mark@mips.COM (Mark G. Johnson) (10/22/89)
For a "well designed" computer, how many gate delays are there in one clock cycle?? That is, what's the ratio [(cycle time)/(gate delay)]? Presuming for the moment that Cray & Thornton's "CDC-6600" machine was well-designed :-), its figure of merit is 20 gate delays per clock cycle. ** Of course, wiring delay due to the speed-of-EMwaves is detrimental. So the ratio of cycle time to gate delay gives an optimistic number for how many gates the longest path can _really_ contain; unless the machine is teeny tiny with negligible wiring delay. What is the gate-delays-per-clock-cycle number for other computers? Does anybody know, for example, ETA-10? Amdahl 580? IBM 801 (risc minicomputer)? Cray Research Y-MP? IBM 360/91? Pedagogical question: is there a "correct value" for gate delays per clock cycle that represents a good tradeoff, empirically determined over the last 30 years, that's best in fast machines? aside remark: Browsing through my e-mail archives today, I came upon an old message from 1988 that purported to describe the (at the time, hypothetical) Cray-3. A pair of listings leaped off the screen and really astonished me: Instruction issue rate: 1 new instruction per cycle # of gate delays per clock cycle: 6 gate delays per cycle ^^^ _________ WOW !! _______________________________| ** The 6600 used a really wacko design for (clock pulses + latch schematic) resulting, apparently, in a setup time of 5 gate delays!! {25ns out of 100ns according to Thornton's book}. So the logic seems to have only had 15 gate delays to calculate results. -- -- Mark Johnson MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086 (408) 991-0208 mark@mips.com {or ...!decwrl!mips!mark}
ram@shukra.Sun.COM (Renu Raman) (10/22/89)
In article <29862@obiwan.mips.COM> mark@mips.COM (Mark G. Johnson) writes: > > Talks about gate-delay/cycle i.e. average (or maybe max) no. gates between latches. >Cray-3. A pair of listings leaped off the screen and really astonished me: > > Instruction issue rate: 1 new instruction per cycle > # of gate delays per clock cycle: 6 gate delays per cycle > > -- Mark Johnson Cray-1 had 8. renu raman email: ram@sun.com
lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (10/23/89)
In article <29862@obiwan.mips.COM> mark@mips.COM (Mark G. Johnson) writes: >For a "well designed" computer, how many gate delays are there in one >clock cycle?? That is, what's the ratio [(cycle time)/(gate delay)]? Cray 1S 8 Cray 2 4 Cray 3 6 The Cray-2 gets pipe results every clock, but takes two clocks per instruction issue. So, the low ratio hurt the scalar performance. This may represent a true lower bound. -- Don D.C.Lindsay Carnegie Mellon Computer Science
Singhal@proxima.Berkeley.EDU (Ashok Singhal) (10/24/89)
In article <29862@obiwan.mips.COM>, mark@mips.COM (Mark G. Johnson) writes: > Path: pasteur!ucbvax!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!apple!mips!mark > From: mark@mips.COM (Mark G. Johnson) > Newsgroups: comp.arch > Subject: Gate delays in fast computers > Message-ID: <29862@obiwan.mips.COM> > Date: 21 Oct 89 22:13:42 GMT > Lines: 45 > > > > For a "well designed" computer, how many gate delays are there in one > clock cycle?? That is, what's the ratio [(cycle time)/(gate delay)]? > Here is a reference that answers your question: Steven R. Kunkel and James E. Smith, "Optimal Pipelining in Supercomputers", Proc. 13th Annual Symposium on Computer Arechitecture, June 1986. The paper discusses your question in detail, including an analytical formulation of the problem and uses simulation to get numbers for a few programs for a Cray-1S. They conclude that 8-10 gates per pipeline segment is optimal. Well written paper. Ashok
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (10/24/89)
In article <29862@obiwan.mips.COM> mark@mips.COM (Mark G. Johnson) writes: >Pedagogical question: is there a "correct value" for gate delays per clock >cycle that represents a good tradeoff, empirically determined over the >last 30 years, that's best in fast machines? Waser and Flynn's "Arithmetic.." has a section on this. Segmented FPU's such as carry-look-ahead adders and multipliers using Booth's encoder w/ Wallace tree, which are segmented at 4-delay intervals, are "optimal" by their criteria, but I think the rules may change in micros. You see a lot of machines from the mid 60's to the mid 80's use segmented functional units with various design choices that look familiar from reading the book. These days, gate delay is not the single dominating factor that it once was, so I don't know if there is a simple-minded recipe available. MIPSCo got excellent results on the R3010, but I don't know what approach was used. Anyway, some of the basics are in Waser and Flynn. See Ch. 6. Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
kyriazis@plato.rdrc.rpi.edu (George Kyriazis) (10/24/89)
In article <126676@sun.Eng.Sun.COM> ram@sun.UUCP (Renu Raman) writes: > >>Cray-3. A pair of listings leaped off the screen and really astonished me: >> >> Instruction issue rate: 1 new instruction per cycle >> # of gate delays per clock cycle: 6 gate delays per cycle >> >> -- Mark Johnson > > Cray-1 had 8. > > renu raman > Remember also that the Crays are pipelined machines, so this is probably not the total number of gates delays that each instruction goes through, but the number of gates delays each instruction goes through at EACH clock cycle. George Kyriazis kyriazis@turing.cs.rpi.edu kyriazis@rdrc.rpi.edu ------------------------------