[comp.arch] Gate delays in fast computers

mark@mips.COM (Mark G. Johnson) (10/22/89)

 
For a "well designed" computer, how many gate delays are there in one
clock cycle??  That is, what's the ratio [(cycle time)/(gate delay)]?
 
Presuming for the moment that Cray & Thornton's "CDC-6600" machine was
well-designed :-), its figure of merit is 20 gate delays per clock cycle. **

Of course, wiring delay due to the speed-of-EMwaves is detrimental.  So
the ratio of cycle time to gate delay gives an optimistic number for
how many gates the longest path can _really_ contain; unless the machine is
teeny tiny with negligible wiring delay.

What is the gate-delays-per-clock-cycle number for other computers?  Does
anybody know, for example,
	ETA-10?
	Amdahl 580?
	IBM 801 (risc minicomputer)?
	Cray Research Y-MP?
	IBM 360/91?

Pedagogical question: is there a "correct value" for gate delays per clock
cycle that represents a good tradeoff, empirically determined over the
last 30 years, that's best in fast machines?
 
aside remark:
Browsing through my e-mail archives today, I came upon an old message
from 1988 that purported to describe the (at the time, hypothetical)
Cray-3.  A pair of listings leaped off the screen and really astonished me:

	Instruction issue rate:			1 new instruction per cycle
	# of gate delays per clock cycle:	6 gate delays per cycle
                                               ^^^
_________ WOW !! _______________________________|


** The 6600 used a really wacko design for (clock pulses + latch schematic)
   resulting, apparently, in a setup time of 5 gate delays!!  {25ns out
   of 100ns according to Thornton's book}.  So the logic seems to have
   only had 15 gate delays to calculate results.

-- 
 -- Mark Johnson	
 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
	(408) 991-0208    mark@mips.com  {or ...!decwrl!mips!mark}

ram@shukra.Sun.COM (Renu Raman) (10/22/89)

In article <29862@obiwan.mips.COM> mark@mips.COM (Mark G. Johnson) writes:
>
> 

Talks about gate-delay/cycle i.e. average (or maybe max) no. gates between
latches.

>Cray-3.  A pair of listings leaped off the screen and really astonished me:
>
>	Instruction issue rate:			1 new instruction per cycle
>	# of gate delays per clock cycle:	6 gate delays per cycle
>
> -- Mark Johnson	

   Cray-1 had 8.

   renu raman

   email: ram@sun.com

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (10/23/89)

In article <29862@obiwan.mips.COM> mark@mips.COM (Mark G. Johnson) writes:
>For a "well designed" computer, how many gate delays are there in one
>clock cycle??  That is, what's the ratio [(cycle time)/(gate delay)]?

Cray 1S		8
Cray 2		4	
Cray 3		6

The Cray-2 gets pipe results every clock, but takes two clocks per
instruction issue. So, the low ratio hurt the scalar performance.
This may represent a true lower bound.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science

Singhal@proxima.Berkeley.EDU (Ashok Singhal) (10/24/89)

In article <29862@obiwan.mips.COM>, mark@mips.COM (Mark G. Johnson) writes:
> Path: pasteur!ucbvax!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!apple!mips!mark
> From: mark@mips.COM (Mark G. Johnson)
> Newsgroups: comp.arch
> Subject: Gate delays in fast computers
> Message-ID: <29862@obiwan.mips.COM>
> Date: 21 Oct 89 22:13:42 GMT
> Lines: 45
> 
> 
>  
> For a "well designed" computer, how many gate delays are there in one
> clock cycle??  That is, what's the ratio [(cycle time)/(gate delay)]?
>  

Here is a reference that answers your question:

Steven R. Kunkel and James E. Smith, "Optimal Pipelining in Supercomputers",
Proc. 13th Annual Symposium on Computer Arechitecture, June 1986.

The paper discusses your question in detail, including an analytical
formulation of the problem and uses simulation to get numbers for a few
programs for a Cray-1S.  They conclude that 8-10 gates per pipeline segment
is optimal.  Well written paper.

Ashok

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (10/24/89)

In article <29862@obiwan.mips.COM> mark@mips.COM (Mark G. Johnson) writes:

>Pedagogical question: is there a "correct value" for gate delays per clock
>cycle that represents a good tradeoff, empirically determined over the
>last 30 years, that's best in fast machines?

Waser and Flynn's "Arithmetic.." has a section on this.  Segmented FPU's
such as carry-look-ahead adders and multipliers using Booth's encoder w/ Wallace
tree, which are segmented at 4-delay intervals, are "optimal" by their criteria,
but I think the rules may change in micros.  You see a lot of machines
from the mid 60's to the mid 80's use segmented functional units with various
design choices that look familiar from reading the book.  These days, gate 
delay is not the single dominating factor that it once was, so I don't
know if there is a simple-minded recipe available.  MIPSCo got excellent
results on the R3010, but I don't know what approach was used.  Anyway, some
of the basics are in Waser and Flynn.  See Ch. 6.

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

kyriazis@plato.rdrc.rpi.edu (George Kyriazis) (10/24/89)

In article <126676@sun.Eng.Sun.COM> ram@sun.UUCP (Renu Raman) writes:
>
>>Cray-3.  A pair of listings leaped off the screen and really astonished me:
>>
>>	Instruction issue rate:			1 new instruction per cycle
>>	# of gate delays per clock cycle:	6 gate delays per cycle
>>
>> -- Mark Johnson	
>
>   Cray-1 had 8.
>
>   renu raman
>

Remember also that the Crays are pipelined machines, so this is 
probably not the total number of gates delays that each instruction goes
through, but the number of gates delays each instruction goes through
at EACH clock cycle.  

  George Kyriazis
  kyriazis@turing.cs.rpi.edu
  kyriazis@rdrc.rpi.edu
------------------------------