[comp.arch] Impossible 40MHz R2000 ??

mark@mips.UUCP (Mark G. Johnson) (12/16/87)

Quoting from author	jesup@pawl22.pawl.rpi.edu (Randell E. Jesup)
in article		<140@imagine.PAWL.RPI.EDU>
of comp.arch on date	13 Dec 87 13:02:20 GMT

	> Given current technology, r2000 could probably be scaled
	> to about 20 MHz.  However, custom RISC designs in CMOS are
	> now reaching 40 MHz, which would be impossible with the
	> double-clocked interface currently on the r2000.  Perhaps
	> the interface could be removed, given enough pins, but
	> that gets you back into the packaging limits.

"Impossible" is quite a strong word.  "Difficult", sure.  But he's
saying that a 2.4X improvement of a first-chip-designed-at-a-startup-
company, 2-micron-generic-silicon-foundry device is IMPOSSIBLE.

A few things might change :-) :-) between now (16.7 MHz) and 40 MHz.
Principal among these is experience; several different systems using
this double-clocked approach have now been built (by SGI, MIPS, and
others) and their properties have been measured and analyzed.
Weaknesses, if any :-) :-), can be improved, and strengths can
be exploited.

Other factors conspire to make the job of building a 40 MHz
double-clocked interface not "impossible":

	1.  Cache RAM access times will continue to decrease, likely
	    at the same rate as the processor clock, since SRAM vendors
	    now build RISC chips (including SPARC, R2000, Am29K).  So
	    RAM access time will probably stay at 40-50% of processor
	    cycle time.  {presently 60 ns cycle, 25-30 ns RAM access}.
	    The rest of the cycle is used up by setup & hold times,
	    bus drive (slew) times, timing uncertainties, and "margin".

	2.  Surface mount packages (having the *same number* of
	    leads, 144) might be used instead of the current 144 pin
	    Pin Grid Array.  Their lower inductance and better
	    controlled impedance can decrease dispersion and improve
	    signal quality.  Such packages, available today, are
	    more than 2.4X better than the existing PGA package, so
	    the net percentage of the cycle wasted in package-induced
	    "timing slop" would decrease.

	3.  Output voltage drive levels might shrink from the present
	    0.0 volts and 5.0 volts, to (an example) 0.4V and 2.7V.
	    This speeds up output transitions (dT = dV * C/I) without
	    increasing switching noise.  Less of the cycle (in
	    percentage terms) would be spent slewing the bus around.

	4.  The clock generation and distribution technology may
	    get a factor of > 2.4X more precise.  If this happened,
	    the fraction of the cycle lost to "slop" (timing edge
	    uncertainty) would go down.

	5.  BiCMOS fab processes might be employed, permitting the
	    use of open emitter, Wired-OR interfacing to ECL-compatible
	    cache RAM chips.  Current ECL RAMs are about 15-20 nsec,
	    a smaller fraction of the current processor cycle (60 nsec)
	    than current MOS RAM chips.  So the timing margins
	    *improve*.  Additionally, multiple-driver "collisions" or
	    "contention" are non-deadly in the wired-OR ECL structure,
	    (unlike CMOS tristate busses), such that the time between
	    disabling X and enabling Y onto the bus, can be reduced
	    dramatically.  And there's reason to believe that BiCMOS
	    RAM access times will scale at about the same rate as
	    traditional CMOS RAMs.


Taken individually or as a group, these scenarios indicate (at least
to me) that 40 MHz "double clocked cache interfaces" are indeed
possible, and might in fact be as robust (or moreso) than existing
implementations at 16.7 MHz.

Regards,

-Mark Johnson	*** DISCLAIMER: The opinions above are personal. ***	
UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mark   TEL: 408-720-1700 x208
US mail: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

oconnor@sunray.steinmetz (Dennis M. O'Connor) (12/18/87)

An article by hansen@mips.UUCP (Craig Hansen) says
-] Quoting jesup@pawl22.pawl.rpi.edu (Randell E. Jesup) :
-]] The real problem is the fact that your chip edge is clocked at twice your
-]] instruction freqency.  Running a higher-speed clock than the instruction
-]] rate is fine, and makes internal design much easier.  However, packaging
-]] technology will be your limiting factor for some time to come, not really
-]] ram speed per se.  For the large number of pins required, it is hard to
-]] find packages certified at that speed.
-]
-]For speeds well above 40 MHz in CMOS technology, our studies suggest that this
-]will not be a limiting factor at all ...

You missed the point. The relevent number isn't 40MHz, it's 40MIPS. To
achieve that, you'll need 80 MHz. Mr. Jesup ( hi Randell ) has
personal knowledge of a 40MHz 40MIPS CMOS microprocessor, as do I,
that is real silicon, really working, right now. Those are of course
raw machine MIPS, not equivalent anything MIPS. Check out the upcoming
ISSCC conference for more details.

-] ... But the other companies chip had double-frequency clock inputs,
-] too, and when you compared the two chips at their specified clock
-]rate, ours runs more than twice as fast (benchmark-wise), and theirs had double
-]the input clock rate. Talk about double-speak!

The chip Randal and I know of uses a two-phase 40MHz clock two achieve
40 MIPS. The smallest important time interval is 2ns. You'll need,
I think, a four-phase 80MHz clock for 40 MIPS. I don't know what the
smallest important time interval will be for your clocks.

-] [ ... a bunch of non-sequitors deleted ... ]
-] Because SRAMs are used as technology drivers for new CMOS and BiCMOS
-] technologies, MIPS can be assured of a good supply of highly
-] agressive SRAMs that will work with the MIPS part.

Actually, GE's 1.25 micron AVLSI CMOS process is one of the best going
(ask IBM about it) and never had anything to do with RAMs. Oh since
you brought it up, the machine Randal and I know of uses 20ns CMOS RAMs
to run at 40MIPS with no wait states. What will you need?

-]The real problem with the other RISC designs ...
-] [massive generalization deleted]

What limited-omniscience you must have, to be able to tell all of us
the ONE TRUE PROBLEM with "other" RISC designs. :-) Baloney.
What do you know about MY RISC machine? Nothing. So pay
attention at ISSCC and learn a few things.

ALSO, An article by mark@mips.UUCP (Mark G. Johnson) says:
-] Quoting from author	jesup@pawl22.pawl.rpi.edu (Randell E. Jesup)
-]] Given current technology, r2000 could probably be scaled
-]] to about 20 MHz.  However, custom RISC designs in CMOS are
-]] now reaching 40 MHz, which would be impossible with the
-]] double-clocked interface currently on the r2000.  Perhaps
-]] the interface could be removed, given enough pins, but
-]] that gets you back into the packaging limits.
-]
-]"Impossible" is quite a strong word.  "Difficult", sure.  But he's
-]saying that a 2.4X improvement of a first-chip-designed-at-a-startup-
-]company, 2-micron-generic-silicon-foundry device is IMPOSSIBLE.
-]
-]A few things might change :-) :-) between now (16.7 MHz) and 40 MHz.

No No NO, 40 _MIPS_, not MHz. 80MHz for you guys. And since it is
already here, your changes better happen yesterday. Randel, you 
shoulda said 40MIPS, you know how easily confused people get :-)

-] [ ... standard "learn from experience" stuff deleted ... ]
-]
-]Other factors conspire to make the job of building a 40 MHz
-]double-clocked interface not "impossible":
-]
-]	1.  Cache RAM access times will continue to decrease, likely
-]	    at the same rate as the processor clock, since SRAM vendors
-]	    now build RISC chips (including SPARC, R2000, Am29K).  So
-]	    RAM access time will probably stay at 40-50% of processor
-]	    cycle time.  {presently 60 ns cycle, 25-30 ns RAM access}.

Anyone who thinks system access time for a given memory system
architecture will decrease linearly with RAM access into the tens-of-
nanoseconds range is dreaming. Go figure your peak amps per volt of
swing when trying to charge 120pf in 3 nanoseconds. Run that through
your bonding wires and smoke it :-). Also the idea that there is a
linear relationship between how fast a particular fab technology
can make RAMs and how fast the technology can make CPUs ignores
a lot of important differences between the two. For instance,
how much difference do you think having two levels of metal
makes to a RAM, and how much to a CPU ? Think about it.

-]	2.  Surface mount packages (having [...] leads, 144) might be
-]          used instead of the [... PGA].  ... lower inductance ...
-]	    controlled impedance ... improve signal quality ...
-]	    more than 2.4X better than the existing PGA package ...
-]	    "timing slop" would decrease.

Package isn't that important. The dielectric constant of your
substrate and the size of your input protection are probably going
to limit you first. Besides, surface-mounts beyond 132 or so have
a tendency to jump off the board when thermal-cycled. No fun.


-]
-]	3.  Output voltage drive levels might shrink ... to (an
-]          example) 0.4V and 2.7V. ...speeds up output transitions... 
-]	    without increasing switching noise.  Less of the cycle (in
-]	    percentage terms) would be spent slewing the bus around.

Yes, this is true. But the processor Randal and I know of ( but aren't
really allowed to say much about ) uses good old 5V swings, like
the fast CMOS RAMs it's hooked to and the Sun that drives talks to it :-)
If IT goes to 3V, well, we'll have to see. Don't have crystal ball.
Also, what abouth second order effects in the MOS transistors, and
what about noise rejection ? Its just not as simple as lowering the supply.

-]	4.  The clock generation and distribution technology may
-]	    get a factor of -] 2.4X more precise  ...
-]	    "slop" (timing edge uncertainty) would go down.

"You cannae change the laws of physics, laws of physics, laws of physics,
 You cannae change the laws of physics, laws of physics, Captain !"
				from _Star Trekkin'_, by The Firm.

" Our new supraluminal wave guide... " "Don't put your finger near
the board, the stray capacitance will kill the clock system..."
Look, using transmission lines instead of wires is one way to go and
yes it would help. But until then your just doing that old RC game.
In the imperfect noisy nonlinear real world. Sigh.

-]	5.  BiCMOS fab processes might be employed, permitting the
-]	    use of open emitter, Wired-OR interfacing to ECL-compatible
-]	    cache RAM chips.  Current ECL RAMs are about 15-20 nsec,
-]	    a smaller fraction of the current processor cycle (60 nsec)
-]	    than current MOS RAM chips.  So the timing margins
-]	    *improve*.  Additionally, multiple-driver "collisions" or
-]	    "contention" are non-deadly in the wired-OR ECL structure,
-]	    (unlike CMOS tristate busses), such that the time between
-]	    disabling X and enabling Y onto the bus, can be reduced
-]	    dramatically.  And there's reason to believe that BiCMOS
-]	    RAM access times will scale at about the same rate as
-]	    traditional CMOS RAMs.

Sure, but when YOU use 20ns RAM you get what, 20MIPS? And when I
use 20ns RAM I get (am getting) 40MIPS. So who is better positioned
to take advantage of new RAM technology?

-]Taken individually or as a group, these scenarios indicate (at least
-]to me) that 40 MHz "double clocked cache interfaces" are indeed
-]possible, and might in fact be as robust (or moreso) than existing
-]implementations at 16.7 MHz.
-]
-]Regards,
-]
-]-Mark Johnson	*** DISCLAIMER: The opinions above are personal. ***	

Sure, and warp drive may be safer than skiing. Cutting edge NOW is
40MIPS at 40MHz-2-phase using 20ns RAMs. The future may belong to
GaAs. Are you in the present or the past ?

Gee, I wish I could tell you more about our chip. Maybe after
ISSCC I'll be able to. Like, knock off your socks, fur sure.
--
	Dennis O'Connor 	oconnor@sungoddess.steinmetz.UUCP ??
				ARPA: OCONNORDM@ge-crd.arpa
        "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"

jesup@pawl22.pawl.rpi.edu (Randell E. Jesup) (12/18/87)

In article <1145@mips.UUCP> mark@mips.UUCP (Mark G. Johnson) writes:
>Quoting from author	jesup@pawl22.pawl.rpi.edu (Randell E. Jesup)
>	> Given current technology, r2000 could probably be scaled
>	> to about 20 MHz.  However, custom RISC designs in CMOS are
>	> now reaching 40 MHz, which would be impossible with the
>	> double-clocked interface currently on the r2000.  Perhaps
>	> the interface could be removed, given enough pins, but
>	> that gets you back into the packaging limits.
>
>"Impossible" is quite a strong word.  "Difficult", sure.  But he's
>saying that a 2.4X improvement of a first-chip-designed-at-a-startup-
>company, 2-micron-generic-silicon-foundry device is IMPOSSIBLE.

	Oops.  You're right, I shouldn't have said 'impossible'.  Take that
as rephrased to 'very tough'.

>Other factors conspire to make the job of building a 40 MHz
>double-clocked interface not "impossible":
>
>	1.  Cache RAM access times will continue to decrease, likely
>	    at the same rate as the processor clock, since SRAM vendors
>	    now build RISC chips (including SPARC, R2000, Am29K).  So
>	    RAM access time will probably stay at 40-50% of processor
>	    cycle time.  {presently 60 ns cycle, 25-30 ns RAM access}.
>	    The rest of the cycle is used up by setup & hold times,
>	    bus drive (slew) times, timing uncertainties, and "margin".

	If you're willing to pay for it, you can get 20ns cycle SRAMs
in reasonable sizes now.

>	2.  Surface mount packages (having the *same number* of
>	    leads, 144) might be used instead of the current 144 pin
>	    Pin Grid Array.  Their lower inductance and better
>	    controlled impedance can decrease dispersion and improve
>	    signal quality.  Such packages, available today, are
>	    more than 2.4X better than the existing PGA package, so
>	    the net percentage of the cycle wasted in package-induced
>	    "timing slop" would decrease.

	You can get leadless chip carriers now with 144 'pins' that are
certified to run at 40Mhz.  However, to 'double-clock' I assume they would
have to be certified at 80Mhz, which they are not.  PGAs at that speed are
right out (though perhaps not impossible :-)

>	3.  Output voltage drive levels might shrink from the present
>	    0.0 volts and 5.0 volts, to (an example) 0.4V and 2.7V.
>	    This speeds up output transitions (dT = dV * C/I) without
>	    increasing switching noise.  Less of the cycle (in
>	    percentage terms) would be spent slewing the bus around.

	This may well be they way the world goes.  However, at least for
the next few years, I think we're stuck with 0-5V.

>	4.  The clock generation and distribution technology may
>	    get a factor of > 2.4X more precise.  If this happened,
>	    the fraction of the cycle lost to "slop" (timing edge
>	    uncertainty) would go down.

	Definitely helps, but you can get some VERY accurate clocks if
you put some real thought into it and limit the loading.  Things like
2 40Mhz non-overlapping clocks, for example.

>	5.  BiCMOS fab processes might be employed, permitting the
>	    use of open emitter, Wired-OR interfacing to ECL-compatible
>	    cache RAM chips.  Current ECL RAMs are about 15-20 nsec,
>	    a smaller fraction of the current processor cycle (60 nsec)
>	    than current MOS RAM chips.  So the timing margins
>	    *improve*.  Additionally, multiple-driver "collisions" or
>	    "contention" are non-deadly in the wired-OR ECL structure,
>	    (unlike CMOS tristate busses), such that the time between
>	    disabling X and enabling Y onto the bus, can be reduced
>	    dramatically.  And there's reason to believe that BiCMOS
>	    RAM access times will scale at about the same rate as
>	    traditional CMOS RAMs.

	I'm not a silicon person, so I'll take your word.  However, as I said
earlier, 20-25ns cycle time SRAMs exist and can be bought now.  In fact, all
40Mhz RISC chip designs being worked on that I know about will need them.
	Does the BiCMOS require that ALL chips be BiCMOS?  (sounds like
it would.)

>Taken individually or as a group, these scenarios indicate (at least
>to me) that 40 MHz "double clocked cache interfaces" are indeed
>possible, and might in fact be as robust (or moreso) than existing
>implementations at 16.7 MHz.

	I think your real limiting factor, even with some of those you
stated above, will be packaging technology.  You can build a 40Mhz RISC chip
with single-clocked interfaces with current tech.  With better tech, you
could build better ones, more or less as fast as the interface lets you go.
It seems to be the limiting factor at the bleeding edge right now.
The 16Mhz double-clocked RISC chips are now putting about the same load on the
technology as a 30-32Mhz single-clocked chip would.
	For double-clocking to be a win, you have to have packaging tech that
is at least twice as fast as you can make your CPU cycle times go.  With
16Mhz now that's easy, it is not possible (caveat:-) with 40Mhz now.  (Note
that my earlier article was about current state-of-the-art, not what can
happen in the next n years.  There are 40Mhz RISC chips out there now, using
1.25u.  My comment on scaling was that the r2000 chip couldn't do 40Mhz given
CURRENT vlsi, packaging, and ram technologies (once again, caveat:-))
	If you believe that packaging tech will improve sufficiently faster
than CPU cycle times, maybe it will continue to be reasonable.  However, it
seems that packaging is NOT keeping up right now.

	One last caveat:  I know there are new packaging technologies under
development that may (more like will) turn the world upside down.  However,
my earlier comment was about NOW, not 1-3 years from now.

>Regards,
>
>-Mark Johnson	*** DISCLAIMER: The opinions above are personal. ***	
>UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mark   TEL: 408-720-1700 x208
>US mail: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

     //	Randell Jesup			Lunge Software Development
    //	Dedicated Amiga Programmer	13 Frear Ave, Troy, NY 12180
 \\//	lunge!jesup@beowulf.UUCP	(518) 272-2942
  \/    (uunet!steinmetz!beowulf!lunge!jesup)

mash@mips.UUCP (John Mashey) (12/20/87)

The following is a general discussion prompted by Dennis O'Conner's
posting, followed by a few detailed comments interspersed with extracts from
the posting itself.  Dennis alluded many times to a GE 40-native-mips CPU
about which he can't say much until after ISSCC.

[BTW: I've occasionally confused people by using "we", making them think
I speak for MIPS Computer Systems, which I do NOT. I sometimes use the editorial
"we" out of personal habit, and sometimes I say "we", in that my comments and
opinions may arise from discussions with other people.  In this case,
mark johnson and craig hansen had useful comments that I've incorporated,
but I'm responsible for any errors.]

GENERAL
1) Fortunately, it's not long until ISSCC, which will help, since it's
hard to have any reasonable discussion without data, and we understand
the ISSCC rules that prevent prior publication, so I won't push for details
that would cause problems.

2) The statements marked 1 & 2 below (on mips) could use some clarification.
	a) Remember that we consistently use 1 mips = vax11/780-with-decent
	compilers-running-variety-of-real-programs-not-dhrystones-or-toy-
	benchmarks type mips, i.e., something people can actually measure
	and evaluate.

	b) 40MHZ is not apriori 40mips (of the kind we described above);
	statement 1 (way below: "raw machine MIPS") makes it clear that Dennis
	understands this, but some of the other statements seem to MIX
	apples and oranges.  Note that even getting 40 native-mips from
	40MHZ implies certain things about external cache/memory latencies,
	miss penalties, branch-handling, MMU-interference, etc.  I do assume
	40MIPS is for something other than a tight-loop of adds or nops in
	an on-chip cache. (a 25MHZ 68020 is a 12.5MIPS thing by that rule).
	
	c)This business has more than once seen spectacular claims,based on peak
	native-mips-ratings, that had little to do with the actual performance
	on real benchmarks.  Until we see the ISSCC paper, it's hard to
	guess what actual performance might be.  I certainly believe that
	a machine should be able to do it's clock rates in NOPS or ADDs;
	doing loads, stores, and branches is harder, especially for real-sized
	programs that actually miss in caches now and then.

3) After ISSCC, if the paper itself doesn't reveal such information,
perhaps you can post some real benchmark numbers.  We'd be glad to send
you a copy of the MIPS benchmark tape for a nominal cost (not me:
mail to ....!mips!mannos), and we'll be glad to compare notes.

4) Can you say more on what you're asserting the actual performance is?
If not now, at least after ISSCC? For example,
do you think that, in a real system, that people can build, is it:
	a) 40X faster than 11/780 (4X MIPS M/1000)
	b) 30X "" ...(3X)
	c) 2X ""  ...(2X)
Also, you haven't mentioned floating point.  Can you at least say if the
ISSCC paper will discuss it?

5) There are several ways of convincing somebody that a computer can
achieve a given performance:
	REALITY: Here's the machine.  Benchmark it and see.
	HINT: For a future machine, here are some hints about the ways in
	which it might be done.
	DESIGN: Here is what the future design looks like, and here are the
	innovations and sneaky designs we use to make it work.

REALITY is always preferable: existence is a virtue:
if I see a system remake the UNIX kernel, boot it, and then compile/run
Spice, I believe it might even be a Real Machine, subject to any evidence
to the contrary. This is hard to do with future designs, unless someone
has really great simulators and can convince me of that.

We often tell people under nondisclosure a lot of HINT, and even some DESIGN.
Sometimes HINT alone can sound like hand-waving, which we dislike,
but DESIGN inevitably discloses details that are highly proprietary,
and this is something we can't do in a forum like comp.arch.

Anway, we look forward to seeing the GE ISSCC paper, and even more, to
some live benchmark numbers to give appropriate perspective to the
40-mips characterization.

SPECIFIC NOTES ON POSTING
--------------------------
In article <8252@steinmetz.steinmetz.UUCP> sunray!oconnor@steinmetz.UUCP writes:
>An article by hansen@mips.UUCP (Craig Hansen) says
....
>-]For speeds well above 40 MHz in CMOS technology,our studies suggest that this
>-]will not be a limiting factor at all ...

>You missed the point. The relevent number isn't 40MHz, it's 40MIPS. To
>achieve that, you'll need 80 MHz. Mr. Jesup ( hi Randell ) has
>personal knowledge of a 40MHz 40MIPS CMOS microprocessor, as do I,
>that is real silicon, really working, right now. Those are of course
>raw machine MIPS, not equivalent anything MIPS. Check out the upcoming

-------------^^^^ statement 1: 

>ISSCC conference for more details.

.....bunch of discussion on clock-rates....

>Actually, GE's 1.25 micron AVLSI CMOS process is one of the best going
>(ask IBM about it) and never had anything to do with RAMs. Oh since
>you brought it up, the machine Randal and I know of uses 20ns CMOS RAMs
>to run at 40MIPS with no wait states. What will you need?

>-]The real problem with the other RISC designs ...
>-] [massive generalization deleted]

>What limited-omniscience you must have, to be able to tell all of us
>the ONE TRUE PROBLEM with "other" RISC designs. :-) Baloney.
>What do you know about MY RISC machine? Nothing. So pay
>attention at ISSCC and learn a few things.

Just before "The real problem..", craig had referred to 2 specific other
RISC designs, and was discussing cache issues, by context. He never
stated that this was the ONE TRUE PROBLEM of ALL other risc designs.

>No No NO, 40 _MIPS_, not MHz. 80MHz for you guys. And since it is
>already here, your changes better happen yesterday. Randel, you 
>shoulda said 40MIPS, you know how easily confused people get :-)

....much EE stuff deleted...

>Sure, but when YOU use 20ns RAM you get what, 20MIPS? And when I
>use 20ns RAM I get (am getting) 40MIPS. So who is better positioned
---------------------------------^^^^^^ statement 2
>to take advantage of new RAM technology?.....
>Sure, and warp drive may be safer than skiing. Cutting edge NOW is
>40MIPS at 40MHz-2-phase using 20ns RAMs. The future may belong to
>GaAs. Are you in the present or the past ?

It's hard to tell when we are.  We think we're going along the edge of
of what's practical for us to build, sell, and quickly improve for
the commercial marketplace, and we think what we do will scale well
in the better technologies that we're now getting access to.  It's
quite possible that the chip you speak of will indeed blow everyone's
socks off, and that we are indeed living in the past.  On the other
hand, if you haven't done exceedingly careful performance modeling
of this in realizable system environments, you might get surprised,
as have numerous other people.  Perhaps you could describe your
performance simulation environment, and what kinds of programs
are simulated to predict performance?

>Gee, I wish I could tell you more about our chip. Maybe after
>ISSCC I'll be able to. Like, knock off your socks, fur sure.

Many people here will look forward to the ISSCC paper with strong
interest, and even more to seeing some good benchmarks. 
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

oconnor@sunray.steinmetz (Dennis M. O'Connor) (12/22/87)

An (excellent) article by mash@winchester.UUCP (John Mashey) says :
-]GENERAL
-]1) Fortunately, it's not long until ISSCC ... and we understand
-]the ISSCC rules that prevent prior publication ...

Thanks. As one of the architects of this design, I'm just itchin' to
discuss it, so I'm frustrated by the rules but understand why. I'm
glad you understand too.

-]2) The statements marked 1 & 2 below (on mips) could use some clarification.
-]	a) Remember that we consistently use 1 mips = vax11/780-with-decent
-]	compilers-running-variety-of-real-programs-not-dhrystones-or-toy-
-]	benchmarks type mips, i.e., something people can actually measure
-]	and evaluate.

We've only got some computed VAX equivalence numbers, so I wouldn't
want to tell them.

-]	b) 40MHZ is not apriori 40mips (of the kind we described above);
-]	statement 1 (way below: "raw machine MIPS") makes it clear that Dennis
-]	understands this, but some of the other statements seem to MIX
-]	apples and oranges.  Note that even getting 40 native-mips from
-]	40MHZ implies certain things about external cache/memory latencies,
-]	miss penalties, branch-handling, MMU-interference, etc.  I do assume
-]	40MIPS is for something other than a tight-loop of adds or nops in
-]	an on-chip cache. (a 25MHZ 68020 is a 12.5MIPS thing by that rule).

The chip always runs 40MIPS all the time. We do have interlocks that
can sometiumes require NOPs ( one O or two ?? a new debate :-), and we
get cache misses very occasionally as well, so our performance is,
well, usually less than 40MIPS certainly. Which is indeed why I've
only spoken of raw machine MIPs. "But they're HONEST raw machine MIPs!" :-)
	
-]	c)This business has more than once seen spectacular claims,based on peak
-]	native-mips-ratings ...  Until we see the ISSCC paper, it's hard to
-]	guess what actual performance might be.  I certainly believe that
-]	a machine should be able to do it's clock rates in NOPS or ADDs;
-]	doing loads, stores, and branches is harder, especially for real-sized
-]	programs that actually miss in caches now and then.

The RPM40 chip does its clock rate on loads and stores. There
are no cache misses on loads and stores. Branches can cause cache
misses, but barring misses we do full clock on them too. I really
wish I coulkd talk about cache design, ooh its so good! But
fortuneately ISSCC is not so far away.
-]
-]3) After ISSCC, if the paper itself doesn't reveal such information,
-]perhaps you can post some real benchmark numbers. 

I'll post everything I can. But as I will explain later, it's probably
comparing Red Delicious to Macintosh ( the apples ) to compare RPM40
to DEC, Sun, MIPS or Motorola. Different target environment.

-]Also, you haven't mentioned floating point.  Can you at least say if the
-]ISSCC paper will discuss it?

This years ISSCC does not ( I believe ) discuss the FPU. That may have
to wait till NEXT years ISSCC, I fear. Even tho we've silicon of it Now.

-]5) There are several ways of convincing somebody that a computer can
-]achieve a given performance:
-]	REALITY: Here's the machine.  Benchmark it and see.
-]	HINT: For a future machine, here are some hints about the ways in
-]	which it might be done.
-]	DESIGN: Here is what the future design looks like, and here are the
-]	innovations and sneaky designs we use to make it work.
-]
-]REALITY is always preferable: existence is a virtue:
-]if I see a system remake the UNIX kernel, boot it, and then compile/run
-]Spice, I believe it might even be a Real Machine, subject to any evidence
-]to the contrary. This is hard to do with future designs ...
-]
-]... HINT alone can sound like hand-waving ... DESIGN inevitably
-] discloses details ... highly proprietary ... can't do in ... comp.arch.

After ISSCC I hope I can talk design : this was a non-classified
DARPA project, and GE is NOT in the computer business. Maybe I'll
be allowed to publish. I hope so : I think we did some great work !

-]... we look forward to ... GE ISSCC paper ... live benchmark numbers 

Our live benchmarks, to be applicable to our design environment,
would be different from yours. Unless you've got an ATF ( Advanced
Tactical Fighter) aerodymanic control surfaces controller benchmark :-). 

Well, now for some notes :

First, I hope people caught that my article was intended to be
light-hearted. I've got no bones to pick or axes to grind, I'm just
EXTREMELY happy to have some of my ideas in working silicon. Feels
real good. And yes I'm proud of my work. But I can't really disclose
it yet, which is frustrating.

The chip has IMHO some nice new things in it, which IMHO will be taken
up with enthusiam when they go public ( the ideas, not the chip ).

But realize : this is a MILITARY chip. Not commercial. $/MIP is not
a real factor in it's design. MIPS/Watt, MIPS/sq-cm, MIPS/package
where bigger drivers. Rad-Hard was a factor as well. So it may never
make a UNIX kernal, or run Spice. Or be for sale without a satelites :-)

So why bring it up ? Beacuse it runs at 40MHz/MIPS. Since we've
actually built a 40MHz/MIPS chip, well, as John Mashey says, reality
is better than design. We have dealt with some of the nasty problems
CMOS encounters driving even 30pf at 40MHz. Or trying to turn a bus
around ( from send to receive ) in 3ns. And it is tough. No question.
Hi-speed CPUs was the original subject, I think ?

Thank you, John Mashey, for an excellent reply that pointed out where
my posting was unclear. I hope I've clarifed things some with this.


--
	Dennis O'Connor 	oconnor@sungoddess.steinmetz.UUCP ??
				ARPA: OCONNORDM@ge-crd.arpa
        "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"

ward@cfa.harvard.EDU (Steve Ward) (12/24/87)

> 
> The RPM40 chip.....
>
> 
> After ISSCC I hope I can talk design : this was a non-classified
> DARPA project, and GE is NOT in the computer business. Maybe I'll
> be allowed to publish. I hope so : I think we did some great work !
> 
> --
> 	Dennis O'Connor 	oconnor@sungoddess.steinmetz.UUCP ??
> 				ARPA: OCONNORDM@ge-crd.arpa
>         "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"



I believe that all non-classified federal research must be made
available to U.S. Citizens.  The Freedom of Information Act can be used
to obtain such information if it is not otherwise forthcoming.  DON'T
JUMP TO CONCLUSIONS!  Clearly the researcher(s) have rights to privacy
and confidentiality during the research, so  nobody can or should be
able to force premature disclosures.  However, a research contract with
DARPA will call for research results in the form of reports, findings,
papers, or whatever to be reported to DARPA according to the contract
timetable, which might even be only upon completion/termination of
the research grant/contract.  All such non-classified documents and
research materials may be requested by anyone.  If the research is
classified then only DOD contractors with the need to know and
industrial and foreign spies :-) will gain access.  Of course, one
can always appeal the DOD classification, as well.

All of this is stated here to pose a question:  Dennis O'Connor seemed
to have doubt as to what he could make public in the long term about his
work.  It seems to me that if his work is sponsored by a non-classified
DARPA grant or research contract, that his work MUST be made public as
it is reported to DARPA, meaning that at the very least, the documents
and other information he gives to DARPA should be publically available
in roughly the timeframe in which DARPA receives such information.
Again, preliminary information and general communications with DARPA
are not what I am talking about, but only final results and findings.

Now for my parachute:  We do a lot of federally-sponsored research
here, but almost always with NASA.  Our work is also of a non-classified
nature.  Our scientists live to publish, so the only secrecy is prior to
publishing, then all the beans are publically spilled.  There certainly
may be more to this situation than meets my eye.  I am not an expert on
federally-sponsored research legalities.  Hopefully when the time is
right, all technical information regarding the 40MIPS/MHZ beastie will
be revealed.

oconnor@sungoddess.steinmetz (Dennis M. O'Connor) (12/29/87)

I didn't say enough about why I might not be able to publish.
But it's NOT because of the Goverment. There are indeed reports
you can probably get from the Gov. Printing Office about our
work. And the goverment is NOT stopping publication.
But I work for GE, GE has the rights to the RPM40 work, and
if GE doesn't want it published, that's GE's right.
Just like it is/was IBM's right to not publish LOTS of it's research,
including stuff IBM was never going to use. I have no problems with
this. Sure, I'd like to publish. But if GE says no, I won't.
I'll just move on to othr interesting GE work. No problem.
So what if GE doesn't let me do some things I'd like : I still think
working at GE CR&D is lotsa fun. 
--
	Dennis O'Connor 	oconnor@sungoddess.steinmetz.UUCP ??
				ARPA: OCONNORDM@ge-crd.arpa
        "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"

Steve_D_Wilson@cup.portal.com (01/11/88)

Just a quick comment on your statements concerning ECL.  There
are two points that should be made.

1) Current ECL rams aren't at 15-20 ns as stated, there down
   at 7 to 8 ns and with the advent of self-timed write
   curcuitry, will approach 3 to 5 ns cycle times.

2)  When you WIRE-OR ECL drivers you pay a price in performance.
    There is a timing penalty that must be payed when you have
    more than one driver on the line, and it isn't small.  At
    the last company I worked at the expected hit for using a
    WIRE-OR was greater than 2 ns WITHOUT adding in the delay
    due to driver separation reflections and such.  

jesup@pawl19.pawl.rpi.edu (Randell E. Jesup) (01/12/88)

In article <2367@cup.portal.com> Steve_D_Wilson@cup.portal.com writes:
>1) Current ECL rams aren't at 15-20 ns as stated, there down
>   at 7 to 8 ns and with the advent of self-timed write
>   curcuitry, will approach 3 to 5 ns cycle times.

	Sorry if you misunderstood, 15-20ns was for static CMOS rams, not
for ECL.

	I hope they can get that fast, as we will see GAAS chips of >100 MHz,
maybe even 200 MHz, eventually.  Of course, they have to get gaas to yield
with lots of gates, first.

     //	Randell Jesup			      Lunge Software Development
    //	Dedicated Amiga Programmer            13 Frear Ave, Troy, NY 12180
 \\//	beowulf!lunge!jesup@steinmetz.UUCP    (518) 272-2942
  \/    (uunet!steinmetz!beowulf!lunge!jesup) BIX: rjesup