[comp.arch] The Killer Micro From Hell [Really: fight ...

yam@nttmhs.ntt.jp (Toshihiko YAMAKAMI) (01/04/90)

From article <34030@mips.mips.COM>, by mash@mips.COM (John Mashey):

> 1) The protagonists in this argument, both of whom have access to various
> flavors of supercomputers and Killer Micros, and either of whom would happily
> consume infinite bunches of cycles:
> 	a) Mostly agree on the fundamentals, which I think are:

(staff on Killer Micros and supercomputers)

> 	b) Maybe disagree a little on:

> 		b3) Exactly how fast the KMs are gaining.

It is interesting to see how fast the KMs are gaining,
and it is more interesting to find how fast the KMs will be gaining.

I have never touched supercomputers, but I use SONY NEWS with MC68030.
And you know, SONY will introduce R3000 in their RISC-NEWS this year.
So I am interested in how fast they will be gaining in these coming
years in the field of scalar computing.
R6000 runs at 66.7MHz. What is the bottleneck to prevent it
from running faster is a current topic of our news group.
I hope I can replace my SONY machines with Rx000-based ones in 1990 or 1991.

(x) The clock rate? How fast can they gain in ECL implementation in 1990's?
    Can they run at 100MHz, 200MHz, 300MHz??
    How much room of improvement of operation access time compared to
    current 16ns(66.7MHz)?

(y) Memory Bandwidth?  What type of CPU starvation is the most critical
    in R6280? Access time of first cache SRAM? Inner bus bandwidth?
    How much fast SRAM will be available in next coming years compared
    to 7ns ECL SRAM in R6280? How about on 12ns bipolar CMOS SRAM in
    second cache of R6280?
    How fast can system bus be compared 266Mbyte/sec system bus with R6020?

(z) Huting of parallel computation? Superpipeline or superscalar can
    exploit more parallelism hidden in the codes?
    Or more sophisticated compilers?

MIPS will release 80MHz version in 1990.
The speed up will continue, but the ratio will be smaller. When
will the speed up stop and by what bottleneck?

						Toshihiko YAMAKAMI



    
Toshihiko YAMAKAMI	NTT Telecommunication Networks Laboratories
 Telephone:	+81-468-59-3781 	FAX:	+81-468-59-2546
 junet:	yam@nttmhs.ntt.jp		CSNET:	yam%nttmhs.ntt.jp@relay.cs.net
 snail-mail:	Take 1-2356-523A, Yokosuka, Kanagawa 238-03 JAPAN

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (01/06/90)

In article <4322@nttmhs.ntt.JP> yam@nttmhs.ntt.jp (Toshihiko YAMAKAMI) writes:

>It is interesting to see how fast the KMs are gaining,
>and it is more interesting to find how fast the KMs will be gaining.
>
>I have never touched supercomputers, but I use SONY NEWS with MC68030.

Supercomputing Review recently had an article on the two Cray companies, and
the Cray-3 is planned to have a 2ns clock, with 16 processors.  This information
was stated in the public stock offering prospectus, so it is now public 
knowledge.  This gives an idea of the numbers people are talking about for
the next generation of supercomputers.  500 MHz still compares favorably 
to 50-80MHz numbers for KMs in the same time frame.  The price/performance 
ratio is somewhat different, of course :-)  See below.

Now, many people may differ :-) but *my* rule of thumb for a "good"
implementation (The folks from MIPSCo, Sun, et al. will shoot me for this -
it compresses man years of clever work into a *simplistic* number) is that 
Cray and SPARC tend to produce about .5x clock speed in VAX-equivalent "MIPS" 
(IMHO - this is just a simplistic ballpark figure etc etc your milage may vary)
while MIPSCo tends to be a little better, around .6x or so.  (I guess Crays
tend not to do as well as this on Eugene Brooks favorite code...).

>R6000 runs at 66.7MHz. What is the bottleneck to prevent it

So, I would expect systems to have up to about 40 "MIPS".  An R6280 could
SPECmark, for example, at ~40x an 11/780.  Just my speculation, of course.

>(y) Memory Bandwidth?  What type of CPU starvation is the most critical
>(z) Huting of parallel computation? Superpipeline or superscalar can
>    exploit more parallelism hidden in the codes?
>    Or more sophisticated compilers?
>
>MIPS will release 80MHz version in 1990.

It is interesting to contemplate $100-300K systems, like the SGI Power Series,
with each CPU based on an 80MHz R6000.  The possibility is there for a
system which looks like a Cray scaled down by a factor of about 5 for scalar
work, a factor of 15 for vector work.  At a cost of 1/30 - 1/100.  What would
prevent this from happening?  Memory bandwidth could.  Nobody 
really wants to talk about this in public, but I bet a lot of people are
staying up nights trying to figure out how to scale up memory bandwidth
with processor speed.  Cheaply (If you build it like Cray does, it will
cost like a Cray).

The factor of five could be pretty scary to Cray.  I saw a report summary done
at Livermore a while back in which "4" was the critical speed factor in
which determined whether systems had "significantly different" speed. 

And looking just a little further ahead, an "R9000"
(just making this up out of whole cloth) with a starting clock speed of
100MHz, scaled up to 200 MHz by 1992 or 1993, could put Cray out of business.
SGI will build scalar graphics workstations with half the power of the then
current Crays at 1/100 the cost, and Ardent will do the vector version, with
a similar price advantage.  An amusing idle speculation (?)

Now, about that memory bandwidth?

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117       

rhealey@umn-d-ub.D.UMN.EDU (Rob Healey) (01/07/90)

In article <39807@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
>(just making this up out of whole cloth) with a starting clock speed of
>100MHz, scaled up to 200 MHz by 1992 or 1993, could put Cray out of business.
>SGI will build scalar graphics workstations with half the power of the then
>current Crays at 1/100 the cost, and Ardent will do the vector version, with
>a similar price advantage.  An amusing idle speculation (?)
>
	For the first part, if MIPS, or whoever, can do 200MHz by 1993 what
	prevents CRI, as opposed to CCC, from comming up with a faster
	version of it's architecture? Remember, CCC is the Cray 3 and CRI
	is the 2, X & Y and whatever comes after that. Also, what about 
	the Japanese companys that are working on or that have supers? From
	what I've read of the KM wars so far it seems to me that most
	are assuming that supers will not improve in speed by that much.
	I question whether this is a valid assumption to make. Also, there's
	always the possiblility of supers switching to optical or hybrid
	optical technologys. It seems foolish to me that super companys
	would just stand still and let the KM's go wizzing by.

>Now, about that memory bandwidth?
>
	Good question! And how about I/O bandwidth to disks? How about network
	bandwidth? Might as well include anything else that might bottleneck.
	Supers are not supers by MIPS alone, memory, disk and networking all
	go along with the super title.

			-Rob

davec@proton.amd.com (Dave Christie) (01/09/90)

In article <3101@umn-d-ub.D.UMN.EDU> rhealey@ub.d.umn.edu (Rob Healey) writes:
>	For the first part, if MIPS, or whoever, can do 200MHz by 1993 what
>	prevents CRI, as opposed to CCC, from comming up with a faster
>	version of it's architecture? 

Nothing.  But coming up with faster versions of classic supercomputers
has (IMHO) been much more difficult and costly, and the resulting
performance improvements not so spectacular, as compared to micros over
the past several years.  I think this will remain true for awhile,
although I think the micro performance growth curve will flatten as 
technology and the use of high performance implementation tricks matures.  

>	optical technologys. It seems foolish to me that super companys
>	would just stand still and let the KM's go wizzing by.

Of course they won't stand still, but like I said, its much more costly
for them to try to improve at the same rate, and of course that cost
gets passed on to the buyers, many of whom will be less and less willing 
to pay for it.  (But there will always be those who want max performance 
for application X at any cost.)

>
>>Now, about that memory bandwidth?
>>
>	Good question! And how about I/O bandwidth to disks? How about network
>	bandwidth? Might as well include anything else that might bottleneck.
>	Supers are not supers by MIPS alone, memory, disk and networking all
>	go along with the super title.

It's too early to use this argument - such comparisons will be more valid
in a year or two when KMs such as the R6000 appear in very high memory and
i/o bandwidth systems.  

I think a valid generalization (it's hard to come up with one in this
group!) is that the application domain of KM's is growing, while the
application domain of traditional supers is not - one could probably
say it's shrinking, considering the applications that will migrate
to parallel and high b/w KM systems as they become available.  And
trying to support the development of costly, exotic supers in the face
of a shrinking application base just isn't going to cut it.
---------------------
Dave Christie              My opinions only, not my employers.

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (01/09/90)

In article <3101@umn-d-ub.D.UMN.EDU> rhealey@ub.d.umn.edu (Rob Healey) writes:
>In article <39807@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:

>>(just making this up out of whole cloth) with a starting clock speed of

>	For the first part, if MIPS, or whoever, can do 200MHz by 1993 what
>	prevents CRI, as opposed to CCC, from comming up with a faster
>	version of it's architecture? Remember, CCC is the Cray 3 and CRI

>	what I've read of the KM wars so far it seems to me that most
>	are assuming that supers will not improve in speed by that much.

Actually, my point about Cray is that CCC has now revealed in its business
plan that the next generation machine is expected to have a 500 MHz clock.
This was reported in SC Review.   I based my calculations on that public
information.  Now, who knows, CRI and/or CCC may have have something
faster up their sleeves.  On the other hand, they may continue to have
trouble developing machines sufficiently faster than the current models
to justify the high price tag...        So:

1)  Nothing will stop SC vendors from producing faster versions of their
architectures.  They have well developed plans to do so.  If current and
soon to be planned machines materialize on schedule, the SC speed advantage
could be 10:1  On the other hand, it could also drop to less than 4:1,
which would be ominous for the SC vendors.

2)  The economic advantage that developed during the last year for KMs
will widen considerably, if nothing speeds up the development cycle of
the SCs.  For 1/100 the cost, you will be able to get a machine with 1/10
the CPU speed (to grossly oversimplify).

3)  The next major challenge will be memory bandwidth, because KM CPU speed
is outstripping memory bandwidth.

4)  Don't hold your breath on real I/O bandwidth.  The challenge will be to
put enough in there to get something reasonable out of the CPU.  I think
10-40 MBytes/sec is probably doable for KMs.  Cray can get over 10MB/sec
through the filesystem on *one* 12MB/sec disk; so could a KM.  Redundant Arrays
of Inexpensive Disks should be able to give us devices with 12 MB/sec speeds.
If you could have *1* equivalent Cray disk on your system for a reasonable
price, your single job speed could match what you would get on a Cray.

>	I question whether this is a valid assumption to make. Also, there's
>	always the possiblility of supers switching to optical or hybrid
>	optical technologys.

I don't argue with this.  Certainly, "supercomputers" will always, by
definition, be faster.  But, 6 years ago they were *vastly* more *cost
effective* than smaller systems.  Now, smaller systems have an apparent
edge.  The price disadvantage will get too large to ignore in the next
few years.  At this point, it appears that the commodity nature of the 
microprocessor business will push the future generations of supercomputers
to being built out of micro building blocks.



>	Good question! And how about I/O bandwidth to disks? How about network
>	bandwidth? Might as well include anything else that might bottleneck.
>	Supers are not supers by MIPS alone, memory, disk and networking all
>	go along with the super title.

Absolutely no argument there.  What do you call a machine which is as fast
as a CDC 7600, but has one programmed I/O slow asynch. SCSI disk controller
for I/O?


***********************
Provocation for the day:  Which of today's Killer Micro CPUs is the best
suited towards 16-64 CPU high memory bandwidth shared coherent memory 
supercomputer implementations?

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117       

jlg@lambda.UUCP (Jim Giles) (01/09/90)

From article <40043@ames.arc.nasa.gov>, by lamaster@ames.arc.nasa.gov (Hugh LaMaster):
> [...]                          What do you call a machine which is as fast
> as a CDC 7600, but has one programmed I/O slow asynch. SCSI disk controller
> for I/O?

Useless!!  Or, was it a trick question?  :-)






J. Giles

rpw3@rigden.wpd.sgi.com (Robert P. Warnock) (01/09/90)

In article <40043@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Xugh
LaMaster) writes:
+---------------
| 4)  Don't hold your breath on real I/O bandwidth.  The challenge will be to
| put enough in there to get something reasonable out of the CPU.  I think
| 10-40 MBytes/sec is probably doable for KMs.  Cray can get over 10MB/sec
| through the filesystem on *one* 12MB/sec disk; so could a KM...
+---------------

FDDI is 12.5 MB/s peak. Most KMs have some sort of FDDI board available
at this point, though the throughputs are somehwat disappointing at present.
However, we will see some KMs handling a full FDDI's worth of data by the
end of this year, even over VME busses. But many pundits already see FDDI
as "too slow" for a back-end I/O channel...

The ANSI X3T9.3 HPPI-PH interface (formerly Cray "HSC") is either 125 MB/s
or 250MB/s (that's "bytes", boys 'n' girls, not "bits"), depending on whether
you choose the 32- or 64-bit data path. Some KMs already have HPPI interfaces
which, as you might imagine, can't suck the pipe at full steam yet. By 18
months from now, that will have changed, as network and channel interfaces
start speaking directly to system memory busses, bypassing traditional
"I/O busses" such as VME.

X3T9.3 is also looking at a Fiber Channel (FC) version of HPPI. At least two
companies offer single chips *today* which take HPPI-speed 32-bit-word streams
and serialize them into gigabit/sec bit-serial streams, and vice-versa. The
holdup right now is affordable fiber-optoelectronics at gigabit speeds, but
that will change. And in the mean time, people are considering going back
to *copper* (but still bit-serial to keep the cables small) for "I/O bus"
distances, since wire is still cheap. [Yes, you can run at 1 Gb/s over cheap
coax, for some 10's of meters. That's about the same distance as HPPI-PH.]

The X3T9.3 meeting this week is considering possible ways to glue HPPI and
IPI-3 together. Fast future disks may (or may not) use IPI-3. Myself, I'd
like the disks to speak HPPI-FC directly, or something equivalent.

All in all, I think the KMs will have the I/O capability. It will just
be a little slower in coming than we'd like, and that due more to delays
in getting fast I/O devices at "KM prices" than in getting fast paths
into the KMs themselves.

And given HPPI (especially HPPI-FC), the idea of SC's as file servers
for KMs isn't totally ridiculous...

-Rob


-----
Rob Warnock, MS-9U/510		rpw3@sgi.com		rpw3@pei.com
Silicon Graphics, Inc.		(415)335-1673		Protocol Engines, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA  94039-7311

mac@ra.cs.Virginia.EDU (M. Alex Colvin) (01/09/90)

> What do you call a machine which is as fast
> as a CDC 7600, but has one programmed I/O slow asynch. SCSI disk controller
> for I/O?

unbalanced.

actually you call it an engine for some really powerful data compression
algorithms.

rpw3@rigden.wpd.sgi.com (Robert P. Warnock) (01/11/90)

As several people have pointed out, my recent article <47800@sgi.sgi.com>
had a couple of typos and a`bug in it. Consider this a retraction/correction:

First, when I said ANSI X3T9.3 HPPI-PH was either 125MB/s or 250MB/s,
that was a typo. I have had the bit->baud expansion of 4b/5b coding on
the brain lately, so for some stupid reason I multiplied the correct
numbers by 1.25 when posting! The data rate of HPPI is of course 100 or
200 MB/s (800 or 1600 Mb/s), depending on whether you have a 32 or 64
bit data path, respectively. (It's a 25MHz word rate in either case.)

And when I attributed the trademark "HSC" to Cray, I was confused. HPPI
was modelled after Cray HSX, to be sure, but "HSC" is a *DEC* trademark.

I apologize for the inaccuracies.

-Rob


-----
Rob Warnock, MS-9U/510		rpw3@sgi.com		rpw3@pei.com
Silicon Graphics, Inc.		(415)335-1673		Protocol Engines, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA  94039-7311