[net.arch] Caltech's Cosmic Cube

fred@mot.UUCP (Fred Christiansen) (01/04/85)

[]
Dec 27's Electronic Design makes reference to a 64-node parallel processor
using 8086/87's having solved a high-order physics problem which, heretofore,
folk had only had the temerity to try out on a Cray.
	I'm curious.  Anyone know about this or know literature references?
--------------------
Fred Christiansen, Motorola Microsystems, 2900 S Diablo Way, Tempe, AZ 85282
{allegra,ihnp4}!sftig!mot!fred      {ihnp4,seismo}!ut-sally!oakhill!mot!fred
{ihnp4,amdahl}!drivax!mot!fred                       arizona!asuvax!mot!fred

norm@rocksanne.UUCP (01/14/85)

	The Jan 85 issue of ACM communications has a more detailed
	article on the Cosmic Cube for those interested.

rro@csu-cs.UUCP (Rod Oldehoeft) (01/19/85)

> []
> Dec 27's Electronic Design makes reference to a 64-node parallel processor
> using 8086/87's having solved a high-order physics problem which, heretofore,
> folk had only had the temerity to try out on a Cray.
> 	I'm curious.  Anyone know about this or know literature references?
> --------------------
> Fred Christiansen, Motorola Microsystems, 2900 S Diablo Way, Tempe, AZ 85282
> {allegra,ihnp4}!sftig!mot!fred      {ihnp4,seismo}!ut-sally!oakhill!mot!fred
> {ihnp4,amdahl}!drivax!mot!fred                       arizona!asuvax!mot!fred

The latest CACM has a special section on computer architecture with
yet another RISC paper by Patterson, an article on the Cosmic Cube,
and one on the Manchester dataflow machine.  Good reading.

chrsbmw@pertec.UUCP (chris mihaly) (01/19/85)

> []
> Dec 27's Electronic Design makes reference to a 64-node parallel processor
> using 8086/87's having solved a high-order physics problem which, heretofore,
> folk had only had the temerity to try out on a Cray.
> 	I'm curious.  Anyone know about this or know literature references?
> --------------------

Pice of Cake

	Yes, I have heard of it.  I live in San Marino, which is about
three minutes walking distance from Caltech and know several students
at that Institution.  I remember one of them talking about it.  I knew 
that Caltech has been putting a considerable effort into multiple dimensional
array processors using micro-processor.  I was told that they were working
on a 64 node 8087 w/8087 array, and that it had successfully completed
the physics task.  I do not have any information on particulars of the task,
but it could be the very one you mentioned.  I don't think there is much
literature or if there is any whether Caltech would be willing to release it,
but I will ask around and get back to you if I get anything.


k
-- 
	Christopher D. Mihaly
	{ucbvax!unisoft | scgvaxd | trwrb | felix}!pertec!chrsbmw
				or
	{ucbvax!ucivax | trwrb | unisoft!pertec}!csuf!chrsbmw

	"But you told me to type rm * .o and it came back with 
		'rm: .o nonexistent'"

jww@bonnie.UUCP (Joel West) (02/05/85)

> []
> Dec 27's Electronic Design makes reference to a 64-node parallel processor
> using 8086/87's having solved a high-order physics problem which, heretofore,
> folk had only had the temerity to try out on a Cray.
> 	I'm curious.  Anyone know about this or know literature references?
> --------------------
> Fred Christiansen, Motorola Microsystems, 2900 S Diablo Way, Tempe, AZ 85282
> {allegra,ihnp4}!sftig!mot!fred      {ihnp4,seismo}!ut-sally!oakhill!mot!fred
> {ihnp4,amdahl}!drivax!mot!fred                       arizona!asuvax!mot!fred

I can add a little more.  JPL/CalTech have several machines planned.  The 
research on the "Hypercube" (as I have always heard it termed within JPL) is 
being funded by several different government organizations, each of which hopes 
to eventually use one to solve its own particular computational problems.

A machine consisting of 16 x {8086, 8087, 256kb} is known as a "Mark II".
The architecture encourages (2^N)-node networks by making the maximum distance 
between nodes to be N links; hence, "hypercube".  I understand that different
configurations of the Mark II are being built, up to possibly 128-node.

The next version, a "Mark III", is tentatively set to be 64 x {16 mhz 68020, 
68881, 1-4mb } for delivery in 1987.  For my purposes (massive discrete event 
simulations) that begins to look interesting.  I've heard claims that the 
68020/68881 pair is faster than a VAX-11/780...can someone comment on this?
Some also claim that 1024 x {8086...} would be better but I would strongly
disagree (a side-flame I'll ignore.)  

I've also heard a rumor that a major firm plans to market its own Intel
386-based hypercube.  I don't know enough about the 386 performance or
schedule to know when this would be or whether the 68020 would be better.

The problem of effectively using this computing power is non-trivial
(ask the folks with Illiac IV).  For simulation purposes, David Jefferson
of UCLA (Jefferson@UCLA-LOCUS.ARPA) has come up with an interesting
approach that JPL plans to try.
-- 
	Joel West
	CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037
	jww@bonnie.UUCP (ihnp4!bonnie!jww)
	westjw@nosc.ARPA

davet@oakhill.UUCP (Dave Trissel) (02/08/85)

>>Dec 27's Electronic Design makes reference to a 64-node parallel processor
>>using 8086/87's having solved a high-order physics problem which, heretofore,
>>folk had only had the temerity to try out on a Cray.
>>      I'm curious.  Anyone know about this or know literature references?

>A machine consisting of 16 x {8086, 8087, 256kb} is known as a "Mark II".
>The architecture encourages (2^N)-node networks by making the maximum distance
>between nodes to be N links; hence, "hypercube".  I understand that different
>configurations  of the Mark II are being built, up to possibly 128-node.

I think it is important to size up the claims made for the power of multiple
microprocessors tied together in ANY configuration.  First lets look at the
raw power available.  The 8086 at 10 Mhz (its highest rated speed) can do
at most 1.25 million integer operations per second (thats 32-bit register to
register ADD.)  The 8087 performs ADD and SUBTRACT floating-points at
20 us a shot (MUL is around 30 and DIV is around 40) at its highest rated
speed of 5 Mhz. (Lets be good guys and forget for the moment that the 8086
cannot run faster than the 8087 which means it must run at 5 Mhz which lowers
its 32-bit integer add rate to .625 MIPS.)

Now the CRAY runs (I am quoting from memory but I don't think that I'm going
to be far off) scalar rates of 30 Megaflops and vector rates of over 80.
At the scalar rate of 30 Megaflops and assuming no interconnect overhead or
idle time penalties on all 8087s it would take about 600 8087s to match the
floating-point power of a CRAY!  Thats right --- 600!  Even if the cube
had an array of 64 8086/8087 pairs its power would only be about one tenth
that of a CRAY.  (Cost wise though, 600 8086/8087 pairs would only run about
200 grand - substantially cheaper than the CRAY.)

Assuming the same 30 MIPS figure for the CRAY integer processing it would only
take about 50 8086's (at 10 Mhz) to match the CRAY.

Even though these are ballpark figures, I think the conclusion to be had is
quite obvious.  The cube does not approach the power of a CRAY.

>The next version, a "Mark III", is tentatively set to be 64 x {16 mhz 68020,
>68881, 1-4mb } for delivery in 1987.  For my purposes (massive discrete event
>simulations) that begins to look interesting.  I've heard claims that the
>68020/68881 pair is faster than a VAX-11/780...can someone comment on this?

Well true and false.  At nonfloating-point operations the '020 runs from
20 percent to 80 percent faster than the 780.  For floating-point (DEC gives
out no timings) we figure    the 780 is slightly faster for single precision,
slightly slower for double and extended and moderately slower at
transcendentals. So the result is that the MC68020/881 combination is from
about the same to 80 percent faster than the VAX 11/780 depending upon what
you are doing.

Lets make the same ballpark comparison with the CRAY.  Floating ADD/SUB is
about 2.3 us on the MC68881.  That still means you would need about 44 881s
to match the power of the CRAY 30 Megaflops.  This is a little more
encouraging as fourty-four of something is more managable than 600 of
something. The MC68020 runs 32-bit register to register operations at an
impressive 8 MIPS, which would indicate that only four MC68020's would be
needed to approach the integer power of a CRAY.  (I am assuming a 30 MIP
figure here for the CRAY.  Corrections welcomed from those in the know. Sorry
but my CRAY manual is in storage.)

Fermii (sp?) Labs in Chicago have a serious proposal to build a CRAY power
equivalent MC68020 multi-processor system.  I have seen their prototype
running on MC68000s and it along with the software they have developed
is truely impressive.  They are running ABSOFT FORTRAN on each node with
a VAX 780 controlling the whole thing.  However, thier nodes do not seem
to be as closely coupled as those mentioned here about the Cube.
I will post a synopsis of that machine if people are interested.

>I've also heard a rumor that a major firm plans to market its own Intel
>386-based hypercube.  I don't know enough about the 386 performance or
>schedule to know when this would be or whether the 68020 would be better.

   <<<WARNING -- THE FOLLOWING IS FROM A COMPETITOR OF INTEL --->>>
We at Motorola have heard rumors that it is in for its fourth redesign and
that now the on-chip instruction cache is being abandoned. I fail to see
how any high performance chip can be effective without an on-chip cache
of some type.  (The EDN benchmarks on the MC68020 show an over 25 percent
improvement when the cache is turned on.)  Intel's sales pitch may give
a clue about the 386's status.  It is a polished presentation which attempts
to prove that you don't need 32-bits for anything, and that the MC68020 is
overkill.

>The problem of effectively using this computing power is non-trivial
>(ask the folks with Illiac IV). ...
>        Joel West
>        CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037
>        jww@bonnie.UUCP (ihnp4!bonnie!jww)
>        westjw@nosc.ARPA

I would have agreed 100 percent with that statement before I saw the Fermii
Lab demo.  Now I'm not so sure.  It may be non-trivial but now I don't think
its too difficult to tackle either.

Of course, all responses welcome.

Motorola Semiconductor Inc.                Dave Trissel
Austin, Texas          {ctvax,siesmo,gatech,ihnp4}!ut-sally!oakhill!davet

jlg@lanl.ARPA (02/09/85)

While I agree with most of what this original poster said, I think the
following is somewhat in error:


> Even if the cube
> had an array of 64 8086/8087 pairs its power would only be about one tenth
> that of a CRAY.  (Cost wise though, 600 8086/8087 pairs would only run about
> 200 grand - substantially cheaper than the CRAY.)


The cost of such a system would be MUCH hihger in order to make back
research costs, pay for the labor that assembles the machine (which
must be a nightmare), as well as the cost of memory (less than several
million words would be inadequate for a machine of such projected power).
A way to interface so many processors to memory efficiently has yet to
be found and would add to the expense of the implementation.  Quoted
costs of the Hypercube project itself have ignored labor (they get grad
students and researchers themselves to do it, and their salaries are
figured seperately), they ignore parts (all of which are being donated
for the project), and they ignore sales, distribution, etc.  All of which
would be reuired to make a commercial Hypercube feasible.

J. Giles

hal@cornell.UUCP (Hal Perkins) (02/09/85)

>> Dec 27's Electronic Design makes reference to a 64-node parallel processor
>> using 8086/87's having solved a high-order physics problem which, heretofore,
>> folk had only had the temerity to try out on a Cray.
>> 	I'm curious.  Anyone know about this or know literature references?

See the January issue of the Communications of the ACM for a paper on this.

cdshaw@watrose.UUCP (Chris Shaw) (02/12/85)

> While I agree with most of what this original poster said, I think the
> following is somewhat in error:
>
> > Even if the cube
> > had an array of 64 8086/8087 pairs its power would only be about one tenth
> > that of a CRAY.  (Cost wise though, 600 8086/8087 pairs would only run about
> > 200 grand - substantially cheaper than the CRAY.)
> 
> 
> The cost of such a system would be MUCH hihger in order to make back
> research costs, pay for the labor that assembles the machine (which
> must be a nightmare), as well as the cost of memory (less than several
> million words would be inadequate for a machine of such projected power).
> A way to interface so many processors to memory efficiently has yet to
> be found and would add to the expense of the implementation.  Quoted
> costs of the Hypercube project itself have ignored labor (they get grad
> students and researchers themselves to do it, and their salaries are
> figured seperately), they ignore parts (all of which are being donated
> for the project), and they ignore sales, distribution, etc.  All of which
> would be reuired to make a commercial Hypercube feasible.
> 
> J. Giles

Oh come on...
 64 * $2000 = $128,000 , given a rough guess of the cost of 1 8086/8087
single-board computer with (say) 512 K of memory per board = $2000

The architecture of the Cosmic Cube is such that there is not a common pool
of memory, but that each processor has its own memory and sends messages about
the computation to other machines.

As for labour and parts... the $2000 I mentioned is a price at QUANTITY ONE.
ordering (say) multibus boards of the above configuration in 64's would
cost you only 1/2 or 2/3 the price due to price breaks.

And as for labour costs...
Any idiot can plug cards into a card cage (the wiring for which is TRIVIAL
by comparison to the 18-inch wire that you have to use with ECL (CRAY))
The real difference between the CosCube and a micro with 64 slots is that
the backplane is much more complex, but as I mentioned, not as complex
as the labour in wiring a CRAY.

So, I submit that a cost of $200-300k  per 64-Cube would not be too out to 
lunch at all, since it's really just 64 copies of an IBM PC  !!  ( :-) ) 

				Yours 'til the baloney melts...
					Chris D Shaw

gnu@sun.uucp (John Gilmore) (02/13/85)

A recent newspaper (Electronic News?) contains a front-page announcement
of a new Intel product containing a cube of 80286/80287's each of
which has 512KB and seven(?) 82586 ethernet interfaces.  (One is for
global communications with the master control processor and the rest
make up the edges of the cube, from the sketchy description.)

There was no mention of software support in the article.
Prices ranged up into the $500K zone.  There was a Caltech connection
on the design.

eugene@ames.UUCP (Eugene Miya) (02/14/85)

> >The problem of effectively using this computing power is non-trivial
> >(ask the folks with Illiac IV). ...
> >        Joel West
> >        CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037
> >        jww@bonnie.UUCP (ihnp4!bonnie!jww)
> >        westjw@nosc.ARPA
> Motorola Semiconductor Inc.                Dave Trissel
> Austin, Texas          {ctvax,siesmo,gatech,ihnp4}!ut-sally!oakhill!davet

Sorry, we gave what's left of the Illiac to DEC's museum, no spare parts.
The disks made nice coffee tables, wish I got one.

I made a saying last Sept.:
	"reinventing the illiac again."
Chuck's Cube is not the illiac, but programming any of these machines is not
a piece of cake.  Intel stopped by today and spoke about their cube
versions the d5, d6, d7 systems. You can write them for info.  Read
the CACM paper for a sample C program. 'Applications' programs are going to
look more like 'systems' programs as we increase parallelism.

So long for now.

--eugene miya
  NASA Ames Research Center
  {hplabs,ihnp4,dual,hao,vortex}!ames!aurora!eugene
  emiya@ames-vmsb.ARPA

bcase@uiucdcs.UUCP (02/15/85)

Now wait a minute....

The cosmic cube, in its 64 processor configuration, exists and is
functional.  It is quite price competitive, being about 2 orders
of magnatude better in price/performance than the Cray.  It has
been used by someone (at Caltech) to work on a very large problem
and although it took some weeks (or months, I cannot remember now),
it was successful (and purchasing the Cray time would have been
out of the question in this case).  Perhaps more importantly for
demonstrating the feasibility of this configuration, there is an
article in the current Electronics Week describing the Intel version
of this machine, the iPSC (Intel Personal Super Computer is one
of the possible expansions).  As with most parallel computers,
this machine works best on a restricted class of problems; however,
it is believed that the class includes alot of the most interesting
ones.  It is also believed that the demand for supercomputing is
very elastic with respect to price; perhaps this explains Intel's
entry into this market.  Selling 64 8086s with each machine is a
good deal for them also....

ross@dsd.UUCP (Evan Ross) (02/15/85)

As a matter of fact, Intel has announced the iPSC family of parallel
computers which seem to be a commercial version of the Comsic Cube.
Each node has 80286/80287, 512k Ram, 7 Point to point comm channels,
one global comm channel, and an 82586 LAN coprocessor to handle all of
the comm channels.

There are three versions iPSC-d5 with 32 nodes @$150k, iPSC-d6 with
64 nodes @$275k, and the iPSC-d7 with 128 nodes @$520k.  'Shipments are 
expected to begin by late May'.

Consult the February 11 issue of the E.E. Times for further details
(propaganda, vaporware, wishful thinking...)

-- 
			Evan Ross   decwrl!amd!fortune!dsd!ross

"To oppose something is to maintain it.
 To oppose vulgarity is inevitably to be vulgar."

avie@cmu-cs-wb1.ARPA (Avadis Tevanian) (02/17/85)

This Cosmic Cube is really puzzling me... As I understand it, each processor
has its own local memory, no memory is shared.  Rather, message passing is
used for communications.

What puzzles me is why use point to point channels between processors (and
do routing if a connection does not exist)?  Wouldn't it be much simpler to
use a dedicated ethernet?  A 10mb ethernet should easily provide the
necessary bandwidth for 64 or more processors.  Since the ethernet would be
dedicated, minimal protocols could be used, thus keeping the costs of
managing the ethernet down.  If 10mb is not enough bandwidth (which I highly
doubt), it shouldn't be too tough to increase the bandwidth considering that
the wire will be dedicated and can be very short (it won't need to run all
around a building).

	Avie Tevanian

wjafyfe@watmath.UUCP (Andy Fyfe) (02/19/85)

In article <402@dsd.UUCP> ross@dsd.UUCP (Evan Ross) writes:
>As a matter of fact, Intel has announced the iPSC family of parallel
>computers which seem to be a commercial version of the Comsic Cube.
>Each node has 80286/80287, 512k Ram, 7 Point to point comm channels,
>one global comm channel, and an 82586 LAN coprocessor to handle all of
>the comm channels.

Actually, there are 8 82586 Ethernet chips, one for each communication
channel.  (I have actually seen (and touched) a board.)  If there's
interest (send me mail), I can post info from the announcement package.

--andy fyfe		...!{decvax, allegra, ihnp4, et. al}!watmath!wjafyfe
			wjafyfe@waterloo.csnet

david@daisy.UUCP (David Schachter) (02/20/85)

8087s are available at 8 MHz now.  (The samples we got a year ago ran at
10 MHz, no problem, in fact.)  Of course they cost a bit more.... But not
as much as a quasi-existent Motorola FPU.  [I am not affiliated with Moto
or Intel and I have no great love for the '86/'87 architecture.  Just the
facts, ma'am.]

The opinions expressed herein are not necessarily those of Daisy Systems
or any sapient lifeform.  

"If at first you don't succeed, quit.  No use making a damn fool of yourself."
          -- W.C. Fields

rej@cornell.UUCP (Ralph Johnson) (02/20/85)

In article <166@cmu-cs-wb1.ARPA> avie@cmu-cs-wb1.ARPA (Avadis Tevanian) writes:
>
>What puzzles me is why use point to point channels between processors (and
>do routing if a connection does not exist)?  Wouldn't it be much simpler to
>use a dedicated ethernet?  A 10mb ethernet should easily provide the
>necessary bandwidth for 64 or more processors.  Since the ethernet would be
>dedicated, minimal protocols could be used, thus keeping the costs of
>managing the ethernet down.  If 10mb is not enough bandwidth (which I highly
>doubt), it shouldn't be too tough to increase the bandwidth considering that
>the wire will be dedicated and can be very short (it won't need to run all
>around a building).
>

I assume that most of the communication between processors consists of very
short packets, i.e., a single floating point number.  Ethernet is very
inefficient when it is handling short packets, since it has a lot of overhead
per packet.  In actual practice, the 10mb bandwith is approximated only when
packets are very long (perhaps 10KB, I forget).

Also, I bet most of the algorithms for the Cosmic Cube are fairly
synchronous, so all the processors would want to be broadcasting at the
same time.  Ethernet assumes that the net is not very loaded.  A 10%
loaded Ethernet is very rare.

Also, Ethernet is not that cheap.  Each connection runs a few hundred
dollars.  A straightforward serial connection would only be a few dollars,
and a parallel port is even faster and almost as cheap (wiring costs, you
know).  As long as the interconnection pattern is regular and there are
not too many processors (too many is more than the number that fit in one
or two cabinets) the Cosmic Cube interconnection scheme should be cheap
and simple.

Ralph Johnson

eugene@ames.UUCP (Eugene Miya) (02/20/85)

<166@cmu-cs-wb1.ARPA>

> This Cosmic Cube is really puzzling me... 
> 
> What puzzles me is why use point to point channels between processors (and
> do routing if a connection does not exist)?  Wouldn't it be much simpler to
> use a dedicated ethernet?  A 10mb ethernet should easily provide the
> necessary bandwidth for 64 or more processors.  Since the ethernet would be
> dedicated, minimal protocols could be used, thus keeping the costs of
> managing the ethernet down.  If 10mb is not enough bandwidth (which I highly
> doubt), it shouldn't be too tough to increase the bandwidth considering that
> the wire will be dedicated and can be very short (it won't need to run all
> around a building).
> 
> 	Avie Tevanian

to this and other articles about the Cube and the new Intel cube:  the
new intel cube does use ethernet controller chips for point to point
communication [Justin Rattner, at Ames last week]. don't forget these are
research machines and multiprocessor comunications is one area of research.
the intel machine also has an extra 'global' ethernet for passing
interrupts and the like.

communications and massive memory are major problems with supercomputers.
our cray sends 1.2 GB/sec to a solid state storage device.
big bandwidth is a long term problem since you don't want your processors
waiting very long, nor do you want big buffers for i/o, you don't
want your processors calculating ethernet backoff after a collision
has been detected.

these machines are not cray replacements.  big mainframe 308x machines
outperform these cubes [cube i/o is especially poor].  tech's cube has
i/o thru a single pe, intel has proposed cray-like disk stripping
with disks off every pe.

--eugene miya
  NASA Ames Research Center
  {hplabs,ihnp4,dual,hao,vortex}!ames!aurora!eugene
  emiya@ames-vmsb.ARPA

cdshaw@watrose.UUCP (Chris Shaw) (03/01/85)

The Intel machine has 80286 / 80287 co-processors pairs.
Does anyone know how fast these boards should run ? (Assuming documented
spec mean anything)
Chris Shaw

brooks@lll-crg.ARPA (Eugene D. Brooks III) (03/05/85)

> The Intel machine has 80286 / 80287 co-processors pairs.
> Does anyone know how fast these boards should run ? (Assuming documented
> spec mean anything)
> Chris Shaw

These boards run at 8 mhz.  The floating point speed should be almost double
what we got on the "cosmic" cube.  This was 25 microseconds for an expression
like a = b * c; where all three are memory locations in double precision.

rpw3@redwood.UUCP (Rob Warnock) (03/05/85)

+---------------
| >What puzzles me is why use point to point channels between processors (and
| >do routing if a connection does not exist)?  Wouldn't it be much simpler to
| >use a dedicated ethernet?
| I assume that most of the communication between processors consists of very
| short packets, i.e., a single floating point number...
+---------------

Just went to a very interesting talk today at NASA/Ames given by Cleve
Moler of Intel Scientific Computers, who make a commercial hypercube
system (announced in net.arch previously). Don't know about Caltech's
applications, but for Intel's, the messages tend to be fairly large vectors,
actually. (Hundreds of floating-point numbers.)

+---------------
|                                                  ...  Ethernet is very
| inefficient when it is handling short packets, since it has a lot of overhead
| per packet.  In actual practice, the 10mb bandwith is approximated only when
| packets are very long (perhaps 10KB, I forget).
+---------------

Well, not 10KB, since the maximum legal packet is 1518 bytes. The minimum
packet size is 46 data bytes (64 total bytes including preamble, address,
and CRC), and those can happen every 60.8 microseconds (51.2 for the packet
and 9.6 "mikes" of inter-packet delay), or every 76 byte times. Let's see,
that's a minimum efficiency of 46/76 or about 60%, in the absence of collisions.
Packets of only 128 data bytes yield 81%; 256 bytes, 89.5%; and 1024 bytes, 97%.

Even with collisions, channel efficiency stays high for packets over 128
bytes or so, but remember that in the backplane "bus" application here, the
Ethernet channel is VERY short (much less than a bit time), so collisions
are much less frequent.  (Try solving the equations for efficiency in the
original Ethernet paper for C = 10 Mbit/sec and T = 0.1 microsecond.)

+---------------
| Also, I bet most of the algorithms for the Cosmic Cube are fairly
| synchronous, so all the processors would want to be broadcasting at the
| same time...
+---------------

That didn't seem to be the case for the application problems I saw presented
today -- concurrent, yes; "synchronous", no.

Further, the targets of messages were always specific processors (processes,
actually). Broadcast did not seem to be (yet) implemented.

+---------------
|         ... Ethernet assumes that the net is not very loaded.  A 10%
| loaded Ethernet is very rare.
+---------------

True, a heavily-loaded Ethernet is rare in, say, a real-life
office-automation environment. But Ethernet doesn't "assume" that, in
fact, the access algorithm and total throughput are stable even under
extreme overload. (See "Measured Performance of an Ethernet...", Shoch
& Hupp.) The net will not collapse, as long as the rules are followed,
and the thoroughput will be high if packets are a few hundred bytes or
more. On a "bus" backplane, the throughput will be even higher (the
number of "hosts" is smaller, and the "cable" is shorter.)

+---------------
| Also, Ethernet is not that cheap.  Each connection runs a few hundred
| dollars.  A straightforward serial connection would only be a few dollars,
+---------------

Geez... I wonder why the Intel hypercube uses ETHERNET chips... EIGHT (8)
OF THEM!!! ;-} ;-}  And they use them for mere point-to-point links!

Seriously, you should look at current chip prices. In "backplane" applications
you don't need a full transceiver per connections, but can interconnect at
the "transceiver cable" level (or even at TTL, if you supply clock).

+---------------
|      ...  A straightforward serial connection would only be a few dollars,
| and a parallel port is even faster and almost as cheap (wiring costs, you
| know).
+---------------

Sorry, most of the cost is NOT in the serialization, but in the bus interface,
buffer handling, and line driving/receiving -- all things which a parallel
interface also has to do. And the parallel interface doesn't have the noise
immunity (at least not a cheap TTL one), while the Ethernet transceiver-cable
driver/receivers cheerfully drive 50 meters over a shielded twisted pair
(differential shifted-ECL levels).

+---------------
|      ...As long as the interconnection pattern is regular and there are
| not too many processors (too many is more than the number that fit in one
| or two cabinets) the Cosmic Cube interconnection scheme should be cheap
| and simple.  | | Ralph Johnson
+---------------

I'd like to see you interconnect 128 processors in a hypercube using 50-pin
ribbon cable! ;-} The interconnection pattern is regular, but it's not
necessarily convenient! (Remember, each processor is a "corner", and as you
"linearize" the Cube by putting it in a rack, the interconnects get to be a
bit of a rat's nest.)

Disclaimer: I am not selling the Intel method; I have some concerns
about having that many high-speed point-to-point links on a memory bus.
(I am an advocate of quasi-bus serial backplanes, rather than point-to-point).
However, Intel's use of Ethernet chips is quite reasonable, given the
connection pattern they chose, and is MUCH preferred to 8 parallel interfaces!


Rob Warnock
Systems Architecture Consultant

UUCP:	{ihnp4,ucbvax!dual}!fortune!redwood!rpw3
DDD:	(415)572-2607
USPS:	510 Trinidad Lane, Foster City, CA  94404