fred@mot.UUCP (Fred Christiansen) (01/04/85)
[] Dec 27's Electronic Design makes reference to a 64-node parallel processor using 8086/87's having solved a high-order physics problem which, heretofore, folk had only had the temerity to try out on a Cray. I'm curious. Anyone know about this or know literature references? -------------------- Fred Christiansen, Motorola Microsystems, 2900 S Diablo Way, Tempe, AZ 85282 {allegra,ihnp4}!sftig!mot!fred {ihnp4,seismo}!ut-sally!oakhill!mot!fred {ihnp4,amdahl}!drivax!mot!fred arizona!asuvax!mot!fred
norm@rocksanne.UUCP (01/14/85)
The Jan 85 issue of ACM communications has a more detailed article on the Cosmic Cube for those interested.
rro@csu-cs.UUCP (Rod Oldehoeft) (01/19/85)
> [] > Dec 27's Electronic Design makes reference to a 64-node parallel processor > using 8086/87's having solved a high-order physics problem which, heretofore, > folk had only had the temerity to try out on a Cray. > I'm curious. Anyone know about this or know literature references? > -------------------- > Fred Christiansen, Motorola Microsystems, 2900 S Diablo Way, Tempe, AZ 85282 > {allegra,ihnp4}!sftig!mot!fred {ihnp4,seismo}!ut-sally!oakhill!mot!fred > {ihnp4,amdahl}!drivax!mot!fred arizona!asuvax!mot!fred The latest CACM has a special section on computer architecture with yet another RISC paper by Patterson, an article on the Cosmic Cube, and one on the Manchester dataflow machine. Good reading.
chrsbmw@pertec.UUCP (chris mihaly) (01/19/85)
> [] > Dec 27's Electronic Design makes reference to a 64-node parallel processor > using 8086/87's having solved a high-order physics problem which, heretofore, > folk had only had the temerity to try out on a Cray. > I'm curious. Anyone know about this or know literature references? > -------------------- Pice of Cake Yes, I have heard of it. I live in San Marino, which is about three minutes walking distance from Caltech and know several students at that Institution. I remember one of them talking about it. I knew that Caltech has been putting a considerable effort into multiple dimensional array processors using micro-processor. I was told that they were working on a 64 node 8087 w/8087 array, and that it had successfully completed the physics task. I do not have any information on particulars of the task, but it could be the very one you mentioned. I don't think there is much literature or if there is any whether Caltech would be willing to release it, but I will ask around and get back to you if I get anything. k -- Christopher D. Mihaly {ucbvax!unisoft | scgvaxd | trwrb | felix}!pertec!chrsbmw or {ucbvax!ucivax | trwrb | unisoft!pertec}!csuf!chrsbmw "But you told me to type rm * .o and it came back with 'rm: .o nonexistent'"
jww@bonnie.UUCP (Joel West) (02/05/85)
> [] > Dec 27's Electronic Design makes reference to a 64-node parallel processor > using 8086/87's having solved a high-order physics problem which, heretofore, > folk had only had the temerity to try out on a Cray. > I'm curious. Anyone know about this or know literature references? > -------------------- > Fred Christiansen, Motorola Microsystems, 2900 S Diablo Way, Tempe, AZ 85282 > {allegra,ihnp4}!sftig!mot!fred {ihnp4,seismo}!ut-sally!oakhill!mot!fred > {ihnp4,amdahl}!drivax!mot!fred arizona!asuvax!mot!fred I can add a little more. JPL/CalTech have several machines planned. The research on the "Hypercube" (as I have always heard it termed within JPL) is being funded by several different government organizations, each of which hopes to eventually use one to solve its own particular computational problems. A machine consisting of 16 x {8086, 8087, 256kb} is known as a "Mark II". The architecture encourages (2^N)-node networks by making the maximum distance between nodes to be N links; hence, "hypercube". I understand that different configurations of the Mark II are being built, up to possibly 128-node. The next version, a "Mark III", is tentatively set to be 64 x {16 mhz 68020, 68881, 1-4mb } for delivery in 1987. For my purposes (massive discrete event simulations) that begins to look interesting. I've heard claims that the 68020/68881 pair is faster than a VAX-11/780...can someone comment on this? Some also claim that 1024 x {8086...} would be better but I would strongly disagree (a side-flame I'll ignore.) I've also heard a rumor that a major firm plans to market its own Intel 386-based hypercube. I don't know enough about the 386 performance or schedule to know when this would be or whether the 68020 would be better. The problem of effectively using this computing power is non-trivial (ask the folks with Illiac IV). For simulation purposes, David Jefferson of UCLA (Jefferson@UCLA-LOCUS.ARPA) has come up with an interesting approach that JPL plans to try. -- Joel West CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037 jww@bonnie.UUCP (ihnp4!bonnie!jww) westjw@nosc.ARPA
davet@oakhill.UUCP (Dave Trissel) (02/08/85)
>>Dec 27's Electronic Design makes reference to a 64-node parallel processor >>using 8086/87's having solved a high-order physics problem which, heretofore, >>folk had only had the temerity to try out on a Cray. >> I'm curious. Anyone know about this or know literature references? >A machine consisting of 16 x {8086, 8087, 256kb} is known as a "Mark II". >The architecture encourages (2^N)-node networks by making the maximum distance >between nodes to be N links; hence, "hypercube". I understand that different >configurations of the Mark II are being built, up to possibly 128-node. I think it is important to size up the claims made for the power of multiple microprocessors tied together in ANY configuration. First lets look at the raw power available. The 8086 at 10 Mhz (its highest rated speed) can do at most 1.25 million integer operations per second (thats 32-bit register to register ADD.) The 8087 performs ADD and SUBTRACT floating-points at 20 us a shot (MUL is around 30 and DIV is around 40) at its highest rated speed of 5 Mhz. (Lets be good guys and forget for the moment that the 8086 cannot run faster than the 8087 which means it must run at 5 Mhz which lowers its 32-bit integer add rate to .625 MIPS.) Now the CRAY runs (I am quoting from memory but I don't think that I'm going to be far off) scalar rates of 30 Megaflops and vector rates of over 80. At the scalar rate of 30 Megaflops and assuming no interconnect overhead or idle time penalties on all 8087s it would take about 600 8087s to match the floating-point power of a CRAY! Thats right --- 600! Even if the cube had an array of 64 8086/8087 pairs its power would only be about one tenth that of a CRAY. (Cost wise though, 600 8086/8087 pairs would only run about 200 grand - substantially cheaper than the CRAY.) Assuming the same 30 MIPS figure for the CRAY integer processing it would only take about 50 8086's (at 10 Mhz) to match the CRAY. Even though these are ballpark figures, I think the conclusion to be had is quite obvious. The cube does not approach the power of a CRAY. >The next version, a "Mark III", is tentatively set to be 64 x {16 mhz 68020, >68881, 1-4mb } for delivery in 1987. For my purposes (massive discrete event >simulations) that begins to look interesting. I've heard claims that the >68020/68881 pair is faster than a VAX-11/780...can someone comment on this? Well true and false. At nonfloating-point operations the '020 runs from 20 percent to 80 percent faster than the 780. For floating-point (DEC gives out no timings) we figure the 780 is slightly faster for single precision, slightly slower for double and extended and moderately slower at transcendentals. So the result is that the MC68020/881 combination is from about the same to 80 percent faster than the VAX 11/780 depending upon what you are doing. Lets make the same ballpark comparison with the CRAY. Floating ADD/SUB is about 2.3 us on the MC68881. That still means you would need about 44 881s to match the power of the CRAY 30 Megaflops. This is a little more encouraging as fourty-four of something is more managable than 600 of something. The MC68020 runs 32-bit register to register operations at an impressive 8 MIPS, which would indicate that only four MC68020's would be needed to approach the integer power of a CRAY. (I am assuming a 30 MIP figure here for the CRAY. Corrections welcomed from those in the know. Sorry but my CRAY manual is in storage.) Fermii (sp?) Labs in Chicago have a serious proposal to build a CRAY power equivalent MC68020 multi-processor system. I have seen their prototype running on MC68000s and it along with the software they have developed is truely impressive. They are running ABSOFT FORTRAN on each node with a VAX 780 controlling the whole thing. However, thier nodes do not seem to be as closely coupled as those mentioned here about the Cube. I will post a synopsis of that machine if people are interested. >I've also heard a rumor that a major firm plans to market its own Intel >386-based hypercube. I don't know enough about the 386 performance or >schedule to know when this would be or whether the 68020 would be better. <<<WARNING -- THE FOLLOWING IS FROM A COMPETITOR OF INTEL --->>> We at Motorola have heard rumors that it is in for its fourth redesign and that now the on-chip instruction cache is being abandoned. I fail to see how any high performance chip can be effective without an on-chip cache of some type. (The EDN benchmarks on the MC68020 show an over 25 percent improvement when the cache is turned on.) Intel's sales pitch may give a clue about the 386's status. It is a polished presentation which attempts to prove that you don't need 32-bits for anything, and that the MC68020 is overkill. >The problem of effectively using this computing power is non-trivial >(ask the folks with Illiac IV). ... > Joel West > CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037 > jww@bonnie.UUCP (ihnp4!bonnie!jww) > westjw@nosc.ARPA I would have agreed 100 percent with that statement before I saw the Fermii Lab demo. Now I'm not so sure. It may be non-trivial but now I don't think its too difficult to tackle either. Of course, all responses welcome. Motorola Semiconductor Inc. Dave Trissel Austin, Texas {ctvax,siesmo,gatech,ihnp4}!ut-sally!oakhill!davet
jlg@lanl.ARPA (02/09/85)
While I agree with most of what this original poster said, I think the following is somewhat in error: > Even if the cube > had an array of 64 8086/8087 pairs its power would only be about one tenth > that of a CRAY. (Cost wise though, 600 8086/8087 pairs would only run about > 200 grand - substantially cheaper than the CRAY.) The cost of such a system would be MUCH hihger in order to make back research costs, pay for the labor that assembles the machine (which must be a nightmare), as well as the cost of memory (less than several million words would be inadequate for a machine of such projected power). A way to interface so many processors to memory efficiently has yet to be found and would add to the expense of the implementation. Quoted costs of the Hypercube project itself have ignored labor (they get grad students and researchers themselves to do it, and their salaries are figured seperately), they ignore parts (all of which are being donated for the project), and they ignore sales, distribution, etc. All of which would be reuired to make a commercial Hypercube feasible. J. Giles
hal@cornell.UUCP (Hal Perkins) (02/09/85)
>> Dec 27's Electronic Design makes reference to a 64-node parallel processor >> using 8086/87's having solved a high-order physics problem which, heretofore, >> folk had only had the temerity to try out on a Cray. >> I'm curious. Anyone know about this or know literature references? See the January issue of the Communications of the ACM for a paper on this.
cdshaw@watrose.UUCP (Chris Shaw) (02/12/85)
> While I agree with most of what this original poster said, I think the > following is somewhat in error: > > > Even if the cube > > had an array of 64 8086/8087 pairs its power would only be about one tenth > > that of a CRAY. (Cost wise though, 600 8086/8087 pairs would only run about > > 200 grand - substantially cheaper than the CRAY.) > > > The cost of such a system would be MUCH hihger in order to make back > research costs, pay for the labor that assembles the machine (which > must be a nightmare), as well as the cost of memory (less than several > million words would be inadequate for a machine of such projected power). > A way to interface so many processors to memory efficiently has yet to > be found and would add to the expense of the implementation. Quoted > costs of the Hypercube project itself have ignored labor (they get grad > students and researchers themselves to do it, and their salaries are > figured seperately), they ignore parts (all of which are being donated > for the project), and they ignore sales, distribution, etc. All of which > would be reuired to make a commercial Hypercube feasible. > > J. Giles Oh come on... 64 * $2000 = $128,000 , given a rough guess of the cost of 1 8086/8087 single-board computer with (say) 512 K of memory per board = $2000 The architecture of the Cosmic Cube is such that there is not a common pool of memory, but that each processor has its own memory and sends messages about the computation to other machines. As for labour and parts... the $2000 I mentioned is a price at QUANTITY ONE. ordering (say) multibus boards of the above configuration in 64's would cost you only 1/2 or 2/3 the price due to price breaks. And as for labour costs... Any idiot can plug cards into a card cage (the wiring for which is TRIVIAL by comparison to the 18-inch wire that you have to use with ECL (CRAY)) The real difference between the CosCube and a micro with 64 slots is that the backplane is much more complex, but as I mentioned, not as complex as the labour in wiring a CRAY. So, I submit that a cost of $200-300k per 64-Cube would not be too out to lunch at all, since it's really just 64 copies of an IBM PC !! ( :-) ) Yours 'til the baloney melts... Chris D Shaw
gnu@sun.uucp (John Gilmore) (02/13/85)
A recent newspaper (Electronic News?) contains a front-page announcement of a new Intel product containing a cube of 80286/80287's each of which has 512KB and seven(?) 82586 ethernet interfaces. (One is for global communications with the master control processor and the rest make up the edges of the cube, from the sketchy description.) There was no mention of software support in the article. Prices ranged up into the $500K zone. There was a Caltech connection on the design.
eugene@ames.UUCP (Eugene Miya) (02/14/85)
> >The problem of effectively using this computing power is non-trivial > >(ask the folks with Illiac IV). ... > > Joel West > > CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037 > > jww@bonnie.UUCP (ihnp4!bonnie!jww) > > westjw@nosc.ARPA > Motorola Semiconductor Inc. Dave Trissel > Austin, Texas {ctvax,siesmo,gatech,ihnp4}!ut-sally!oakhill!davet Sorry, we gave what's left of the Illiac to DEC's museum, no spare parts. The disks made nice coffee tables, wish I got one. I made a saying last Sept.: "reinventing the illiac again." Chuck's Cube is not the illiac, but programming any of these machines is not a piece of cake. Intel stopped by today and spoke about their cube versions the d5, d6, d7 systems. You can write them for info. Read the CACM paper for a sample C program. 'Applications' programs are going to look more like 'systems' programs as we increase parallelism. So long for now. --eugene miya NASA Ames Research Center {hplabs,ihnp4,dual,hao,vortex}!ames!aurora!eugene emiya@ames-vmsb.ARPA
bcase@uiucdcs.UUCP (02/15/85)
Now wait a minute.... The cosmic cube, in its 64 processor configuration, exists and is functional. It is quite price competitive, being about 2 orders of magnatude better in price/performance than the Cray. It has been used by someone (at Caltech) to work on a very large problem and although it took some weeks (or months, I cannot remember now), it was successful (and purchasing the Cray time would have been out of the question in this case). Perhaps more importantly for demonstrating the feasibility of this configuration, there is an article in the current Electronics Week describing the Intel version of this machine, the iPSC (Intel Personal Super Computer is one of the possible expansions). As with most parallel computers, this machine works best on a restricted class of problems; however, it is believed that the class includes alot of the most interesting ones. It is also believed that the demand for supercomputing is very elastic with respect to price; perhaps this explains Intel's entry into this market. Selling 64 8086s with each machine is a good deal for them also....
ross@dsd.UUCP (Evan Ross) (02/15/85)
As a matter of fact, Intel has announced the iPSC family of parallel computers which seem to be a commercial version of the Comsic Cube. Each node has 80286/80287, 512k Ram, 7 Point to point comm channels, one global comm channel, and an 82586 LAN coprocessor to handle all of the comm channels. There are three versions iPSC-d5 with 32 nodes @$150k, iPSC-d6 with 64 nodes @$275k, and the iPSC-d7 with 128 nodes @$520k. 'Shipments are expected to begin by late May'. Consult the February 11 issue of the E.E. Times for further details (propaganda, vaporware, wishful thinking...) -- Evan Ross decwrl!amd!fortune!dsd!ross "To oppose something is to maintain it. To oppose vulgarity is inevitably to be vulgar."
avie@cmu-cs-wb1.ARPA (Avadis Tevanian) (02/17/85)
This Cosmic Cube is really puzzling me... As I understand it, each processor has its own local memory, no memory is shared. Rather, message passing is used for communications. What puzzles me is why use point to point channels between processors (and do routing if a connection does not exist)? Wouldn't it be much simpler to use a dedicated ethernet? A 10mb ethernet should easily provide the necessary bandwidth for 64 or more processors. Since the ethernet would be dedicated, minimal protocols could be used, thus keeping the costs of managing the ethernet down. If 10mb is not enough bandwidth (which I highly doubt), it shouldn't be too tough to increase the bandwidth considering that the wire will be dedicated and can be very short (it won't need to run all around a building). Avie Tevanian
wjafyfe@watmath.UUCP (Andy Fyfe) (02/19/85)
In article <402@dsd.UUCP> ross@dsd.UUCP (Evan Ross) writes: >As a matter of fact, Intel has announced the iPSC family of parallel >computers which seem to be a commercial version of the Comsic Cube. >Each node has 80286/80287, 512k Ram, 7 Point to point comm channels, >one global comm channel, and an 82586 LAN coprocessor to handle all of >the comm channels. Actually, there are 8 82586 Ethernet chips, one for each communication channel. (I have actually seen (and touched) a board.) If there's interest (send me mail), I can post info from the announcement package. --andy fyfe ...!{decvax, allegra, ihnp4, et. al}!watmath!wjafyfe wjafyfe@waterloo.csnet
david@daisy.UUCP (David Schachter) (02/20/85)
8087s are available at 8 MHz now. (The samples we got a year ago ran at 10 MHz, no problem, in fact.) Of course they cost a bit more.... But not as much as a quasi-existent Motorola FPU. [I am not affiliated with Moto or Intel and I have no great love for the '86/'87 architecture. Just the facts, ma'am.] The opinions expressed herein are not necessarily those of Daisy Systems or any sapient lifeform. "If at first you don't succeed, quit. No use making a damn fool of yourself." -- W.C. Fields
rej@cornell.UUCP (Ralph Johnson) (02/20/85)
In article <166@cmu-cs-wb1.ARPA> avie@cmu-cs-wb1.ARPA (Avadis Tevanian) writes: > >What puzzles me is why use point to point channels between processors (and >do routing if a connection does not exist)? Wouldn't it be much simpler to >use a dedicated ethernet? A 10mb ethernet should easily provide the >necessary bandwidth for 64 or more processors. Since the ethernet would be >dedicated, minimal protocols could be used, thus keeping the costs of >managing the ethernet down. If 10mb is not enough bandwidth (which I highly >doubt), it shouldn't be too tough to increase the bandwidth considering that >the wire will be dedicated and can be very short (it won't need to run all >around a building). > I assume that most of the communication between processors consists of very short packets, i.e., a single floating point number. Ethernet is very inefficient when it is handling short packets, since it has a lot of overhead per packet. In actual practice, the 10mb bandwith is approximated only when packets are very long (perhaps 10KB, I forget). Also, I bet most of the algorithms for the Cosmic Cube are fairly synchronous, so all the processors would want to be broadcasting at the same time. Ethernet assumes that the net is not very loaded. A 10% loaded Ethernet is very rare. Also, Ethernet is not that cheap. Each connection runs a few hundred dollars. A straightforward serial connection would only be a few dollars, and a parallel port is even faster and almost as cheap (wiring costs, you know). As long as the interconnection pattern is regular and there are not too many processors (too many is more than the number that fit in one or two cabinets) the Cosmic Cube interconnection scheme should be cheap and simple. Ralph Johnson
eugene@ames.UUCP (Eugene Miya) (02/20/85)
<166@cmu-cs-wb1.ARPA> > This Cosmic Cube is really puzzling me... > > What puzzles me is why use point to point channels between processors (and > do routing if a connection does not exist)? Wouldn't it be much simpler to > use a dedicated ethernet? A 10mb ethernet should easily provide the > necessary bandwidth for 64 or more processors. Since the ethernet would be > dedicated, minimal protocols could be used, thus keeping the costs of > managing the ethernet down. If 10mb is not enough bandwidth (which I highly > doubt), it shouldn't be too tough to increase the bandwidth considering that > the wire will be dedicated and can be very short (it won't need to run all > around a building). > > Avie Tevanian to this and other articles about the Cube and the new Intel cube: the new intel cube does use ethernet controller chips for point to point communication [Justin Rattner, at Ames last week]. don't forget these are research machines and multiprocessor comunications is one area of research. the intel machine also has an extra 'global' ethernet for passing interrupts and the like. communications and massive memory are major problems with supercomputers. our cray sends 1.2 GB/sec to a solid state storage device. big bandwidth is a long term problem since you don't want your processors waiting very long, nor do you want big buffers for i/o, you don't want your processors calculating ethernet backoff after a collision has been detected. these machines are not cray replacements. big mainframe 308x machines outperform these cubes [cube i/o is especially poor]. tech's cube has i/o thru a single pe, intel has proposed cray-like disk stripping with disks off every pe. --eugene miya NASA Ames Research Center {hplabs,ihnp4,dual,hao,vortex}!ames!aurora!eugene emiya@ames-vmsb.ARPA
cdshaw@watrose.UUCP (Chris Shaw) (03/01/85)
The Intel machine has 80286 / 80287 co-processors pairs. Does anyone know how fast these boards should run ? (Assuming documented spec mean anything) Chris Shaw
brooks@lll-crg.ARPA (Eugene D. Brooks III) (03/05/85)
> The Intel machine has 80286 / 80287 co-processors pairs. > Does anyone know how fast these boards should run ? (Assuming documented > spec mean anything) > Chris Shaw These boards run at 8 mhz. The floating point speed should be almost double what we got on the "cosmic" cube. This was 25 microseconds for an expression like a = b * c; where all three are memory locations in double precision.
rpw3@redwood.UUCP (Rob Warnock) (03/05/85)
+--------------- | >What puzzles me is why use point to point channels between processors (and | >do routing if a connection does not exist)? Wouldn't it be much simpler to | >use a dedicated ethernet? | I assume that most of the communication between processors consists of very | short packets, i.e., a single floating point number... +--------------- Just went to a very interesting talk today at NASA/Ames given by Cleve Moler of Intel Scientific Computers, who make a commercial hypercube system (announced in net.arch previously). Don't know about Caltech's applications, but for Intel's, the messages tend to be fairly large vectors, actually. (Hundreds of floating-point numbers.) +--------------- | ... Ethernet is very | inefficient when it is handling short packets, since it has a lot of overhead | per packet. In actual practice, the 10mb bandwith is approximated only when | packets are very long (perhaps 10KB, I forget). +--------------- Well, not 10KB, since the maximum legal packet is 1518 bytes. The minimum packet size is 46 data bytes (64 total bytes including preamble, address, and CRC), and those can happen every 60.8 microseconds (51.2 for the packet and 9.6 "mikes" of inter-packet delay), or every 76 byte times. Let's see, that's a minimum efficiency of 46/76 or about 60%, in the absence of collisions. Packets of only 128 data bytes yield 81%; 256 bytes, 89.5%; and 1024 bytes, 97%. Even with collisions, channel efficiency stays high for packets over 128 bytes or so, but remember that in the backplane "bus" application here, the Ethernet channel is VERY short (much less than a bit time), so collisions are much less frequent. (Try solving the equations for efficiency in the original Ethernet paper for C = 10 Mbit/sec and T = 0.1 microsecond.) +--------------- | Also, I bet most of the algorithms for the Cosmic Cube are fairly | synchronous, so all the processors would want to be broadcasting at the | same time... +--------------- That didn't seem to be the case for the application problems I saw presented today -- concurrent, yes; "synchronous", no. Further, the targets of messages were always specific processors (processes, actually). Broadcast did not seem to be (yet) implemented. +--------------- | ... Ethernet assumes that the net is not very loaded. A 10% | loaded Ethernet is very rare. +--------------- True, a heavily-loaded Ethernet is rare in, say, a real-life office-automation environment. But Ethernet doesn't "assume" that, in fact, the access algorithm and total throughput are stable even under extreme overload. (See "Measured Performance of an Ethernet...", Shoch & Hupp.) The net will not collapse, as long as the rules are followed, and the thoroughput will be high if packets are a few hundred bytes or more. On a "bus" backplane, the throughput will be even higher (the number of "hosts" is smaller, and the "cable" is shorter.) +--------------- | Also, Ethernet is not that cheap. Each connection runs a few hundred | dollars. A straightforward serial connection would only be a few dollars, +--------------- Geez... I wonder why the Intel hypercube uses ETHERNET chips... EIGHT (8) OF THEM!!! ;-} ;-} And they use them for mere point-to-point links! Seriously, you should look at current chip prices. In "backplane" applications you don't need a full transceiver per connections, but can interconnect at the "transceiver cable" level (or even at TTL, if you supply clock). +--------------- | ... A straightforward serial connection would only be a few dollars, | and a parallel port is even faster and almost as cheap (wiring costs, you | know). +--------------- Sorry, most of the cost is NOT in the serialization, but in the bus interface, buffer handling, and line driving/receiving -- all things which a parallel interface also has to do. And the parallel interface doesn't have the noise immunity (at least not a cheap TTL one), while the Ethernet transceiver-cable driver/receivers cheerfully drive 50 meters over a shielded twisted pair (differential shifted-ECL levels). +--------------- | ...As long as the interconnection pattern is regular and there are | not too many processors (too many is more than the number that fit in one | or two cabinets) the Cosmic Cube interconnection scheme should be cheap | and simple. | | Ralph Johnson +--------------- I'd like to see you interconnect 128 processors in a hypercube using 50-pin ribbon cable! ;-} The interconnection pattern is regular, but it's not necessarily convenient! (Remember, each processor is a "corner", and as you "linearize" the Cube by putting it in a rack, the interconnects get to be a bit of a rat's nest.) Disclaimer: I am not selling the Intel method; I have some concerns about having that many high-speed point-to-point links on a memory bus. (I am an advocate of quasi-bus serial backplanes, rather than point-to-point). However, Intel's use of Ethernet chips is quite reasonable, given the connection pattern they chose, and is MUCH preferred to 8 parallel interfaces! Rob Warnock Systems Architecture Consultant UUCP: {ihnp4,ucbvax!dual}!fortune!redwood!rpw3 DDD: (415)572-2607 USPS: 510 Trinidad Lane, Foster City, CA 94404