brooks@maddog.llnl.gov (06/04/89)
There has been quite a lot of discussion on what a computer architecture must have in it to be called a "superscalar." I thought I would contribute some real data to this discussion. Last week, I had the chance to benchmark Intel's i860 running at 33MHZ with an "alpha" compiler. The compiler did not take advantage of delayed branches yet, and did not use any of the dual or pipelined mode instructions. On a radiation transport Monte Carlo code, which is something we routinely crunch on supercomputers like the Cray machines, that wimpy little i860 with an alpha compiler outran the Cray 1S by 10% or so. I don't think that anyone, including myself, took the marketing hype that showed a little Cray machine on top of the i860 chip seriously. I don't think that even Intel took it seriously. At this point, for applications which mesh well with a cache, its not marketing hype. Of course, all the other microprocessor vendors are within 6 months or less of obtaining the same performance goal. The MIPS R3000 is probably within epsilon of this performance level, the rumored ECL RISC implementations from various vendors coming down the pike must be truely impressive. For those that might say one should have compared to the XMP or YMP, the XMP is 30% faster than the Cray 1S on this application, and the YMP is 50% faster yet. With good compilers the i860, particularly the announced 40MHZ part, or the rumored 50MHZ models, will be knocking on the door of the YMP pretty loudly. Needless to say, when the application starts missing cache (for any of the microprocessors) the performance rapidly drops into a hole when compared to the classic supercomputer. The microprocessor vendors now need to learn the last lesson in supercomputer architecture, which is getting adequate main memory bandwidth. Since interleaving memory chips with glue logic would raise cost too much, the micro vendors need to get in close collaboration with the memory chip vendors to get the interleaving done on the memory chips themselves. This may be a good way for the U.S. manufactures to get back into the memory chip biz. Design your micro with interleave control on the chip and then design your memory chips that have a compatible arangement, then don't tell the foreign memory chip vendors about the micro/memory chip interface until you get to market. Interleaving on the memory chip is not a difficult thing to do, one only has to decide that it is time to do it. Just in case the Intel marketing pukes might be tempted to use this posting for their own purposes, please read the disclaimer below: (C) Copyright 1989, by Eugene Brooks III, all rights reserved. This posting is the personal opinion solely of the author, and does not relflect the opinions of the U.S. Govt or the University of CA in any official capacity. This posting may be transmitted only on the USENET Newsgroup comp.arch, for the purposes of stimulating technical discussion, and may be excerpted for the purposes of further discussion on the USENET if the copyright is left in place. This posting may NOT be printed on paper, and may NOT be used for product endorsement purposes. brooks@maddog.llnl.gov, brooks@maddog.uucp
aglew@mcdurb.Urbana.Gould.COM (06/05/89)
>[Brooks] >Needless to say, when the application starts missing cache (for any of the >microprocessors) the performance rapidly drops into a hole when compared to the >classic supercomputer. The microprocessor vendors now need to learn the last >lesson in supercomputer architecture, which is getting adequate main memory >bandwidth. Since interleaving memory chips with glue logic would raise cost >too much, the micro vendors need to get in close collaboration with the memory >chip vendors to get the interleaving done on the memory chips themselves. This >may be a good way for the U.S. manufactures to get back into the memory chip >biz. Design your micro with interleave control on the chip and then design >your memory chips that have a compatible arangement, then don't tell the >foreign memory chip vendors about the micro/memory chip interface until you >get to market. Interleaving on the memory chip is not a difficult thing to do, >one only has to decide that it is time to do it. So, we're back to memory again. I hesitate to get involved, since we all heard Mark Johnson saying "Don't talk about, buy it!", to us intellectual weenies who can only talk about things - but I can only talk about it for the moment, so here goes: Q: how many processor chip vendors will be willing to tie themselves tightly into a memory manufacturer? Well, there are some companies who do both... but are you going to risk customers refusing to buy your processor chip because they have to use your memory chips with it? Prediction: people will be very slow to get into tightly coupled processor/memory. But when they do, the processor companies will probably put out both custom memory and non-custom memory processor chips - probably by last stage customization of the die. Ditto memory. This will probably be suboptimal.
mat@uts.amdahl.com (Mike Taylor) (06/05/89)
In article <26356@lll-winken.LLNL.GOV>, brooks@maddog.llnl.gov writes: > ... On a radiation transport Monte Carlo code, > which is something we routinely crunch on supercomputers like the Cray > machines, that wimpy little i860 with an alpha compiler outran the > Cray 1S by 10% or so. I presume this code doesn't vectorize well (or at all?) What % vector is it on the Cray? Just curious.... -- Mike Taylor ...!{hplabs,amdcad,sun}!amdahl!mat [ This may not reflect my opinion, let alone anyone else's. ]
brooks@vette.llnl.gov (Eugene Brooks) (06/06/89)
In article <14FU029t326G01@amdahl.uts.amdahl.com> mat@uts.amdahl.com (Mike Taylor) writes: >In article <26356@lll-winken.LLNL.GOV>, brooks@maddog.llnl.gov writes: >> ... On a radiation transport Monte Carlo code, >> which is something we routinely crunch on supercomputers like the XXXX >> machines, that wimpy little XXX with an alpha compiler outran the >> XXXX by 10% or so. Please note that my posting contained a copyright notice which specifically forbids excerpting such as this without including of the copyright. I don't mind excerpting sections of the posting which did not contain any reference to specific vendors, but this type of excerpt MUST include the copyright. >I presume this code doesn't vectorize well (or at all?) What % vector >is it on the Cray? Just curious.... To answer the question, the code vectorizes with extreme recoding effort on the Cray to get a factor of 5 in speed. Parallel machines based on the VLSI chips will still have a factor of 20 in cost and performance leverage after this is taken into account. brooks@maddog.llnl.gov, brooks@maddog.uucp
grunwald@flute.cs.uiuc.edu (Dirk Grunwald) (06/07/89)
If I remember correctly, electron transport code is seperable, and ports well to distributed memory multi-processors right? Intel has plans to produce a successor the iPSC/2 based on the the i860, and finally using a reasonable I/O architecture. They plan to produce 2048-node systems for DARPA. -- Dirk Grunwald -- Univ. of Illinois (grunwald@flute.cs.uiuc.edu)
brooks@vette.llnl.gov (Eugene Brooks) (06/09/89)
In article <GRUNWALD.89Jun7094931@flute.cs.uiuc.edu> grunwald@flute.cs.uiuc.edu writes: > >If I remember correctly, electron transport code is seperable, and ports well >to distributed memory multi-processors right? This was a photon transport code, and it evolved the interaction between photons and atoms in an implicit manner. The implicit coupling causes a linear system to get created as the result of the Monte Carlo and this must be solved for the atomic populations and resulting photon weights. Its not as trivial to parallelize as one might think, but the micro based machines have incredible leverage. >Intel has plans to produce a successor the iPSC/2 based on the the i860, and >finally using a reasonable I/O architecture. They plan to produce 2048-node >systems for DARPA. I have no comment on this rumor. brooks@maddog.llnl.gov, brooks@maddog.uucp
grunwald@flute.cs.uiuc.edu (Dirk Grunwald) (06/09/89)
In article <26641@lll-winken.LLNL.GOV> brooks@vette.llnl.gov (Eugene Brooks) writes: >Intel has plans to produce a successor the iPSC/2 based on the the i860, and >finally using a reasonable I/O architecture. They plan to produce 2048-node >systems for DARPA. I have no comment on this rumor. -- I got this from Federal Computer Week, April 10, 1989, ``Intel Lands DARPA Super Award''. Darpa gives $7.6M to Intel, Intel plows in $20M. 2048 i860 processors running at 60 to 80 Mhz., giving a machine ``50 to 100 times faster than YMP.'' Many quotes from Rattner. The 3 year project is called Touchstone. -- Dirk Grunwald -- Univ. of Illinois (grunwald@flute.cs.uiuc.edu)