brooks@maddog.llnl.gov (Eugene Brooks) (12/20/89)
Well, at SC'89 I speculated that the MIPS R6000, specifically in the MIPS 6280 box but then that doesn't really matter, would perform at roughly 2.5 times the speed of the XMP 4/16 CPU one some of my favorite SCALAR compute bound applications which I have burned some serious computer time on in the recent few years. The XMP 4/16 is the "fast one" for those that don't know, and the YMP is only 30% faster than the XMP 4/16 on the code in question. Well guys, I WAS WRONG! I wish to APOLOGIZE for for the terrible error! The 6280, and when I was given permission to share this bit of data I was also told to inform you that this was a preliminary result on a pre-production machine, has run at 3.3 times the performance of the XMP 4/16 CPU on a SCALAR packet switched network simulator. The R6000 is probably, for the very short fleeting moment in the lifetime of a KILLER MICRO, the FASTEST UNIPROCESSOR COMPUTER IN THE WORLD on this code. Of course, we have to keep in mind that this years Killer Micro is next years Lawn Sprinkler Controller, but what a year it has been and what a year the coming one will be! NO ONE WILL SURVIVE THE ATTACK OF THE KILLER MICROS! xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx COPYRIGHT 1989, Eugene D. Brooks III, all rights reserved. You are expressly forbidden to use this posting for product endorsment or advertising purposes, or to print it on paper to show to a customer for any reason. This posting is the personal opinion of the author and is in no way to be construed as the opinion of the U.S. govt. or the University of California. brooks@maddog.llnl.gov, brooks@maddog.uucp
rhealey@umn-d-ub.D.UMN.EDU (Rob Healey) (12/27/89)
In article <42007@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) writes: > >The 6280, and when I was given permission to share this bit of data I was >also told to inform you that this was a preliminary result on a pre-production >machine, has run at 3.3 times the performance of the XMP 4/16 CPU on a SCALAR >packet switched network simulator. > Was the code running on the XMP/YMP optimized to the Cray architecture as much as the code running on the R6000 was optimized to the MIPS architecture? How much time was spent on both in order to get to the code that gave the results above? Hmmm, I seem to remember the Cray's being touted for their VECTOR capability in addition to "respectable" SCALAR performance. Can the 6000 do seamless vector operations too? Just asking if comparing Apples to Oranges and saying Apples are better is a valid claim? Also, with appologys to DEC and Cray: Cray has it now... While I'm using my spare pocket change millions for other things right now, B^), one generally buys a high powered system for many reasons. The overall performance of the whole system, CPU, memory, I/O, networking strongly influences the sale of a system. I'd be interested to see the R6000 system that can beat a Cray in memory, I/O and networking bandwidth. My main reason for responding to this excited article is that I find it disturbing that ALOT of people pay attention only to MIPS, or only one aspect of a system, and not to full systems as a whole. To over simplify: A CPU is only as fast as it's slowest sub-system. Just some musings, -Rob #include <std/disclaimers.h> I speak for myself and noone else.
brooks@maddog.llnl.gov (Eugene Brooks) (12/28/89)
In article <3090@umn-d-ub.D.UMN.EDU> rhealey@ub.d.umn.edu (Rob Healey) writes: > Was the code running on the XMP/YMP optimized to the Cray architecture > as much as the code running on the R6000 was optimized to the MIPS > architecture? We have Cray machines on site, the MIPS R6000 system was a compile and go benchmark run done by the vendor. Just which system do you think the code was "tuned" for, within the limits of keeping the code portable, readable and maintainable? > Hmmm, I seem to remember the > Cray's being touted for their VECTOR capability in addition to > "respectable" SCALAR performance. Can the 6000 do seamless vector > operations too? The scalar performance of the Cray machines is no longer "respectable", is it! I do not believe that the R6000 has vector registers, but I haven't seen technical data on this issue. > > Just asking if comparing Apples to Oranges and saying Apples are better > is a valid claim? I was not comparing apples to oranges. I was comparing the performance of compiled C code on two computers... In the best notion of benchmarking, the code was one of MY compute bound applications. Your milage will vary. > > My main reason for responding to this excited article is that > I find it disturbing that ALOT of people pay attention only to > MIPS, or only one aspect of a system, and not to full systems as a > whole. To over simplify: My main reason to reponding to this article is that there are a lot of people with their heads in the sand who still think that traditional supercomputers or mainframes are good buys. I hate to see people get bushwacked by Killer Micros when they can just ride the wave. Killer Micro powered systems are no longer just more cost effective, for scalar application codes they are faster... brooks@maddog.llnl.gov, brooks@maddog.uucp
csimmons@.com (Charles Simmons) (12/28/89)
In article <42527@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) writes: [Description of a benchmark comparing the performance of a Cray versus the performance of an R6000.] >My main reason to reponding to this article is that there are a lot >of people with their heads in the sand who still think that traditional >supercomputers or mainframes are good buys. I hate to see people get >bushwacked by Killer Micros when they can just ride the wave. Killer >Micro powered systems are no longer just more cost effective, for >scalar application codes they are faster... > >brooks@maddog.llnl.gov, brooks@maddog.uucp The comparison would be slightly more interesting if an Amdahl 5990 were compared to the R6000. For scalar processing, Amdahl mainframes are (were?) generally considered the fastest obtainable... -- Chuck
rhealey@umn-d-ub.D.UMN.EDU (Rob Healey) (12/29/89)
In article <42527@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) writes: >In article <3090@umn-d-ub.D.UMN.EDU> rhealey@ub.d.umn.edu (Rob Healey) writes: >We have Cray machines on site, the MIPS R6000 system was a compile and go >benchmark run done by the vendor. Just which system do you think the code >was "tuned" for, within the limits of keeping the code portable, readable >and maintainable? > Could also be the fact that MIPS has some of the best compiler technology around. If I remember the Cray C compiler is a pcc derivitive; YUCK-O-RAMA. >> I find it disturbing that ALOT of people pay attention only to >> MIPS, or only one aspect of a system, and not to full systems as a >> whole. To over simplify: >My main reason to reponding to this article is that there are a lot >of people with their heads in the sand who still think that traditional >supercomputers or mainframes are good buys. I hate to see people get >bushwacked by Killer Micros when they can just ride the wave. Killer >Micro powered systems are no longer just more cost effective, for >scalar application codes they are faster... > Being my head is currently in Minnesota I'd say it might be in a snow bank but definitely NOT in the sand. What makes you think the bigger systems won't adopt the same technology as the "killer micros" and thus the costs go down. How good will your scalar 6000 do HUGE data sets that require movement to and from I/O? The MIPS performance of the 6000 may well beat a super or mainframe but what about scalar problems that require heavy I/O? Will your low cost workstation be able to handle those problems better? Super computers and mainframes ARE GREAT buys when LOTS of users need to be serviced. You'd be foolish to think 1000 users would be best served by networked workstations maxed out with disk and memory so they can run at top speed. That situation requires a heirarchy of disk, CPU and memory networked together very carefully. My original point is being totally ignored here: MIPS is useless if the data can't flow in and out of the CPU at the rating of the CPU. The "Killer Micro" is a glorified oscillator when it has to wait for I/O to complete. DON'T use a diskless "Killer Micro" low cost workstation to try to do REAL work. Let the manufacturer nickel and dime you for fast disks and fast memory in vast quantitys. While the MIPS arguement might work on the ignorant IBM PeeWee masses, technical people know better than to just look at one aspect of a problem and think the problem solved based only on that one aspect/criteria. When you solve a problem with a computer you have to weigh MIPS vs memory vs disk vs networking vs ??. You'll screw yourself over BIG TIME if you totally ignore any of the 4 in heavy favor of 1 or 2 of the factors. This is my point, it looks like I picked the wrong article to bring it out on. 'Nuff siad before we waste bandwidth on a subject most MIPS junkies will "stick their head in the sand" on... -Rob I speak for no one but myself, they'd ignore me anyways...
brooks@maddog.llnl.gov (Eugene Brooks) (12/29/89)
In article <3091@umn-d-ub.D.UMN.EDU> rhealey@ub.d.umn.edu (Rob Healey) writes: > Could also be the fact that MIPS has some of the best compiler > technology around. If I remember the Cray C compiler is a pcc > derivitive; YUCK-O-RAMA. No, a high quality optimizing, and vectorizing for that matter, C compiler was used on the Cray. It was the LLNL C/Civic hybrid compiler which uses the same back end and optimizer as our Civic Fortran compiler. The compiler was not a PCC derivitive. The code quality on the Cray was very good, the poor Cray supercomputer just couldn't be made to go faster at reasonable coding cost. We could have gotten another 50% out of the Cray in speed for 6 months coding work, and possibly a factor of 2 in one man year. The R6000 just compiled and ran the code 3.3 times faster. What choice would a sensible buyer of computer time make here??? > What makes you think > the bigger systems won't adopt the same technology as the "killer > micros" and thus the costs go down. I do think that "big systems" will adopt Killer Micro technology. Supercomputer system integrators which don't will not survive the the coming decade, and I personally doubt that they will survive the next 5 years. No one will survive the attack of the Killer Micros, except those system integrators and users who choose to ride the wave. > do HUGE data sets that require movement to and from I/O? The MIPS > performance of the 6000 may well beat a super or mainframe but > what about scalar problems that require heavy I/O? Will your > low cost workstation be able to handle those problems better? Yes, but I am not talking about a low cost work station here. I am referring to a system with a respectable number of Killer Micro processors. Vendors are integrating high performance and high reliability disk systems out of commodity disks just as vendors will integrate supercomputers out of Killer Micros. These disk systems are appearing on boxes in a price range which is dirt cheap compared to traditional supercomptuers but which is much more expensive than what what you would put on a desk. These are time shared computers for large numbers of users. > > Super computers and mainframes ARE GREAT buys when LOTS of > users need to be serviced. You'd be foolish to think 1000 users I think that that cold weather has gotten to your neurons. > My original point is being totally ignored here: Your original point is not being ignored, you are ignoring the high performance I-O systems that are appearing on Killer Micro powered systems. These high performance I-O systems are built of commodity disk drives and are much cheaper, while being faster, than high performance disk drives used on supercomputers. brooks@maddog.llnl.gov, brooks@maddog.uucp
rhealey@umn-d-ub.D.UMN.EDU (Rob Healey) (12/29/89)
In article <42600@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) writes: >I do think that "big systems" will adopt Killer Micro technology. >Supercomputer system integrators which don't will not survive the >the coming decade, and I personally doubt that they will survive the >next 5 years. No one will survive the attack of the Killer Micros, >except those system integrators and users who choose to ride the wave. > You sound like your definition of super computer is static. As we all know micro, mini, main and super are defined in terms relative to the others. Traditionally every level snarfs ideas from the level above as technology enables it to be done. Whatever technology a micro uses can obviously be used on a faster and more expensive scale in a super, why this is not already the case is probably due to the fact that the supers aren't threatened enough yet. >Yes, but I am not talking about a low cost work station here. I am referring >to a system with a respectable number of Killer Micro processors. Vendors >are integrating high performance and high reliability disk systems out >of commodity disks just as vendors will integrate supercomputers out of >Killer Micros. These disk systems are appearing on boxes in a price >range which is dirt cheap compared to traditional supercomptuers but which >is much more expensive than what what you would put on a desk. Hmmm, parallel OS technology, REAL stable stuff once you get above a dozen or so CPU's... Again, anything in the I/O systems can easily be improved upon in the next level up. The need for a computer with abilitys beyond the killer would still exist, the killer would still not eliminate the super. The super wouldn't necessarily be a bunch of micros thrown together in parallel either. >you are ignoring the high >performance I-O systems that are appearing on Killer Micro powered systems. >These high performance I-O systems are built of commodity disk drives >and are much cheaper, while being faster, than high performance disk drives >used on supercomputers. >I think that that cold weather has gotten to your neurons. > NOPE, I have high powered heaters for the neurons. B^) In order for the killer micros to beat out supers, supers would have to stand still in Parallel OS, I/O subsystems and implementation technologys. I sincerly doubt that will happen, the scale will shift as it always has. Micros will still be less powerful than supers, the definition of the terms makes that certain. There will always be super computers, there will just be more people using killer micros since that's all they can afford for what they need to do. But by the same token, there will always be a few problems that the killer micros just can't quite cut, this is where, by definition, supercomputers are usually used. As far as commodity disk drives go, let's hope our banks don't decide that commodity disks are more cost effective; OOOOOPS, lost a bit or two there Joe... Problem solved by volume shadowing and error correction technologys but geez that sounds familiar from somewhere... Again, the techniques for correction and detection can be improved if your data warrents it. To overuse yet another big boy phrase: One way or another, you get what you pay for. The killer micros will always be a notch or two below the killer supers in the real world. Just because supers haven't been threatened enough from below doesn't mean they won't bite back hard when they are. The 6000 in the original article was a VERY state of the art pre-production CPU, compare it's performance to a VERY state of the art pre-production super and see what the results are. Let's continue the banter via e-mail, I'm sure comp.arch is sick of us already. -Rob
chris@mimsy.umd.edu (Chris Torek) (12/29/89)
In article <42600@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) writes: >... I am not talking about a low cost work station here. (Note that the R6000-based MIPS system is expected to be in the $100k to $200k range, if I remember right: rather a bit more than your desktop $10k micro.) >Your original point is not being ignored, you are ignoring the high >performance I-O systems that are appearing on Killer Micro powered systems. >These high performance I-O systems are built of commodity disk drives >and are much cheaper, while being faster, than high performance disk drives >used on supercomputers. A note of caution here: they are cheaper, but not (yet) faster. The CM Datavault (or whatever they are calling it these days) runs 39 SCSI disks in parallel (32 bits + ECC). These are doing fairly well if they sustain > 1 MB/s each, so a Datavault gets ~32 MB/s. With IPI disks expected to do 8 MB/s each in the near future, a Datavault style system could do 256 MB/s: still slower than Cray, but quite respectable. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
rpeglar@csinc.UUCP (Rob Peglar x615) (12/29/89)
Eugene (Brooks) has already responded to this, quite elegantly. Just wanted to throw in my $.02. In article <3091@umn-d-ub.D.UMN.EDU>, rhealey@umn-d-ub.D.UMN.EDU (Rob Healey) writes: > Being my head is currently in Minnesota I'd say it might be in > a snow bank but definitely NOT in the sand. What makes you think > the bigger systems won't adopt the same technology as the "killer > micros" and thus the costs go down. How good will your scalar 6000 > do HUGE data sets that require movement to and from I/O? The MIPS > performance of the 6000 may well beat a super or mainframe but > what about scalar problems that require heavy I/O? Will your > low cost workstation be able to handle those problems better? ^^^^^^ If the meaning of "better" is absolute performance, no, not today - but as Eugene says, "ride the wave". No will be yes, and Soon. If, on the other hand, the meaning is "price/performance", most assuredly yes, yes, yes - today, tomorrow, and forever more. There are many meanings. Be specific. As far as being in the snow banks, I'm there too - fortunately, it's my feet, not my head :-) > > Super computers and mainframes ARE GREAT buys when LOTS of > users need to be serviced. You'd be foolish to think 1000 users > would be best served by networked workstations maxed out with disk > and memory so they can run at top speed. That situation requires a > heirarchy of disk, CPU and memory networked together very carefully. Look around you. The very same "systems" - in the broad sense of the word - i.e. many components - are indeed overtaking centralized, vertical machines. There was a long thread on this topic (degree of centralization) a while back. Personally, I held the same opinion (as described above) for many years, and have since changed my mind. Rather, opened my mind. As in almost every problem to be solved, there are many solutions. Use what's best for you - and don't be afraid to change. If a large super serving 200 people gives those people the most "numerator" (MIPS,Flops, I/Os,etc.) for the "denominator" (dollars,time,effort,etc.etc) then great, use the super. If not, swallow hard and accept the Killer Micro as a fact of life. > > My original point is being totally ignored here: > > MIPS is useless if the data can't flow in and out of the CPU > at the rating of the CPU. The "Killer Micro" is a glorified oscillator > when it has to wait for I/O to complete. DON'T use a diskless > "Killer Micro" low cost workstation to try to do REAL work. Let the > manufacturer nickel and dime you for fast disks and fast memory in > vast quantitys. While the MIPS arguement might work on the > ignorant IBM PeeWee masses, technical people know better than to > just look at one aspect of a problem and think the problem solved > based only on that one aspect/criteria. When you solve a problem with > a computer you have to weigh MIPS vs memory vs disk vs networking vs ??. > You'll screw yourself over BIG TIME if you totally ignore any of the 4 > in heavy favor of 1 or 2 of the factors. > > This is my point, it looks like I picked the wrong article to bring it > out on. Editoral note - you aren't scoring any points for phrases like "REAL work", "PeeWee masses", and "BIG TIME". Anyway, you should carefully look at the issue of CPU starvation on some of the very machines you tout - like the Cray-2. Some (not all) of the smaller machines exhibit much less CPU starvation. The ETA-10 is (was) another notable example of real and potential CPU starvation as an architectural flaw. There will always be room for big supers. The room, however, is becoming smaller. Don't get squeezed. Rob -- Rob Peglar Control Systems, Inc. 2675 Patton Rd., St. Paul MN 55113 ...uunet!csinc!rpeglar 612-631-7800 The posting above does not necessarily represent the policies of my employer.
mccalpin@stat.fsu.edu (John Mccalpin) (12/30/89)
In article <158@csinc.UUCP> rpeglar@csinc.UUCP (Rob Peglar x615) writes: > >Anyway, you should carefully look at the issue of CPU starvation on some >of the very machines you tout - like the Cray-2. Some (not all) of the >smaller machines exhibit much less CPU starvation. The ETA-10 is (was) >another notable example of real and potential CPU starvation as an >architectural flaw. It seems odd to mention the Cray-2 and the ETA-10 in the same sentence with regard to "CPU starvation". It seems to me that the ETA-10 is a much more balanced design with regard to memory bandwidth -- I don't know about I/O speeds past the shared memory, though... With the most recent release of the operating system, we have gotten paging rates of >500 MB/s on thrashing jobs. This is almost 1/2 of the physical I/O bandwidth to shared memory. Earlier system software certainly left the cpu hungry, but the hardware is capable of some pretty tremendous bandwidth, and the software is finally starting to catch up.... >There will always be room for big supers. The room, however, is becoming >smaller. Don't get squeezed. When Cray Research was founded, they estimated a world market for supercomputers that was in the neighborhood of 40 units. Maybe they weren't so far off after all! Anyway, here at FSU we have been pushing the KILLER MICRO bandwagon, too. Lets get all those !@#$%^&* scalar jobs _off_ of our vector machines and onto the killer micros where they belong.... Then those of us who can effectively use the vector machines will have more time available. By the way, I estimate the the (soon-to-be-installed) FSU Cray Y/MP-4/432 will only be about 125 times as fast as the new MIPS "KILLER MICRO from HELL" on my code. Yep, they are closing the gap all right.... >Rob Peglar Control Systems, Inc. 2675 Patton Rd., St. Paul MN 55113 >...uunet!csinc!rpeglar 612-631-7800
mcdonald@aries.scs.uiuc.edu (Doug McDonald) (12/30/89)
>> >> My original point is being totally ignored here: >> >> MIPS is useless if the data can't flow in and out of the CPU >> at the rating of the CPU. The "Killer Micro" is a glorified oscillator >> when it has to wait for I/O to complete. DON'T use a diskless >> "Killer Micro" low cost workstation to try to do REAL work. Let the >> manufacturer nickel and dime you for fast disks and fast memory in >> vast quantitys. While the MIPS arguement might work on the >> ignorant IBM PeeWee masses, technical people know better than to >> just look at one aspect of a problem and think the problem solved >> based only on that one aspect/criteria. Well, I am a member of both the IBM PeeWee masses and a "technical person". This comment is so obvious that should be obvious - but I guess that it isn't to the above poster. >>technical people know better than to >> just look at one aspect of a problem and think the problem solved >> based only on that one aspect/criteria. When you solve a problem with >> a computer you have to weigh MIPS vs memory vs disk vs networking vs ??. >> You'll screw yourself over BIG TIME if you totally ignore any of the 4 >> in heavy favor of 1 or 2 of the factors. >> This is quite true - BUT - and a big but - when you DO look at the big picture, you will find that some people need only MIPS, others (the IBM mainframe accounting crowd?) mainly IO bandwidth, others need abnormally large memory. Once you get to the final decision of benchmarking systems to buy, you may well want to weigh one aspect at 90% of the total decision. The problem with the IBM mainframes and the Cray supercomputers is that they have very large very expensive IO systems that some people RIGHT NOW simply don't need. That is (one reason ) why killer micros are selling so very well. >>DON'T use a diskless >> "Killer Micro" low cost workstation to try to do REAL work. It is this statement that I find offensive. For some people it is indeed what is needed for "real work". I once had the fastest computer in the world run for 16 hours with ZERO "I" requests (literally) and only a few kilobytes of "O". (This was long ago on the Illiac IV and it was a miracle that it didn't die in the 16 hours - but it was free.) Doug McDonald
brooks@maddog.llnl.gov (Eugene Brooks) (12/30/89)
In article <787@stat.fsu.edu> mccalpin@stat.fsu.edu (John Mccalpin) writes: >By the way, I estimate the the (soon-to-be-installed) FSU Cray >Y/MP-4/432 will only be about 125 times as fast as the new MIPS "KILLER >MICRO from HELL" on my code. Yep, they are closing the gap all right.... You you care to enlighten the masses with regard to the basis for this estimate? brooks@maddog.llnl.gov, brooks@maddog.uucp
brooks@maddog.llnl.gov (Eugene Brooks) (12/30/89)
In article <1989Dec28.000031.14774@oracle.com> csimmons@oracle.UUCP (Charles Simmons) writes: >The comparison would be slightly more interesting if an Amdahl 5990 >were compared to the R6000. For scalar processing, Amdahl mainframes >are (were?) generally considered the fastest obtainable... I am never one to pass up a chance to collect data... Lets do it! Has anyone got access to an Amdahl 5990 with a decent C compiler? brooks@maddog.llnl.gov, brooks@maddog.uucp
mccalpin@stat.fsu.edu (John Mccalpin) (12/31/89)
In article <787@stat.fsu.edu> I wrote: >By the way, I estimate the the (soon-to-be-installed) FSU Cray >Y/MP-4/432 will only be about 125 times as fast as the new MIPS "KILLER >MICRO from HELL" on my code. Yep, they are closing the gap all right.... In article <42701@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) asked: >Would you care to enlighten the masses with regard to the basis for >this estimate? >brooks@maddog.llnl.gov, brooks@maddog.uucp The estimate is based on the _observed_ performance of an 8-processor Cray Y/MP vs a 25 MHz R-3000 (SGI 4D/2x0). The speed ratio in that case is 536:1, and this Cray is an internal machine with a 6.5 ns clock, rather than the 6 ns clock that will be installed at FSU. So applying some scaling suggests that a 4-cpu Cray Y/MP at 6 ns will be about 290 times as fast as the R-3000 box. Then scale the MIPS cpu speed by the ratio of the clocks of the R-6000 to R-3000 to reduce this ratio to about 120:1. (I am assuming a 60 MHz clock on the R-6000 --- I don't know what the exact value will be....). Since the code is highly parallelizable, a multi-processor R-6000 based machine will show good speedups up to about 16 processors. The experience on the Cray and Ardent machines suggests that a speedup of 12x should be possible on a 16-cpu system. However, multi-processor Cray Y/MP's exist today, and multi-processor R-6000 machines do not.... The code is a hybrid finite-element/finite-difference ocean circulation model written in portable FORTRAN-77. The calculations are all done in 64-bit precision, and require 64-bits for reasonable accuracy. This is all just as excuse to remind Eugene :-) that some users will still be able to make effective use of vector supercomputers. In Price/Performance ratios, the scalar KILLER MICRO's are not even significantly ahead of the traditional supercomputers on optimal codes. They are certainly not _yet_ competitive with regard to turnaround time on large vector jobs, though I agree that that will change soon as 8-16 cpu machines in the R-6000 class become available. My next project is porting this code to a Connection Machine CM-2. I anticipate about the same performance as the 8-processor Y/MP, but in a much more scalable architecture, and at about 1/4 of the price.
brooks@maddog.llnl.gov (Eugene Brooks) (12/31/89)
In article <788@stat.fsu.edu> mccalpin@stat.fsu.edu (John Mccalpin) writes: >So applying some scaling suggests that a 4-cpu Cray Y/MP at 6 ns will >be about 290 times as fast as the R-3000 box. Then scale the MIPS cpu So to really compare one processor to one processor, as any reasonable person would do, we divide the 290 by 4 to get a ratio of 72 for the 6 NS Y to the R3000. This is the kind of single cpu speed ratio that we see here, and expect at this point, for codes running near 100% vectorization levels. If you take the manufactuer's hint of a speed ratio of 2.5 between the R3000 and the R6000 you get a factor of 29 for the YMP vs the R6000. Now the ONE data point I have indicates that the ratio between the R3000 and the R6000 can be as good as 2.7, so I am inclined to believe the manufacturers estimate which is lower. I do not know what kind of a deal you fellows got on a Y, but an 8 processor Y with 32 megawords (thats 32 megabytes per cpu) cost (system cost, disk drives included) around 3 million per processor. Yes, we are looking at increasing the size for the memory of the one here, at a cost I don't care to mention in an open forum. The single cpu R6000 is going to be between 100K and 200K depending on whether you go for more memory per cpu, and many gigabytes of disk. The bottom line, roughly 30 times the speed for 30 times the cost for code which is fully vectorized on the Y. There is an absolute performance advantage but no cost-performance advantage. If your code is not 99% vectorized, however, you are very foolish to run it on a traditional supercomputer cpu. As you correctly point out. >This is all just as excuse to remind Eugene :-) that some users will >still be able to make effective use of vector supercomputers. In I pointed out in my posting that Killer Micros have overrun traditional supercomputers in scalar performance. I qualified this very explicitly in my posting. The notion that I need to be reminded that traditional supercomputers are still hanging in there for codes which are nearly 100% vectorized is silly. brooks@maddog.llnl.gov, brooks@maddog.uucp
mccalpin@stat.fsu.edu (John Mccalpin) (12/31/89)
In article <788@stat.fsu.edu> I wrote: >So applying some scaling suggests that a 4-cpu Cray Y/MP at 6 ns will >be about 290 times as fast as the R-3000 box. Then scale the MIPS cpu.... To which brooks@maddog.llnl.gov (Eugene Brooks) replied: >So to really compare one processor to one processor, as any reasonable person ^^^^^^^^^^^^^^^^^^^^^^^^ >would do, we divide the 290 by 4 to get a ratio of 72 for the 6 NS Y to the ^^^^^^^^ >R3000. [...details deleted...] you get a factor of 29 for the YMP vs the R6000. So, I am not a reasonable person? I compared a configuration of the Cray which is _smaller_ than the one I ran on with the only configuration of the MIPS product that I have even heard of. The MIPS is not even announced yet as a single-processor, so it is giving a slight advantage to the killer micro, since it is comparing a delivered system to an unannounced one.... Maybe I should use single-cpu performance comparisons with my Connection Machine results? :-) >The bottom line, roughly 30 times the speed for 30 times >the cost for code which is fully vectorized on the Y. There is an absolute >performance advantage but no cost-performance advantage. If the MIPS box had enough memory and disk to run the same jobs that I run on the Cray, then the Cray should be about 2 times more cost-effective in that naive measure. Of course if I have a job that takes 100 hours on an 8-processor Y/MP, then I would have to wait for 59 weeks on the (almost) equally cost-effective "KILLER MICRO from HELL". >If your code is not 99% vectorized, however, you are very foolish to run >it on a traditional supercomputer cpu. As you correctly point out. Well, I didn't say "very foolish", but as a taxpayer I would prefer people to use the more expensive of the government-owned machines only for jobs that they are reasonably cost-effective for.... I wrote: >This is all just as excuse to remind Eugene :-) that some users will >still be able to make effective use of vector supercomputers. In Eugene replied: >I pointed out in my posting that Killer Micros have overrun traditional >supercomputers in scalar performance. I qualified this very explicitly >in my posting. The notion that I need to be reminded that traditional >supercomputers are still hanging in there for codes which are nearly 100% >vectorized is silly. >brooks@maddog.llnl.gov, brooks@maddog.uucp That's what the smiley face was there for.... By the way, the most cost-effective machine on my code is the new Stardent 3000. It runs at about 1/15 of the speed of the Cray on a per-cpu basis and is less than 1/50 of the cost.... Too bad I can't afford one!
shekita@provolone.cs.wisc.edu (E Shekita) (01/01/90)
Speaking of killer micros from hell: In case anyone missed it, MIPS went public recently on the OTC market.
brooks@maddog.llnl.gov (Eugene Brooks) (01/01/90)
In article <791@stat.fsu.edu> mccalpin@stat.fsu.edu (John Mccalpin) writes: >In a short series of articles, Eugene Brooks and I have been flaming >back and forth (in a reasonably light-hearted sort of way) about the >relative merits of vector supercomputers vs KILLER MICROS. I think that we can cut back on the flames a bit.... Actually, this line of discussion started out with a posting of a real measurement for a specific "very scalar" code on the XMP4/16 CPU and on a R6000. I speculated that the R6000 is the fastest single CPU computer in the world on this specific code. I will provide the code to any person who would like to run the code on another machine and disprove this speculation. I will even take accurate simulation results for any traditional supercomputer which will BE DELIVERED in the same time frame as the R6000's lifetime, which I estimate to be at a close one year from now. I also speculated that the R6000 is the first Killer Micro to cleanly overrun traditional supercomputers for scalar dominated computation. I suggest that we compare the performance of the 5 scalar LLNL loops for the YMP and the R6000, when MIPS cares to release the figures, as a way to decide on this question. We should also compare the R6000 to the Japanese machines which are currently on the market as well, I understand their their scalar performance is quite impressive. The notion that the R6000 is not really here yet, that it is not yet being delivered, is a red herring. Its lifetime will be over, having been replaced by much meaner hardware, before any of the next generation of traditional supers are ready for benchmarking. How well KMs are doing on vectorizable workloads, for the purposes of this duscussion, is a red herring. I prefer to wait for the appropriate time to discuss KMs overrunning traditional supercomputers for vector workloads. Killer Micros have been quite brilliant in their strategy of market conquest, they have always waited for a clean unambigous kill of their prey before visibly moving into the frey. Lets wait till they make their move to worry about splitting the hairs on the issue. brooks@maddog.llnl.gov, brooks@maddog.uucp
rpeglar@csinc.UUCP (Rob Peglar x615) (01/02/90)
In article <787@stat.fsu.edu>, mccalpin@stat.fsu.edu (John Mccalpin) writes: > In article <158@csinc.UUCP> rpeglar@csinc.UUCP (Rob Peglar x615) writes: > > > >Anyway, you should carefully look at the issue of CPU starvation on some > >of the very machines you tout - like the Cray-2. Some (not all) of the > >smaller machines exhibit much less CPU starvation. The ETA-10 is (was) > >another notable example of real and potential CPU starvation as an > >architectural flaw. > > It seems odd to mention the Cray-2 and the ETA-10 in the same sentence > with regard to "CPU starvation". It seems to me that the ETA-10 is a > much more balanced design with regard to memory bandwidth -- I don't > know about I/O speeds past the shared memory, though... With the most > recent release of the operating system, we have gotten paging rates of > >500 MB/s on thrashing jobs. This is almost 1/2 of the physical I/O > bandwidth to shared memory. Earlier system software certainly left the > cpu hungry, but the hardware is capable of some pretty tremendous > bandwidth, and the software is finally starting to catch up.... Sounds like the work Chris' group (particularly JPH) is finally bearing fruit - seven months too late..... :-( McCalpin is correct about the ETA-10 being a "more" balanced design. Let's take a look at the ETA-10 from the "external" memory perspective - ignoring the "internal" (e.g. RNI) paths from 1st level store to CPU(s). Take my word for it, the internal paths from 1st level store to the CPUs are sufficient. Otherwise, multi-pipe operations would not be possible. ETA-10 shared memory (SM) (2nd level store) can feed central memory (CM) (1st level store) at the rate of one 64-bit word per clock. The CPU can compute at the rate of needing four 64-bit operands (input) per clock (2 pipes each doing M-M vector A op vector B). Assume for this case that the input operands are considered "used" after the computation, i.e. they won't be needed (ever) again. Thus, to avoid CPU starvation from the hardware perspective, the SM-->CM bandwidth is too small by a factor of four. If the "software" (OS or application) can manage its own memory correctly (i.e. four SM-->CM transfers of N words for every computation on N words) then the computation can continue at peak forever. Alas, Babylon. Peak rates are not sustainable. This problem becomes even worse if one needs third level store (typ. disk) to SM to refresh SM in a similar manner. This is excerbated in the liquid cooled machines, typically because the ratio of IOU's to SM size was too low. Current hardware can only extract about 70% of the max IOU-->SM bandwidth due to the handshaking across the IOI. Current (1.1.5) software can only get about 70% of that through the file system. E-mail me for more discussion. > > When Cray Research was founded, they estimated a world market for > supercomputers that was in the neighborhood of 40 units. Maybe they > weren't so far off after all! Probably only a factor of ten. > > Anyway, here at FSU we have been pushing the KILLER MICRO bandwagon, > too. Lets get all those !@#$%^&* scalar jobs _off_ of our vector > machines and onto the killer micros where they belong.... Then those > of us who can effectively use the vector machines will have more time > available. Amen. > > By the way, I estimate the the (soon-to-be-installed) FSU Cray > Y/MP-4/432 will only be about 125 times as fast as the new MIPS "KILLER > MICRO from HELL" on my code. Yep, they are closing the gap all right.... See the comment from Eugene Brooks. The key words, of course, are "my code" ... there are no absolute answers. Once again, the "gap" of absolute performance is there. The "gap" of price/performance, on the other hand, is now in the Killer Micro camp, for enough codes to make it interesting... John, if you want to discuss more, e-mail... Rob -- Rob Peglar Control Systems, Inc. 2675 Patton Rd., St. Paul MN 55113 ...uunet!csinc!rpeglar 612-631-7800 The posting above does not necessarily represent the policies of my employer.
desnoyer@apple.com (Peter Desnoyers) (01/04/90)
Just a few thoughts on this ongoing debate - Eugene Brooks is claiming that the R6000 is (probably) faster than any other computer for ONE specific simulation that he runs. He describes this as a packet-switched network simulation, almost completely scalar. Claims that {supercomputer X} runs {FP app Y} faster don't alter this claim. Claims that {super X} has much more memory or much more I/O bandwidth than the R6000 are probably irrelevant as well, as event-driven simulations (I assume a PSN simulator would be event-driven) may not need the amounts of memory and I/O that other types of simulations require. [Gross generalization. However, consider that in many scientific codes - e.g. weather simulations - you can increase the simulated detail, and hence accuracy and memory requirements, by decreasing the grid size. To do the equivalent with an event-driven simulation may require describing the finer detail yourself in code.] In other words, Eugene may not be comparing apples to oranges; however, he is discussing the merits of his apple in a conference full of orange growers :-) Peter Desnoyers Apple ATG (408) 974-4469
lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (01/15/90)
In article <28674@amdcad.AMD.COM> davec@proton.amd.com (Dave Christie) writes: >But coming up with faster versions of classic supercomputers >has (IMHO) been much more difficult and costly, and the resulting >performance improvements not so spectacular, as compared to micros over >the past several years. Yes. A case in point is the new Cyclone processor from Tandem. I'm not knocking it: I'm sure that it was built by sharp people, and will be sold successfully. It has the important property that it's bit- compatible with Tandem's previous stack machines. However, it was clearly a major effort - they wrote 420 KB of microcode, and they did in-house metallization of ECL gate arrays. Nor did the machine come out small: each CPU+IOP fills three 18 x 18" boards, and the microcode alone takes over a hundred chips. So, did they get much for all that? No. It only runs at 22.2 MHz, although in its defence I should add that it often issues two instructions per clock. At best, that's 45 MHz. I don't know the MIPS/VUPS ratio, but even if the ratio is better than I think, the Cyclone still isn't as fast as the new ECL RISCs. It's also pretty well under the wheels of the CMOS steamroller. Is it reliable? Well, yes, it's a Tandem product. It has parity and temperature compensation and a diagnostic processor and spare cache RAMs. But a Killer Micro with the same throughput could be made more reliable, at a lower price, simply from its reduced chip count. -- Don D.C.Lindsay Carnegie Mellon Computer Science