yamo@wk46..nas.nasa.gov (Michael Yamasaki) (05/30/90)
In article <359@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes: > >One of the ways in which cheap workstations got to be cheap was by >neglecting to install I/O hardware. [...] > >The single most frustrating thing for me as a consumer of workstations >is that I/O costs haven't decreased at the same relative rate as MIPS >and MFLOPS costs --- Although they are decreasing. [...] >However, the cost of I/O performance is coming down. ESDI drives with >hundreds of megabytes of storage are available on PCS which give the >same per user performance as the high performance I/O subsystems on >supercomputers at a lower per user cost, and usually much more storage >per user. > Greetings. IMHO, one of the places where supercomputers (my only experience is with Cray 2 and Cray YMP) excel over the touted killer micro solution is in I/O performance. In a strange way, it is because of the high cost of the traditional supercomputer that high cost/high performance I/O subsystems are acceptable. Your last statements are (in my experience) dubious. My workstation (a Personal Iris) gets something less than a megabyte/second disk I/O. Our Cray 2 gets something more than 10 megabytes/second. You know by the magic of large I/O buffers that it can seem to the user to get 30-40 megabytes/second. (Wasn't HSP-2's acceptance test something like an aggregate 100 MBytes/second?) Even with IPI-2 drives, workstations and PCs will quickly run out of I/O bus bandwidth. Add a HPPI or two, support for those million shaded polygons a second, a few hundred megabytes of memory, multiple micro processors and uh oh the main bus gets stressed a bit. There's more to supercomputer architecture than the CPU. When micros have sucessfully addressed the rest of the issues, then they'll really be "killers". -Yamo- yamo@wk46.nas.nasa.gov yamo@amelia.nas.nasa.gov {ncar, decwrl, hplabs, uunet}!ames!amelia!yamo (hey, Marty, you gonna show up for some softball one of these days? -Y-)
mccalpin@vax1.acs.udel.EDU (John D Mccalpin) (05/30/90)
In article <6374@amelia.nas.nasa.gov> yamo@wk46.nas.nasa.gov (Michael Yamasaki) writes: >In article <359@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes: >> >>One of the ways in which cheap workstations got to be cheap was by >>neglecting to install I/O hardware. [...] >My workstation (a Personal Iris) gets something less than a megabyte/second >disk I/O. Our Cray 2 gets something more than 10 megabytes/second. You >know by the magic of large I/O buffers that it can seem to the user to get >30-40 megabytes/second. (Wasn't HSP-2's acceptance test something like an >aggregate 100 MBytes/second?) > -Yamo- yamo@amelia.nas.nasa.gov Well, I guess I am a bit confused, because 10% of the I/O performance of a Cray 2 means that your workstation is relatively imbalanced *toward* I/O performance, rather than away from it! My Personal Iris runs my 64-bit codes at about 1-2 MFLOPS, which makes it about 1% of a Y/MP. The I/O numbers that I have seen are on the order of 1 MB/s, while the Cray numbers are on the order of 10 MB/s to disk -- so I get about 10% of the Y/MP. Thus my machine is 10 times more cost-effective for I/O than for computations! (relative to a Y/MP). If you are running scalar codes (yecch!) that are as fast on the workstation as on the Cray, then this ratio would be turned upside-down. Perhaps this is the more common occurrence. -- John D. McCalpin mccalpin@vax1.udel.edu Assistant Professor mccalpin@delocn.udel.edu College of Marine Studies, U. Del. mccalpin@scri1.scri.fsu.edu
lkaplan@bbn.com (Larry Kaplan) (05/30/90)
In article <6374@amelia.nas.nasa.gov> yamo@wk46.nas.nasa.gov (Michael Yamasaki) writes: >Greetings. IMHO, one of the places where supercomputers (my only experience >is with Cray 2 and Cray YMP) excel over the touted killer micro solution >is in I/O performance. > >My workstation (a Personal Iris) gets something less than a megabyte/second >disk I/O. Our Cray 2 gets something more than 10 megabytes/second. There are some true "killer micros" that address I/O issues on the market right now. BBN's newer "Butterfly" machine (TC2000) can be configured with one I/O bus (read VME) for about every TWO processors. About three quarters of these busses are single slot, the rest being 4-slot buses. VME-to-SCSI adapters and VME bus repeaters are available. Note that these busses are NOT used for direct communication between the processors. There is a Butterfly interconnection network for that. These busses communicate with memory at a peak of 8 Mbytes/sec. Using a crude test of "dd if=<unbuffered device> of=/dev/null" and some blocking arguments currently yields about 1 Mbyte/sec using the unbuffered device. This single threaded I/O rate is similar to that found on the IRIS above. Distributing files and users across multiple disks allows you to effectively multiply this bandwidth. The kernel, including the disk device driver, is fully parallel. In addition, paging traffic can be easily distributed across separate disks. All of this capability is available NOW. One existing customer has 9 disks on five I/O busses. This amount of I/O is starting to look reasonable for a killer micro. Some work remains to be done, however, in getting the software to effectively use all of the bandwidth. I believe that Alliant has a similiar (though not quite so scalable) story. The rumor mill has NCUBE and Intel developing similar capabilities for their machines. #include <std_disclaimer.h> _______________________________________________________________________________ ____ \ / ____ Laurence S. Kaplan | \ 0 / | BBN Advanced Computers lkaplan@bbn.com \____|||____/ 10 Fawcett St. (617) 873-2431 /__/ | \__\ Cambridge, MA 02238
yamo@wk46..nas.nasa.gov (Michael Yamasaki) (05/30/90)
In article <6543@vax1.acs.udel.EDU> mccalpin@vax1.udel.edu (John D Mccalpin) writes: > >Well, I guess I am a bit confused, because 10% of the I/O performance of >a Cray 2 means that your workstation is relatively imbalanced *toward* I/O >performance, rather than away from it! > >My Personal Iris runs my 64-bit codes at about 1-2 MFLOPS, which makes it >about 1% of a Y/MP. The I/O numbers that I have seen are on the order of >1 MB/s, while the Cray numbers are on the order of 10 MB/s to disk -- so I >get about 10% of the Y/MP. >Thus my machine is 10 times more cost-effective for I/O than for >computations! (relative to a Y/MP). > I didn't intend to say that the Cray's were well balanced flops to I/O. Some people like to think Crays turn compute bound problems into I/O bound problems. Uh, I shudder to think what using an Iris with 10% of it's current I/O bandwidth would be like. (Cost effective? If it takes me the same time to do 10% of a problem, it wouldn't seem to be cost effective to me. ;-) The Cray numbers, by the way, are what a typical user might expect for a single processor job with 50 or so other users. The Iris numbers are of course for a single user. -Yamo- yamo@wk46.nas.nasa.gov yamo@amelia.nas.nasa.gov {ncar, decwrl, hplabs, uunet}!ames!amelia!yamo
yamo@wk46..nas.nasa.gov (Michael Yamasaki) (05/30/90)
In article <56754@bbn.BBN.COM> lkaplan@BBN.COM (Larry Kaplan) writes: > >There are some true "killer micros" that address I/O issues on the market >right now. BBN's newer "Butterfly" machine (TC2000) can be configured with >one I/O bus (read VME) for about every TWO processors. > [...] >These busses communicate with memory at a peak of 8 Mbytes/sec. > [...] >One existing customer has 9 disks on five I/O busses. > Combined peaks of these five I/O busses is somewhat less than a single HPPI interface, less than half of an HSX (at 100 MBytes/sec.). Again, the numbers I used for the Cray 2 were for a typical single processor job with 50 or so other users on the system. >This amount of I/O is starting to look reasonable for a killer micro. >Some work remains to be done, however, in getting the software to effectively >use all of the bandwidth. > (an exersize for the reader? ;-) Really, this doesn't seem like a trivial problem to me. -Yamo- yamo@wk46.nas.nasa.gov yamo@amelia.nas.nasa.gov {ncar, decwrl, hplabs, uunet}!ames!amelia!yamo
ciotti@wilbur.nas.nasa.gov (Robert B. Ciotti) (05/31/90)
In article <6543@vax1.acs.udel.EDU> mccalpin@vax1.udel.edu (John D Mccalpin) writes: >In article <6374@amelia.nas.nasa.gov> yamo@wk46.nas.nasa.gov (Michael Yamasaki) writes: >>In article <359@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes: >>> >>>One of the ways in which cheap workstations got to be cheap was by >>>neglecting to install I/O hardware. [...] > >>My workstation (a Personal Iris) gets something less than a megabyte/second >>disk I/O. Our Cray 2 gets something more than 10 megabytes/second. You >Well, I guess I am a bit confused, because 10% of the I/O performance of >a Cray 2 means that your workstation is relatively imbalanced *toward* I/O >performance, rather than away from it! > >My Personal Iris runs my 64-bit codes at about 1-2 MFLOPS, which makes it >about 1% of a Y/MP. A single processor that is, average workload performance puts it at .1% >The I/O numbers that I have seen are on the order of >1 MB/s, while the Cray numbers are on the order of 10 MB/s to disk -- so I >get about 10% of the Y/MP. single process/single stream rates again that is, aggregate rotating media performance comes out at .5%, and including the SSD, .0714% For our Y-MP, The I/O performance for *Standard* FORTRAN I/O I have measured to exceed 1.2 gigabytes per second from a single process/single I/O stream to a SSD cached filesystem (yes its memory, but thats what it for, 2 gigabytes). multiple process/multiple stream I/O rates to disk have been benched exceeding 200 megabytes per second. Big killer micro configurations are going to have to compete with the aggregate rates as well as the single thread rates. (parallel)Flops just ain't good enough, your results have to go somewhere , If you can put a big SSD type memory to use for I/O, your still going to have to get it there somehow and in a standard way. Bob
fouts@bozeman.ingr.com (Martin Fouts) (06/07/90)
In article <6374@amelia.nas.nasa.gov> (Michael Yamasaki) writes: In article <359@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes: > >One of the ways in which cheap workstations got to be cheap was by >neglecting to install I/O hardware. [...] > >The single most frustrating thing for me as a consumer of workstations >is that I/O costs haven't decreased at the same relative rate as MIPS >and MFLOPS costs --- Although they are decreasing. [...] >However, the cost of I/O performance is coming down. ESDI drives with >hundreds of megabytes of storage are available on PCS which give the >same per user performance as the high performance I/O subsystems on >supercomputers at a lower per user cost, and usually much more storage >per user. > Greetings. IMHO, one of the places where supercomputers (my only experience is with Cray 2 and Cray YMP) excel over the touted killer micro solution is in I/O performance. In a strange way, it is because of the high cost of the traditional supercomputer that high cost/high performance I/O subsystems are acceptable. Your last statements are (in my experience) dubious. My workstation (a Personal Iris) gets something less than a megabyte/second disk I/O. Our Cray 2 gets something more than 10 megabytes/second. You know by the magic of large I/O buffers that it can seem to the user to get 30-40 megabytes/second. (Wasn't HSP-2's acceptance test something like an aggregate 100 MBytes/second?) Even with IPI-2 drives, workstations and PCs will quickly run out of I/O bus bandwidth. Add a HPPI or two, support for those million shaded polygons a second, a few hundred megabytes of memory, multiple micro processors and uh oh the main bus gets stressed a bit. There's more to supercomputer architecture than the CPU. When micros have sucessfully addressed the rest of the issues, then they'll really be "killers". Hi. I completely agree that the big systems are still much better at I/O. The requirement for HSP-2 by the way was for real aggragate disk performance across 10 pairs of drives at once, while maintaining an outrageous amount of network traffic. The Cray 2 passed with ease. (You can't actually generate enough I/O traffic across all of the disk drives, memory interfaces and high speed channels to consume the "backplane" rate that the machine can generate.) The much more recent Y/MP had to sweat a lot to pass, although it did eventually. Now let me try to justify my last point by pointing out that I said "per user". A typical PC has one user. A typical Cray 2 has thousands, often with dozens of users on running simultaneously. The raw performance of the workstation is often in the 1 to 10 percent range of the Cray, but the divisor is often 1 user for the workstation (how many people are you sharing the iris with?) and often 10 for simultaneous bandwidth and 1000 for storage for the Cray. By the way, through the magic of multiple busses, some Irises (I don't remember the personal well enough) can pass incredible amounts of data between the graphics engine and the display without ever touching the I/O bus. To reitterate, you are correct; the big win in supercomputer architecture is the scalability of I/O performance to match CPU performance. The big loose in "killer" microprocessor systems is the lack of that scalability. My argument is that we can build scalable workstations, if we are willing to concentrate on the i/o as well as the CPU. One of the things which frustrates me in this area is the lack of educational effort. The latest book on computer "architecture" from two people who should know better spends most of its time on CPU performance and devotes only a small portion to I/O architecture. Of course, it is much better than typical "architecture" texts which only deal with CPU architecture. -Yamo- yamo@wk46.nas.nasa.gov yamo@amelia.nas.nasa.gov {ncar, decwrl, hplabs, uunet}!ames!amelia!yamo (hey, Marty, you gonna show up for some softball one of these days? -Y-) (When's the next game. Scuba class is over, but nobody has sent me a schedule in thee weeks.) -- Martin Fouts UUCP: ...!pyramid!garth!fouts ARPA: apd!fouts@ingr.com PHONE: (415) 852-2310 FAX: (415) 856-9224 MAIL: 2400 Geng Road, Palo Alto, CA, 94303 If you can find an opinion in my posting, please let me know. I don't have opinions, only misconceptions.
eugene@wilbur.nas.nasa.gov (Eugene N. Miya) (06/07/90)
In article <424@garth.UUCP> fouts@bozeman.ingr.com (Martin Fouts) writes: >One of the things which frustrates me in this area is the lack of >educational effort...... >it is much better than typical "architecture" texts which only >deal with CPU architecture. From George Michael: There is a fixation on CPUs in this country sort of like a teat [female breast]. We have to stop that. We have to have more balanced SYSTEMS. I would not quite use George's gender-biased language. You would. I would say we are infatuated, or have a fetish on CPUs, and I/O will continue to be a problem. So what is industry going to do about it? The problem, George, is that you scientists want an infinite quantity of storage, in a finite media, peta-bytes and peta-bytes, and you want it all, NOW! You are right. Well, I don't see how you are going to get it (infinite). --e. nobuo miya, NASA Ames Research Center, eugene@orville.nas.nasa.gov {uunet,mailrus,other gateways}!ames!eugene To the PAX people: We tend to use the term "peak" as well, but "Olympic" is some new colorful language, we've not heard, but since you broadcast, I think you have added it to the vinacular.
mash@mips.COM (John Mashey) (06/10/90)
In article <6583@amelia.nas.nasa.gov> eugene@wilbur.nas.nasa.gov (Eugene N. Miya) writes: >I would say we are infatuated, or have a fetish on CPUs, and I/O will >continue to be a problem. So what is industry going to do about it? 1) I/O will always be a problem, for all of the usual design reasons, and especially when working near the edge on peak performance. 2) However, I would suggest that it will start getting better, at least in certain ways, and especially in price/performance. 3) The reason is the same kind of dynamic that has led to SCSI & Ethernet chips, for example. Specifically, when the time is "right", people are induced to do fast, high-integration (and therefore fairly cheap) chips that help solve parts of the I/O problem as well. 4) Consider, for example, Ethernet support. a) Early on, Ethernet support was a board full of logic, not cheap. However, since it was in a supermini, it was still a small fraction of the price, so it probably didn't matter too much. b) As microprocessor systems came along, Ethernet support is now down to a LANCE chip or equivalent, and a little bit of glue: 1) Workstations NEEDED cheap Ethernet. 2) There was a market for large volumes, hence worth doing. 5) Same kind of progress in SCSI. I would claim that the following dynamic exists: 1) Workstations and other micro-based products are zooming upward in CPU performance at fairly low cost. 2) Such machines could certainly use better I/O. 3) The volumes make it interesting for people to design chips to support such things, whereas this has seldom been true in the super- or mainframe markets. 4) Over the next few years, we will see increasing interest in people looking to sell support chips for: faster & wider busses I/O muxing & buffering network interfaces disk control, such as for RAIDs 5) There used to be huge numbers of different architectures for CPUs, whereas now more people use CPUs, and design I/O systems. In some parts of the design space, it is easy enough to design computers with just a few VLSI chips that include both CPU & I/O. I think that space will enlarge to much higher performance levels, as the faster VLSI CPUs make it both necessary to get fast inexpensive I/O, and enough volume to make it interesting. 6) I'd still expect that supercomputers will have an edge in this area, although I'd be amazed if killer-micros-with-(forthcoming better I/O) don't blow them away on I/O price/performance basis (but not absolute performance, of course). -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
eugene@wilbur.nas.nasa.gov (Eugene N. Miya) (06/12/90)
In article <39284@mips.mips.COM> mash@mips.COM (John Mashey) writes: >1) I/O will always be a problem, for all of the usual design reasons, "When you can give me disks which rotate significantly faster than 3600 RPM, then we can start talking." --Don Senzig Time to round up all the "usual" suspects. ;) --e. nobuo miya, NASA Ames Research Center, eugene@orville.nas.nasa.gov {uunet,mailrus,other gateways}!ames!eugene
mash@mips.COM (John Mashey) (06/12/90)
In article <6662@amelia.nas.nasa.gov> eugene@wilbur.nas.nasa.gov (Eugene N. Miya) writes: >In article <39284@mips.mips.COM> mash@mips.COM (John Mashey) writes: > >>1) I/O will always be a problem, for all of the usual design reasons, > > "When you can give me disks which rotate significantly > faster than 3600 RPM, then we can start talking." > --Don Senzig > >Time to round up all the "usual" suspects. ;) Seems like the RPM is only suspect #2. I'd observe that N cheap disks running in parallel-transfer designs may have N times the "logical" rotation speed & N times the data xfer rate.... but don't seek in 1/N of the time of one.... so I'd say that disk seek is suspect #1. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086