mash@mips.COM (John Mashey) (02/20/89)
In article <4290@pt.cs.cmu.edu> shivers@centro.soar.cs.cmu.edu (Olin Shivers) writes: >Andy Glew mentioned a barrel processor discussion, and says that >mash@mips posted a good argument against them (barrel processors, not >discussions). I would very much like to see that argument. I don't have a copy of the original, but the argument follows two inter-related areas: technical issues and business issues, and can be summarized as follows: 1) For technical reasons, it's more complicated to build VLSI micros as barrels. 2) Cheap general-purpose chips tend to dominate special-purpose solutions, unless the special-purpose ones have substantial long-term cost or performance advantages. Good background material could be found in: Bell, Mudge, McNamara, "COMPUTER ENGINEERING", a DEC View of Hardware Systems Design, 1978, Digital Press. Specifically, read Chapter 1 "Seven Views of Computer Systems", especially Views 3 and 4, and especially, Figure 7 on levels of integration. Following is a (brief) technical argument, followed by a (long) business argument that addresses a bunch of related issues that people have asked about. Also, sorry if I step on anybody's toes; maybe this will stir up some discussion. 1) Technical: The first-order determinant of CPU performance, for general purpose machines, is the aggregate bandwidth into the CPU, with about 1 VUP ==(approx) 10 MB/sec [try this rule-of-thumb and see]. Take the same technology and cache memory. You can either build an N-way barrel processor, where each barrel slot generates B VUPs, or you can build 1 CPU that generates about N*B VUPs, because the basic hardware is running at the same speed. The single CPU has to fight with latency issues that are avoided by the barrel, but the barrel: -wastes whole slots whenever there are less than N tasks available; -needs N copies of registers and state, in general, i.e., things that are fast, and therefore expensive, if only in oppurtunity cost. -probably has worse cache behavior, in terms of the separate tasks banging into each other more. OF COURSE, ALL THIS NEEDS QUANTIFICATION. The more you split the hardware apart [like separate caches], the closer you get to separate processors. I think barrel designs might make more sense in board-level implementations than they do in chip-level designs. It is often less expensive to replicate state in the former, and also, to afford really wide busses all over the place. Maybe it might make sense to do a barrel design for 1 design round if you think you can get to VLSI in the next. Anyway, quite specifically, the detailed tradeoffs in building VLSI CPU chips seems to argue against building them as barrels: I don't know of any existing popular CISC or RISC chips that are barrels; if anybody does, please point them out. Likewise, although this is harder data to know, the next round of chips is not likely to do this either: everybody is working on more integrated chips and things like super-scalar or super-pipelined designs, but the MIPS Competitive Intelligence Division has yet to turn up any barrel chips out there. Maybe if we're all building 2M-transistor chips, we'll find that we can't think of anything better to do, although I doubt it... 2) Business issues: (now, it gets long) The computer business is fundamentally different than it was even 10 years ago, basically because of the microprocessor. Specifically, if you are a systems designer, and if you choose to design your own CPUs, rather than implement your system out of commercially-available micros, you'd better have a Real Good reason, of which the following are some: a) You're building something that has to be binary-compatible with an existing line. Your choice is either to build things out of gate-arrays, or semi-custom, or full-custom VLSI, in order of ascending cost and difficulty. [Gate-arrays: most supermini & mainframe vendors; full-custom VLSI: DEC CVAX.] b) You're a semiconductor vendor, also, and your business is building VLSI chips anyway. [Intel, Motorola, etc.] c) You're a system vendor who thinks they can design a CPU architecture and get it to be popular enough that it gets access to successive technologies that it stays with the leading edge of the technology curve in a cost-effective way [Sun & MIPS]. d) You're building something whose performance or functionality cannot be done with the existing micros or next year's micros [CONVEX, CRAY] However, if you're building something from scratch, it had better do something a lot better than next year's micros, or you'll get run over from behind by CPUs that have 1) bottom-to-top range of applicability, not limited to a narrow price-performance niche, 2) volume, and hence lower cost, and 3) a bigger software base*, and 4) more $$ coming in to fuel the next round of development to the next range of performance. * Caveat: you do have to be careful that you don't just count # packages available, but number of RELEVANT packages for the kinds of machines you're building. For example, ability to run MSDOS applications is a plus for workstations, but probably not very relevant to somebody who wants a Convex, so you can't compare architectures by counting applications. Nevertheless, application availability does count. Not that many years ago, there used to be LOTS of companies who built mini / supermini class machines out of TTL (and then maybe ECL). You'd probably be surprised how many different proprietary minis have been built: I looked at the DataPro research reports, 1987, and found about 50 different mini or supermini architectures [there used to be more]. Of these, some were produced by companies that have since disappeared; many of them may never be upgraded; only a few are supported by companies successful enough to make the continued enhancement worthwhile. In the early 1980s, proprietary minis starting getting badly hurt by the 16-bit micros, and low-end superminis were getting threatened by 32-bit micros. Only a few mini/supermini vendors are left, really. Of course, this is the second wave of this: consider the consolidations in the companies building mainframes and others in the 1950s and 1960s... OPINION, PERHAPS BIASED (REMEMBER WHERE I WORK): 1) There exist VLSI RISCs in production that already show faster integer and scalar FP performance than any of the popular superminis. Before the end of the year, people will ship ECL VLSI RISCs at supermini prices, whose corresponding uniprocessor performance is equivalent to Amdahl 5990s or IBM 3090s. In addition, one should expect to see, during 1990-1992, CMOS or BiCMOS chips from which one can build 50-100 VUPs machines (still uniprocessor). There's no reason not to have a 1000-VUP multi in a large file-cabinet-size box by 1992 / 1993 (although we'd sure better get some faster disks by then!) at costs competitive with current superminis. 2) Most mini/supermini architectures born in the 1970s or early 1980s are essentially doomed, unless they're owned by a company with strong finances, a big customer base, or, perhaps, a customer base that's heavily locked in for some reason or other. Some of the older mainframe architectures are also doomed, for the same reason. [Note: doom doesn't mean they disappear overnight, but that it gets harder and harder to justify upgrades, and if a company takes the approach of relying only on its installed base of locked-in customers, trouble is coming.] Now, this doesn't mean that the company owning those architectures is doomed. Some mini companies have taken thoughtful and timely steps to adapt to the new technology without dumping their customers: HP would be a good example: think how long ago they saw the RISC stuff coming, and how much work went into assuring reasonable migration. Others have been working the problem as well; some have not, to the best of my knowledge, and I suspect they're going to get hurt. 3) Proprietary mini-supers are in serious danger in the next year or two: one can already see the bloodbath going on there. (Apologies to my friends at various places), but it's hard to see why anybody but Convex is really going to prosper and remain independent in this. Note that Convex seems wisely to be taking the strategy of moving up chasing supercomputers and staying out of the frenzy at the lower-end of this market, which is, of course, the part starting to be attacked by the VLSI RISCs. I know this overlap is starting to happen, because we (MIPS and some of its friends) are seeing a lot more competitive run-ins with some of the mini-super guys. We lose some (like: real vector problem, need 1GB of memory, need some application that we don't have yet), but we win some already on cost/performance, and sometimes even on performance. An M/120 (a $30K thing in a PC/AT box) has been known to beat some mini-supers in some kinds of big number-crunching benchmarks, and that is Not Good News.... (well, it's good news for us...:-) What happens in 1989/1990? Well, we expect to see the first VLSI ECL RISCs appear, at least from us and Sun. These things have got to be Bad News, as they'll be in the 30-60 VUPs range, with reasonable scalar floating-point. They're likely to be quite competitive (on a performance basis) with many of the mini-supers, except in really heavy vector or vector-parallel applications, and they'll probably win on cost/performance numbers in even more cases, leaving a fairly narrow niche. However, even worse is the software problem. One of the biggest difficulties for mini-supers is the difficulty of getting software on them: the machines are expensive enough that you don't just leave them around at bunches of 3rd-party software developers. BTW, 3rd-party developers are sane people, and they don't port software for free, and they care about the number of machines on which they can sell their software. This makes it Real Hard if you you only have a few hundred machines in the field, unless your machines are among the few able to run the application. (Note how important it is to be the first to get to a new zone of cost/performance, i.e., part of why CRAY and Convex have been successful). This is not a problem faced by the ECL RISCs, which both already have large numbers of software-compatible machines out there. To get a feeling for the scope of the problem, here are some numbers: From COMPUTERWORLD, Feb 13, 1989, page 130 "High Performance Computers": Minisuper installed base as of yearend 88 (Computer Technology Research Corp): 450 FPS 430 Convex 335 Alliant 110 Elxsi 45 SCS 150?? Multiflow* (from a different source): ---- 1520 TOTAL This article didn't include Multiflow: CSN 2/13/89, p46. says "As of June 1988, Multiflow had sold 44 of its Trace computers. Since then, the company has stopped revealing how many systems it has sold, but Joseph Fisher, co-founder and EVP, said the 4th and 3rd quarters generated the largest and second-largest revenue for the company in its four-year history." Assume the installed base in now 150 machines (probably optimistic). (And of course, who know how accurate these numbers really are? However, they're probably the right order of magnitude. To be fair, the CW article claimed minisupers were a real hot growth area, and I'm using the numbers in the opposite direction....) Now, MIPS and/or semiconductor partners have shipped about 20,000 chipsets, as of YE1988. Of course, many of them have gone into prototypes, or into dedicated applications, or other things. Still, MIPS itself built on the order of 1000 machines, as well as a lot of boards that have gone into others, and of course, some of our friends have shipped more MIPS-based machines than we have. Although I'm not privy to the numbers :-), there must be 5-15K SPARC-based things out there, mostly in Sun-4s. In late 1989,the mini-supers will have to face the spectre of competing with fast and cost-effective machines whose CPU performance overlaps at least the lower-middle of the minisuper performance range, each of which has an installed base of lots of 10s of thousands, low-end machines in the $10K range or lower, lots of software, and little messing around to get reasonable performance. Of course, CPU performance alone does not a minisuper make, and none of this should be taken as disparagement of folks who work at any of these companies, some of whom have built hardware or software that I respect greatly. All I suggest is that the old quote is appropriate: "Don't look back. Something might be gaining on you." To finish this long tome with the thing that started it: a barrel design had better show some compelling and casting advantage over VLSI RISCs, because it will probably be more expensive to build, and if it doesn't get volume, business reality will make its life very hard. Sorry for the length of this, but the topics have come up in a number of side e-mail conversations, and it seemed to fit here. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
brooks@vette.llnl.gov (Eugene Brooks) (02/21/89)
In article <13582@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
A long, and well founded, analysis of why superminis are being squeezed out
of their performance niche from the rear by VLSI based machines.
This article is conservative at best, there are a whole lot of users of Cray
time buying the latest VLSI based machines as a more cost effective alternative.
With the latest microprocessors these machines are within 1/5th of the
performance of a Cray supercomputer for all but the most highly vectorized
codes. For scalar codes the performance of these microprocessors can be
as high as 1/2 of a Cray-1S. The performance of supercomputers has stagnated
in the last 10 years, with only about one factor of 2 in performance per CPU
having been achieved. Needless to say, while traditional computer technology
has stagnated performance wise, microprocessors have really accelerated as
their designers have learned the basics of pipelining and have had enough gates
on a chip to support full functionality. Supercomputer vendors shudder when
we show them where the best microprocessors stand in relation to mainframes
for the Livermore Loops and point out where their performance will be a year
or two from now. Next years microprocessors will meet or beat the scalar
performance of supercomputers, and I expect at least one or two further
doublings in speed of these parts before they reach an asymptote. At that
point you will start to see higher bandwidth memory connections for these
parts (as opposed to a simple stall on a cache miss model) and the distinction
between a micro and a supercomputer architecture will be completely blurred.
Supercomputers at this point will still exist, but they will be built out of
modestly large numbers of VLSI processors (shared memory or otherwise depending
on the application). The only hope for supercomputer vendors is to start using
higher levels of integration than they currently use so the cost of their
hardware can be reduced and their reliability increased.
Is the news software incompatible with your mailer too?
brooks@maddog.llnl.gov, brooks@maddog.uucp, uunet!maddog.llnl.gov!brooks
mccalpin@loligo.uucp (John McCalpin) (02/21/89)
In article <20667@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) writes: >In article <13582@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >A long, and well founded, analysis of why superminis are being squeezed out >of their performance niche from the rear by VLSI based machines. > >This article is conservative at best, there are a whole lot of users of Cray >time buying the latest VLSI based machines as a more cost effective alternative >With the latest microprocessors these machines are within 1/5th of the >performance of a Cray supercomputer for all but the most highly vectorized >codes. For scalar codes the performance of these microprocessors can be >as high as 1/2 of a Cray-1S. I have had a great deal of trouble believing the poor performance of "supercomputers" on scalar code lately. I just ran the LINPACK 100x100 test of the FSU ETA-10 (10.5 ns=95 MHz) and got a result of 3.8 64-bit MFLOPS for fully optimized (but not vectorized) code. I used the version of the code with unrolled loops. This performance is EXACTLY the same as the MIPS R-3000/3010 pair running at 25 MHz. I understand that there must be tradeoffs, but considering the difference in cost, this is a bit surprising.... Of course, the vectorized version runs at 60 MFLOPS on the ETA-10 now (90 MFLOPS with the 7 ns CPU's), and gets rapidly faster for larger systems. I don't mean to pick on CDC/ETA --- even the fastest Cray's are going to get caught by the highest performance RISC chips pretty soon. I haven't seen any MC88000 results yet, but it looks to be able to put out results in the same performance range. Does anyone know if the memory bandwidth of the 88000 is going to able to keep the floating- point pipeline filled? This could push the performance of the 88000 up to closer to 10 MFLOPS.... ---------------------- John D. McCalpin ------------------------ Dept of Oceanography & Supercomputer Computations Research Institute mccalpin@masig1.ocean.fsu.edu mccalpin@nu.cs.fsu.edu --------------------------------------------------------------------
lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (02/22/89)
John Mashey has argued convincingly that single-chip processors are on a faster trend curve than mainframe processors, and in fact are just plain catching up. The basic reason is the speed of light. As Mr. Cray knows, small == fast. In the long run, the smallest system is the one that fits on a single chip. Now that you've all nodded sagely ... I don't agree with the last sentence above. I think that we're going to see really large chips - perhaps the much fabled wafer-scale integration. And if you think about wire lengths, those chips are going to have some awfully long interconnects: wires just centimeters and centimeters long. We might do better by going to three dimensions instead of two. The breakthrough I'd like to see, is chip vias. For the hardware- impaired, what I mean is, I'd like to see signal paths between the two surfaces of a chip. I'd like to take a stack of naked chips, and then solder them together into a solid cube. -- Don D.C.Lindsay Carnegie Mellon Computer Science --
colwell@mfci.UUCP (Robert Colwell) (02/22/89)
In article <7330@pyr.gatech.EDU> mccalpin@loligo.cc.fsu.edu (John McCalpin) writes: >In article <20667@lll-winken.LLNL.GOV> > brooks@maddog.llnl.gov (Eugene Brooks) writes: >>In article <13582@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >>A long, and well founded, analysis of why superminis are being squeezed out >>of their performance niche from the rear by VLSI based machines. >> >I don't mean to pick on CDC/ETA --- even the fastest Cray's are going >to get caught by the highest performance RISC chips pretty soon. A note from the other side of the aisle. "Even the fastest Crays"? Are you kidding? If you believe the Cray-3 is going to be manufacturable (an entertaining discussion all by itself) then how the heck do you think a micro is going to get 1800 mflops any time soon? I think that's wishful thinking or outright fantasy. Do you realize how many bits you'd be stuffing into its pins per unit time? Or maybe you think you're going to make the micro out of GaAs? You still have to feed it. Cray has the wire lengths down below 1" to maintain this kind of data rate. I don't think you're going to touch that kind of computation rate without the same big bucks the big boys must spend. Are you going to do CMOS with 100K ECL I/O's? Or do you think you're going to get there with TTL switching levels? And those large machines put way more than half their money into their memory subsystems. What sleight of hand will make it possible for the micros to do better? And if they don't do better, then their systems won't cost significantly less than the less integrated machines, in which case their cost advantage dissipates. The same goes for the I/O needed to support all the flops being predicted willy-nilly in this stream of discussion. The cost/performance of I/O isn't increasing at anything near the rate of the CPUs. So my (possibly broken) crystal ball says that the default future isn't so much a world filled with satisfied customers of nothing but micros so much as one filled with CPUs spending an awful lot of time waiting on badly mismatched memory and I/O systems. Data caches aren't going to help you much, either, running the kinds of codes that Crays are good for. Name a data cache size, and every user will say it's too small. Bob Colwell ..!uunet!mfci!colwell Multiflow Computer or colwell@multiflow.com 175 N. Main St. Branford, CT 06405 203-488-6090
seeger@poe.ufnet.ufl.edu (F. L. Charles Seeger III) (02/22/89)
In article <4330@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: |The breakthrough I'd like to see, is chip vias. For the hardware- |impaired, what I mean is, I'd like to see signal paths between the two |surfaces of a chip. I'd like to take a stack of naked chips, and then |solder them together into a solid cube. I believe through-wafer vias are being done, at least in some labs. However, as you might expect, they are rather large by VLSI standards. Memory-check: my recollection could be about a proposal to do these vias, rather than a report of it being done. AT&T is working on using wafers as circuit boards, with >= 4 conducting layers, including power and ground planes. Individual chips are mounted to the wafer with solder techniques similar to SMT. A big win here is that this mounting is done before packaging, so that the IO pads on the chips can be scattered about the chip anywhere that is most convenient (i.e. the pads need not be around the periphery). The initiative for this work was to increase the interconnect density, which hasn't been keeping pace with chip density. You can then mount your CPU, MMU, FPU, sRAM, dRAM, etc. all on one wafer, while still using different fabs for the different chips. Though their work is currently planar, it seems that combining this technology with through-wafer vias would point in the direction that you suggest. -- Charles Seeger 216 Larsen Hall +1 904 392 8935 Electrical Engineering University of Florida seeger@iec.ufl.edu Gainesville, FL 32611
brooks@vette.llnl.gov (Eugene Brooks) (02/23/89)
In article <656@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >A note from the other side of the aisle. "Even the fastest Crays"? Are >you kidding? If you believe the Cray-3 is going to be manufacturable >(an entertaining discussion all by itself) then how the heck do you think >a micro is going to get 1800 mflops any time soon? I think that's wishful Whether or not the Cray-3 is manufacturable, there will certainly be super- computers with many gigaflops of VECTOR performance in the near term. We were talking about scalar performance, and not vector performance. Certain codes which are heavily run on Cray machines are scalar and would score high hit rates in a rather small cache. I predict that a microprocessor will outrun the scalar performance of the Cray-1S within a year. The "supercomputers" will only hold on for those applications which are 99% vectorized, which are darned few, and because of this supercomputers will share the computer center floor with micro based hardware soon, and on an equal footing. Is the news software incompatible with your mailer too? brooks@maddog.llnl.gov, brooks@maddog.uucp, uunet!maddog.llnl.gov!brooks
rpw3@amdcad.AMD.COM (Rob Warnock) (02/23/89)
As John Mashey says, with current chip technology, barrel processors don't seem to make much sense. But there *are* upcoming technologies for which barrel architectures will make sense, at least for a time, just as there was a time in the past in which they made sense -- when memory was much slower than the CPU logic (e.g. the CDC 6600 Peripheral Processors). Case in point: Several groups -- most notably that I know of, Alan Huang et. al. at Bell Labs (see some recent issue of "Scientific American" or "Discover", I forget) -- are working on true optical computers, where the fundamental logic operations are done with non-linear optics. The total "state" of the CPU might be in the pattern of on/off dots in a planer wavefront of light. A "microcycle" would consist of that wavefront travelling through the "logic", mixing with itself and getting pieces sliced and diced, passing through a regenerator (amplifier/limiter), and looping back to the beginning. (This would *really* be done with mirrors!) That is, all of the optical devices ("gates", if you like) would be operating in parallel, on different pieces of the wavefront. Now I'm guessing these machines will initially have optical loop paths ("microcycle" times?) in the low to medium nanoseconds (circa 5ns/meter in glass?), since they won't be sub-miniturized (initially). But from what I hear, even initially the optical devices will be *very* fast (just a few picoseconds or less), so that you'll only be "using" the gates for the "thickness" of the wavefront. So they're already thinking about taking another wavefront and positioning it "behind" the first one (of course with some guard time to avoid interference). Voila! A barrel processor! In the limit, a given hunk of glass/&c. could support "loop_time/switching_time" CPUs in the barrel. And if I/O or access to "main" memory (whatever that might be) was slow, it might make sense to artificially increase the microcycle (loop) time to match the external world, which at the same time lets you stack more CPUs in the barrel. (Pushing the analogy, the width of one "stave" is fixed by the speed of the optical logic, including guard bands. "Staves/sec", or circumferential speed is fixed by speed-of-light in the glass/silicon/air/whatever in the loop. But the "RPMs" can be slowed by adding staves to the circumference of the barrel.) Anyway, just to point out that there is some chance that barrel processors may live again someday... Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
paulf@ece-csc.UUCP (Paul D. Franzon) (02/23/89)
In article <19814@uflorida.cis.ufl.EDU> seeger@iec.ufl.edu (F. L. Charles Seeger III) writes: >In article <4330@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: >|The breakthrough I'd like to see, is chip vias. For the hardware- >|impaired, what I mean is, I'd like to see signal paths between the two >|surfaces of a chip. I'd like to take a stack of naked chips, and then >|solder them together into a solid cube. > >I believe through-wafer vias are being done, at least in some labs. However, >as you might expect, they are rather large by VLSI standards. Hughes is working on mechanical through wafer vias. They are large (0.5mm square) but you can put circuits underneath them. They have proposed multi-layer structures. I've heard nothing about mechanical reliability. AT&T is working on through-wafer optical interconnects. At the moment this is at a research stage only. > >AT&T is working on using wafers as circuit boards, with >= 4 conducting >layers, including power and ground planes. Individual chips are mounted This effort has been dropped. Several other groups are working on Ceramic or Al high density "circuit boards", on which chips are flip mounted. This gives you a very high density I/O capability and very fast interconnect. Some people here are currently exploring structures that can use these capabilities effectively. -- Paul Franzon Aussie in residence Ph. (919) 737 7351 ECE Dept, NCSU
malcolm@Apple.COM (Malcolm Slaney) (02/24/89)
In article <24582@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes: >As John Mashey says, with current chip technology, barrel processors don't >seem to make much sense. Maybe I missed something in the definition of a barrel processor but isn't the new Stellar machine a barrel processor much like the HEP? I just read the machine overview last night and they have four "virtual" pipelines that time share a single long pipeline. I wonder what it is about the Stellar architecture that makes them think they can succeed where Burton Smith (HEP) couldn't? Is it just the graphics? This is one of the advantages they claim: At any particular moment, 12 instructions are active in the pipeline, but each stream (of four) has only three instructions active. In this way, the architecture can achieve the performance of a deep (12 stage, 50 nsec) pipeline, while experiencing the minimal "pipeline drain" cost of a shallow (3 stage, 200 nsec) pipeline. I don't have to pay for the machine so I can't comment on its price. I did recompile my ear models on the machine and they worked without changes and ran much faster than a Sun...but not as fast as the code on our Cray XMP. Malcolm
rodman@mfci.UUCP (Paul Rodman) (02/24/89)
In article <20821@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov.UUCP (Eugene Brooks) writes: >Whether or not the Cray-3 is manufacturable, there will certainly be super- >computers with many gigaflops of VECTOR performance in the near term. Please stop using the word VECTOR. use "large data aggregate" or "parallel" instead. There are many, many problems that are not vectorizable but have large amounts of parallelism. You would call it "scalar" parallelism, but you would be in error if you thought small RISC chip would compete in performance with a VLIW (or WISC) machine. Its interesting the same folks that find "CISC" non-optimal can also refer to vector architectures without flinching! I'm waiting for the day when somebody announces a SPARC or MIPS based processor with a vector unit! :-) :-) [Reminds me of chess programs that follow MCO for the opening but as soon as their out of the "book" they have no idea why they did what they did, and they start undoing moves!] >We were >talking about scalar performance, and not vector performance. Certain codes >which are heavily run on Cray machines are scalar and would score high hit >rates in a rather small cache. >I predict that a microprocessor will outrun the >scalar performance of the Cray-1S within a year. The "supercomputers" will >only hold on for those applications which are 99% vectorized, >which are darned >few, and because of this supercomputers will share the computer center floor >with micro based hardware soon, and on an equal footing. > Well, I have more faith in parallel compilation than you seem to. Probably because I've been able to build hardware for some of the best compiler-writers in the world. I *DO* agree that canonical supers are dead ducks, in short order. VLIWs using VLSI to much greater advantage will replace them. Paul K. Rodman rodman@mfci.uucp
colwell@mfci.UUCP (Robert Colwell) (02/24/89)
In article <20821@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov.UUCP (Eugene Brooks) writes: >In article <656@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >>A note from the other side of the aisle. "Even the fastest Crays"? Are >>you kidding? If you believe the Cray-3 is going to be manufacturable >>(an entertaining discussion all by itself) then how the heck do you think >>a micro is going to get 1800 mflops any time soon? I think that's wishful > >Whether or not the Cray-3 is manufacturable, there will certainly be super- >computers with many gigaflops of VECTOR performance in the near term. We were >talking about scalar performance, and not vector performance. Certain codes I gathered that, but I was going to just let it slide. "Vector" performance does not necessarily mean "floating point" performance, and it isn't just floating point that makes supercomputers super. It's also the other things I mentioned. I didn't say the micros will hit a plateau and nothing they ever do thereafter will make interesting applications run any faster. I meant that making balanced systems is just as important for them as for their costlier competition, and that users who see high flop numbers on benchmarks and think it means high commensurately performance on micros may be in for a bigger than usual shock when they try to use I/O or fit their application into main memory. >which are heavily run on Cray machines are scalar and would score high hit >rates in a rather small cache. I guess we could each "prove" our point by judicious selection of interesting benchmarks. >I predict that a microprocessor will outrun the >scalar performance of the Cray-1S within a year. The "supercomputers" will >only hold on for those applications which are 99% vectorized, which are darned >few, and because of this supercomputers will share the computer center floor >with micro based hardware soon, and on an equal footing. I hope you mean that there will be some micro somewhere in a system that achieves a higher throughput on a scalar program, because anything else doesn't count. And there, I further predict that said micro, having achieved this feat, will then have trouble on one of two other counts -- having enough physical memory present to handle the same size jobs that people want, having enough flops to not be embarrassing on more vectorizable codes, and having enough I/O to support all of the above without making the user smash the keyboard in frustration. And all of that at workstation prices. If you go higher in the cost space, then your one-chip solution must start competing with multi-chip solutions that have much more flexibility in their implementation. And as I've argued before, they don't pay all that much a penalty for it either, because systems at these performance levels put most of the implementation dollars into memory and I/O, not CPU. Bob Colwell ..!uunet!mfci!colwell Multiflow Computer or colwell@multiflow.com 175 N. Main St. Branford, CT 06405 203-488-6090
brooks@maddog.llnl.gov (Eugene Brooks) (02/24/89)
In article <661@m3.mfci.UUCP> rodman@mfci.UUCP (Paul Rodman) writes:
A bit of a flame, followed by a statement that VLIW machines will dominate
the world of computing.
Some time back I posted a challenge to VLIW proponents to compile and run a
parallel Monte Carlo code of mine and compare the performance to a box full
of microprocessors. There were no takers. This challenge is still open.
I would also like to see a detailed justification for the statement that VLIW
processors will displace vector processors at what they do best. A VLIW
machine has not yet outrun a vector processor yet on its favorite workload
and I do not see any technology trend that leads to this conclusion even for
the long term.
Is the news software incompatible with your mailer too?
brooks@maddog.llnl.gov, brooks@maddog.uucp, uunet!maddog.llnl.gov!brooks
mccalpin@loligo.uucp (John McCalpin) (02/24/89)
In article <656@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >In article <7330> mccalpin@loligo.cc.fsu.edu (John McCalpin) writes: >> >>I don't mean to pick on CDC/ETA --- even the fastest Cray's are going >>to get caught by the highest performance RISC chips pretty soon. > >A note from the other side of the aisle. "Even the fastest Crays"? Are >you kidding? If you believe the Cray-3 is going to be manufacturable >(an entertaining discussion all by itself) then how the heck do you think >a micro is going to get 1800 mflops any time soon? I think that's wishful >thinking or outright fantasy. >Bob Colwell ..!uunet!mfci!colwell If you re-read my original message, you will see that I am talking about SCALAR code only. Comparing the performance of EXISTING Cray machines to the fastest RISC chips show that the Whetstone performance of a Cray X/MP is not that much faster than a 25 MHz MIPS R-3000/3010 (or an MC88000). Cray may have a factor of 2 better performance (I don't have the numbers right in front on me), which I again claim is not impressive when the clock speeds (118MHz vs 25 MHz) and prices ($3,000,000+ vs $150,000) are taken into consideration. Not all the important codes in the world can be vectorized to any significant degree. I certainly agree with you that micros will never compete with the VECTOR performance of these machines simply because the memory bandwidth is not going to be available. For my large scientific problems (which are >98% vector code), I much prefer the CDC memory- to-memory approach. Having a data cache would be very little help. However, Cray and CDC/ETA machines are not likely to ever be cost- effective on scalar codes, precisely because most of their budget goes into producing huge bandwidth memory subsystems.... ---------------------- John D. McCalpin ------------------------ Dept of Oceanography & Supercomputer Computations Research Institute mccalpin@masig1.ocean.fsu.edu mccalpin@nu.cs.fsu.edu --------------------------------------------------------------------
turner@uicsrd.csrd.uiuc.edu (02/25/89)
In article <4330@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: > The breakthrough I'd like to see, is chip vias. For the hardware- > impaired, what I mean is, I'd like to see signal paths between the two > surfaces of a chip. I'd like to take a stack of naked chips, and then > solder them together into a solid cube. I just heard a talk by Kai Hwang who reported that Hughes labs has produced 4" wafers in stacks of 6 that implement an array of processors 32x32 in size! He only had one slide on this, and he said he had taken it from Hughes. The specs I can remember said that 1" square of the wafer contained circuitry. It was all CMOS, and at 10MHz consumed 1.3W. I believe that the processors they are talking about are bit sliced - but I'm not sure. They have plans for much larger scale (512x512 procs) on 6" wafers, sometime around '93. Meanwhile I personally feel that this type of technology has *lots* of obstacles to overcome. 1 - heat (obviously). 2- fault tolerance to an unheard of degree. Think about it, if a single wafer has a fault in a column the system may have to eliminate the entire column to avoid it! 3- I/O how do you think the problem of pin limitation applies to the square/cube law? Overall, not a pretty picture. But *I* sure won't say it can't be done. --------------------------------------------------------------------------- Steve Turner (on the Si prairie - UIUC CSRD) UUCP: {ihnp4,seismo,pur-ee,convex}!uiucdcs!uicsrd!turner ARPANET: turner%uicsrd.csrd.uiuc.edu CSNET: turner%uicsrd@uiuc.csnet *-) Mutants for BITNET: turner@uicsrd.csrd.uiuc.edu Nuclear Power! (-%
mash@mips.COM (John Mashey) (02/25/89)
In article <7367@pyr.gatech.EDU> mccalpin@loligo.cc.fsu.edu (John McCalpin) writes: ... >If you re-read my original message, you will see that I am talking about >SCALAR code only. Comparing the performance of EXISTING Cray machines >to the fastest RISC chips show that the Whetstone performance of a >Cray X/MP is not that much faster than a 25 MHz MIPS R-3000/3010 >(or an MC88000). Note: we haven't seen any published numbers on the 88K, yet but we generally expect a noticable difference between the R3000 and 88K on 64-bit scalar codes [noticable = 1.5-2X]. Hopefully, DG will provide a nice bunch of performance data to go with their 88K product announcements next week. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.COM (John Mashey) (02/25/89)
In article <665@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: ...... >I hope you mean that there will be some micro somewhere in a system >that achieves a higher throughput on a scalar program, because anything >else doesn't count. And there, I further predict that said micro, >having achieved this feat, will then have trouble on one of two other >counts -- having enough physical memory present to handle the same >size jobs that people want, having enough flops to not be embarrassing >on more vectorizable codes, and having enough I/O to support all of the >above without making the user smash the keyboard in frustration. And >all of that at workstation prices. If you go higher in the cost space, >then your one-chip solution must start competing with multi-chip solutions >that have much more flexibility in their implementation. And as I've >argued before, they don't pay all that much a penalty for it either, >because systems at these performance levels put most of the implementation >dollars into memory and I/O, not CPU. I agree 100% with Robert on the issue of building balanced systems: it's important. I also don't expect widely-available workstations to give supers or minisupers a hard time any tiome soon. [Why? WHen you look at the tradeoffs you tend to make for the volume workstations, you tend to limit them in terms of memory size, I/O, and expandability. Of course, one's idea of workstation certainly goes up over the years.] Workstations are not small servers/minicomputers, which are not large servers/big superminis/small mainframes, which are not big mainframes or mini-supers, which are not supercomputers. On the other hand, you can take the same chips that you might use in a very unbalanced workstation [i.e., lots of CPU, and less I/O], and build fairly powerful, balanced machines with the same technology, and save a tremendous amount of moeny across a product line, even though, in the largest machines, the CPU chips themselves are basically almost free. In the small machines, you may well waste CPU power to lower cost, and not have smart peripheral controllers, etc. In the bigger ones, you may have multiple CPUs, smart controllers, high-powered backplanes, etc, etc. A good example that can be seen right now is the way DEC uses it CMOS VAX chips in different configurations, and I have to believe that it really helps their overall product line costs. The second issue is a more subtle one, which is, that as the volume goes up, the unit costs go down, and this can be seen in the I/O area as well. In particular, cheap things that you wished would exist don't come into being until the systems that want them make sense. Let me give a few examples, observing first, that in the Old Days, if you built computers, it meant you built everything yourself, CPUs, memories, disks, tapes, etc, etc, and if you couldn't do it, you weren't in the business. That's changed a lot; only the largest companies can afford to, and even they do a lot of judicious outsourcing. Now, you can put together some pretty good machines by integrating a lot of other people's work: a) Remember when having an Ethernet controller meant you had a good-sized board full of logic? If the only systems that wanted Ethernet were large superminis, you might not have LANCE chips at reasonable costs. [Why would anybody bother to build them if there weren't going to be volume?] b) SCSI controllers: same thing. c) Very fast 5 1/4" (or 3 1/2") disks: why would you want these if all you had were mainframes [which want huge fast disks], or PCs [which started wanting small cheap disks]. On the other hand, with workstations and fast supermicros, you can get real use from these things, which raises the volume, which drops the price, and makes it worth investing the effort to make them faster. The best of these compete well with ESMDs on some speed metrics, and now people are looking real hard at building big, fast, cheap, disk substystems out of arrays of these (like the UCB work, as just one example). d) High-speed controller boards: people like Interphase build some pretty fast controller boards. Who uses them? People who have 1) fast CPUs 2) nonproprietary I/O busses People who have slow CPUs naturally choose lower-performance controllers. People who have proprietary I/O busses spend a lot of money building their I/O systems (and sometimes, necessarily so to get performance they want, or, perhaps, redundancy or other special functionality.). But look what happens when you get cheap, fast CPUs in systems that use industry busses. All of a sudden there's a market there for somebody who wants to supply controllers, and there might be enough volume to justify the effort, and although the high-performance controllers might command a premium price, the costs of these boards are much less than the corresponding costs of more proprietary ones, if the latter must be engineered for lower-volume products, even in the same performance range. What you see is a pattern: a) People will build high-performance I/O products, if there are systems they make sense in, and the costs will go down. b) If cheap CPUs keep getting faster, there will be more pressure to boost the I/O performance (cheaply), and that creates markets for people doing higher-integration-level VLSI to fill the demand. So that brings us back to the original discussion: if you build systems out of the same (or a small number of different, like CMOS & ECL) VLSI CPUs, the lower part of a product range gets drastic percentage savings [a large board or two less is serious business], the middle can take the cost savings and put some of it in I/O, and the top gets the benefit of spending little engineering effort on the CPU, and can take that effort and also put it into I/O. Maybe all of this gets enough volume that people can say "we should have a super-whizzy bus chip, and we can now afford to build it." (There are more words than I like in this, and I'm not sure I've communicated the industry-interaction effects as well as I'd like, so maybe somebody else can say it better.) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mcdonald@uxe.cso.uiuc.edu (02/25/89)
>Subject is possibility that microprocessors will beat Crays in
speed sometime soon. (For scalar code.)
Remember that Cray's are in a sense special purpose machines.
For some purposes (i.e. some type of calculations) a lowly 386 PC
will beat any single processor Cray. What purpose? Can you say
"integer remainder instruction" bound code? [Some math problems
fall in this category.] One of my most common programs almost
has my PC beating a Cray (but not quite). It is hopelessly scalar,
has gigantic arrays (of numbers which never get bigger than 200),
totally integer, and has just enough remainders to make the Cray
unhappy.
Doug McDonald
colwell@mfci.UUCP (Robert Colwell) (02/25/89)
In article <20862@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov.UUCP (Eugene Brooks) writes: >Some time back I posted a challenge to VLIW proponents to compile and run a >parallel Monte Carlo code of mine and compare the performance to a box full >of microprocessors. There were no takers. This challenge is still open. As I'm sure you know, Eugene, if there were a potential sale behind this challenge, our level of interest would rise considerably. As it is, just what do you think you'd be entitled to conclude if we DID run something that was tailored for a vector box and came up short? How about this: nothing at all. A VLIW isn't a replacement for a vector machine. It's a different way of computing that does very well on vector code, but also does well on code that chokes a vector machine. Because of Amdahl's law, this situation arises more often than not. So what's your point? >I would also like to see a detailed justification for the statement that VLIW >processors will displace vector processors at what they do best. A VLIW >machine has not yet outrun a vector processor yet on its favorite workload >and I do not see any technology trend that leads to this conclusion even for >the long term. The first answer is to wait and see. What is your rejoinder when it is pointed out that the TRACE (as an example of a VLIW) routinely achieves fractions of a Cray XM/P far out of proportion to the difference in cycle times? I'd imagine the only thing you can say is that you suspect there's something fundamental in the design of a VLIW that will always make it require a much longer cycle time. You can think that if you want to. If you want the opinions of the folks who designed the TRACE, we think that's hogwash. Arguing in your style, name me a vector machine that can touch a dedicated single-algorithm digital signal processor on IT'S favorite workload. Right, none. Did I just prove that digital signal processors are the wave of all computing in the future? Bob Colwell ..!uunet!mfci!colwell Multiflow Computer or colwell@multiflow.com 175 N. Main St. Branford, CT 06405 203-488-6090
rogerk@mips.COM (Roger B.A. Klorese) (02/26/89)
In article <661@m3.mfci.UUCP> rodman@mfci.UUCP (Paul Rodman) writes: >I'm waiting for the >day when somebody announces a SPARC or MIPS based processor with a vector >unit! :-) :-) You mean, like Ardent?! -- Roger B.A. Klorese MIPS Computer Systems, Inc. {ames,decwrl,pyramid}!mips!rogerk 928 E. Arques Ave. Sunnyvale, CA 94086 rogerk@servitude.mips.COM (rogerk%mips.COM@ames.arc.nasa.gov) +1 408 991-7802 "I majored in nursing, but I had to drop it. I ran out of milk." - Judy Tenuta
rodman@mfci.UUCP (Paul Rodman) (02/27/89)
In article <20862@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov.UUCP (Eugene Brooks) writes: >Some time back I posted a challenge to VLIW proponents to compile and run a >parallel Monte Carlo code of mine and compare the performance to a box full >of microprocessors. There were no takers. This challenge is still open. A "box" full of microprocessors? Look, if you've got some money, call a salesman and benchmark your code. If you like the results buy the Trace. Don't conclude that just because there were no takers that you have "proved" anything. Secondly, if somehow you've managed to port your application to a "box" of micros, congratulations, but most folks don't have the time to do such things. Most computer users have large programs that couldn't be ported to a "box" of micros in month of Sundays. Eventually, *more* use for multiple-cpu systems will find its way to more and more to rank-and-file computer users (for something more than time-sharing peformance.) When that happens you'll still be better off with a small number of faster machines (VLIWs) than a large number of slow ones. Unless, of course, you have just the right application. > >I would also like to see a detailed justification for the statement that VLIW >processors will displace vector processors at what they do best. Why would I need to prove such a silly thing? Why shouldn't *you* have to prove that vector processor can even come close to VLIWs at what *they* do best. (A much large set of programs, by the way.) Would you like to write vector programs, or programs with parallel expressions? The latter is much easier. Even if your vector performance is some fraction less than a vector machines unless the users application is incredibly vectorizable, you'll easily win due to non-vector speedups. I'm sure you've read Olaf Ludbeck paper from Los Alamos, haven't you? What does that paper tell *you*? It tells me that even if the vector machines at Los Alamos had *infinite* speed vector units the speedups are dismal. In a *decade*+ of programming vector machines, some of the best scientists in the world haven't improved the percent vectorization. >A VLIW >machine has not yet outrun a vector processor yet on its favorite workload >and I do not see any technology trend that leads to this conclusion even for >the long term. > That's because you aren't looking. Vector compilers are topped out, in case you haven't noticed. Vector compilers have been around a long, long time and have gotten quite good. This is *bad* news for vector compilers , not good news. A decent VLIW+compiler has only just popped on the commercial scene, relativly speaking. Do you think that we are standing still? Paul K. Rodman rodman@mfci.uucp
rodman@mfci.UUCP (Paul Rodman) (02/27/89)
In article <7367@pyr.gatech.EDU> mccalpin@loligo.cc.fsu.edu (John McCalpin) writes: > >If you re-read my original message, you will see that I am talking about >SCALAR code only. OOOOOOhhhhh, ok, you mean you want permission to ignore Amdahl's law, do you? Also, I keep trying to get you guys to stop using the word "SCALAR" when what you really mean is "a small amount of parallelism". This is an extremly sloppy situation. What if my vector lengths are of length 2? Aren't you going to stand by your claim? You will? Then DON'T use the word SCALAR, please. You mean't so say "no parallelism". The whole misuse of the terms "SCALAR" and vector on this net just underlines the lack of understanding about what makes computers slow, or fast. On the one hand I've got folks trashing on me saying that "box"es of micros are going to be faster than hell due to all the parallelism in programs. ON the other hand I've got risc/micro guys saying "well as long as there is no parallelism, we'll beat a CRAY!". :-) :-) > >I certainly agree with you that micros will never compete with the >VECTOR performance of these machines simply because the memory >bandwidth is not going to be available. Then how come I have to answer flames that claim the opposite? :-) >For my large scientific >problems (which are >98% vector code), I much prefer the CDC memory- >to-memory approach. Having a data cache would be very little help. >However, Cray and CDC/ETA machines are not likely to ever be cost- >effective on scalar codes, precisely because most of their budget >goes into producing huge bandwidth memory subsystems.... However, the ETA machine is fundementally damaged in its ability to do other than stride 1 accesses. Presumably they will fix this someday. You may simplify your statement: "Cray and CDC machines are not going to be cost effective." :-) Paul K. Rodman rodman@mfci.uucp
rodman@mfci.UUCP (Paul Rodman) (02/28/89)
In article <13888@admin.mips.COM> rogerk@mips.COM (Roger B.A. Klorese) writes: >In article <661@m3.mfci.UUCP> rodman@mfci.UUCP (Paul Rodman) writes: >>I'm waiting for the >>day when somebody announces a SPARC or MIPS based processor with a vector >>unit! :-) :-) > >You mean, like Ardent?! Sheesh, I posted that mail expecting a farrago of replies and recieved only 4, 2 of which were from MIPS folks (the Ardent uses MIPS chips). None were from Ardent. I guess they don't read comp.arch.....they must be smarter than I thought...:-):-):-). Paul K. Rodman rodman@mfci.uucp Calm down?! Calm down!? But.., but..., I AM calm!....
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/28/89)
In article <661@m3.mfci.UUCP> rodman@mfci.UUCP (Paul Rodman) writes: >In article <20821@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov.UUCP (Eugene Brooks) writes: >>computers with many gigaflops of VECTOR performance in the near term. >Please stop using the word VECTOR. use "large data aggregate" or "parallel" >instead. There are many, many problems that are not vectorizable but It makes sense to use VECTOR when that is what you mean. The poster was correct - higher speed vector processors are, in a sense, a known quantity. Given a level of technology, it is a known problem of how to build a high performance vector processor with a known behavior relative to a scalar processor of the same technology. Parallel processing is not, yet, equivalent, though the number of parallel approaches to solving problems is increasing. >day when somebody announces a SPARC or MIPS based processor with a vector >unit! :-) :-) I have not looked at the SPARC architecture in sufficient detail to know whether a vector processor that is upward compatible with SPARC is a good idea, but I suspect it is. After all, the Cray machines and the CDC Cyber 205, without their vector capabilities, are high performance "RISC" machines (load store, instruction set which is easily pipelined, simple addressing modes, simple R-to-R instructions). Anyway, a vector micro makes perfect sense if you can build a data path into and out of it that is wide enough. If VLIW matures, you can expect to see some VLIW "vector" micros. There is already at least one machine that is close to this - the Weitek 64 bit vector micro. >I *DO* agree that canonical supers are dead ducks, in short order. >VLIWs using VLSI to much greater advantage will replace them. I'm not sure what a "canonical super" is, but VLIW machines are still SIMD, like vector machines (of which they are a generalization from a certain point of view). A "true parallel" machine would be MIMD. Like the Cray X-MP and its successors, which allow true parallelism, but with a relatively small number of processors. I note that Cray, CDC/ETA, Convex, etc., all seem to have concluded that a vector (or possibly VLIW machine in the future) processor makes a jim dandy building block to build a parallel machine out of, but that building a purely parallel machine with only SISD sub-processors is not optimal. Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
lm@snafu.Sun.COM (Larry McVoy) (02/28/89)
In article <675@m3.mfci.UUCP> rodman@mfci.UUCP (Paul Rodman) writes:
$ I'm sure you've read Olaf Ludbeck paper
$ from Los Alamos, haven't you? What does that paper tell *you*? It tells
$ me that even if the vector machines at Los Alamos had *infinite* speed
$ vector units the speedups are dismal. In a *decade*+ of programming vector
$ machines, some of the best scientists in the world haven't improved the
$ percent vectorization.
I really don't have a lot to add to this other than to repeat it. It
seems that a lot of people out there think VECTOR is the word of God. I
remember this paper and I remember various discussions over coffee when I
worked at ETA. The conclusion was then and is now "if you can't scream on
non-vectorizable, integer code, forget it" (ETA can forget it). You want
a fast Unix box? Get an Amdahl - 30 MIPS and the I/O to go with it. You
want to build your own? Concentrate on I/O and integer performance.
That's your bread and butter.
Larry McVoy, Lachman Associates. ...!sun!lm or lm@sun.com