dan@ccnysci.UUCP (Dan Schlitt) (03/03/89)
At long last I have some time to sit down and summarize the responses that I got from my request for information about minisupercomputers. A week at Usenix can really disrupt things. I want to thank the people that responded. The information was very helpful to me. It also was voluminous. I am going to edit it severly and rearrange it some. It is still very long. Also some of the replies were pretty candid. Thus I will not identify the comments by author. Thanks to those who replied: dean@violet.Berkeley.EDU (Dean Pentcheff) sun!pyramid.pyramid.com!csg (Carl S. Gutekunst) peregrine.COM!chris@uunet (Chris Cole) kurt@pprg.unm.edu (Kurt Zeilenga [LANL]) m4@ux.acss.UMN.EDU Timothy R. Giebelhaus <cmcl2!uunet!hi-csc!giebelhaus> Michael Nittmann <ZREN@DS0RUS1I> cmcl2!amax.npac.syr.edu!vicki (Vicki A. Haenel) Charles E Nove <cnove@unix.ETA.COM> Jean Marie Diaz <oracle!jdiaz@uunet> cucard!uiucuxc!uicsrd.csrd.uiuc.edu!kai (Kuck And Associates) geaclib!daveb@uunet (David Collier-Brown) apollo!jps@EDDIE.MIT.EDU (Jeffrey P. Snover) sun!convex!c1apple!sawka (Walter Sawka) uunet!seismo.CSS.GOV!sundc!sun!khb (Keith Bierman - Sun Tactical Engineering) mike@degobah.nmt.edu (Mike Ames) sun!convex!williams (Bradley Williams) Michael J. Tighe <cmcl2!uunet!super!mjt> Ron Johnson <cmcl2!lanl!hc!arizona.edu!catuc!somewhere!ron> seibel@zeno (George Seibel) wolters@erim.org (Bill Wolters) davidsen@steinmetz (William E. Davidsen Jr) cmcl2!uunet!seanf (Sean Fagan) RICHARD KERSHENBAUM <uunet!kuhub.cc.ukans.edu!RICHARD> uunet!ucsd!celerity!whoops!dave (David L. Smith) sns@DEImos.Caltech.Edu (Sam Southard) Chuck@UNLCDC3 Now you see why I need to edit it. I also got *LOTS* of calls from salescritters. Judging from the persistence this is a tough market these days. You will recall that I asked: >We are not looking for a machine to take over our load of general purpose >computing (e.g., replace our vax) nor are we looking for a workstation >(graphic or otherwise). What we need is some thing that will fly when doing >numerical computation and simulations etc. >We would strongly prefer something with a 4.3BSD Unix-like operating >system. It must have a good optimizing fortran compiler. >To make things even more definite, suppose the total system cost must >be in the range of $500,000 to $1,000,000. The hardware should not be >at the top of its migration path -- we should be able to add >processors, memory, disk, etc. to make it bigger and better in the >future. >Now the added questions. What about Multiflow? Any experience with >them? And how about the hypercube designs? How do they compare with >the more straightforward multiprocessor machines? Here is the summary. It represents the OPINION of the person who sent it to me. If I messed it up by editing then I take the responsibility for that. But PLEASE, if the you think that the computer you sell got a bum rap, DON'T call me. I will check stuff that is important to me. I will start with systems that seem to fill the request. Comments about ones that seem not to be in that class will be toward the end. """ That leaves Alliant, Amdahl, Convex, Cray, Elxsi, and Gould. Of course, Cray and Elxsi and maybe Amdahl are out of your price range. There's also the new class of minisuper startup companies, but most of those are evaporating as fast as they're appearing. For example, Saxpy. """ If you want a solid one, check out convex. (software and hardware!) """ Given the Fortran compiler, check out the Convex. """ [....] I know some people who use the Convex C series and like it for fortran number crunching. It runs unix and you could get a system with plenty of room to expand into for about $600k. The compilers do have parallel processing ability also. I can't quote prices, but your range seems reasonable. """ I strongly recommend the Convex family of computers. regularly use both Cray and Convex. I find the Convex compilers to be excellent, while the Cray compilers are junk. In fact, I have many programs that run faster on a Convex than on anything Cray offers. """ We chose Convex after quite a bit of evaluation. It does very well for vector work, quite well for scalar work. """ A little over a year ago, the Caltech Astronomy Department purchased a Convex C1-XP minisuper to take over their main crunch jobs (primarily image processing in fortran). It has a BSD operating system (primarily 4.2 with some 4.3, I forget which parts). The fortran compiler has two levels of optimization - scalar and vector. We have encountered few bugs in the compiler, usually of the sort that causes the compiler to abort and can be worked around by compiling only with scalar optimization. This is unfortunate, but these bugs have not occurred in any tight loops so far (with HEAVY use). We have had only one bug that actually caused incorrect program behavior, and that was when the programmer was freely going between logicals and integers, which is pushing it if you ask me. Convex supplies an easy way to report any bugs that do occur (it is designed to be used with a uucp connection). Once the bug report is received by Convex an automatic reply is generated. Convex also informs you of the progress of the bug report. Usually the bug is fixed in the next release. The C1-XP is at the bottom of their migration path, and falls at the bottom of your price range (or it was when we purchased it). You can either add processors or go to the C2. Unfortunately, I don't think they may C1s any more. Convex claims that the peripherals from a C1 can simply be attached to a C2, but we can not speak from experience on that point. All in all, we are very pleased with the system. When running in scalar mode it is not as outstanding as we hoped. but with vectorized code it really flies. """ >dws> And for all you folk who must deal with VAX/VMS... """ The largest problem we had was due to most (all?) of our users being familiar only with VAX/VMS. However, this was eased by Convex as much as possible: they have a large (and increasing) number of "COVUE" products, which are products which emulate VMS. We use COVUEnet (Decnet, including set host and file transfer), COVUEedt, and the COUVUEshell (DCL). We have found a few bugs in them, but COVUEnet presently has a very small (~40) limit to the number of nodes it can know about at any one time. This, however, is soon to be fixed. The other large problem we had was in code conversion. Any code you have for a VMS system that relies on VMS will require a significant effort to port (as I'm sure you would expect). The large problem areas were with byte & word swapping for binary floating point numbers and VMS-specific calls (QIOs and such). """ It sounds like your requirements are almost identical to ours. We're currently leaning to either Elxsi or Convex. Convex is more stable financially, and is very well established in certain markets (notably medicinal chemistry). Elxsi, though, is hungrier and is willing to write off 50% of list price. Their system is also expandable to 10 cpus. They haven't yet shipped their new 6460 processor, though. """ I would suggest an Elxsi. Very nice computer. Very fast computer (about 80% as fast as a Cyber 170/760, running real programs with real I/O). It has good Pascal and Fortran compilers, with good libraries and good optimizations. It runs an OS called 'Embos,' which I think is *very* nice (better than Unix), but it also comes with both SysV and BSD 4.2 (although they supposedly have 4.3 already. You might be able to get that, instead). I don't know if their Fortran compiler runs under BSD, or just under Embos. If so, it probably won't take you long to get used to Embos (\ instead of .., [command] instead of `command`, and the commands are different. Other than that, it looks like unix [I know, I know. Sounds strange, right? Well, you can create aliases that have the same names as unix programs, and you will feel right at home]). CalState has an Elxsi with 2 cpu's: a 6410 (their first, "slow" processor), and a 6420 (their second, "faster" processor). The machine can have up to something like 8 or 10 processors (I forget which), and up to either 2 or 4 IO processors (again, I forget which). They also have announced another processer, the 6460, which, although they don't have 'em available right now, if you buy one, they will give you 3 6420's (so I guess they feel that's the relative speed). The OS can automatically take advantage of multiple processors, and your programs can, if you use the libraries. You can also restrict certain people/processes to certain processors/processor classes, if you wish. """ I would suggest looking into Alliant computers. We are very happy with them. One, they are quite fast single treaded. This is VERY important to get good speedups. It parallel compiler gives you decent speedups without any effort on part of the programmer (especially useful if you're in a non-CS shop). """ My PERSONAL knowledge and experiences are: best supermini fortran is in Alliant's products. Vectorizing and and and... . """ Given what little I know about your environment, and what you need, I'd have to recommend the Alliant. [They have an FX/80.] It has a terrific optimizing fortran compiler... you can also add hand optimizations. The compiler tells you when it can't optimize a loop, so you know where to fine-tune your programs. It has really great messages to aid in optimization. It's a very stable machine.. I'm the systems administrator for it and the multimaxes, and the hardware rarely gives me any problems. The OS is called Concentrix, based on 4.2 with many 4.3 extensions... it has NFS, and some 4.3 networking. The company has recently bought Raster, and seems to be leaning a little towards graphics, but I'm not really sure how far that will go. There are several third-party products that run on the Alliant, like eispack and linpack. Don't quote me on this, but I believe a fully configured fx/80 benchmarks at about 188 whetstones. It has a parallel ada compiler, they're working on an optimizing C compiler, it has X, NeWS, etc. It's a pretty robust environment. """ Sounds like you are describing the Alliant FX series. We have an FX/8, and it is a really quick number cruncher. There are three kinds of processor resources on the FX/8 (like ours). The IPs (interactive processors), CEs (computational elements), and the "complex", which is a group of the CEs working together in parallel. You can configure the complex however you want. Memory comes in 8 or 32 Mb chunks up to 1 Gb. Their fortran compiler has a good vectorizer. The IPs are 68020 processors that are used to execute system daemons and interactive work (when the CEs are busy, that is), and handle all i/o to external devices. You can have from 1 up to 24 of these guys (we have 3). The CEs are the big guns, Alliant's own design (instructions are a superset of 68020), and you can have from 1 to 16 of then working as a in parallel on one program, or on separately programs, or on a mix controlled by scheduling parameters you decide. On our machine, we have all 8 of our CEs connected to the complex, but because we have to support more interactive and single threaded programs than parallel programs, we give a 33/66 split to complex/single threaded jobs. Sorry if I'm confusing you about this, but it should be enough to know that you have control over the hardware configuration and how programs are scheduled on it. The operating system is Concentrix, a concurrentized UNIX *4.2* BSD. They haven't (as yet) come out with a 4.3 level system, but I do know that the next major release is due out very shortly, and I am hopeful. An approximate price for a basic system configuration (disk, tape, OS) with 2 IPs, and 1 CE system is (I think) around $250,000. A full blown FX/8 with 1 Gb memory, 16 CEs and 24 IPs will probably run somewhere around $2.5 M. We got ours in Sept 1986, and while I will admit they had some software problems, they have all been worked out now. Cleanliness of electricity was a problem for us (not Alliant's fault), but now we have a power conditioner for our computer room, and the machine runs fine and stays up. Their monthly maintenance costs are outrageous. """ But first, do you insist on spending $500K ? If you have vectorized, contiguous memory sorts of problems, consider spending about 150K on a Stellar today, and see how you do with 4 heads of that. In my hands, one head is near a Convex C1, which is to say approaching a tenth of an XMP4/8. (that's my code though, your mileage *will* differ.) And it has *four* heads! (some degradation in performance on multiple jobs; you wont ever see 4x single head performance unless code is very scalar, in which case it's slow anyway.) From my vantage point, anything over about 250K is a waste of money. This class of machine is improving too fast. I've been in this field long enough to see a lot of dollars go down the drain. I've used Crays, FPS's Convexes, Vaxen, Irises, Stellars, Cybers, Suns, PC's... quite a bunch of hardware. Much of my work is heavy duty simulation - Molecular Dynamics mostly. The best clue I can offer is this: multiple processors are the way to go. We have a 4 headed Stellar here now, and we love it. The reason is that I always have my own processor. I don't make value judgements about other people's work, because they don't impact me. I don't feel bad about abusing one cpu, because there are 3 others. The nice thing about multiple smaller heads is that you have a simple upgrade path - buy another box. """ I do not have any first hand experience with any minisupercomputers (alas), but an article in the January 1989 issue of IEEE's Computer magazine on the Cydra 5 minisupercomputer sure made it sound good. Rather than restate the entire article, which is definitely worth reading, I'll try to summarize some of the high points. - A 68020 based, integrated front end running Cydrix, Cydrome's Unix V implementation. - 15.4 Mflops sustained on Linpack - 5.8 Mflops sustained on Livermore Fortran Kernels - An entry level unit price of around $500,000. """ >dws> A couple of systems have strong partisans. (Who happen to work for the companies. But the info may be useful.) """ Why bother with a minisuper when you can get a supercomputer for under $1M? In my humble :-) opinion, I'd suggest looking at an ETA10-P. The ETA10 line uses the same CPU board from the lowest speed air-cooled single processor configuration to the highest speed liquid nitrogen-cooled eight processor system (a 27:1 performance range -- now *that's* a migration path!). Available with System V (including bsd4.3 networking extensions and SUN ONC/NFS), FORTRAN, and FORTRAN vectorizer. """ Well, if you don't want an unbiased opinion, I can tell you to get an Apollo DSP10000 with no reservations. It only goes to show how poor Apollo's marketing is that you did not cross post to comp.sys.apollo. The DN10000 has a model which will do graphics very nicely, but I think you will find the DSP10000 to be a great number cruncher (the dps model does not even have a display device of any type). Don't stop considering the DSP10000's just because the don't cost much. I don't know enough about you application to say they are what you need either. If you are going to have 20 people all doing major number crunching, it would require several DSP10000's. They do network quite nicely so it really wouldn't be much of a problem. """ Without a doubt, check out Apollo's DN10000! It is a truly wonderful machine. I runs bsd4.3 and sysV Unix at the same time. I must admit, when I heard this I thought "sure, it probably screwed up both environments" but they (we) didn't. It really works. One other thing I wanted to mention - the DN10000 is 100.0% source code/binary data compatible with our 68k-based workstations. That means that people can develop/run/debug/repeat code on cheap (~5k) private workstations and then run the "mega-run" on the DN10000 - """ I feel that the FPS Model 500 is a pretty good machine. The Unix port is solid, mostly 4.3ish with some Sys Visms, etc. Up to four processors, mix and match scalar and vector. (I think you have to have a scalar for each vector, but not vice versa). Prices start around $250,000 ranging to $1M, more or less. The machine is very similar architecture and software-wise to the Celerity 12xx series, but a lot faster, so if you know anyone with a 1200, you can get a feel for what the 500 is like. """ >dws> There isn't much info in the message but the machine may fit. """ I was in a similar situation a few years ago (expected to know something about superminis for fast numerics and simulations). To make a long story short, we ended up with a Data General MV/10000. """ >dws> I didn't have anyone mention this machine, but the salescritter showed up and it sounds of interest. >dws> The machine is the ZS-2 and is manufactured by Astronautics Corporation of America of Madison, Wisc. ACA is a 30 year old privately held company that sells instruments, communication and navigational systems to the US Gov. This is their first general purpose computer. They claim an $800,000 two processor system that does 45 MFLOPS. It uses PP's a la CDC to handle the I/O. Their main claim to fame is a "decoupled" architecture and a high degree of pipelining for their main processor plus a good floating point unit. The operating system is 4.3BSD with NFS. If the truth were told, they are really only in beta test with the system. But now you know they exist. >dws> Now for systems that don't seem to fit the requirements. The next person has rather definite opinions bluntly stated. """ Well, you can certainly rule out anything made by Sequent or Encore; these are just multiuser engines with the floating point capability of a Compaq. The DEC 8700 is sort of the industry reference for minicomputer floating point, but that's still two orders of magnitude below a Cray; hardly "super." So you can rule out DEC, too. You can also rule out Arix, CCI, NCR, Sperry, and Pyramid. The CCI Power 6/32 is a little better than a VAX 8700, and the Pyramid is a little worse. You know, this sounds funny, but you should also consider the MIPS boxes. I suppose these would be classified as workstations, but they have floating- point capacity that takes a VAX 8700 to the cleaners, and solidly beats the Alliant on non-parallizable problems. """ >dws> And some agree and others disagree. """ The Sequent Symmetry is a multi-processor with up to 30 Intel 80386 processors. It is much more like a general purpose computer than a supercomputer, unless your problem can be made parallel. It supports parallel symbolic debugging in C. It is very reliable and supports Unix 4.2+ BSD. It will be moving to System V.3 with Berkeley enhancements. """ (Sequent would be my other suggestion, and they have a better Fortran compiler on the way, but it ain't here till it's here....) """ On the Encores and Sequents, you get killed when your not running with all CPUs and since the time running with all cpu's is short, the average speedups are not at all what you would expect from an N computer system. """ I have an encore. The software stinks. Bugs with every compiler, etc. [p.s. we've had it 2 1/2 years, and it has gotten leaps and bounds better, it just has never gotten anywhere near "perfect" -- you know, one serious bug per month...] """ For problems lending themselves to concurrent processing look at the Encore. We have a couple for file servers, and every once in a while I run something on them... FAST!! We haven't looked at their vector processing units, so I can't say. """ >dws> And now for hypercubes and VLIW computers. """ Hypercubes are fun. I worked with the guys at JPL who built the first cubes (it was a joint Caltech/JPL project), and they are still building new ones (first was 8086/7 based, as was the second (which Intel turned into a 286/7 based commercial product) and the third was (is) 68020 based. Fox, Lyzenga et al., have a fine book on the algorithmic considerations, "Solving Problems on concurrent processors" so far only vol I is out. Although Fox strongly disagrees, it is the general consensus that most scientific computing is not yet ready (or visa versa) for cubes. Programming them is non-trivial (though much, much easier than it was), and I/O bandwidth is usually very poor. Commercial cubes come from Intel, Ncube, Ametek and sort of the Connection machine and the upcoming machine from MASSPAR. Intel is the Caltech/JPL 2nd generation cube, with Intel advances. The chief algorithm expert gave up, he is now at Ardent (Cleve Moler) NCUBE is an Intel breakaway. It seemed like they were pretty smart, and were aware of the Intel problems, but this was some years back, and I've had no contact with the actual product. Ametek is a clone of the Caltech/JPL third generation machine, with commercial enhancements. Connection machine is much more expensive, and is based on very primative nodes (JPL believes in big nodes). MASSPAR will be a CM cheapie, with improvements. If your primary interest is algorithm research, and your I/O demands are minimal, all should be considered. If you are more interested in solving actual problems, a cube is probably not your best bet for a while (although with enough gradual students the difficulties are easily overcome ... this is the Fox approach). The MF 28 functional unit machine, or a 2x clock rate machine should be forthcoming shortly (based on my projections, not any inside info!@) A good man to talk with there is Chani Pangali mfci!pangali@uunet.UU.NET He used to be with Amdahl Fujitusu, is a chemist of some repute, and has worked with various array processors and vector machines. The sun4 binary compatible supercomputer should cost about $1mil and should be in beta in December. For more info about that, check with primsa!kolstad@uunet.UU.NET Happy Hunting! """ Wellll... it depends on what you want to run. Is your application load highly vectorized, or is it pretty scalar? Multiflow is interesting for scalar problems. """ We have been using a Multiflow TRACE 7/200 for about 10 months now. This is Multiflow's entry level system that supports up to 7 operations in one instruction using a VLIW architecture. (To be more specific, it is a SIMD architecture). It is field expandable to 28 operations per instruction by adding six more processor cards (they come in pairs -- an integer processor and a floating point processor for a total of 7 operations per board pair) but requires a recompilation if you want to take advantage of it, and if your application exhibits sufficient fine grained parallelism (vector operations are also examples of fine grained parallelism). As you may have guessed, our interests were also in finding a moderate cost (VAX price range) computer that could address computationally intense scientific and engineering problems in signal and image processing, simulation and other areas. I am very new to UNIX (mostly VAX/VMS background), but I am told Mutliflow is very vanilla UNIX 4.3BSD. As an example, we have had no problem porting numerous quasi-public domain packages over to the TRACE. Basically, you can use a TRACE like any general purpose UNIX machine. You need not be aware of the underlying VLIW architecture. Just write your code as usual (the TRACE will reward you for well structured code). Their FORTRAN and C compilers seem to do a super job of optimization, but they do take a little longer to compile. The compilers also seem reliable. We sometimes do our development on VAX/VMS or an IBM PC to take advantage of more user-friendly and familiar program development environments and faster compile times, then upload code after it's debugged. This arrangement works pretty well. One advantage here is portability to other machines to achieve varying price/performance objectives. This is wonderful if you've ever had the "pleasure" of working with array processors or other distributed processing environments, or on proprietary hardware and software platforms. I believe in letting the computer industry work for me, and I like to preserve portability and freedom of choice to capitalize on new technology without rewriting software. Whether this makes sense for you may be dependent on the amount of production processing you anticipate versus new software development and limited runs. The TRACE is field upgradeable to a model 28/200 and (I believe) can go to 512Mb. We have (2) CDC 1.2Gbyte disks for mass storage, a 9-track tape transport, 2.2 Gb Exabyte cartridge tape, and Ethernet. There is room for more controllers on the VME bus used for peripherals. If fast disk I/O is a concern, you can also consider disk striping. I believe a second VME bus can also be added. The whole system only takes two bays, and the CPU chassis is essentially empty (room for more memory and processing cards). They are also offering high density 8" Winchesters. Keep your eyes on the trade rags in the next few months also. Now, for the question that everyone asks: How fast is it? The answer is: it all depends. I didn't buy it only for performance, but we have experienced speeds on our 7/200 comparable to about 35 VAX Mips (about six times faster than an 8550) for some applications. We have also achieved performance on an application comparable to a 30 MFLOP array processor. The applications and compiler technology accounts for some of the difference. We also have seen applications that run only about 8 VAX MIPS. It depends strongly on how much fine grained parallelism you have in your application, but we never concern ourselves with it, except for some production processing. Then, you just run the profiler, find out where you are spending your time, then use some of the techniques that are available such as code inlining (eliminate procedure calling overhead and gives the compiler more opportunities to move code around), more aggressive loop unrolling, etc. Normally, the default compiler switches are quite adequate, and we hardly ever have to play with the source code. You also inquired about hypercube designs, of which there are numerous products. Again, whether these are good for you depends on what concerns you want to address. In general, I believe hypercubes are best suited for production applications that exhibit a high degree of coarse grained parallelism, and I think in most cases YOU must write your code to take advantage of it. There are some low cost hypercube-like systems, some even offering software controlled crossbar switches for node interconnect reconfigurability. There may also be issues about memory capacity per node, internode communications bandwidth, multi-user capabilities, program development environment, and so forth. For other reasons, I was also not really interested in vector oriented machines. As I mentioned, we've been using vector processors (well, array processors) for some time. The break even point for scalar vs vector processor may be very high. And the run time may still be dominated by the scalar code (Amdahl's law). Multi-processor systems might be OK if you simply want to support a large user community, but it doesn't do much when you want to focus your computational horse power on a specific problem, even though most new multi-processors have threaded operating systems that can run in parallel on any available processor. Time to solution is still important though, as well as total computational capacity. """ >dws> Finally, something that seems like fine general advice. """ And some very personal advice: do only buy where you can get the SW you need NOW, and where it exists NOW and is NOW in use (references). There are many people out there that tell you what you will get when the machine will finally be delivered up to it's specs, but regrettable cannot be shown NOW. Only buy where you and your scientists can see NOW what they will get as a SW. Don't trust promises like "we shall port this for you", "the actual system shall do this right", "the machine you will get will come up to this specifications, just this model cannot". A machine bought cheaply mostly becomes the most expensive device of a lab or computer center. The most important things are not cycles per second but performant software, perfect networking integration and accessibility, good tools for programmers and system administration and a very good customer support. Because: BAD SOFTWARE USES UP GOOD CYCLES and drives even a fast machine down to snail's performance. >dws> I hope that you find this summary useful. -- Dan Schlitt Manager, Science Division Computer Facility dan@ccnysci City College of New York dan@ccnysci.bitnet New York, NY 10031 (212)690-6868