KROWITZ@MIT-MARIE.ARPA (01/16/86)
Our lab here at MIT does some fairly long (eg. 7 days of CPU time on a DN660) calculations, so we went through the exercise of looking at array processors (eg. Numerix 432, CSPI 6420, FPS 164) and fast standalone computers (eg. VAX 8600, Convex C-1) early last Spring and Summer. We had a number of constraints on our decision: 1) we needed fast performance on double precision (64 bit) floating point arithmetic. Some of our calculations are sensitive to round off errors. 2) the machine had to be able to co-exist with our Apollo workstations. The interactive graphics of the Apollos is a feature we are not about to give up to give up. The machine either had to be attached to an Apollo node (ie. an attached arrary processor) or had to have ethernet access. 3) the performance advantage had to be at least a factor of 5 better than the DN660, otherwise it wasn't worth the trouble. 4) we only had $150,000 to start with. The minimum machine configuration including software had to be something which could be bought on the budget of a single professor. A machine which offered a growth path to higher performance levels (above 5 X DN660) was desirable. Scripts Oceanographic Institute in La Jolla, CA has an Apollo ring similar to ours here at MIT and does similar work with it. (in fact the professor I work for and most of our post docs came from Scripps) They have been trying to attach a Numerix 432 array processor to one of there nodes for awhile with the classical limitations of an AP: 1) there is a time delay while your data is converted to the internal floating point format of the AP and is loaded into the AP's memory. 2) the AP company has to develop special hardware and software to link their AP to the Apollo hardware and software. Most AP companies make their machines to hook onto VAX 11/780's, not Apollo's. 3) ApP's are good at doing vector and matrix arithmetic and poor at doing I/O and scalar code. Our programs have a mix of scalar and vector code. If an AP has an infinitely fast vector unit, but only a 1 MIPS scalar unit, and if your program is 80% vector code (a very high fraction) you will get only a 5 to 1 speed up at the best. (assuming that the DN660 is roughly 1MIPs). Scripps had problems getting the Numerix machine to work reliably with the Apollo (apparently software interface problems), and it is only a 32 bit floating point unit -- much slower when doing 64 bit arithmetic, so we wrote Numerix off. We saw a presentation by CSPI on their 6420 which claimed a 5 MFLOP peak rate, and which was not a vector machine. In addition, it had its own Fortran compiler which could run any fortran program which did not include I/O. It still had the limitations of having to send your data to the AP memory, convert it into the AP format, running the program, and then having to reconvert the data and load it back into the host. Unfortunately, CSPI did not have the resources to build an interface for the Apollo. They could sell us a micro vax with the 6420 attached to it and an ethernet interface to the Apollo. This would require us to use 3 operating systems (the Apollo, VAX/VMS, and the AP's system). They did do 64 bit arithmetic and were only about $120,000 though. The minimum FPS 164 configuration was more than $250,000 at the time we looked at it, so that was out of the question. The Alliant Computer Systems Corp. let us get a look at their machines prior to their product announcement and also were more than willing to let us run benchmarks on their development machines (something which CSPI in particular discouraged us from doing). They have two machines, the FX/1 which costs about $130,000 in its minimum configuration with software , and the FX/8 which runs closer to $250,000 in its minimum configuration. Both machines use the same basic hardware modules and are object code compatable same hardware modules. The FX/8 is an upgradable system where as the FX/1 has a much smaller cabinet and has no space for an upgrade. The FX machines have two sets of processors: interactive processors (IP's) and Computational element (CE's) which share a global memory and cache system. The IP's handle multiple users doing I/O, editing jobs, the Unix kernal, compiling and the like. Jobs from the timesharing queue are scheduled for the first available IP (ie. multiprocessing of independent user jobs and system processes). The IP's are Motorola 68012 based processors and each IP can have its own I/O bus, so you can spread out the disk controllers and terminal controllers onto seperate IP's to avoid bottlenecks in I/O.
DIEGERT@SANDIA-CAD.ARPA (Carl Diegert) (01/16/86)
Mail-From: DIEGERT created at 16-Jan-86 09:31:52 Date: Thu 16 Jan 86 09:31:52-MST From: Carl Diegert <DIEGERT@SANDIA-CAD.ARPA> Subject: Re: Alliant FX/1 info request To: DIEGERT@SANDIA-CAD.ARPA In-Reply-To: Message from "KROWITZ@MIT-MARIE.ARPA" of Wed 15 Jan 86 17:58:13-MST Sandia National Labs has 64 bit floating point requirements similar to yours for our semiconductor device modeling and circuit simulation needs. Alliant and FPS were players in a competitive procurement that Alliant has now won with a FX/8 with three computational elements. Perofrmance on our benchmarks was impressive, bu t then we will really know once our machine gets here. The FX/8 should be on our Apollo ring by the end of March. We will be happy oto share our experiences.. ------- -------