mo@seismo.CSS.GOV (Mike O'Dell) (07/24/87)
In a former lifetime I supercomputed to keep myself in beans and remember two interesting things people might find interesting. Based on the parallelizing compiler work done by MCC (Massechusetts Computer Consultants) on IVTRAN for Illiac IV, John Levesque and friends at RDA built "RDALIB" for the CDC 7600. THis library basically provided many of the facilities now in the CRAY machines, but in some ways more interesting. The automatic parallelizer was experimented with to try and use RDALIB, but the best results were obtained after further hand-tuning. Anyway, the secret of RDALIB was the "instruction stack" - what we now would call a "prefetch cache" implemented in very fast memory. If you could contain the loop in the instruction stack, the ol' 7600 could really get with it - memory hits for instruction fetches really slowed it down by over a factor of 2. Between the incredibly tense code which kept the RDALIB primitives in the i-stack, and the work on the codes, the ol' 7600 was an unbelivably fast machine, considering it was designed in the late-middle 1960's. If I remember right (considerable fog...) it got well into the 40 megaflops sustained, measured over 8 hours clock time, the usual stint when you had "block time" on such machines. And because of tricks like BUFFERIN and BUFFEROUT, clocktime essentially equalled cputime. In fact, when the CRAY-1 was first introduced, it took considerable work to realize its potential and the 7600 did not abdicate its megaflop crown readily. Another vector machine which was, to quote the designer of the Startrek M5, "was not completely sucessful" was the TI-ASC. It was a multipipeline beast - one to four pipes. The machine was largely compatible with the System 360 instruction set except for the vector pipe additions. IN fact, they committed the unpardonable sin of reinventing OS/360 for the ASC, only slightly different JCL syntax. UGH!!!! Anyway, the problem with the machine and most of the pre-CRAY vector machines (the STAR*100 in particular and somewhat remaining in the Cyber 205) was the pipeline startup overhead. If the vectors weren't about 100 elements long (the number varied between 50 and 100 depending on what you were doing in the pipe), starting the pipeline actually SLOWED DOWN THE PROGRAM!!! The vectorizer was again based on the MCC IVTRAN work and was quite good at vectorizing DO loops, but because it usually pessimized the code, the code had to be liberally laced with $NOVECTORIZE directives. Finally they added a flag to make vectorizing default to OFF and then you could put in only a few $VECTORIZE directives. Anyway, the machine never achieved anything like its advertised speeds. There may still be one ASC still running, but there never were very many built. Yours for faster machines, -Mike O'Dell
lamaster@pioneer.arpa (Hugh LaMaster) (07/27/87)
In article <44042@beno.seismo.CSS.GOV> mo@seismo.CSS.GOV (Mike O'Dell) writes: (Discussion of CDC 7600 deleted): >late-middle 1960's. If I remember right (considerable >fog...) it got well into the 40 megaflops sustained, >measured over 8 hours clock time, the usual stint You are right that the 7600 was the fastest for a long time. But not 40MFLOPS. Even with perfect overlap on a "vector problem" the maximum rate was about 6MFLOPS. The fastest sustained rate on an appropriate problem was between 3 and 4 MFLOPS. Hugh LaMaster, m/s 233-9, UUCP {seismo,topaz,lll-crg,ucbvax}! NASA Ames Research Center ames!pioneer!lamaster Moffett Field, CA 94035 ARPA lamaster@ames-pioneer.arpa Phone: (415)694-6117 ARPA lamaster@pioneer.arc.nasa.gov "IBM will have it soon" (Disclaimer: "All opinions solely the author's responsibilityrqC
root@mfci.UUCP (SuperUser) (03/30/88)
In article <6630@ames.arpa> lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes: ....[Fortran vs. C stuff deleted].... > >Again, the payoff on a vector machine can be much more than a factor of >two. True, it doesn't matter how fast most code runs. But it does >matter how fast certain very important (to you, the user) codes run. >And now that vector machines are reaching the market of below $100 K >machines, you will have an opportunity to see which codes will run >much faster. You may be pleasantly surprised. Or then again, if what you bought was a vector machine, you may come to agree with Olaf Lubeck of Los Alamos. In his recent "Supercomputer Performance: The Theory, Practice, and Results" tech report, LA-11204-MS, he concludes with the following: "Today both laboratories [Los Alamos and LLLabs] are running vector processors at 70 percent average vectorization levels. Because of Amdahl's law, the overall performance difference (at this vectorization level) between a low-speed vector unit and one an order of magnitude faster is negligible. We must have vectorization levels at the 95% level to see a significant difference and after ten years of fector experience, this level has simply not been attained....The fact is that it is difficult to organize computations so that *exactly* the same instruction operates on multiple elements of a data set..." "Faced with this sobering assessment of where we stand after 10 years of vectorization, I begin to wonder whether vectorization has a substantial place in future supercomputer architectures. The major reason that these architectures prosper today is that they have a significant scalar speed advantage over contemporary mainframes. I believe that future supercomputers are not necessarily bound to vector processing...." You don't need a vector machine to speed up vectorizable code. Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090 <my opinions only>