[comp.arch] Vector machines

mo@seismo.CSS.GOV (Mike O'Dell) (07/24/87)

In a former lifetime I supercomputed to keep myself in
beans and remember two interesting things people might
find interesting.

Based on the parallelizing compiler work done by MCC
(Massechusetts Computer Consultants) on IVTRAN for
Illiac IV, John Levesque and friends at RDA built
"RDALIB" for the CDC 7600.  THis library basically
provided many of the facilities now in the CRAY
machines, but in some ways more interesting. The
automatic parallelizer was experimented with to 
try and use RDALIB, but the best results were obtained
after further hand-tuning.  Anyway, the secret of
RDALIB was the "instruction stack" - what we now
would call a "prefetch cache" implemented in
very fast memory.  If you could contain the loop
in the instruction stack, the ol' 7600 could really
get with it - memory hits for instruction fetches
really slowed it down by over a factor of 2.
Between the incredibly tense code which kept the
RDALIB primitives in the i-stack, and the
work on the codes, the ol' 7600 was an unbelivably
fast machine, considering it was designed in the 
late-middle 1960's.  If I remember right (considerable
fog...) it got well into the 40 megaflops sustained,
measured over  8 hours clock time, the usual stint
when you had "block time" on such machines.
And because of tricks like BUFFERIN and BUFFEROUT,
clocktime essentially equalled cputime.
In fact, when the CRAY-1 was first introduced, it
took considerable work to realize its potential
and the 7600 did not abdicate its megaflop crown 
readily.

Another vector machine which was, to quote 
the designer of the Startrek M5, "was not 
completely sucessful" was the TI-ASC.
It was a multipipeline beast - one to four
pipes.  The machine was largely compatible with
the System 360 instruction set except for
the vector pipe additions.  IN fact, they
committed the unpardonable sin of reinventing
OS/360 for the ASC, only slightly different
JCL syntax.  UGH!!!!  Anyway, the problem 
with the machine and most of the pre-CRAY
vector machines (the STAR*100 in particular
and somewhat remaining in the Cyber 205)
was the pipeline startup overhead.  If the
vectors weren't about 100 elements long
(the number varied between 50 and 100 depending
on what you were doing in the pipe), starting
the pipeline actually SLOWED DOWN THE PROGRAM!!!
The vectorizer was again based on the MCC IVTRAN
work and was quite good at vectorizing DO loops, 
but because it usually pessimized the code,
the code had to be liberally laced with
$NOVECTORIZE directives.  Finally they
added a flag to make vectorizing default
to OFF and then you could put in only a few
$VECTORIZE directives.  Anyway, the
machine never achieved anything like its
advertised speeds.  There may still be one
ASC still running, but there never were
very many built.

	Yours for faster machines,
	-Mike O'Dell

lamaster@pioneer.arpa (Hugh LaMaster) (07/27/87)

In article <44042@beno.seismo.CSS.GOV> mo@seismo.CSS.GOV (Mike O'Dell) writes:

(Discussion of CDC 7600 deleted):

>late-middle 1960's.  If I remember right (considerable
>fog...) it got well into the 40 megaflops sustained,
>measured over  8 hours clock time, the usual stint

You are right that the 7600 was the fastest for a long time.  But not 40MFLOPS.
Even with perfect overlap on a "vector problem" the maximum rate was about
6MFLOPS.  The fastest sustained rate on an appropriate problem was between 3
and 4 MFLOPS.

  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

                 "IBM will have it soon"

(Disclaimer: "All opinions solely the author's responsibilityrqC

root@mfci.UUCP (SuperUser) (03/30/88)

In article <6630@ames.arpa> lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) writes:
....[Fortran vs. C stuff deleted]....
>
>Again, the payoff on a vector machine can be much more than a factor of
>two.  True, it doesn't matter how fast most code runs.  But it does
>matter how fast certain very important (to you, the user) codes run.
>And now that vector machines are reaching the market of below $100 K
>machines, you will have an opportunity to see which codes will run
>much faster.  You may be pleasantly surprised.

Or then again, if what you bought was a vector machine, you may come to
agree with Olaf Lubeck of Los Alamos.  In his recent "Supercomputer
Performance:  The Theory, Practice, and Results" tech report, LA-11204-MS,
he concludes with the following:

"Today both laboratories [Los Alamos and LLLabs] are running vector
processors at 70 percent average vectorization levels.  Because of
Amdahl's law, the overall performance difference (at this
vectorization level) between a low-speed vector unit and one an order
of magnitude faster is negligible.  We must have vectorization levels
at the 95% level to see a significant difference and after ten years
of fector experience, this level has simply not been attained....The
fact is that it is difficult to organize computations so that
*exactly* the same instruction operates on multiple elements of a
data set..."

"Faced with this sobering assessment of where we stand after 10 years of
vectorization, I begin to wonder whether vectorization has a substantial
place in future supercomputer architectures.  The major reason that
these architectures prosper today is that they have a significant
scalar speed advantage over contemporary mainframes.  I believe that
future supercomputers are not necessarily bound to vector
processing...."

You don't need a vector machine to speed up vectorizable code.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090  <my opinions only>