[comp.parallel] Chaining on IBM 3090 VF

newton@kahuna.UUCP (Mike Newton) (01/05/89)

[ Found this on comp.arch.  Althought we're not necessarily interested
  in vector processors, no one else seems to be taking up the slack.
  Pipelining things is a form of parallelism.  Call it
  ``empire building.'' 8-)

  --- Steve
]

In article <3950@pt.cs.cmu.edu> yk@a.nl.cs.cmu.edu (Yasusi Kanada) writes:
>I read an article of IBM 3090 in 88-12 issue of Transaction of Information
>Processing (written in Japanese) recently.  In this article, the author
>(in IBM Tokyo Research Center) writes that the following instruction sequence
>is executed in CHAINED manner, so the result is generated every cycle.
>
>	VL	VR0,A(R1)
>	VA	VR0,B(R2)
>	VST	VR0,C(R3)
>
>Is that true?  Thanks in advance.
>
>-Yasusi Kanada


Once the pipeline is loaded this will give you a result every cycle.  
To overcome multiply bottlenecks in fp multiply, i believe they alternate
multipliers between a bank of 2 or 3 multipliers.

I stronly urge you to find an old copy of IBM Journal of R & D from
last year (Feb?) if you have access to an 3090 (w/ or w/o VF).  I'd
give you the exact date and more precise info above, but my copy is
about 3000 miles from here.

Some useful facts for highspeed code generation (my speciality): 'LA'
instructions are effectively executed by the instruction fetcher and
so are usually 0 clock cycles.  Also: avoid overlapping args for SS
instructions and self-modifying code like the plague.  To a good
first order aproximation program execution time = clock cycle time * 
number of instructions executed (ie: it's basically a risc :-) !! )

- mike

newton@csvax.caltech.edu		Caltech Submillimeter Observatory
(which is forwarded to)			POB 4339 / Hilo HI 96720
 cit-vax!kahuna!newton			808 935 1909