[comp.sys.sun] Accelerators, array processor ?

geodel@prg.oxford.ac.uk (Delman Lee) (02/15/90)

I am interested in add-on boards (accelerator, array processor, etc..) for
Sun workstations. I have heard of 2 boards for VMEbus --- SKYbolt from SKY
(80MFLOPS), and SuperCard from CSPI (80MFLOPS/card). 

Questions:
- Anybody out there using these boards? Any comments on them? 
- Are there any add-on boards for SBus?
- Any other boards of same category?

Thanks --- Delman.

          JANET: geodel%sungeo@uk.ac.oxford.prg
  ARPA Internet: geodel%sungeo%prg.oxford.ac.uk@nsfnet-relay.ac.uk

hart@decwrl.dec.com (Howard C. Hart) (02/18/90)

In article <5021@brazos.Rice.edu> sungeo!geodel@prg.oxford.ac.uk (Delman Lee) writes:
>I am interested in add-on boards (accelerator, array processor, etc..) for
>Sun workstations. I have heard of 2 boards for VMEbus --- SKYbolt from SKY
>(80MFLOPS), and SuperCard from CSPI (80MFLOPS/card). 

Can't vouch for the Skybolt but we did buy the previous version, the SKY
Warrior. For our applications, it worked about as fast as a Sun 3/260 with
an FPA. Definitely a buyer beware situation.

>- Are there any add-on boards for SBus?

No for SKY and no for Mercury. Don't know about Supercard. Something about
insufficient power.

>- Any other boards of same category?

Mercury datasystems or something like that. Unlike SKY, they're offering a
multiprocessor option. Judging by the MFLOPS above, CSPI is probably doing
the same thing as SKY and banking on the i860 40 MHZ chip running parallel
instructions (get it? - 40 * 2 instructions = 80 MFLOPs...maybe).

One big word of warning before you buy. Benchmark using your own primary
applications. The thing that killed the old SKY board was VME bandwidth.
If you couldn't ship the instructions across the bus faster than it takes
to calculate the answers, you'll never see the performance as advertised.
The old SKY was benchmarked against FFTs which sit on the SKY processor
board and crunch away until the final answer comes out.  Thus, no I/O
bottelneck. If your applications are inherently sequential, you might be
able to transfer up to 2000 adds, multiplies or divides at a time, but it
takes too long to ship the answers back for the next iteration. Supposedly
these new boards are complete CPUs and floating point processors on one or
more boards, so VMEBus I/O is minimized.  I'd check it anyway.

Howard C. Hart                 UUCP:{sun!sunncal,pyramid}!leadsv!laic!nova!hart

carroll@beaver.cs.washington.edu (Jeff Carroll) (02/21/90)

In article <5095@brazos.Rice.edu> nova!hart@decwrl.dec.com (Howard C. Hart) writes:
>In article <5021@brazos.Rice.edu> sungeo!geodel@prg.oxford.ac.uk (Delman Lee) writes:
>>I am interested in add-on boards (accelerator, array processor, etc..) for
>>Sun workstations. I have heard of 2 boards for VMEbus --- SKYbolt from SKY
>>(80MFLOPS), and SuperCard from CSPI (80MFLOPS/card). 
>
>Can't vouch for the Skybolt but we did buy the previous version, the SKY
>Warrior. For our applications, it worked about as fast as a Sun 3/260 with
>an FPA. Definitely a buyer beware situation.

As I pointed out in a earlier post, the skybolt is a fundamentally
different concept than the sky warrior, or other array processors.  Having
a real onboard CPU (two, actually), it is capable (according to the mfr)
of running applications which "stand alone" in the sense that the host is
only needed as a target for console I/O - Sky claims that the host process
could be a naked shell.

With last-generation array processors, the high speed of the vector units
was often eaten up in the startup costs of loading data into the AP,
especially if your code didn't use long vectors.

I'll grant that the i860 has pipelined floating point units as well, and
that it's quite possible that *your* code won't run at anywhere near 80
MFLOPS. But the Skybolt has an i960 on the card doing DMA, and Sky has
apparently spent a lot of money working on parallelization and
optimization for the i860. I'll bet that it'll run most things a hell of a
lot faster than a 3/260.

This isn't really an AP any more, it's a RISC CPU on an add-in board.
Same goes for the CSPI board, though I don't know much about it.

>>- Any other boards of same category?
>
>Mercury datasystems or something like that. Unlike SKY, they're offering a
>multiprocessor option. Judging by the MFLOPS above, CSPI is probably doing
>the same thing as SKY and banking on the i860 40 MHZ chip running parallel
>instructions (get it? - 40 * 2 instructions = 80 MFLOPs...maybe).

I wasn't aware that Mercury had an i860 board out. Anybody know more about
it?

There's an outfit called Avalon that builds an 88000 board for the DEC
Q-bus. Don't know whether they have a VME version or not, but somebody
surely ought to. Once again, a RISC CPU, not an AP.

>One big word of warning before you buy. Benchmark using your own primary
>applications. The thing that killed the old SKY board was VME bandwidth.

Amen to that. One should never buy any kind of accelerator before
benchmarking it on one's own code. But one should also try to understand
one's own code well enough to know whether it's a good fit to the kind of
accelerator the friendly neighborhood AP salesman is trying to sell you. 

>If you couldn't ship the instructions across the bus faster than it takes
>to calculate the answers, you'll never see the performance as advertised.
>The old SKY was benchmarked against FFTs which sit on the SKY processor
>board and crunch away until the final answer comes out.  Thus, no I/O
>bottelneck. If your applications are inherently sequential, you might be
>able to transfer up to 2000 adds, multiplies or divides at a time, but it
>takes too long to ship the answers back for the next iteration. Supposedly
>these new boards are complete CPUs and floating point processors on one or
>more boards, so VMEBus I/O is minimized.  I'd check it anyway.

This is exactly the problem with APs in general. If you have to do a lot
of I/O, you're dead. This problem doesn't completely go away with the RISC
chip, though (in the case of the Sky board) it does help to have, in
effect, DMA. The smart guys in this market have developed independent
high-speed paths to disk, so that the CPU doesn't have to be involved in
disk I/O. CSPI (I believe) had this on their AP product line, and some of
the other AP companies are starting to offer it. (Numerix comes to mind;
of course, FPS has had it for a long time on their 164/264 series.) 

Some of these file systems turn out to be pretty kludgey, but it's better
than living with the bottleneck of passing everything thru the host CPU.

I haven't bought a Skybolt; I just talked to the guy from Sky, and I was
impressed by what I heard. The CSPI guy hasn't been around to see me
lately.

DISCLAIMER: This is not an endorsement of any company or product herein
named by either myself or The Boeing Company. 

	Jeff Carroll
	carroll@atc.boeing.com