[comp.arch] Alliant FX/8 parallel benchmarking

iddo@nsta.UUCP (Iddo Carmon) (03/17/87)

In article <43700009@uicsrd> turner@uicsrd.CSRD.UIUC.EDU (Steve Turner)
writes:

> There is a BIG difference between parallelizing for a machine like the
> Alliant, with lots of nice loop-type synchronization built in (not to
> mention a shared memory architecture), and automatic parallelization
> for a HYPERCUBE....
> ....If the differences are not bloody obvious, mail me and I'll be more
> than happy to share the troubles I've seen in writing for the iPSC,
> compared to an FX/8 its a totally different ball game.  Fun, but tough.

As far as I know, the Alliant FX/8 machine combines three levels of
parallelism:

1.  Multiprocessing among two sets of processors: "interactive" and
    "computational".
2.  Parallel-processing by spreading DO-loops over the computational
    processors.
3.  Vector-processing in each computational processor.

Since everybody seems to be quite pleased with this approach, I'd like to
get a feel for the contribution of the second level alone (the parallel
DO-loops).

Did anyone publish measurements of the speedup resulting from this
parallelization feature on typical applications?
In other words, after you take a single-threaded task and apply
vectorization to it, how much parallelism (in terms of additional speedup)
is there left for DO-across, using the Alliant compiler and the FX/8
special h/w support?

Also, I'd like to hear about specific applications (like Spice, PDE,
non-numeric, etc.) that were probably tried on the Alliant and what's the
contribution of the parallelization on this kind of "benchmarks".
Can anyone quantify what kind of programmer's effort one should be looking
at in order to optimize significantly on the compiler-generated code?

I would think Alliant itself and some of its users should have this kind
of data.  I'll be happy to summarize and post any results I'll get.