[comp.parallel] Alliant FX/2880

neray@Alliant.COM (Phil Neray) (03/16/90)

In article <8261@hubcap.clemson.edu> boulder!foobar!grunwald@ncar.UCAR.EDU (Dirk Grunwald) writes:
>I've heard that Alliant has announced an i860 based system with up to
>28 processors. Anyone have any more information? What's the system
>architecture? FX/80-like but with higher bandwidth bus?

Glad you asked. Alliant announced the FX/2800 in January, with
shipments to begin in March. The FX/2800 consists of:

- Up to 28, 64-bit Intel i860 processors (40 MHz).
- Up to 1 GB of main memory and 4 MB of cache memory (connected to the
processors via a 1.28 GB/s crossbar switch).
- Multiple 25 MB/s VME channels with disk striping, UltraNet, etc.
- Extensively multi-threaded Concentrix operating system.
- Parallel FX/Fortran, FX/C and FX/ADA compilers.
- Real-time FX/RT executive (priority-driven, pre-emptive scheduling),
co-resident with UNIX.
- Optional tightly-coupled visualization capability (X11R3,
PHIGS/PHIGS+, visualization toolkits).
- Source-code compatible with the FX/80 Series.
- $500K to $2M price range. Entry-level is 8-processor system.

What's unique about this system? It's the first general-purpose,
shared memory, parallel supercomputer that uses standard VLSI
processors rather than a proprietary processor architecture.

It's the first truly open supercomputer because it is standard at
the processor/instruction-set level AS WELL AS at the usual UNIX
level (UNIX, NFS, NQS, compilers, etc.).

The idea is to bring the benefits of binary standards to the
supercomputer world, so that users can benefit from a much broader
applications base than has traditionally been available.

The FX/2800 is compatible with the PAX (Parallel Architecture
eXtension) ABI jointly defined by Intel and Alliant. ANY vendor who 
wants to build a binary-compatible system can go to Intel and buy the 
i860 processors, plus Alliant's concurrency control architecture and 
parallelizing compilers (which Alliant has licensed to Intel)...AND
any software vendor can produce a single binary version of his
application that runs on a variety of machines, from workstations to
parallel supercomputers like Alliant's.

Using standard processors (the i860 is a "Cray-on-a chip", as
described by John Rollwagen) combined with parallelism and high-speed
shared memory, we've built a system that is rated at 720 MFLOPS on
the 1000x1000 LINPACK (in comparison, the single-processor Cray
Y-MP/832 is rated at 308 MFLOPS, the C240 at 166 MFLOPS, and the VAX
9000/440VP at 312 MFLOPS).

Other performance metrics: over 1.12 peak GFLOPS (DP), 1148 VAX MIPS
(aggregate, based on Dhrystone V1.1) and 672 Whetstone MIPS 
(non-inlined, aggregate).

The processors in the FX/2800 can be used as parallel or
multiprocessors. Up to six parallel clusters are supported. The
scheduler automatically "breaks-up" a cluster into independent
multiprocessors if there are no parallel jobs waiting to execute, or
automatically breaks clusters up in user-defined time-slices. 

Each cluster consists of up to 14 processors controlled via
hardware-based, concurrency control instructions that are
automatically generated by the compilers. The compilers detect
opportunities for fine-grained parallelism, typically at the
loop-level. (Up to 28 processors in the cluster are supported in
certain situations, such as the 1000x1000 LINPACK). 

Explicit parallelism via compiler directives or UNIX tasking is also
supported. (Note that UNIX itself runs directly on the i860
processors in an SMP implementation. There is no "front-end".)

The i860 also has some interesting instruction-level parallelism
features. It supports "superscalar" operations (up to three
instructions per clock cycle - RISC integer/control, FP MUL and FP
ADD). This requires sophisticated instruction scheduling in the
compiler. The chip also supports pipelined floating operations, which
allows our compilers to produce code that has been optimized for both
vectorization and concurrency.

So - the FX/2800 supports parallelism at multiple levels -
instruction-level, loop-level, and task-level - in a truly open
supercomputing environment. Thank you for your support.



-- 
Phil Neray			Domain:	neray@alliant.com
Alliant Computer Systems	UUCP:	{mit-eddie|linus}!alliant!neray
Littleton, MA 01460		Phone:	(508) 486-1429

dbradley@gibson.ncsa.uiuc.edu (David Bradley) (03/17/90)

So what are the architectural differences between an FX/2800 and a
"conventional" shared memory MIMD system like an Encore or Sequent?
Based on the posting by Phil Neray, they appear to be the following:

	- Faster processors
	- Bigger memory and cache
	- Processors connected to memory via crossbar rather than
	  bus.  (Or does the crossbar connect the processors 
	  and cache?  This was ambiguous in the Neray's posting.)
	- Special "hardware-based concurrency control instructions".

So from a hardware standpoint the machine is just like a really fast Encore
or Sequent, right?  Or am I missing something?  Of course the software sounds
pretty cool, especially the cluster scheduling.
--
	David Bradley
	University of Illinois at Urbana Champaign

carroll@beaver.cs.washington.edu (Jeff Carroll) (03/19/90)

In article <8389@hubcap.clemson.edu> neray@Alliant.COM (Phil Neray) writes:
>Using standard processors (the i860 is a "Cray-on-a chip", as
>described by John Rollwagen) combined with parallelism and ...
>

Did Rollwagen really say this? I've heard plenty of people at Intel say
it - in fact, I believe I saw it in the early press releases.

If Rollwagen *did* say this, I'd be very appreciative to anyone who can
give me the publication reference.

	Jeff Carroll
	carroll@atc.boeing.com

schumach%convex@uunet.UU.NET (Richard A. Schumacher) (03/22/90)

>In article <8389@hubcap.clemson.edu> neray@Alliant.COM (Phil Neray) writes:
>>Using standard processors (the i860 is a "Cray-on-a chip", as
>>described by John Rollwagen) combined with parallelism and ...


! Please post a reference for this quote!