blossom@dspo.UUCP (03/19/85)
/*This lines' for you.....*/
Our group is currently in the advanced architectural phase of a
Hypercube-topology ensemble machine design. We are interested in any
suggestions, correspondence, flames, etc. I will list out the primary
specs for the machine below, and a good path for mail and ad hominem
attacks, etc. If there is sufficient response, and I feel up to it,
I will summarize the former to net.arch, and the latter to /dev/null.
Machine Architecture:
1) It will be a homogeneous hypercube (i.e. all nodes will be identical,
and will be interconnected via hypercube topology).
2) It will be a distributed memory machine (each node will have it's
own memory).
3) It will be an order 10 hypercube (up to 1024 nodes although 16 will
be built initially).
Node specs:
1) 32032 full chip set (MMU FPU CPU ICU TCU). We've built this chip
set into a couple of different designs, and feel comfortable with it.
It should be adequate for the job we intend it to do: bookkeeping,
handholding, etc. We also plan to put Genix 4.2bsd on each node to
enhance the development environment. We will pare back any parts
of the OS that are superfluous or tend to slow things down.
(sorry folks .. no hyperhack)
2) Two AMD 29325 Floating Point Units (10.0 Mflop ea., pipelinable to
20.0 Mflop). These FPUs will have a fast track to memory in order
to allow 2 32 bit operands to be moved to and 1 from the FPU section
every 100nS. The FPU will have a fast local controller to sequence
instructions and data.
3) 16 Mega bytes RAM, with EDAC bits and local memory scrubbing. We
are planning the nodes to be 'computationally fat' so a large local
memory is needed for many of the intended problems.
4) A LAN interface (probably ETHERNET, or maybe PRONET, although
we have more experience with ETHERNET) used to access the cube
as a Linear Array for global messages, trouble-shooting, initial-
ization, etc.
5) A hard disk controller and 512 Mbyte of disk LOCAL TO THE NODE
(this turns out to be extremely important for many problems)
6) Order 10 communications (10 Hyperlinks designed to pass messages
through the node without disturbing local execution, and to keep
up with the computational units for problems of interrest).
We will probably have some form of shared-memory buffering at
each node-node connection, i.e. 2kbytes at each end. Each node
would then have 10 Hyperlinks each with 2kbyte of memory accessible
from either end.
Machine Specs: (uncorrected for Speedup, vectorization, and other degradations
and scaled for a 16 node implementation)
--subject to change without notice, and void where prohibited by law--
--320 Mflops (again, red-lined in an unusual situation, for arm waving only)
--256 Mbytes (64 Mega 32 bit words) RAM
--8 Giga bytes disc
--16 Mips Integer performance (for fully concurrent operation only)
For the 1024 node machine, scale the above X 64. Note that for this machine
the disk and memory access times scale linearly, i.e. more nodes mean greater
total memory and disk bandwidth. worst case path length (assuming optimal
traversal of the cube .. something that is easy to do in hardware) goes from
4 links to 10, and *very* large problems would fit on the machine.
P.S. we have a hypercube simulator that runs under bsd4.2
GENIX, ETHERNET, PRONET, AMD, and probably some I missed are trademarks
of several companies who should like the extra exposure. No statement
made herein should be construed to be official LANL drivel. All drivel
was drove by the undersigned.
--
Jim Blossom - dspo!blossom@LANL or {ucbvax!unmvax,ihnp4}!lanl!dspo!blossom
Los Alamos National Laboratory - E-10/Data Systems ms-k488 po box 1663
Los Alamos, New Mexico 87545 - (505) 667-9616