[net.arch] Hypercube development work at LANL

blossom@dspo.UUCP (03/19/85)
/*This lines' for you.....*/

   Our group is currently in the advanced architectural phase of a
Hypercube-topology ensemble machine design. We are interested in any
suggestions, correspondence, flames, etc. I will list out the primary
specs for the machine below, and a good path for mail and ad hominem
attacks, etc. If there is sufficient response, and I feel up to it,
I will summarize the former to net.arch, and the latter to /dev/null.

Machine Architecture:
  
   1) It will be a homogeneous hypercube (i.e. all nodes will be identical,
      and will be interconnected via hypercube topology).
 
   2) It will be a distributed memory machine (each node will have it's
      own memory).

   3) It will be an order 10 hypercube (up to 1024 nodes although 16 will
      be built initially).
    


Node specs:
    
   1) 32032 full chip set (MMU FPU CPU ICU TCU).  We've built this chip 
      set into a couple of different designs, and feel comfortable with it.
      It should be adequate for the job we intend it to do: bookkeeping,
      handholding, etc. We also plan to put Genix 4.2bsd on each node to
      enhance the development environment. We will pare back any parts
      of the OS that are superfluous or tend to slow things down.
      (sorry folks .. no hyperhack)

   2) Two AMD 29325 Floating Point Units (10.0 Mflop ea., pipelinable to 
      20.0 Mflop). These FPUs will have a fast track to memory in order
      to allow 2 32 bit operands to be moved to and 1 from the FPU section
      every 100nS. The FPU will have a fast local controller to sequence
      instructions and data.

   3) 16 Mega bytes RAM, with EDAC bits and local memory scrubbing. We
      are planning the nodes to be 'computationally fat' so a large local
      memory is needed for many of the intended problems.

   4) A LAN interface (probably ETHERNET, or maybe PRONET, although
      we have more experience with ETHERNET) used to access the cube
      as a Linear Array for global messages, trouble-shooting, initial-
      ization, etc.

   5) A hard disk controller and 512 Mbyte of disk LOCAL TO THE NODE
      (this turns out to be extremely important for many problems)
       
   6) Order 10 communications (10 Hyperlinks designed to pass messages 
      through the node without disturbing local execution, and to keep
      up with the computational units for problems of interrest).
      We will probably have some form of shared-memory buffering at
      each node-node connection, i.e. 2kbytes at each end. Each node
      would then have 10 Hyperlinks each with 2kbyte of memory accessible
      from either end.


Machine Specs: (uncorrected for Speedup, vectorization, and other degradations
                and scaled for a 16 node implementation)
  --subject to change without notice, and void where prohibited by law--

   --320 Mflops (again, red-lined in an unusual situation, for arm waving only)
    
   --256 Mbytes (64 Mega 32 bit words) RAM
    
   --8 Giga bytes disc
    
   --16 Mips Integer performance (for fully concurrent operation only)

   For the 1024 node machine, scale the above X 64.  Note that for this machine
 the disk and memory access times scale linearly, i.e. more nodes mean greater
 total memory and disk bandwidth. worst case path length (assuming optimal
 traversal of the cube .. something that is easy to do in hardware) goes from 
 4 links to 10, and *very* large problems would fit on the machine.
  
  P.S. we have a hypercube simulator that runs under bsd4.2

   GENIX, ETHERNET, PRONET, AMD, and probably some I missed are trademarks
   of several companies who should like the extra exposure. No statement
   made herein should be construed to be official LANL drivel. All drivel
   was drove by the undersigned.
    

-- 
Jim Blossom - dspo!blossom@LANL  or  {ucbvax!unmvax,ihnp4}!lanl!dspo!blossom
Los Alamos National Laboratory - E-10/Data Systems ms-k488 po box 1663
Los Alamos, New Mexico  87545 -  (505) 667-9616