blossom@dspo.UUCP (03/19/85)
/*This lines' for you.....*/ Our group is currently in the advanced architectural phase of a Hypercube-topology ensemble machine design. We are interested in any suggestions, correspondence, flames, etc. I will list out the primary specs for the machine below, and a good path for mail and ad hominem attacks, etc. If there is sufficient response, and I feel up to it, I will summarize the former to net.arch, and the latter to /dev/null. Machine Architecture: 1) It will be a homogeneous hypercube (i.e. all nodes will be identical, and will be interconnected via hypercube topology). 2) It will be a distributed memory machine (each node will have it's own memory). 3) It will be an order 10 hypercube (up to 1024 nodes although 16 will be built initially). Node specs: 1) 32032 full chip set (MMU FPU CPU ICU TCU). We've built this chip set into a couple of different designs, and feel comfortable with it. It should be adequate for the job we intend it to do: bookkeeping, handholding, etc. We also plan to put Genix 4.2bsd on each node to enhance the development environment. We will pare back any parts of the OS that are superfluous or tend to slow things down. (sorry folks .. no hyperhack) 2) Two AMD 29325 Floating Point Units (10.0 Mflop ea., pipelinable to 20.0 Mflop). These FPUs will have a fast track to memory in order to allow 2 32 bit operands to be moved to and 1 from the FPU section every 100nS. The FPU will have a fast local controller to sequence instructions and data. 3) 16 Mega bytes RAM, with EDAC bits and local memory scrubbing. We are planning the nodes to be 'computationally fat' so a large local memory is needed for many of the intended problems. 4) A LAN interface (probably ETHERNET, or maybe PRONET, although we have more experience with ETHERNET) used to access the cube as a Linear Array for global messages, trouble-shooting, initial- ization, etc. 5) A hard disk controller and 512 Mbyte of disk LOCAL TO THE NODE (this turns out to be extremely important for many problems) 6) Order 10 communications (10 Hyperlinks designed to pass messages through the node without disturbing local execution, and to keep up with the computational units for problems of interrest). We will probably have some form of shared-memory buffering at each node-node connection, i.e. 2kbytes at each end. Each node would then have 10 Hyperlinks each with 2kbyte of memory accessible from either end. Machine Specs: (uncorrected for Speedup, vectorization, and other degradations and scaled for a 16 node implementation) --subject to change without notice, and void where prohibited by law-- --320 Mflops (again, red-lined in an unusual situation, for arm waving only) --256 Mbytes (64 Mega 32 bit words) RAM --8 Giga bytes disc --16 Mips Integer performance (for fully concurrent operation only) For the 1024 node machine, scale the above X 64. Note that for this machine the disk and memory access times scale linearly, i.e. more nodes mean greater total memory and disk bandwidth. worst case path length (assuming optimal traversal of the cube .. something that is easy to do in hardware) goes from 4 links to 10, and *very* large problems would fit on the machine. P.S. we have a hypercube simulator that runs under bsd4.2 GENIX, ETHERNET, PRONET, AMD, and probably some I missed are trademarks of several companies who should like the extra exposure. No statement made herein should be construed to be official LANL drivel. All drivel was drove by the undersigned. -- Jim Blossom - dspo!blossom@LANL or {ucbvax!unmvax,ihnp4}!lanl!dspo!blossom Los Alamos National Laboratory - E-10/Data Systems ms-k488 po box 1663 Los Alamos, New Mexico 87545 - (505) 667-9616