[comp.arch] Does anyone know anything about the N-Cubed Hypercube?

kahn@batcomputer.tn.cornell.edu (Shahin Kahn) (05/26/90)

comp.parallel is another good (better?) place to post this question.
Lots of distributed-memory-experienced people there.

It can be easy, it can be difficult.  Probably not that easy.
Debugging can be fun.  
But it is probably a good experience.
There was a paper out of AT&T describing, among other things, 
a 'special-purpose' hardware with uses in biochemistry calculations.  
Pretty fast, I remember.
It may be interesting to look at it.

Shahin.

csimmons@jewel.oracle.com (Charles Simmons) (06/02/90)

In article <1990May23.172140.17510@portia.Stanford.EDU>,
dhinds@portia.Stanford.EDU (David Hinds) writes:
> From: dhinds@portia.Stanford.EDU (David Hinds)
> Subject: Does anyone know anything about the N-Cubed Hypercube?
> Date: 23 May 90 17:21:40 GMT
> 
> 
>     It seems that a corporate sponsor is going to give my lab access to
> an "N-Cubed Hypercube" for computational biochemistry studies.  I don't
> know much about the machine  - it is supposed to be based on Intel 432-like
> processors, with 16MB of memory each, and up to 8192 CPU's.  The model we
> are getting will have only (!) 32.  Does anyone have any experience with
> these machines?  It seems that it would be difficult to parallelize code to
> make good use of a fully-distributed memory system like this.  How do the
> processors communicate?
> 
>  -David Hinds
>   dhinds@portia.stanford.edu

David --

Seeing as how my current favorite piece of hardware is an NCube,
allow me to correct a few of your small pieces of mis-information
(if they haven't been corrected already).

The name of the company is NCube.  Although the principal designer
of the processor worked for Intel back in the days of the 432, the
NCube processor looks a lot like a Vax, and has nothing in common
with the 432.

16 general purpose registers, each 64-bits long, can hold either
an integer or floating point value.  IEEE floating point.  There
is also a PC and SP register.  User and Supervisor states.  No paging.
There are 4 segments.
Addressing modes:  register direct, register indirect, immediate,
	predecrement, postincrement, offset+register, offset+register indirect,
	indexed, direct, and a couple of others.

Each chip has 14 on-board communications channels.  Each channel is
two wires wide -- one input and one output.  13 of these channels
are used to create the hypercube -- thus the limitation on 8192 nodes.
The 14th channel is used to connect to one or more I/O nodes (in a
fashion that I don't really understand).  The communications hardware
uses wormhole routing and seems to be quite efficient.  I haven't had
to carefully position my processes on the 'cube to get efficient
communications.

I disagree that it is difficult to make use of the distributed memory.
I do believe that it requires coming up with a good model for how you
want to break up an algorithm.

Our current machine also has 32 nodes, each with 16 MB of memory.  There
may be a couple of things you'ld like to pick up from us.  We have ported
GCC (the GNU C compiler) to act as a cross compiler from a Sun to the 'cube.
Our GCC compiler is, in my opinion, about 30-50% faster than the compiler
you will get from NCube.  We do fun little things like pass arguments
in registers.  I've also got some fun Mandelbrot software that you might
want to have lying around.  The compute portion runs on the 'cube, using
a color Sun terminal running X as its output device.

-- Chuck

csimmons@jewel.oracle.com (Charles Simmons) (06/02/90)

In article <9399@pt.cs.cmu.edu>, lindsay@MATHOM.GANDALF.CS.CMU.EDU
(Donald Lindsay) writes:
> From: lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay)
> Subject: Re: Does anyone know anything about the N-Cubed Hypercube?
> Date: 24 May 90 18:26:57 GMT
> 
> In article <26336@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes:
> 	[ concerning the NCUBE-2 multiprocessor ]
> >does anyone out there know what if any MMU support the processors 
> >have? 
> >I am trying to get manuals and tech docs but it is taking time, and 
> >I am hearing conflicting reports ranging from "no MMU" to "full MMU".
> 
> There are no page faults. Each CPU has bounds registers and a supervisor
> state, but that's it. 
> 
> This from Stephen Colley's lips when I put up my hand and asked.
> -- 
> Don		D.C.Lindsay 	Carnegie Mellon Computer Science

Elaborating just a little bit on Don's comments...

The memory model is reasonably kinky:

Each user process has 5 "segments".  There is a "text" segment
(which is framed by the "user code base/length" registers) and
there are 4 data segments (which are framed by the "user data space
base/length" registers).  The really kinky thing is that the processor
accesses the "text segment" if you use pc-relative addressing.  Otherwise
a data segment is referenced.  Thus, logical address zero can be either
in the text segment or in data segment zero depending on whether or
not you use pc-relative addressing.

The operating system allows multiple user processes running on a single
processor node to share one or more segments.

And while we're on the subject...  Things are implemented ever-so-slightly
incorrectly.  It turns out that you cannot dynamically grow your stack
segment.  Basically, the reason for this is that the stack grows down,
but segments "grow" up.  So you get to specify your stack segment size
at program load time and then live with your choice thereafter.

Ah!  The joys of programming on the cutting edge of technology.

-- Chuck