kahn@batcomputer.tn.cornell.edu (Shahin Kahn) (05/26/90)
comp.parallel is another good (better?) place to post this question. Lots of distributed-memory-experienced people there. It can be easy, it can be difficult. Probably not that easy. Debugging can be fun. But it is probably a good experience. There was a paper out of AT&T describing, among other things, a 'special-purpose' hardware with uses in biochemistry calculations. Pretty fast, I remember. It may be interesting to look at it. Shahin.
csimmons@jewel.oracle.com (Charles Simmons) (06/02/90)
In article <1990May23.172140.17510@portia.Stanford.EDU>, dhinds@portia.Stanford.EDU (David Hinds) writes: > From: dhinds@portia.Stanford.EDU (David Hinds) > Subject: Does anyone know anything about the N-Cubed Hypercube? > Date: 23 May 90 17:21:40 GMT > > > It seems that a corporate sponsor is going to give my lab access to > an "N-Cubed Hypercube" for computational biochemistry studies. I don't > know much about the machine - it is supposed to be based on Intel 432-like > processors, with 16MB of memory each, and up to 8192 CPU's. The model we > are getting will have only (!) 32. Does anyone have any experience with > these machines? It seems that it would be difficult to parallelize code to > make good use of a fully-distributed memory system like this. How do the > processors communicate? > > -David Hinds > dhinds@portia.stanford.edu David -- Seeing as how my current favorite piece of hardware is an NCube, allow me to correct a few of your small pieces of mis-information (if they haven't been corrected already). The name of the company is NCube. Although the principal designer of the processor worked for Intel back in the days of the 432, the NCube processor looks a lot like a Vax, and has nothing in common with the 432. 16 general purpose registers, each 64-bits long, can hold either an integer or floating point value. IEEE floating point. There is also a PC and SP register. User and Supervisor states. No paging. There are 4 segments. Addressing modes: register direct, register indirect, immediate, predecrement, postincrement, offset+register, offset+register indirect, indexed, direct, and a couple of others. Each chip has 14 on-board communications channels. Each channel is two wires wide -- one input and one output. 13 of these channels are used to create the hypercube -- thus the limitation on 8192 nodes. The 14th channel is used to connect to one or more I/O nodes (in a fashion that I don't really understand). The communications hardware uses wormhole routing and seems to be quite efficient. I haven't had to carefully position my processes on the 'cube to get efficient communications. I disagree that it is difficult to make use of the distributed memory. I do believe that it requires coming up with a good model for how you want to break up an algorithm. Our current machine also has 32 nodes, each with 16 MB of memory. There may be a couple of things you'ld like to pick up from us. We have ported GCC (the GNU C compiler) to act as a cross compiler from a Sun to the 'cube. Our GCC compiler is, in my opinion, about 30-50% faster than the compiler you will get from NCube. We do fun little things like pass arguments in registers. I've also got some fun Mandelbrot software that you might want to have lying around. The compute portion runs on the 'cube, using a color Sun terminal running X as its output device. -- Chuck
csimmons@jewel.oracle.com (Charles Simmons) (06/02/90)
In article <9399@pt.cs.cmu.edu>, lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: > From: lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) > Subject: Re: Does anyone know anything about the N-Cubed Hypercube? > Date: 24 May 90 18:26:57 GMT > > In article <26336@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes: > [ concerning the NCUBE-2 multiprocessor ] > >does anyone out there know what if any MMU support the processors > >have? > >I am trying to get manuals and tech docs but it is taking time, and > >I am hearing conflicting reports ranging from "no MMU" to "full MMU". > > There are no page faults. Each CPU has bounds registers and a supervisor > state, but that's it. > > This from Stephen Colley's lips when I put up my hand and asked. > -- > Don D.C.Lindsay Carnegie Mellon Computer Science Elaborating just a little bit on Don's comments... The memory model is reasonably kinky: Each user process has 5 "segments". There is a "text" segment (which is framed by the "user code base/length" registers) and there are 4 data segments (which are framed by the "user data space base/length" registers). The really kinky thing is that the processor accesses the "text segment" if you use pc-relative addressing. Otherwise a data segment is referenced. Thus, logical address zero can be either in the text segment or in data segment zero depending on whether or not you use pc-relative addressing. The operating system allows multiple user processes running on a single processor node to share one or more segments. And while we're on the subject... Things are implemented ever-so-slightly incorrectly. It turns out that you cannot dynamically grow your stack segment. Basically, the reason for this is that the stack grows down, but segments "grow" up. So you get to specify your stack segment size at program load time and then live with your choice thereafter. Ah! The joys of programming on the cutting edge of technology. -- Chuck