[comp.sys.transputer] Re 64K Node Machine

HALLAM@physics.oxford.ac.uk ("Phillip M. Hallam-Baker") (03/16/90)

At Southampton there is a big(ish) array of 1260 T212's without external
RAM (called `Deep Thought' as each board has 42 nodes...). There was talk
of upgrading the RAM to 64K per node at one time maybe someone garbbled the
message somwhere? - either from that machine or a similar T2 engine. Quite
what can be done with a 64K NODE machine I don't know - surely the link
speed would start to be a problem? If not that how about the physical cooling
/mounting engineering type problems? Where to get the 256 Gigabytes of RAM to
make it worth while? Sounds like a fun project! - Anybody out there want to
write me a blank cheque to build one?

			Phillip Hallam-Baker


				Oxford University Nuclear Physics
				ZEUS group 

"You don't have to write in FORTRAN to work here - but it helps."

zenith-steven@cs.yale.edu (Steven Ericsson Zenith) (03/16/90)

In article <1828.9003152120@prg.oxford.ac.uk>,
HALLAM@physics.oxford.ac.uk ("Phillip M. Hallam-Baker") writes:
> 
> At Southampton there is a big(ish) array of 1260 T212's without external
> RAM (called `Deep Thought' as each board has 42 nodes...). There was talk
> of upgrading the RAM to 64K per node at one time maybe someone garbbled the
> message somwhere? - either from that machine or a similar T2 engine.

AN interesting machine, born of a fortuitous error. 
Someone bonded a large number of devices (either 90 
or 180 deg rotation) incorrectly in their packages. 
Guy Harriman had special boards made to accomadate 
the error.

> Quite
> what can be done with a 64K NODE machine I don't know - surely the link
> speed would start to be a problem? If not that how about the physical cooling
> /mounting engineering type problems? Where to get the 256 Gigabytes of RAM to
> make it worth while? Sounds like a fun project! 

Now here's an interesting question. What would the 
characteristics of such a machine be? Remember, there
is only 2kbytes per node, fixed configuration. 
I figure the only useful way to program such a machine
would be of the "load problem, compute, unload solution"
variety, perhaps with an exchange with nearest nieghbour
in there somewhere. But then I'm sure for most problems 
2K is just not going to be enough. Anyone know better?

>- Anybody out there want to
> write me a blank cheque to build one?

Make that cheque out to the two of us!

> 
> 			Phillip Hallam-Baker
> 
> 
> 				Oxford University Nuclear Physics
> 				ZEUS group 
> 
> "You don't have to write in FORTRAN to work here - but it helps."

"You don't have to write Occam to work here - and they'd rather you
didn't." sez.
--
.                                                                        .
Steven Ericsson Zenith              *            email: zenith@cs.yale.edu
Department of Computer Science      |            voice: (203) 432 1278
Yale University 51 Prospect Street New Haven CT 06520 USA.
     "All can know beauty as beauty only because there is ugliness"

roger@wraxall.inmos.co.uk (Roger Shepherd) (03/20/90)

In article <19312@cs.yale.edu> zenith-steven@cs.yale.edu (Steven
Ericsson Zenith) writes:

>In article <1828.9003152120@prg.oxford.ac.uk>,
>HALLAM@physics.oxford.ac.uk ("Phillip M. Hallam-Baker") writes:
>> 
>> At Southampton there is a big(ish) array of 1260 T212's without external
>> RAM (called `Deep Thought' as each board has 42 nodes...).....
>
>> what can be done with a 64K NODE machine I don't know - surely the link
>> speed would start to be a problem? If not that how about the physical cooling
>> /mounting engineering type problems? Where to get the 256 Gigabytes of RAM to
>> make it worth while? Sounds like a fun project! 
>
> Now here's an interesting question. What would the  characteristics of
> such a machine be? Remember, there is only 2kbytes per node, fixed
> configuration.  I figure the only useful way to program such a machine
> would be of the "load problem, compute, unload solution" variety,
> perhaps with an exchange with nearest nieghbour in there somewhere. But
> then I'm sure for most problems  2K is just not going to be enough.
> Anyone know better?

This is a very interesting questoion that Steve has raised. The first
thing  that I would note about such a machine is that 2k really is very
little store. This much store very rapidly gets filled up with program.
 When the 1260 transputer node machine was residing at one of our
staffers, Graham Cramp,  programmed the machine to do primality testing
of Mercenne numbers. This  proved to be a difficult problem due to the
lack of memory. Programs had to be written so as to minimise code size
- roll up all your loops - try and build programs so as to deal with
the  general case even if it is faster to separate out the problem into
disjoint special cases. From my experience working with Graham on this
problem I would suggest the following (in addition to SEZ's suggestion)
as plausible ideas.

1) Data base retreaval. The machine has a reasonable amount of memory
(128 Mbyte). Give half of this to program (generous) this leaves 64
Mbyte. The search problem is compiled, either into t-code, or into
something which will be interpreted by a (small) program sitting on
each processor. This should work pretty well. The balence between
search time and problem distribution seems to be reasonable; the
processors can be organised into a 16 level binary treee so a problem
needs to be distributed to only 16 nodes (say 0.2 mS?). The search time for
a trivial linear search of 1Kbytes is about 1mS, and you need to get the
answer back (0.2 mS). As you make the lookup more complex the communication
cost should become proportionately smaller.

2) Functional decomposition. In one application run on the Southampton 
machine they have used 2 processors as a compute node, this allows them
to run larger programs. I suspect that for some applications this would
work very well. 

When considering machines like this you really do need to rethink your
tradeoffs. For example, in an even more exotic case, we looked at using
silicon compilation to build a machine for generating (small, less than 
16-bit) prime numbers. We had an architecture which used a number of very
simple processor which could perform division and which were handed out
work by a controlling processor. The controller passed out odd numbers
which were then tested for primality by the workers. The question is 
``Should you store previously computed primes so that the workers divide
candidates only by primes rather than by all odd numbers?''.  For small
primes the answer is NO; the density of the primes in the odd numbers
is quite large for numbers of less than 16-bits; this has two implications,
firstly, there is not that much computation saved by dividing only by 
prime numbers, and secondly, you need a large RAM to store those primes. It
turns out that it is much more efficient to use that silicon area to 
build worker processors than to build a RAM. Of course, not many people
see that sort of ecconomics!

Roger Shepherd, INMOS Ltd   JANET:    roger@uk.co.inmos 
1000 Aztec West             UUCP:     ukc!inmos!roger or uunet!inmos-c!roger
Almondsbury                 INTERNET: roger@inmos.com
+44 454 616616              ROW:      roger@inmos.com OR roger@inmos.co.uk