[comp.arch] Transputer based systems.

rsexton@uceng.UC.EDU (robert sexton) (09/20/88)

being a fan of parallel system and their advantages, I was wondering why
the transputer has not gotten off the ground as a viable system.  It seems
pretty feasable, as well as very cost-effective.  I imagine a machine with
several transputers, each running unix.  When the machine is lightly loaded,
every user gets a processor, maybe more, when its heavily loaded, the users 
have to share processors.  Admittedly, there are obstacles in the areas
of shared memory, shared storage, and general parallelization.  The first
two are pretty simple to defeat, but the third seems to show no signs of
going away.  It seems however, that by mapping tasks onto processors, we
could get a pretty flexible system right now.  When you run out of power,
you can just add more processors.  A system with 64 transputers could
could theoretically provide 16 times the floating point performance
of a VAX 8650, for approximately $64000.  Admittedly these ponderings
are largely wishful thinking, but the price/performance could be incredible.
natural applications would be ray tracing, fluid flow, etc.
Thanks in advance for your input.


Robert Sexton, University of Cincinnati
rsexton@uceng.uc.edu tut.cis.ohio-state.edu!uccba!uceng!rsexton
Box Full O' Transputers... The Breakfast with MIPS
I do not speak for UC, They don't speak for me.

fpst@hubcap.UUCP (Steve Stevenson) (09/21/88)

From article <253@uceng.UC.EDU>, by rsexton@uceng.UC.EDU (robert sexton):
>....transputers...

The T-series hypercube from FPS is transputer based.  The glitch there
was religion: Occam or nothing.  INMOS was, until recently, adamant
about it.  Our T and Levco system (hung off a MAC) are research systems
here at Clemson.  Contact Dan Warner: (warner@hubcap.clemson.edu)
-- 
Steve Stevenson                            fpst@hubcap.clemson.edu
(aka D. E. Stevenson),                     fpst@prism.clemson.csnet
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell

dre%ember@Sun.COM (David Emberson) (09/22/88)

In my former life in hypercube-land I evaluated the Transputer (the older
one, T414) and Occam.  The Transputer isn't a bad machine, except that it has
no memory management, no protection of any kind, and no supervisor mode.  This
is probably why you don't see Transputer systems running Unix.  Occam and its
associated editor and tools are totally unusable.  Among other things, the
language had no pointers, interprocessor communication was point-to-point, the
language had white space dependencies (!), and other sins too numerous to list.
Every time we would complain, the poor local technical rep from Inmos would
say something about the great new version just on the horizon.  One time, one
of the big honchos from England (I forget the name but he was one of the top
architects) came through on a U.S. tour.  I spent three hours arguing with
him about giving me the details of the assembly language--which at that time
they did not want to make public.  He made a remark that he did not understand
why "we Americans" were so interested in the machine dependent details.  "In
Europe, no one asks us these questions, and they are satisfied with Occam."
I finally said something like, "If I don't know how the chip works, I sure as
hell am not going to design it into my machine."  End of conversation.

Inmos finally did come around and publish an assembler manual, and a couple of
companies are making Transputer-based machines.  One company makes a four-cpu
board that plugs into the Sun backplane.  I think you can have up to 16 cpus.
I don't know about the programming environment, but I think they have C.
Sorry about not having the name, but I seem to have misplaced the literature.
It's probably buried under heaps of much more interesting Sparc stuff... :-)

			Dave Emberson (dre@sun.com)

bs@linus.UUCP (Robert D. Silverman) (09/22/88)

In article <253@uceng.UC.EDU> rsexton@uceng.UC.EDU (robert sexton) writes:
>being a fan of parallel system and their advantages, I was wondering why
>the transputer has not gotten off the ground as a viable system.  It seems
>pretty feasable, as well as very cost-effective.  I imagine a machine with
>several transputers, each running unix.  When the machine is lightly loaded,
 
 
stuff deleted.

We have just been though a major decision process where we chose a parallel
computer. We discarded the transputer for several reasons:

(1) SLOW communication, relative to the IPSC/2 and AMETEK

(2) Lack of software; e.g. good debugging tools, compilers, etc.

(3) Too heavy a dependence on OCCAM

(4) Speed. The IPSC/2 and AMETEK have faster processors and allow for
MERCURY type floating point vector boards as nodes

(5) Uncertainty as to whether the transputer will last as a viable product.

(6) Lack of third party software.

These are just a few of the reasons.

Bob Silverman

pase@ogccse.ogc.edu (Douglas M. Pase) (09/23/88)

Actually, the Transputer has found its way into several commercial products.
I understand it is especially popular in Europe.  Meiko (?) makes a computing
surface built from transputers which does certain modeling and graphics
applications very well, and at low cost.  The FPS T-Series (one tremendous
Mega Flop) is (was) based on the transputer.  Cogent Research also has a
wonderful machine which uses multiple transputers.  The Transputer's on-chip
floating point circuitry and 4Kbyte memory (yes, also on chip) means it can
really scream for the right applications.

It will probably be a while before the Transputer is as ubiquitous as the
80x86 or the Motorola x80y0, but it's not doing poorly.
-- 
Douglas M. Pase				Department of Computer Science
tektronix!ogccse!pase			Oregon Graduate Center
pase@cse.ogc.edu (CSNet)		19600 NW Von Neumann Dr.
(503) 690-1121 x7303			Beaverton, OR  97006-1999

aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/23/88)

----------
These are the NSIEVE (Sieve Of Eratosthenes) results I have at this time.
I have also updated NSIEVE.c.  Added 'free(ptr)' to the SIEVE() routine.
The program was not freeing allocated memory previously.  Added error
checks based on the number of primes found for each array size.  Program
will not bomb if 'malloc()' returns null pointer.  Also added timer
routine for Microsoft C.  I didn't change the Unix timing routines as
I think it is probably better to have the user confirm/input the right
'HZ' values and this is usually in the 'times()' documentation file. Also
while <sys/param.h> should contain the right 'HZ' or 'COUNTS' values this
may not always be the case (neither HZ or COUNTS were defined in our 
system so I had to input it anyway).  Sorry about the 'Primes/sec' output
but some people seem to prefer this over just the RunTime output.  So
anyway there is a 'Primes/sec' output now (calculated as 
Primes/sec = 1899 / ( Average RunTime(sec) ) ). I'll repost NSIEVE week. 

NSIEVE (Scaled to 10 Iterations):
Array Size   --------------------RunTime(sec)----------------------------
 (Bytes)       1         2         3         4         5           6
             Amdahl    Amdahl    McCray     MIPS     McCray    Sun 3/280
	      5890    5890-300E Amd 29000   R2000   AMD 29000    68020
	      (gcc)     (cc)     BTC ON     M/120    BTC OFF      (cc)

    8191      0.033     0.050     0.116     0.130     0.183       0.267
   10000      0.050     0.083     0.150     0.150     0.200       0.300
   20000      0.117     0.133     0.300     0.320     0.450       0.650
   40000      0.200     0.300     0.616     0.630     0.900       1.333
   80000      0.483     0.683     1.233     1.270     1.816       2.917
  160000      1.200     1.533     2.633     2.580     3.833       7.833
  320000      2.583     3.333     5.300     5.570     7.680      17.600

  Average RunTime With Respect to the 8191 size array:
	      0.049     0.067     0.126     0.131     0.185       0.315
  Primes/sec:
	      38755     28343     15071     14496     10265        6029



Array Size ----------------------RunTime(sec)------------------------------
 (Bytes)       7              8          9           10          11
            VAX 8600     Turbo-Amiga    Amiga       Z-248       Z-248
	   (12.5 MHz)    (14.32 MHz)  (7.16 MHz)  (8.00 MHz)  (8.00 MHz)
			    68020       68000       80286       80286
                                                   (small)     (huge)
    8191      0.267         0.480       2.297       4.830       5.660
   10000      0.383         0.582       2.801       5.930       6.970
   20000      0.800         1.180       5.699      12.030      14.170
   40000      1.767         2.359      11.539      24.380      28.670
   80000      3.800         4.820      23.340      ------      ------
  160000      8.167         9.726      47.180      ------      ------
  320000     17.733        19.660      95.262      ------      ------

  Average RunTime With Respect to the 8191 size Array:
	      0.362         0.489       2.362       4.902       5.761
  Primes/sec:
	       5245          3883         804         387         330


 (1) Amdahl 5890, Using GCC (compiled with 'gcc -S -O -DUNIX nsieve.c').
     From Chuck Simmons at Amdahl,  Sunnyvale CA.

 (2) Amdahl 5890-300E, SYS V Unix, cc -O nsieve.c
     From Chuck Simmons at Amdahl,  Sunnyvale CA.
 
 (3) AMD 29000 at 25 MHz.  Branch Target Cache (BTC) was ON.  Metaware 
     High C 29000 V2.1 with -O option. No effective memory wait states.
     Memory was all physical (i.e., No cacheing).
     From Trevor Marshall, BIX 'supermicros/bench #925', 07 Sep 1988.

 (4) MIPS R2000 in M/120, 16.7 MHz, 128K Cache, low-latency memory system.
     From John Mashey at MIPS, Sunnyvale CA.

 (5) AMD 29000 at 25 MHz.  Branch Target Cache (BTC) was OFF.  Metaware
     High C 29000 V2.1 with -O option. No effective memory wait states.
     Memory was all physical (i.e., No Cacheing).

 (6) SUN 3/280, 68020 at 25 MHz.  Compiled with 'cc -O nsieve.c'.  The 
     ICache was ON.

 (7) VAX 8600, 12.5 MHz.  Compiled with 'cc -O nsieve.c'.

 (8) Amiga with 68020 at 14.32 MHz, 32-bit memory at 14.32 MHz.  Compiled 
     with Manx Aztec C V3.4B using 'cc +2 +L +ff nsieve.c'.  The ICache
     was ON.

 (9) Amiga with 68000 at  7.16 MHz, 16-bit memory at  7.16 MHz.  Compiled
     with Manx Aztec C V3.4B using 'cc +L +ff nsieve.c'.

(10) Zenith Z-248, 80286 at 8.00 MHz.  Turbo C with 'small' option set.
     Compiled for 'speed'.  Used Registers, register optimization, and
     jump optimization.

(11) Zenith Z-248, 80286 at 8.00 MHz.  Turbo C V1.0 'huge' option set.
     Compiled for 'speed', used registers, register optimization, and jump
     optimization.

Al Aburto.
aburto@marlin.nosc.mil.UUCP
'ala' on BIX

hankd@pur-ee.UUCP (Hank Dietz) (09/23/88)

David Emberson's comments sum it up nicely, however, as someone who has seen
much of the updated Transputer stuff, I feel obliged to add a few quick
comments:

The newer Occam isn't compatible with the old one...  and the compiler still
lives in its own little world, which isn't very pleasant if you are used to
something else (like a unix environment and editors like emacs).  Inmos and
the Transputer-using world are still encouraging Occam as THE language, with
other languages compiling into it (I don't know if that's what the C compiler
does...  I've never managed to find a copy of it).  As for code quality,
well, I've seen no indication that Occam is doing anything particularly
interesting or clever (I'm an optimizing/parallelizing compiler person :-).

As before, an occam program is not a complete program unless accompanied by
a description of the physical connection pattern; routing isn't
point-to-point, but rather physical-neighbor point-to-point.  There is no
standard way to alter the physical connection pattern.  I've talked with a
few folks from Inmos about us (Purdue EE) doing a software-implemented
(interrupt driven) shared-memory environment managed by compiler-driven
cache techniques, and they sound interested, but they have yet to really
move on it.  I've gotten the same response David got: they claim that Occam
and the connection scheme are basically features to build upon, not handicaps
to overcome.

There are LOTS of companies making little (4-16 processor) Transputer
stick-it-in-there or hang-it-off-that type products, but I don't know of any
general-purpose machine claiming to use Transputers without a host system
which uses another processor.

							-hankd

pauls@nsc.nsc.com (Paul Sweazey) (09/23/88)

is called Niche Data Systems.  I know a marketeer there named
Doug Van Leuven, at 408-730-8963. He probably has free info
and lots of stamps.

pauls

bcase@cup.portal.com (09/24/88)

>These are the NSIEVE (Sieve Of Eratosthenes) results I have at this time.
 
> (3) AMD 29000 at 25 MHz.  Branch Target Cache (BTC) was ON.  Metaware 
>     High C 29000 V2.1 with -O option. No effective memory wait states.
>     Memory was all physical (i.e., No cacheing).
>     From Trevor Marshall, BIX 'supermicros/bench #925', 07 Sep 1988.

Well, "no effective memory wait states" is kinda misleading.  The data
memory access time for this board is two clock cycles; now, maybe this
latency is always overlapped in this benchmark, thus prompting the
comment "no effective memory wait states," but that doesn't change the
implementation details!  Also, the instruction memory has zero wait
states (I *HATE* this damn term, but we're stuck with it) most of the
time, but it can have anywhere from 1 cycle (zero "wait states"; see
why I hate this term?) to 5 cycle latency, depending on circumstances
surrounding branches and static column alignment, page boundaries, etc.

The point I am trying to make is that the McCray board is neither a
"hot box" nor a system with caches.  The 29000 would do better on this
benchmark if it had the advantage of caches like those of the other
systems.  (Note that the current implementation of the 29000 has a
bug:  the BTC doesn't always work right.  This is the reason for the
inclusion of two 29000 times in the NISEVE stats.)

What is the number for the 25 MHz R3000 box?  Is it close to the Amdahl?

johnwe (John Weber, Celtic sysmom) (09/24/88)

In comp.arch, rsexton@uceng.UC.EDU (robert sexton) discourses on Transputer based systems., thusly:
> being a fan of parallel system and their advantages, I was wondering why
> the transputer has not gotten off the ground as a viable system.  It seems
> pretty feasable, as well as very cost-effective.  I imagine a machine with
> several transputers, each running unix.

	The major problem with transputers in a multi user (UN*X ish OS)
	environment is the complete lack of memory management, or any 
	provision for external memory management.  While this is all well
	and good for a single user PC, where nobody really cares if one
	process stomps another process, it is not really the acceptable
	answer for a multiuser system or a system on a network.  If a process
	breaks in just the right way, it could take out the whole network.

	There is also the problem that a user can control task switching,
	and can effectively shut it off, with a little thought.  This is
	also a bad thing.  The processor also has a few design things which
	I personally feel a bit uncomfortable with, such as a lack of a
	barrel or funnel shifter combined with a bit of a problem in the
	microcode which causes the processor to hang for a LONG time
	if you try to left shift 7fffffffh, or distinctly too few registers.
	I also have a bit of a problem with the message security.

	On the other hand, they are wonderful chips for parallel processors,
	controllers, and PCs.  Massively fast, hardware multitasking,
	and other wonderful things.  Basically, in any aplication where 
	interprocess security is not needed they, are great.

> Thanks in advance for your input.
> 

	No prob...

> 
> Robert Sexton, University of Cincinnati
> rsexton@uceng.uc.edu tut.cis.ohio-state.edu!uccba!uceng!rsexton
> Box Full O' Transputers... The Breakfast with MIPS
> I do not speak for UC, They don't speak for me.


-- 

"In the fields of Hell,               John Weber, ...!uunet!sco!johnwe   
 where the grass grows high,              @ucscc.ucsc.EDU:johnwe@sco.COM 
 are the graves of dreams,                                               
 allowed to die."  -- Author unknown   Celtic sysmom  with an ATTITUDE!  

 Any opinions expressed are my own, and bear no relationship to those
 of my employers, to the best of my knowlege.

mac3n@babbage.acc.virginia.edu (Alex Colvin) (09/27/88)

In article <40211@linus.UUCP>, bs@linus.UUCP (Robert D. Silverman) writes:
> We have just been though a major decision process where we chose a parallel
> computer. We discarded the transputer for several reasons:
> 
> (1) SLOW communication, relative to the IPSC/2 and AMETEK


I'm surprised at this.  I thought communication was one of the transputer
strengths.  My understanding was that the effective rate is 10Mb/s, with
only a few usec latency for small messages.

How are transputers where the topology and application are fixed, messages
are short, and the critical factor is end-to-end delay?  What are the
limits on link distance, and is there some kind of repeater (besides
another transputer)?