[comp.sys.misc] Atari Transputers ?

ccplumb@watmath.waterloo.edu (Colin Plumb) (01/01/70)

[I've added comp.sys.transputer to this discussion.  For those who
haven't seen this before, someone posted rumours about an Atari
Transputer box.  Then a discussion about a Transputer's power got
started.  Inmos claims 10 MIPS, but those are Transputer MIPS, which
are about 1/2 to 1/3 of RISC MIPS, which are perhaps 1/2 of CISC MIPS.
(Using VAX 11/780 as a guide - some figures I've seen say the canonical
1 MIPS machine actually executes .5 native MIPS!)]

On the original subject:

I know Atari has wangled a deal with Inmos to get 20 MHz T414
Transputers for $50.00 (U.S.) apiece.  I also know they're paying Tim
King (the man who brought you Tripos!  Beware!) significant amounts of
money to develop Helios.  I have a preliminary spec for Helios around
here somewhere.

The current debate:

tim@amdcad.UUCP (Tim Olson) writes:
>>For example, the transputer is a stack machine.  To perform the sequence
>>
>>	a = b+c;
>>
>>(assuming a,b, and c are register variables) requires 4 instructions:
>>
>>	push b
>>	push c
>>	add
>>	pop a
>>
>>while on the 68000 it requires 2:
>>
>>	mov b, a
>>	add c, a
>>
>>and on many RISC machines it requires only 1:
>>
>>	add	a, b, c

alan@pdn.UUCP (0000-Alan Lovejoy) writes:
>That last example is a good example of a VAX-class CISC machine, not
>of any RISC machine I know of.  RISC code would probably be identical
>to the 68000 example (which is more of a RISC machine than it is given
>credit for--just wait until the 78000 is announced and you'll see what
>I mean).

Not on Tim's baby!  AMD's Am29000 uses 3-operand instructions.  So, I
might add, did the RISC II prototype chip and, I believe, the original
RISC I.

>Also, assuming that the values (a, b, c) are already in registers
>obscures the difference between RISC and 68000.  Assuming memory values
>gives:

First, Tim said "assuming register variables".  The trick is, the
Transputer hasn't got any registers in the usual sense of the word.
All it's got is a 3-word stack to evaluate expressions on (rather like
most HP calculators, only HP gives 4 words).  What the transputer
*does* have, however, is *very* fast stack-relative addressing.  The
first 16 words can be accessed using one-byte instructions, and are the
Transputer's "registers".

I have frequently wished they'd cache those 16 words.  It would
probably reduce bandwidth requirements by about 1/3, and speed up the
chip by a similar factor.

Second, One of the hallmarks of a true-blue RISC chip, however, is
*LOTS* of registers.  The Am29000, for instance, has 192.  Only large
structs and arrays need to go off-chip.

Thus, the assumption that *everything* is in a register is perfectly
valid, even in situations where other architectures would go off-chip.

>   RISC              680x0               VAX
>
>   LOAD a(FP), R0    MOVE.W a(A6), D0    ADD a(R1), b(R1), c(R1)
>   LOAD b(FP), R1    ADD.W b(A6), D0 
>   ADD R0, R1        MOVE.W D0, c(A6)
>   STORE R1, c(FP)
>  
>The RISC code will run in 4 cycles; on a 68030 with no-wait-state memory
>the 680x0 code will take at least 3 cycles but could take as many as 10 cycles 
>and will probably average 5 or 6 cycles, although that will depend on the 
>state of the instruction pipeline and on what's currently in the code and data 
>caches; I don't have enough information about the VAX instruction set to give 
>cycles--it probably is different for each VAX model anyway.

For an 11/780, it would take 1.80 microseconds (assuming a, b and c are
8-bit displacements).  0.60 microseconds if registers are being used.
Sorry, I don't know how many "cycles" that is!
--
	-Colin Plumb (watmath!ccplumb)

Zippy says:
You can't hurt me!!  I have an ASSUMABLE MORTGAGE!!

mr-frog@amos.ling.ucsd.edu (Dave Pare) (10/07/87)

> Second, One of the hallmarks of a true-blue RISC chip, however, is
> *LOTS* of registers.  The Am29000, for instance, has 192.  Only large
> structs and arrays need to go off-chip.

Is this per-context, or are they shared between all the running processes?
After all, the transputer is not a single-process machine!

> Thus, the assumption that *everything* is in a register is perfectly
> valid, even in situations where other architectures would go off-chip.

As I understand it, RISC machines have difficulties once the number
of runnable processes gets above the number of "hardware" contexts
available to it.  I think that if you have a task that fires off 30
processes, the transputer will blow the doors off anything else,
because its switching overhead is 10% of most other machines.
What is the switch time of the Am29000?

Dave Pare

tim@amdcad.AMD.COM (Tim Olson) (10/08/87)

<<< Postnews can't find comp.sys.transputer, so I removed it from the
<<< Newsgroup line -- could someone who has access to that group could cross-
<<< post??

In article <4044@sdcsvax.UCSD.EDU> mr-frog@amos.UUCP (Dave Pare) writes:
+-----
| > Second, One of the hallmarks of a true-blue RISC chip, however, is
| > *LOTS* of registers.  The Am29000, for instance, has 192.  Only large
| > structs and arrays need to go off-chip.
| 
| Is this per-context, or are they shared between all the running processes?
| After all, the transputer is not a single-process machine!
+-----

There exist two "models" of register usage -- the stack cache model and
the register bank model.  In the stack cache model, each process uses
the entire register file (the local registers are used as a scalar stack
cache) and the registers must be saved on context switches.  In the
register bank model, the registers are divided up into 8 banks of 16
registers apiece, so up to 8 processes can be "resident".  Our initial
software will only support the stack cache model, however.

+-----
| As I understand it, RISC machines have difficulties once the number
| of runnable processes gets above the number of "hardware" contexts
| available to it.  I think that if you have a task that fires off 30
| processes, the transputer will blow the doors off anything else,
| because its switching overhead is 10% of most other machines.
| What is the switch time of the Am29000?
+-----

For the stack cache model, the state can be saved in around 120-150
cycles (assuming single-cycle burst-write memory), which is between 4.8
and 6 microseconds.  It takes a like amount of time to reload a new
context.  For the register bank model, the task switch can occur in
about 20 cycles (800 ns).

Yes, the transputer has a low context-switch time, but if you are really
switching that often that 4 microseconds makes a difference, you
probably aren't getting much work done, anyway.  Note that most UNIX
machines perform roughly 60 - 100 context switches per second, so that
register-save time is in the noise.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)

aglew@ccvaxa.UUCP (10/10/87)

>Yes, the transputer has a low context-switch time, but if you are really
>switching that often that 4 microseconds makes a difference, you
>probably aren't getting much work done, anyway.  Note that most UNIX
>machines perform roughly 60 - 100 context switches per second, so that
>register-save time is in the noise.
>
>	-- Tim Olson
>	Advanced Micro Devices
>	(tim@amdcad.amd.com)

4 microseconds might make a difference to you, even if you aren't context
switching that often - but if you have certain events that you want to
respond to real fast, in real time. For these applications, of course,
you would try to pre-load and lock down the registers in the AMD29000.

haitex@pnet01.CTS.COM (Wade Bickel) (10/14/87)

       This is a response to message 8903 about the INMOS Transputer vs.
     other types of processors.  I'm not going to include it here and
     hope you can just call it up and read it.

        Anyway, arn't you all missing the point by analysing the "MIPS"
     rating of these processors and using the results to compare the
     chips?  I followed the Transputer for a couple of years, but became
     dicouraged because the actual pricing of the component has never
     even been close to those pro-offered by INMOS at their Transputer
     seminars.  What attracted me to the Transputer was the parallel 
     processing capabilities it supports.  This is what the Transputer
     is all about, and none of the other systems discussed support
     this type of capability.

         At the current time I find the University of Lowell's (Mass.)
     image processing board to be more intresting than the Transputer.
     First of all, they (claim to) already have the system working,
     and we're talking 35 MIPS (I mean it will execute 35 million 
     instructions each second) and this INCLUDES MULTIPLY (17x17bits).
     The system is based upon NEC parts, and basically consists of
     up to seven 5 MIPS processors which operate in a bucket brigade,
     and one chip to handle most aspects of communications, including
     direct access to the Amiga memory.
        Furthermore NEC claims it will release a 10 MIPS processor 
     version before the end of the year, resulting in 70 MIPS.  With
     multiply on every cycle that seems damn powerful to me!  Only
     problem is the system looks tricky to program.

        As far as the Transputer goes, I think the Transputer language,
     OCCAM, would be a natural for the amiga.  It supports multi-tasking
     at the language level, and looks to be fairly complete.

                                        Wade.


UUCP: {cbosgd, hplabs!hp-sdd, sdcsvax, nosc}!crash!pnet01!haitex
ARPA: crash!pnet01!haitex@nosc.mil
INET: haitex@pnet01.CTS.COM

ccplumb@watmath.UUCP (10/16/87)

In article <1858@crash.CTS.COM> haitex@pnet01.CTS.COM (Wade Bickel) writes:
>        As far as the Transputer goes, I think the Transputer language,
>     OCCAM, would be a natural for the amiga.  It supports multi-tasking
>     at the language level, and looks to be fairly complete.

H'm... The Occam model of message passing involves synchronized passing
and copying of data.  The Amiga's primitives do not copy messages, but
do queue them.  Occam assumes that all memory requirements (including
run-time stack requirements) can be computed at compile time.  The Amiga
allocates memory left, right, and centre.  This is compatible?

Also, the known stack size implies no recursion.  This is a point
against "fairly complete".  You wanted to do *what* to a binary tree?
--
	-Colin Plumb (watmath!ccplumb)

"RISC tends to be any 32-bit processor without an established market
introduced since 1982"