[comp.sys.transputer] What makes Transputers interesting

J.Wexler@edinburgh.ac.uk (12/06/88)
I have made a compendium of the recent correspondence on this topic.  If anyone
is interested in seeing it all in one place, here it is. My thanks to all those
contributors whose words are quoted here.
   John Wexler


On one of the electronic bulletin boards much frequented by Transputer users, an
extended discussion was recently (November 1988) provoked by the question "What
distinguishes a Transputer from any other processor, especially if I take a,
let's say 68030 or 32532, add 4 communication channels and write software to do
processor-processor communication? What makes a Transputer so interesting?"

Many of the answers concentrated on the integration of the design and the
consequences of having many features built into a single chip: speed, low chip
count, and the potential simplicity of systems which use Transputers instead of
other processors. Some correspondents mentioned particularly the ability to
build useful single-chip systems (e.g., embedded controllers), and the advantage
of being able to bootstrap one Transputer (or several) from another.

The features themselves make up a long list. As well as the instruction
processor, a single Transputer includes a memory controller so that it can drive
DRAM with no external circuitry; it includes a small amount of on-chip memory;
it includes the DMA control for four independent fast "links" for external and
inter-processor data transfers, with  access scheduling to prevent the links and
the processor from locking one another out; it provides hardware and microcode
to support inter-processor and inter-process communications; it includes a
microcoded multitasking kernel, which recognises two priority levels of
processes; it includes an elapsed-time clock; and the T800 model includes a
floating-point processor.

The performance of the individual components is also important. As far as the
instruction and floating-point processors are concerned, a full discussion of
speeds and comparisons with other systems would be impossible in this article,
but one can say, at least, that they are reasonably fast.  However, it is
possible to be much more definite about other aspects.  Process switching, for
instance, is extremely fast. Communications are fast and have low start-up
overheads.

Coherent design is another virtue. All the built-in features are mutually
compatible and make up an overall structure which is simple and easy to use. For
instance, inter-processor and inter-process (within a single processor)
communications are handled by quite different mechanisms, but they are
controlled by identical instructions so that they can be handled in the same way
by software. Again, the multi-tasking and communication facilities are fully
integrated so that the necessary synchronisation of processes (e.g., a receiver
waiting until a message has arrived) is obtained without software intervention
or "busy-waiting".

Scalability is a major advantage of Transputer-based systems: it is much easier
to enhance a system by adding further processors to it than would be the case
with other microprocessors.  Compatibility - the ease of replacing one model of
Transputer by another without major design changes in a system - is also
valuable. This extends to mixing models of Transputers within one system
(including procesors running at different speeds, or using different word
lengths).

Whether or not the Transputer is a RISC processor is debatable.  It certainly
has many of the virtues which one expects to derive from RISC-architecture, such
as simplicity and speed.

The principal complaints which came to light in the course of the discussion
were, on the hardware side, that the Transputer should use its on-chip memory as
a cache store, and that it should provide at least some support for memory
management and protection.  Software clearly annoyed a much larger number of
people, who called for something more like an operating system, better support
for standard languages, and a better software development environment.

================================================================================


Various collected comments from contributors:
================================================================================
Integration:

its integration (communication, multitasking, Floating-point on 
the same chip) ==> speed 

Well, you might say that the most interesting thing is that the
transputer does everything that his gob of 68xxx and communications
hardware and software does in one chip! Just the software effort
involved in the "roll your own" version would be an ugly cost.

First of all, I agree that it's the level of integration that makes
a transputer interesting.  They can do useful work with no external
components - just feed it power, ground, clock, reset line, and
hook up a link or two.  You can boot it, download programs, and run
them.
================================================================================
Low chip count
================================================================================
Needs no memory interface chips:

its built-in memory controller: The Transputer can drive DRAM with 
no additionnal circuitry. 

Another hardware facility provided is that of a DRAM controller built right
into the chip. This simplifies DRAM system design considerably.
================================================================================
Memory management of the channels vs processor requirements are done
on chip.
================================================================================
Very fast process switch:

The context-switch time on a transputer makes the 68xxx look like
a pig.

As far as the multitasking is concerned, all instructions use a register stack
(on chip) which is valid only for the duration of the instruction. This makes
context switching extremely fast.

ONLY IF you can use the hardware defined task model.  In that case,
it's pretty nice, since it'll actually wait for a minimum of task
state before swapping.  If you wanted to run a standard operating
system on the thing, you'd be in trouble.
================================================================================
Scheduling:

The transputer provides a multitasking kernel built right in the microcode.
================================================================================
"Nearly free" inter-process communications:

The speed and simplicity of its multitasking and communication due 
to the fact that they are integrated at the processor-instruction 
level.  

The communications are very fast, have very low startup overheads,
and operate without any need of the CPU after setup.
This is not easy to accomplish in discrete silicon with software.
In addition, the technology used for the communications allows for
long (about 30-40 feet) cable runs.

Lets start with your 68030 alongwith its four communication channels - to match
the transputer these links need to operate at 10 Mbps and contending with
these communication devices is no cakewalk - both in terms of H/W and S/W
The multitasking processes use channels for interprocess communication - and
these channels can be implemented either with memory exchanges or over the
serial links. This can be made transparent to the application programmer.
================================================================================
Processing speed:

The transputer is fast (about the equivalent of a vax or sun 3 now,
and getting faster).
================================================================================
On-chip floating-point:

The (T800) transputer has on-chip floating-point support.

Also provided in hardware is a floating point unit. As to how it compares with
the 80387 and Motorola's FPU I don't know. Reasonably well I'd suspect. 

It blows them away.  Against good FP ALUs (MIPS, Am29027, BIT's
stuff) it's not great, but it's at least in Weitek's league.
We've timed 2 MFLOPS doing dot products in on-chip RAM.

That's probably it's nicest feature.  The on-chip floating point i
pretty fast, though it's a small set of operations.  You'd have to
go to a Weitek chipset for that kind of performance on a 68xxx or
80xxx.  Motorola's 88100 has an even better on-chip floating point
scheme, using separate execution units for addition and multiplication.
================================================================================
RISC(ish):

its RISC-like architecture (few simple short and fast instructions) 

The transputer is RISC technology. The small instruction set means
that it's fairly easy to port compilers to it (although INMOS seems
to be real stodgy about realizing that the real world wants C and
FORTRAN).

Far as the software is concerned - Inmos claims that a high percentage (~70?) of
the instructions can be coded in one byte. I have looked at the instruction
encoding philosophy and found it to be impressive. If you are at all interested
in CPU architectures you really should look at it. It is, to say the very least,
'Interesting'.

I haven't really had any experience with the software but soon will have some.
But probably not with Occam.

Well, they have got a patent on it.  I think it's closer to 50%, but still
the code is highly compact.  (There are a few rearrangements I'd like to make,
but that's another story.)  Having programmed it in assembler a fair bit,
I'll avoid "impressive" and stick to "interesting".  There are things they
could have done better.  (Have cj pop the 0?  Unsigned gt?)
================================================================================
Scalability:

What
this does is that it allows initial development and use over a lesser number
of transputer and at a later time, if so desired, performance can be enhanced
and almost linear speedup achieved, by increasing the number of transputers
in the system and redistributing processes.
================================================================================
On-chip memory:

Another hardware goody provided is on-chip memory. This is either 2k or 4k
depending on the CPU (T414 or T800 resp). While not much in itself it can
be used for code optimization as instructions running out of this on chip RAM
run a lot faster than from external RAM.
================================================================================
Range compatibility:

Most
transputer systems can be upgraded by just plugging in newer, faster
chips.
================================================================================
No virtual memory support:

One of the main things I reproach to the Transputer is that it does not 
support virtual memory (vital to build any reasonnable stand-alone machine). 
And the 68030's interface
to memory makes the T800's "look like a pig", to coin a phrase.
================================================================================
Also  the on-board memory should be organised as a cache. This makes programming
much easier.
================================================================================
Miscellaneous:

This philosophy of integrating the links right into the kernel pays divedends
in another manner. The transputer is capable of booting itself right from 
the links. This implies that in a multiple processor system only one transputer
is required to have a ROM. The others will be perfectly content with a simple
RAM subsystem.

And the final hardware goody provided is an on chip frequency multiplier. 
This means that the different speed versions all take in 5 MHz clocks and 
multiply it appropriately to generate 20/25/30 Mhz. Thus these high frequencies
are restricted to within the chip.

Meanwhile, occam is designed for people who dream distributed systems
as opposed to others who dream von Neumann and then have to coerce
their one thread onto multiple processors.  In fact, that's the key
transputer characteristic as well.  It's designed for the way I think.

To sum-up, the Transputer is great because it is ONE Transputer 
instead of being MANY circuits + software. 

INMOS has good plans for the future growth and enhancement of the chip
series. (Now if they'd just do the same with the software).

OPINION: The transputer will probably define the future of parallel
computing for the next 5 years or so IF IF IF INMOS will wake up
and realize that the OCCAM language is a significant hindrance to
acceptance of their product in the US market. OCCAM is a language
best suited to CS weenies (BTW IR1, so I can say that :-).

P.S. I am not an INMOS employee. I have had significant experience
with the transputer in a large scale parallel machine.
The transputer hardware works well, the software sucks rocks.
OCCAM is the single biggest roadblock to general acceptance of
transputer based systems. Most people that I introduced to the
OCCAM language system said "Come back when you have 'real' languages".
I am certain that we will not solve the problems of broad acceptance
and understanding of parallel processing's capabilities as long as
OCCAM is the context.

I hope members of this group realize that, in spite of Inmos's best
efforts (in the past), there ARE several compilers available right now
for the transputer, especially C compilers.  There are also assemblers.
Thus, one does NOT need to use the Occam language to utilize the
interesting hardware features of the transputer, even though Occam has
some nice features too.  One can use the third party software, and Inmos
themselves now offer C and FORTRAN too!  So why doesn't the transputer
take over the world?  Lack of decent SYSTEM software.  Of course, we at
Cornell are trying to fix that with the Trollius OS...  (and one should
also mention Helios).  There are already at least two vendors of transputer-
based hardware (for UNIX hosts) offering Trollius.

'Interesting' is a subjective characteristic. I personally feel that the 
transputer is interesting because, atleast from a hardware developer's
viewpoint it offers a lot of bang for the design effort.

Well, several C compilers are available.  I reccomennd Logical Systems' C
compiler, $6xx.xx with full source last time I looked.  Kirk, are you
still out there to correct me?  They're based in Corvallis, Oregon.  We had
a couple of problems writing our OS in it, but it can handle serious work.

I haven't done any work in Occam, but C works fine.  No, Occam isn't mandatory,
although it makes communications-rich code a bit more legible.  Kirk's
compiler has #pragma asm and #pragma endasm so you can escape to assembler
and get at anything the machine provides.

Actually, my experience has been that the transputer works best in
tightly coupled systems, not loosely coupled. Still, it's brought
loosely coupled systems into the realm of affordable reality.
Hence, most of the discussion in this newsgroup centers on how to
make loosely coupled systems work well.