J.Wexler@edinburgh.ac.uk (12/06/88)
I have made a compendium of the recent correspondence on this topic. If anyone is interested in seeing it all in one place, here it is. My thanks to all those contributors whose words are quoted here. John Wexler On one of the electronic bulletin boards much frequented by Transputer users, an extended discussion was recently (November 1988) provoked by the question "What distinguishes a Transputer from any other processor, especially if I take a, let's say 68030 or 32532, add 4 communication channels and write software to do processor-processor communication? What makes a Transputer so interesting?" Many of the answers concentrated on the integration of the design and the consequences of having many features built into a single chip: speed, low chip count, and the potential simplicity of systems which use Transputers instead of other processors. Some correspondents mentioned particularly the ability to build useful single-chip systems (e.g., embedded controllers), and the advantage of being able to bootstrap one Transputer (or several) from another. The features themselves make up a long list. As well as the instruction processor, a single Transputer includes a memory controller so that it can drive DRAM with no external circuitry; it includes a small amount of on-chip memory; it includes the DMA control for four independent fast "links" for external and inter-processor data transfers, with access scheduling to prevent the links and the processor from locking one another out; it provides hardware and microcode to support inter-processor and inter-process communications; it includes a microcoded multitasking kernel, which recognises two priority levels of processes; it includes an elapsed-time clock; and the T800 model includes a floating-point processor. The performance of the individual components is also important. As far as the instruction and floating-point processors are concerned, a full discussion of speeds and comparisons with other systems would be impossible in this article, but one can say, at least, that they are reasonably fast. However, it is possible to be much more definite about other aspects. Process switching, for instance, is extremely fast. Communications are fast and have low start-up overheads. Coherent design is another virtue. All the built-in features are mutually compatible and make up an overall structure which is simple and easy to use. For instance, inter-processor and inter-process (within a single processor) communications are handled by quite different mechanisms, but they are controlled by identical instructions so that they can be handled in the same way by software. Again, the multi-tasking and communication facilities are fully integrated so that the necessary synchronisation of processes (e.g., a receiver waiting until a message has arrived) is obtained without software intervention or "busy-waiting". Scalability is a major advantage of Transputer-based systems: it is much easier to enhance a system by adding further processors to it than would be the case with other microprocessors. Compatibility - the ease of replacing one model of Transputer by another without major design changes in a system - is also valuable. This extends to mixing models of Transputers within one system (including procesors running at different speeds, or using different word lengths). Whether or not the Transputer is a RISC processor is debatable. It certainly has many of the virtues which one expects to derive from RISC-architecture, such as simplicity and speed. The principal complaints which came to light in the course of the discussion were, on the hardware side, that the Transputer should use its on-chip memory as a cache store, and that it should provide at least some support for memory management and protection. Software clearly annoyed a much larger number of people, who called for something more like an operating system, better support for standard languages, and a better software development environment. ================================================================================ Various collected comments from contributors: ================================================================================ Integration: its integration (communication, multitasking, Floating-point on the same chip) ==> speed Well, you might say that the most interesting thing is that the transputer does everything that his gob of 68xxx and communications hardware and software does in one chip! Just the software effort involved in the "roll your own" version would be an ugly cost. First of all, I agree that it's the level of integration that makes a transputer interesting. They can do useful work with no external components - just feed it power, ground, clock, reset line, and hook up a link or two. You can boot it, download programs, and run them. ================================================================================ Low chip count ================================================================================ Needs no memory interface chips: its built-in memory controller: The Transputer can drive DRAM with no additionnal circuitry. Another hardware facility provided is that of a DRAM controller built right into the chip. This simplifies DRAM system design considerably. ================================================================================ Memory management of the channels vs processor requirements are done on chip. ================================================================================ Very fast process switch: The context-switch time on a transputer makes the 68xxx look like a pig. As far as the multitasking is concerned, all instructions use a register stack (on chip) which is valid only for the duration of the instruction. This makes context switching extremely fast. ONLY IF you can use the hardware defined task model. In that case, it's pretty nice, since it'll actually wait for a minimum of task state before swapping. If you wanted to run a standard operating system on the thing, you'd be in trouble. ================================================================================ Scheduling: The transputer provides a multitasking kernel built right in the microcode. ================================================================================ "Nearly free" inter-process communications: The speed and simplicity of its multitasking and communication due to the fact that they are integrated at the processor-instruction level. The communications are very fast, have very low startup overheads, and operate without any need of the CPU after setup. This is not easy to accomplish in discrete silicon with software. In addition, the technology used for the communications allows for long (about 30-40 feet) cable runs. Lets start with your 68030 alongwith its four communication channels - to match the transputer these links need to operate at 10 Mbps and contending with these communication devices is no cakewalk - both in terms of H/W and S/W The multitasking processes use channels for interprocess communication - and these channels can be implemented either with memory exchanges or over the serial links. This can be made transparent to the application programmer. ================================================================================ Processing speed: The transputer is fast (about the equivalent of a vax or sun 3 now, and getting faster). ================================================================================ On-chip floating-point: The (T800) transputer has on-chip floating-point support. Also provided in hardware is a floating point unit. As to how it compares with the 80387 and Motorola's FPU I don't know. Reasonably well I'd suspect. It blows them away. Against good FP ALUs (MIPS, Am29027, BIT's stuff) it's not great, but it's at least in Weitek's league. We've timed 2 MFLOPS doing dot products in on-chip RAM. That's probably it's nicest feature. The on-chip floating point i pretty fast, though it's a small set of operations. You'd have to go to a Weitek chipset for that kind of performance on a 68xxx or 80xxx. Motorola's 88100 has an even better on-chip floating point scheme, using separate execution units for addition and multiplication. ================================================================================ RISC(ish): its RISC-like architecture (few simple short and fast instructions) The transputer is RISC technology. The small instruction set means that it's fairly easy to port compilers to it (although INMOS seems to be real stodgy about realizing that the real world wants C and FORTRAN). Far as the software is concerned - Inmos claims that a high percentage (~70?) of the instructions can be coded in one byte. I have looked at the instruction encoding philosophy and found it to be impressive. If you are at all interested in CPU architectures you really should look at it. It is, to say the very least, 'Interesting'. I haven't really had any experience with the software but soon will have some. But probably not with Occam. Well, they have got a patent on it. I think it's closer to 50%, but still the code is highly compact. (There are a few rearrangements I'd like to make, but that's another story.) Having programmed it in assembler a fair bit, I'll avoid "impressive" and stick to "interesting". There are things they could have done better. (Have cj pop the 0? Unsigned gt?) ================================================================================ Scalability: What this does is that it allows initial development and use over a lesser number of transputer and at a later time, if so desired, performance can be enhanced and almost linear speedup achieved, by increasing the number of transputers in the system and redistributing processes. ================================================================================ On-chip memory: Another hardware goody provided is on-chip memory. This is either 2k or 4k depending on the CPU (T414 or T800 resp). While not much in itself it can be used for code optimization as instructions running out of this on chip RAM run a lot faster than from external RAM. ================================================================================ Range compatibility: Most transputer systems can be upgraded by just plugging in newer, faster chips. ================================================================================ No virtual memory support: One of the main things I reproach to the Transputer is that it does not support virtual memory (vital to build any reasonnable stand-alone machine). And the 68030's interface to memory makes the T800's "look like a pig", to coin a phrase. ================================================================================ Also the on-board memory should be organised as a cache. This makes programming much easier. ================================================================================ Miscellaneous: This philosophy of integrating the links right into the kernel pays divedends in another manner. The transputer is capable of booting itself right from the links. This implies that in a multiple processor system only one transputer is required to have a ROM. The others will be perfectly content with a simple RAM subsystem. And the final hardware goody provided is an on chip frequency multiplier. This means that the different speed versions all take in 5 MHz clocks and multiply it appropriately to generate 20/25/30 Mhz. Thus these high frequencies are restricted to within the chip. Meanwhile, occam is designed for people who dream distributed systems as opposed to others who dream von Neumann and then have to coerce their one thread onto multiple processors. In fact, that's the key transputer characteristic as well. It's designed for the way I think. To sum-up, the Transputer is great because it is ONE Transputer instead of being MANY circuits + software. INMOS has good plans for the future growth and enhancement of the chip series. (Now if they'd just do the same with the software). OPINION: The transputer will probably define the future of parallel computing for the next 5 years or so IF IF IF INMOS will wake up and realize that the OCCAM language is a significant hindrance to acceptance of their product in the US market. OCCAM is a language best suited to CS weenies (BTW IR1, so I can say that :-). P.S. I am not an INMOS employee. I have had significant experience with the transputer in a large scale parallel machine. The transputer hardware works well, the software sucks rocks. OCCAM is the single biggest roadblock to general acceptance of transputer based systems. Most people that I introduced to the OCCAM language system said "Come back when you have 'real' languages". I am certain that we will not solve the problems of broad acceptance and understanding of parallel processing's capabilities as long as OCCAM is the context. I hope members of this group realize that, in spite of Inmos's best efforts (in the past), there ARE several compilers available right now for the transputer, especially C compilers. There are also assemblers. Thus, one does NOT need to use the Occam language to utilize the interesting hardware features of the transputer, even though Occam has some nice features too. One can use the third party software, and Inmos themselves now offer C and FORTRAN too! So why doesn't the transputer take over the world? Lack of decent SYSTEM software. Of course, we at Cornell are trying to fix that with the Trollius OS... (and one should also mention Helios). There are already at least two vendors of transputer- based hardware (for UNIX hosts) offering Trollius. 'Interesting' is a subjective characteristic. I personally feel that the transputer is interesting because, atleast from a hardware developer's viewpoint it offers a lot of bang for the design effort. Well, several C compilers are available. I reccomennd Logical Systems' C compiler, $6xx.xx with full source last time I looked. Kirk, are you still out there to correct me? They're based in Corvallis, Oregon. We had a couple of problems writing our OS in it, but it can handle serious work. I haven't done any work in Occam, but C works fine. No, Occam isn't mandatory, although it makes communications-rich code a bit more legible. Kirk's compiler has #pragma asm and #pragma endasm so you can escape to assembler and get at anything the machine provides. Actually, my experience has been that the transputer works best in tightly coupled systems, not loosely coupled. Still, it's brought loosely coupled systems into the realm of affordable reality. Hence, most of the discussion in this newsgroup centers on how to make loosely coupled systems work well.