ccplumb@watmath.waterloo.edu (Colin Plumb) (01/01/70)
[I've added comp.sys.transputer to this discussion. For those who haven't seen this before, someone posted rumours about an Atari Transputer box. Then a discussion about a Transputer's power got started. Inmos claims 10 MIPS, but those are Transputer MIPS, which are about 1/2 to 1/3 of RISC MIPS, which are perhaps 1/2 of CISC MIPS. (Using VAX 11/780 as a guide - some figures I've seen say the canonical 1 MIPS machine actually executes .5 native MIPS!)] On the original subject: I know Atari has wangled a deal with Inmos to get 20 MHz T414 Transputers for $50.00 (U.S.) apiece. I also know they're paying Tim King (the man who brought you Tripos! Beware!) significant amounts of money to develop Helios. I have a preliminary spec for Helios around here somewhere. The current debate: tim@amdcad.UUCP (Tim Olson) writes: >>For example, the transputer is a stack machine. To perform the sequence >> >> a = b+c; >> >>(assuming a,b, and c are register variables) requires 4 instructions: >> >> push b >> push c >> add >> pop a >> >>while on the 68000 it requires 2: >> >> mov b, a >> add c, a >> >>and on many RISC machines it requires only 1: >> >> add a, b, c alan@pdn.UUCP (0000-Alan Lovejoy) writes: >That last example is a good example of a VAX-class CISC machine, not >of any RISC machine I know of. RISC code would probably be identical >to the 68000 example (which is more of a RISC machine than it is given >credit for--just wait until the 78000 is announced and you'll see what >I mean). Not on Tim's baby! AMD's Am29000 uses 3-operand instructions. So, I might add, did the RISC II prototype chip and, I believe, the original RISC I. >Also, assuming that the values (a, b, c) are already in registers >obscures the difference between RISC and 68000. Assuming memory values >gives: First, Tim said "assuming register variables". The trick is, the Transputer hasn't got any registers in the usual sense of the word. All it's got is a 3-word stack to evaluate expressions on (rather like most HP calculators, only HP gives 4 words). What the transputer *does* have, however, is *very* fast stack-relative addressing. The first 16 words can be accessed using one-byte instructions, and are the Transputer's "registers". I have frequently wished they'd cache those 16 words. It would probably reduce bandwidth requirements by about 1/3, and speed up the chip by a similar factor. Second, One of the hallmarks of a true-blue RISC chip, however, is *LOTS* of registers. The Am29000, for instance, has 192. Only large structs and arrays need to go off-chip. Thus, the assumption that *everything* is in a register is perfectly valid, even in situations where other architectures would go off-chip. > RISC 680x0 VAX > > LOAD a(FP), R0 MOVE.W a(A6), D0 ADD a(R1), b(R1), c(R1) > LOAD b(FP), R1 ADD.W b(A6), D0 > ADD R0, R1 MOVE.W D0, c(A6) > STORE R1, c(FP) > >The RISC code will run in 4 cycles; on a 68030 with no-wait-state memory >the 680x0 code will take at least 3 cycles but could take as many as 10 cycles >and will probably average 5 or 6 cycles, although that will depend on the >state of the instruction pipeline and on what's currently in the code and data >caches; I don't have enough information about the VAX instruction set to give >cycles--it probably is different for each VAX model anyway. For an 11/780, it would take 1.80 microseconds (assuming a, b and c are 8-bit displacements). 0.60 microseconds if registers are being used. Sorry, I don't know how many "cycles" that is! -- -Colin Plumb (watmath!ccplumb) Zippy says: You can't hurt me!! I have an ASSUMABLE MORTGAGE!!
mr-frog@amos.ling.ucsd.edu (Dave Pare) (10/07/87)
> Second, One of the hallmarks of a true-blue RISC chip, however, is > *LOTS* of registers. The Am29000, for instance, has 192. Only large > structs and arrays need to go off-chip. Is this per-context, or are they shared between all the running processes? After all, the transputer is not a single-process machine! > Thus, the assumption that *everything* is in a register is perfectly > valid, even in situations where other architectures would go off-chip. As I understand it, RISC machines have difficulties once the number of runnable processes gets above the number of "hardware" contexts available to it. I think that if you have a task that fires off 30 processes, the transputer will blow the doors off anything else, because its switching overhead is 10% of most other machines. What is the switch time of the Am29000? Dave Pare
tim@amdcad.AMD.COM (Tim Olson) (10/08/87)
<<< Postnews can't find comp.sys.transputer, so I removed it from the <<< Newsgroup line -- could someone who has access to that group could cross- <<< post?? In article <4044@sdcsvax.UCSD.EDU> mr-frog@amos.UUCP (Dave Pare) writes: +----- | > Second, One of the hallmarks of a true-blue RISC chip, however, is | > *LOTS* of registers. The Am29000, for instance, has 192. Only large | > structs and arrays need to go off-chip. | | Is this per-context, or are they shared between all the running processes? | After all, the transputer is not a single-process machine! +----- There exist two "models" of register usage -- the stack cache model and the register bank model. In the stack cache model, each process uses the entire register file (the local registers are used as a scalar stack cache) and the registers must be saved on context switches. In the register bank model, the registers are divided up into 8 banks of 16 registers apiece, so up to 8 processes can be "resident". Our initial software will only support the stack cache model, however. +----- | As I understand it, RISC machines have difficulties once the number | of runnable processes gets above the number of "hardware" contexts | available to it. I think that if you have a task that fires off 30 | processes, the transputer will blow the doors off anything else, | because its switching overhead is 10% of most other machines. | What is the switch time of the Am29000? +----- For the stack cache model, the state can be saved in around 120-150 cycles (assuming single-cycle burst-write memory), which is between 4.8 and 6 microseconds. It takes a like amount of time to reload a new context. For the register bank model, the task switch can occur in about 20 cycles (800 ns). Yes, the transputer has a low context-switch time, but if you are really switching that often that 4 microseconds makes a difference, you probably aren't getting much work done, anyway. Note that most UNIX machines perform roughly 60 - 100 context switches per second, so that register-save time is in the noise. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)
aglew@ccvaxa.UUCP (10/10/87)
>Yes, the transputer has a low context-switch time, but if you are really >switching that often that 4 microseconds makes a difference, you >probably aren't getting much work done, anyway. Note that most UNIX >machines perform roughly 60 - 100 context switches per second, so that >register-save time is in the noise. > > -- Tim Olson > Advanced Micro Devices > (tim@amdcad.amd.com) 4 microseconds might make a difference to you, even if you aren't context switching that often - but if you have certain events that you want to respond to real fast, in real time. For these applications, of course, you would try to pre-load and lock down the registers in the AMD29000.
haitex@pnet01.CTS.COM (Wade Bickel) (10/14/87)
This is a response to message 8903 about the INMOS Transputer vs. other types of processors. I'm not going to include it here and hope you can just call it up and read it. Anyway, arn't you all missing the point by analysing the "MIPS" rating of these processors and using the results to compare the chips? I followed the Transputer for a couple of years, but became dicouraged because the actual pricing of the component has never even been close to those pro-offered by INMOS at their Transputer seminars. What attracted me to the Transputer was the parallel processing capabilities it supports. This is what the Transputer is all about, and none of the other systems discussed support this type of capability. At the current time I find the University of Lowell's (Mass.) image processing board to be more intresting than the Transputer. First of all, they (claim to) already have the system working, and we're talking 35 MIPS (I mean it will execute 35 million instructions each second) and this INCLUDES MULTIPLY (17x17bits). The system is based upon NEC parts, and basically consists of up to seven 5 MIPS processors which operate in a bucket brigade, and one chip to handle most aspects of communications, including direct access to the Amiga memory. Furthermore NEC claims it will release a 10 MIPS processor version before the end of the year, resulting in 70 MIPS. With multiply on every cycle that seems damn powerful to me! Only problem is the system looks tricky to program. As far as the Transputer goes, I think the Transputer language, OCCAM, would be a natural for the amiga. It supports multi-tasking at the language level, and looks to be fairly complete. Wade. UUCP: {cbosgd, hplabs!hp-sdd, sdcsvax, nosc}!crash!pnet01!haitex ARPA: crash!pnet01!haitex@nosc.mil INET: haitex@pnet01.CTS.COM
ccplumb@watmath.UUCP (10/16/87)
In article <1858@crash.CTS.COM> haitex@pnet01.CTS.COM (Wade Bickel) writes: > As far as the Transputer goes, I think the Transputer language, > OCCAM, would be a natural for the amiga. It supports multi-tasking > at the language level, and looks to be fairly complete. H'm... The Occam model of message passing involves synchronized passing and copying of data. The Amiga's primitives do not copy messages, but do queue them. Occam assumes that all memory requirements (including run-time stack requirements) can be computed at compile time. The Amiga allocates memory left, right, and centre. This is compatible? Also, the known stack size implies no recursion. This is a point against "fairly complete". You wanted to do *what* to a binary tree? -- -Colin Plumb (watmath!ccplumb) "RISC tends to be any 32-bit processor without an established market introduced since 1982"