rsexton@uceng.UC.EDU (robert sexton) (09/20/88)
being a fan of parallel system and their advantages, I was wondering why the transputer has not gotten off the ground as a viable system. It seems pretty feasable, as well as very cost-effective. I imagine a machine with several transputers, each running unix. When the machine is lightly loaded, every user gets a processor, maybe more, when its heavily loaded, the users have to share processors. Admittedly, there are obstacles in the areas of shared memory, shared storage, and general parallelization. The first two are pretty simple to defeat, but the third seems to show no signs of going away. It seems however, that by mapping tasks onto processors, we could get a pretty flexible system right now. When you run out of power, you can just add more processors. A system with 64 transputers could could theoretically provide 16 times the floating point performance of a VAX 8650, for approximately $64000. Admittedly these ponderings are largely wishful thinking, but the price/performance could be incredible. natural applications would be ray tracing, fluid flow, etc. Thanks in advance for your input. Robert Sexton, University of Cincinnati rsexton@uceng.uc.edu tut.cis.ohio-state.edu!uccba!uceng!rsexton Box Full O' Transputers... The Breakfast with MIPS I do not speak for UC, They don't speak for me.
fpst@hubcap.UUCP (Steve Stevenson) (09/21/88)
From article <253@uceng.UC.EDU>, by rsexton@uceng.UC.EDU (robert sexton):
>....transputers...
The T-series hypercube from FPS is transputer based. The glitch there
was religion: Occam or nothing. INMOS was, until recently, adamant
about it. Our T and Levco system (hung off a MAC) are research systems
here at Clemson. Contact Dan Warner: (warner@hubcap.clemson.edu)
--
Steve Stevenson fpst@hubcap.clemson.edu
(aka D. E. Stevenson), fpst@prism.clemson.csnet
Department of Computer Science, comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell
dre%ember@Sun.COM (David Emberson) (09/22/88)
In my former life in hypercube-land I evaluated the Transputer (the older one, T414) and Occam. The Transputer isn't a bad machine, except that it has no memory management, no protection of any kind, and no supervisor mode. This is probably why you don't see Transputer systems running Unix. Occam and its associated editor and tools are totally unusable. Among other things, the language had no pointers, interprocessor communication was point-to-point, the language had white space dependencies (!), and other sins too numerous to list. Every time we would complain, the poor local technical rep from Inmos would say something about the great new version just on the horizon. One time, one of the big honchos from England (I forget the name but he was one of the top architects) came through on a U.S. tour. I spent three hours arguing with him about giving me the details of the assembly language--which at that time they did not want to make public. He made a remark that he did not understand why "we Americans" were so interested in the machine dependent details. "In Europe, no one asks us these questions, and they are satisfied with Occam." I finally said something like, "If I don't know how the chip works, I sure as hell am not going to design it into my machine." End of conversation. Inmos finally did come around and publish an assembler manual, and a couple of companies are making Transputer-based machines. One company makes a four-cpu board that plugs into the Sun backplane. I think you can have up to 16 cpus. I don't know about the programming environment, but I think they have C. Sorry about not having the name, but I seem to have misplaced the literature. It's probably buried under heaps of much more interesting Sparc stuff... :-) Dave Emberson (dre@sun.com)
bs@linus.UUCP (Robert D. Silverman) (09/22/88)
In article <253@uceng.UC.EDU> rsexton@uceng.UC.EDU (robert sexton) writes: >being a fan of parallel system and their advantages, I was wondering why >the transputer has not gotten off the ground as a viable system. It seems >pretty feasable, as well as very cost-effective. I imagine a machine with >several transputers, each running unix. When the machine is lightly loaded, stuff deleted. We have just been though a major decision process where we chose a parallel computer. We discarded the transputer for several reasons: (1) SLOW communication, relative to the IPSC/2 and AMETEK (2) Lack of software; e.g. good debugging tools, compilers, etc. (3) Too heavy a dependence on OCCAM (4) Speed. The IPSC/2 and AMETEK have faster processors and allow for MERCURY type floating point vector boards as nodes (5) Uncertainty as to whether the transputer will last as a viable product. (6) Lack of third party software. These are just a few of the reasons. Bob Silverman
pase@ogccse.ogc.edu (Douglas M. Pase) (09/23/88)
Actually, the Transputer has found its way into several commercial products. I understand it is especially popular in Europe. Meiko (?) makes a computing surface built from transputers which does certain modeling and graphics applications very well, and at low cost. The FPS T-Series (one tremendous Mega Flop) is (was) based on the transputer. Cogent Research also has a wonderful machine which uses multiple transputers. The Transputer's on-chip floating point circuitry and 4Kbyte memory (yes, also on chip) means it can really scream for the right applications. It will probably be a while before the Transputer is as ubiquitous as the 80x86 or the Motorola x80y0, but it's not doing poorly. -- Douglas M. Pase Department of Computer Science tektronix!ogccse!pase Oregon Graduate Center pase@cse.ogc.edu (CSNet) 19600 NW Von Neumann Dr. (503) 690-1121 x7303 Beaverton, OR 97006-1999
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (09/23/88)
---------- These are the NSIEVE (Sieve Of Eratosthenes) results I have at this time. I have also updated NSIEVE.c. Added 'free(ptr)' to the SIEVE() routine. The program was not freeing allocated memory previously. Added error checks based on the number of primes found for each array size. Program will not bomb if 'malloc()' returns null pointer. Also added timer routine for Microsoft C. I didn't change the Unix timing routines as I think it is probably better to have the user confirm/input the right 'HZ' values and this is usually in the 'times()' documentation file. Also while <sys/param.h> should contain the right 'HZ' or 'COUNTS' values this may not always be the case (neither HZ or COUNTS were defined in our system so I had to input it anyway). Sorry about the 'Primes/sec' output but some people seem to prefer this over just the RunTime output. So anyway there is a 'Primes/sec' output now (calculated as Primes/sec = 1899 / ( Average RunTime(sec) ) ). I'll repost NSIEVE week. NSIEVE (Scaled to 10 Iterations): Array Size --------------------RunTime(sec)---------------------------- (Bytes) 1 2 3 4 5 6 Amdahl Amdahl McCray MIPS McCray Sun 3/280 5890 5890-300E Amd 29000 R2000 AMD 29000 68020 (gcc) (cc) BTC ON M/120 BTC OFF (cc) 8191 0.033 0.050 0.116 0.130 0.183 0.267 10000 0.050 0.083 0.150 0.150 0.200 0.300 20000 0.117 0.133 0.300 0.320 0.450 0.650 40000 0.200 0.300 0.616 0.630 0.900 1.333 80000 0.483 0.683 1.233 1.270 1.816 2.917 160000 1.200 1.533 2.633 2.580 3.833 7.833 320000 2.583 3.333 5.300 5.570 7.680 17.600 Average RunTime With Respect to the 8191 size array: 0.049 0.067 0.126 0.131 0.185 0.315 Primes/sec: 38755 28343 15071 14496 10265 6029 Array Size ----------------------RunTime(sec)------------------------------ (Bytes) 7 8 9 10 11 VAX 8600 Turbo-Amiga Amiga Z-248 Z-248 (12.5 MHz) (14.32 MHz) (7.16 MHz) (8.00 MHz) (8.00 MHz) 68020 68000 80286 80286 (small) (huge) 8191 0.267 0.480 2.297 4.830 5.660 10000 0.383 0.582 2.801 5.930 6.970 20000 0.800 1.180 5.699 12.030 14.170 40000 1.767 2.359 11.539 24.380 28.670 80000 3.800 4.820 23.340 ------ ------ 160000 8.167 9.726 47.180 ------ ------ 320000 17.733 19.660 95.262 ------ ------ Average RunTime With Respect to the 8191 size Array: 0.362 0.489 2.362 4.902 5.761 Primes/sec: 5245 3883 804 387 330 (1) Amdahl 5890, Using GCC (compiled with 'gcc -S -O -DUNIX nsieve.c'). From Chuck Simmons at Amdahl, Sunnyvale CA. (2) Amdahl 5890-300E, SYS V Unix, cc -O nsieve.c From Chuck Simmons at Amdahl, Sunnyvale CA. (3) AMD 29000 at 25 MHz. Branch Target Cache (BTC) was ON. Metaware High C 29000 V2.1 with -O option. No effective memory wait states. Memory was all physical (i.e., No cacheing). From Trevor Marshall, BIX 'supermicros/bench #925', 07 Sep 1988. (4) MIPS R2000 in M/120, 16.7 MHz, 128K Cache, low-latency memory system. From John Mashey at MIPS, Sunnyvale CA. (5) AMD 29000 at 25 MHz. Branch Target Cache (BTC) was OFF. Metaware High C 29000 V2.1 with -O option. No effective memory wait states. Memory was all physical (i.e., No Cacheing). (6) SUN 3/280, 68020 at 25 MHz. Compiled with 'cc -O nsieve.c'. The ICache was ON. (7) VAX 8600, 12.5 MHz. Compiled with 'cc -O nsieve.c'. (8) Amiga with 68020 at 14.32 MHz, 32-bit memory at 14.32 MHz. Compiled with Manx Aztec C V3.4B using 'cc +2 +L +ff nsieve.c'. The ICache was ON. (9) Amiga with 68000 at 7.16 MHz, 16-bit memory at 7.16 MHz. Compiled with Manx Aztec C V3.4B using 'cc +L +ff nsieve.c'. (10) Zenith Z-248, 80286 at 8.00 MHz. Turbo C with 'small' option set. Compiled for 'speed'. Used Registers, register optimization, and jump optimization. (11) Zenith Z-248, 80286 at 8.00 MHz. Turbo C V1.0 'huge' option set. Compiled for 'speed', used registers, register optimization, and jump optimization. Al Aburto. aburto@marlin.nosc.mil.UUCP 'ala' on BIX
hankd@pur-ee.UUCP (Hank Dietz) (09/23/88)
David Emberson's comments sum it up nicely, however, as someone who has seen much of the updated Transputer stuff, I feel obliged to add a few quick comments: The newer Occam isn't compatible with the old one... and the compiler still lives in its own little world, which isn't very pleasant if you are used to something else (like a unix environment and editors like emacs). Inmos and the Transputer-using world are still encouraging Occam as THE language, with other languages compiling into it (I don't know if that's what the C compiler does... I've never managed to find a copy of it). As for code quality, well, I've seen no indication that Occam is doing anything particularly interesting or clever (I'm an optimizing/parallelizing compiler person :-). As before, an occam program is not a complete program unless accompanied by a description of the physical connection pattern; routing isn't point-to-point, but rather physical-neighbor point-to-point. There is no standard way to alter the physical connection pattern. I've talked with a few folks from Inmos about us (Purdue EE) doing a software-implemented (interrupt driven) shared-memory environment managed by compiler-driven cache techniques, and they sound interested, but they have yet to really move on it. I've gotten the same response David got: they claim that Occam and the connection scheme are basically features to build upon, not handicaps to overcome. There are LOTS of companies making little (4-16 processor) Transputer stick-it-in-there or hang-it-off-that type products, but I don't know of any general-purpose machine claiming to use Transputers without a host system which uses another processor. -hankd
pauls@nsc.nsc.com (Paul Sweazey) (09/23/88)
is called Niche Data Systems. I know a marketeer there named Doug Van Leuven, at 408-730-8963. He probably has free info and lots of stamps. pauls
bcase@cup.portal.com (09/24/88)
>These are the NSIEVE (Sieve Of Eratosthenes) results I have at this time. > (3) AMD 29000 at 25 MHz. Branch Target Cache (BTC) was ON. Metaware > High C 29000 V2.1 with -O option. No effective memory wait states. > Memory was all physical (i.e., No cacheing). > From Trevor Marshall, BIX 'supermicros/bench #925', 07 Sep 1988. Well, "no effective memory wait states" is kinda misleading. The data memory access time for this board is two clock cycles; now, maybe this latency is always overlapped in this benchmark, thus prompting the comment "no effective memory wait states," but that doesn't change the implementation details! Also, the instruction memory has zero wait states (I *HATE* this damn term, but we're stuck with it) most of the time, but it can have anywhere from 1 cycle (zero "wait states"; see why I hate this term?) to 5 cycle latency, depending on circumstances surrounding branches and static column alignment, page boundaries, etc. The point I am trying to make is that the McCray board is neither a "hot box" nor a system with caches. The 29000 would do better on this benchmark if it had the advantage of caches like those of the other systems. (Note that the current implementation of the 29000 has a bug: the BTC doesn't always work right. This is the reason for the inclusion of two 29000 times in the NISEVE stats.) What is the number for the 25 MHz R3000 box? Is it close to the Amdahl?
johnwe (John Weber, Celtic sysmom) (09/24/88)
In comp.arch, rsexton@uceng.UC.EDU (robert sexton) discourses on Transputer based systems., thusly: > being a fan of parallel system and their advantages, I was wondering why > the transputer has not gotten off the ground as a viable system. It seems > pretty feasable, as well as very cost-effective. I imagine a machine with > several transputers, each running unix. The major problem with transputers in a multi user (UN*X ish OS) environment is the complete lack of memory management, or any provision for external memory management. While this is all well and good for a single user PC, where nobody really cares if one process stomps another process, it is not really the acceptable answer for a multiuser system or a system on a network. If a process breaks in just the right way, it could take out the whole network. There is also the problem that a user can control task switching, and can effectively shut it off, with a little thought. This is also a bad thing. The processor also has a few design things which I personally feel a bit uncomfortable with, such as a lack of a barrel or funnel shifter combined with a bit of a problem in the microcode which causes the processor to hang for a LONG time if you try to left shift 7fffffffh, or distinctly too few registers. I also have a bit of a problem with the message security. On the other hand, they are wonderful chips for parallel processors, controllers, and PCs. Massively fast, hardware multitasking, and other wonderful things. Basically, in any aplication where interprocess security is not needed they, are great. > Thanks in advance for your input. > No prob... > > Robert Sexton, University of Cincinnati > rsexton@uceng.uc.edu tut.cis.ohio-state.edu!uccba!uceng!rsexton > Box Full O' Transputers... The Breakfast with MIPS > I do not speak for UC, They don't speak for me. -- "In the fields of Hell, John Weber, ...!uunet!sco!johnwe where the grass grows high, @ucscc.ucsc.EDU:johnwe@sco.COM are the graves of dreams, allowed to die." -- Author unknown Celtic sysmom with an ATTITUDE! Any opinions expressed are my own, and bear no relationship to those of my employers, to the best of my knowlege.
mac3n@babbage.acc.virginia.edu (Alex Colvin) (09/27/88)
In article <40211@linus.UUCP>, bs@linus.UUCP (Robert D. Silverman) writes: > We have just been though a major decision process where we chose a parallel > computer. We discarded the transputer for several reasons: > > (1) SLOW communication, relative to the IPSC/2 and AMETEK I'm surprised at this. I thought communication was one of the transputer strengths. My understanding was that the effective rate is 10Mb/s, with only a few usec latency for small messages. How are transputers where the topology and application are fixed, messages are short, and the critical factor is end-to-end delay? What are the limits on link distance, and is there some kind of repeater (besides another transputer)?