ENGLE@A.ISI.EDU (10/14/89)
Does anyone have any information on a version of GNU C for the transputer. Does GNU C work on stack based machines? Steven Engle MIMD Systems, Inc.
schoenfr@tubsibr.uucp (Erik Schoenfelder) (10/18/89)
In article <[A.ISI.EDU]13-Oct-89.18:17:43.ENGLE> Steven Engle writes:
Steven> Does anyone have any information on a version of GNU C for
Steven> the transputer. Does GNU C work on stack based machines?
Interesting question. I found the following in the GCC documentation :
GCC> The main goal of GNU CC was to make a good, fast compiler for
GCC> machines in the class that the GNU system aims to run on: 32-bit
GCC> machines that address 8-bit bytes and have several general registers.
GCC> Elegance, theoretical power and simplicity are only secondary.
[gcc.info: ``GNU CC and Portability'']
The `several general registers' seem to be a problem. The
transputer does not have them.
An idea is to use the fast built-in memory of the transputer like
registers (with a accesstime between 1 cycle and 5 (?) cycles).
(Nice idea: It will be possible to get the address of a register variable :-)
But this results in a compiler for a (virtual) processor based on
the transputer-processor.
But i don't know if anyone is working on a port, or if a port is
possible ...
-- Erik
andy@topologix.UUCP (Andy Pfiffer) (10/20/89)
+---- Erik Schoenfelder in "Subject: Re: GNU C for the transputer" - Tue Oct 17 @10:42pm | | An idea is to use the fast built-in memory of the transputer like | registers (with a accesstime between 1 cycle and 5 (?) cycles). | (Nice idea: It will be possible to get the address of a register variable :-) | But this results in a compiler for a (virtual) processor based on | the transputer-processor. +---- The problem with using the on-chip RAM as register variables is that if you want more than one GCC-produced binary running on the Transputer at a time, you have introduced a management problem; you must now context-switch on-chip RAM (or portions of it) between processes. It might be just the thing, however, for monolithic (as in one executable per node) environments. Perhaps you could limit each process to, say, 8 words of on-chip RAM and at process start-up, assign the "base" address of your pseudo-register set to an unused 8-word slot. That would limit you, however, to a few less than 128 processes (assuming a T800; you can't use all 4KB of on-chip RAM). Quite reasonable, actually. You don't get anything for free, however. You will still have to pay some cycles in referencing your on-chip pseudo-register set. Assuming we keep the base address of our pseudo-register set in a workspace-based variable, it might look something like this: ldl regbase ldnl 4 add . . in which case if your workspace is off-chip you've made matters much worse by *still* referencing your workspace. You can also play some games by using "mint" as a base address. A better solution might be to design a new Transputer that maintained a small, say 16-word, Transputer-maintained data-cache of the top 16-words of the workspace. The best solution might be to throw an R3000*, an R3010*, two 8-channel NCR-SCSI Sripts processors, a Virtual-Cut-Through router, and a few 100Mb/sec FDDI controllers into a centrifuge and pour the resulting goo into a 12x12 PGA. Andy *Acceptable substitutions may also include 88100 and 88200, or the daily special from Your Friendly Neighborhood RISC Store. -- Andy Pfiffer Topologix, Inc. (303) 421-7700 Trillium Diving Team 4860 Ward Road / Wheat Ridge, CO 80033 "...that's the way a Transputer works, right?"
hjm@cernvax.UUCP (Hubert Matthews) (10/20/89)
One of the problems with porting a non-transputer C compiler to the transputer is the transputer's evaluation stack. One can simulate registers using on-chip RAM, in which case the evaluation stack is barely used and all of the loads and stores to and from RAM slow down expression evaluation a lot. If, on the other hand, one uses the stack for expression evaluation, then one has to be very careful about stack overflow (the stack is only three elements deep). A peephole optimiser would clean up a lot of the loads and stores associated with the first approach, but the second approach would need a quite different code generation algorithm. Would someone who has written for the transputer or ported any form of compiler to the transputer please comment on the feasability or difficulty of these two approaches, or tell us what transputer compilers really do. -- Hubert Matthews ...helping make the world a quote-free zone... hjm@cernvax.cern.ch hjm@vxomeg.decnet.cern.ch ...!mcvax!cernvax!hjm
roger@wraxall.inmos.co.uk (Roger Shepherd) (10/21/89)
In article <1126@inmos.co.uk (Hubert Matthews) writes: > >One of the problems with porting a non-transputer C compiler to the >transputer is the transputer's evaluation stack. One can simulate >registers using on-chip RAM... > ...If, on the other hand, one uses the >stack for expression evaluation, then one has to be very careful about >stack overflow (the stack is only three elements deep). A peephole >optimiser would clean up a lot of the loads and stores associated with >the first approach, but the second approach would need a quite >different code generation algorithm. > The compilers we have written at Inmos use the approach outlined in ``The transputer Instruction Set: a compiler writers' guide''. (Surprise, surprise, we designed the machine, we wrote the book, we implemented the compiler). This approach is the second one outlined above. Variables live in the (local) workspace and the evaluation stack is used to evaluate expressions. It is very easy for a compiler to introduce temporary variables if an expression is sufficiently complicated that it cannot be evaluated in three registers. The compiler can make use of the commutativity of certain operators to minimise the introduction of temporaries. The Compiler Writers Guide also sets out the best way to load the three registers for procedure calling, or in order to execute an instruction with three parameters (such as `long shift' or `input'). All these methods involve using recursive register counting algorithms which determine the number of registers need to evaluate an expression. Certain features of the transputer instruction set simplify these calculations, for example, in occam or C any variable, be it local, non-local, or referenced via pointer, requires a single register to load it. For example, ldl x -- load local x; loads a local variable ldl static-chain -- load local static-chain ldnl x -- load non-local x; loads variable off static chain ldl pointer ldnl 0 -- load variable via pointer I know that other people have successfully used these methods. I also know of at least one other compiler which uses another evaluation stack based method quite sucessfully. As to the simulation of registers by the workspace, this might work, but it would still leave a number of problems about how best to compile code which loaded registers; I suspect to this properly requires some sort of counting algorithm, and once this is in place it should be milked for all possible benefit! All-in-all, I'd advise following the text-book method Roger Shepherd, INMOS Ltd JANET: roger@uk.co.inmos 1000 Aztec West UUCP: ukc!inmos!roger or uunet!inmos-c!roger Almondsbury INTERNET: roger@inmos.com +44 454 616616 ROW: roger@inmos.com OR roger@inmos.co.uk
andy@topologix.UUCP (Andy Pfiffer) (10/22/89)
The Unidot (formerly Pentasoft, formerly Penguin) compiler generates code to an intermediate form, then interprets it as a Transputer would, noting when an expression stack overflow would occur, queueing up assembler for the the peephole optimizer as it goes. When it detects an expression stack overflow, it generates code to save and restore intermediate expression values. It does the same for the floating point expression stack. Andy -- Andy Pfiffer Topologix, Inc. (303) 421-7700 Trillium Diving Team 4860 Ward Road / Wheat Ridge, CO 80033 "...that's the way a Transputer works, right?"
schoenfr@tubsibr.uucp (Erik Schoenfelder) (10/23/89)
In article <8910192040.AA20132@topologix.com> andy@topologix.UUCP (Andy Pfiffer) writes: Andy> The problem with using the on-chip RAM as register variables Andy> is that if you want more than one GCC-produced binary running Andy> on the Transputer at a time, you have introduced a management Andy> problem; you must now context-switch on-chip RAM (or portions Andy> of it) between processes. Yes, you are right. Andy> A better solution might be to design a new Transputer ... Andy> The best solution might be to throw an R3000*, an R3010*, two Andy> 8-channel NCR-SCSI Sripts processors, a Virtual-Cut-Through Andy> router, and a few 100Mb/sec FDDI controllers into a centrifuge Andy> and pour the resulting goo into a 12x12 PGA. And - don't forget to disconnect the transputer boards: We won't slow the system down. Erik -- Trillium ?