[comp.sys.transputer] GNU C for the transputer

ENGLE@A.ISI.EDU (10/14/89)

Does anyone have any information on a version of GNU C for
the transputer.  Does GNU C work on stack based machines?

Steven Engle
MIMD Systems, Inc.

schoenfr@tubsibr.uucp (Erik Schoenfelder) (10/18/89)

In article <[A.ISI.EDU]13-Oct-89.18:17:43.ENGLE> Steven Engle writes:

Steven>	Does anyone have any information on a version of GNU C for
Steven>	the transputer.  Does GNU C work on stack based machines?

Interesting question. I found the following in the GCC documentation :

GCC>  	The main goal of GNU CC was to make a good, fast compiler for
GCC>  	machines in the class that the GNU system aims to run on: 32-bit
GCC>  	machines that address 8-bit bytes and have several general registers.
GCC>  	Elegance, theoretical power and simplicity are only secondary.

				[gcc.info: ``GNU CC and Portability'']

The `several general registers' seem to be a problem. The
transputer does not have them.

An idea is to use the fast built-in memory of the transputer like
registers (with a accesstime between 1 cycle and 5 (?) cycles).
(Nice idea: It will be possible to get the address of a register variable :-)
But this results in a compiler for a (virtual) processor based on
the transputer-processor.

But i don't know if anyone is working on a port, or if a port is
possible ...

-- Erik

andy@topologix.UUCP (Andy Pfiffer) (10/20/89)

 +---- Erik Schoenfelder in "Subject: Re: GNU C for the transputer" - Tue Oct 17 @10:42pm
 | 
 | An idea is to use the fast built-in memory of the transputer like
 | registers (with a accesstime between 1 cycle and 5 (?) cycles).
 | (Nice idea: It will be possible to get the address of a register variable :-)
 | But this results in a compiler for a (virtual) processor based on
 | the transputer-processor.
 +----

The problem with using the on-chip RAM as register variables is that if you
want more than one GCC-produced binary running on the Transputer at a time,
you have introduced a management problem; you must now context-switch
on-chip RAM (or portions of it) between processes.

It might be just the thing, however, for monolithic (as in one executable
per node) environments.

Perhaps you could limit each process to, say, 8 words of on-chip RAM and
at process start-up, assign the "base" address of your pseudo-register set
to an unused 8-word slot.  That would limit you, however, to a few less
than 128 processes (assuming a T800; you can't use all 4KB of on-chip RAM).
Quite reasonable, actually.

You don't get anything for free, however.  You will still have to pay some
cycles in referencing your on-chip pseudo-register set.  Assuming we keep
the base address of our pseudo-register set in a workspace-based variable,
it might look something like this:

	ldl  regbase
	ldnl 4
	add
	.
	.

in which case if your workspace is off-chip you've made matters much worse
by *still* referencing your workspace.

You can also play some games by using "mint" as a base address.

A better solution might be to design a new Transputer that maintained a small,
say 16-word, Transputer-maintained data-cache of the top 16-words of the
workspace.

The best solution might be to throw an R3000*, an R3010*, two 8-channel
NCR-SCSI Sripts processors, a Virtual-Cut-Through router, and a few
100Mb/sec FDDI controllers into a centrifuge and pour the resulting goo
into a 12x12 PGA.


	Andy

*Acceptable substitutions may also include 88100 and 88200, or the daily
 special from Your Friendly Neighborhood RISC Store.


--
Andy Pfiffer			  Topologix, Inc. (303) 421-7700
Trillium Diving Team              4860 Ward Road / Wheat Ridge, CO 80033
"...that's the way a Transputer works, right?"

hjm@cernvax.UUCP (Hubert Matthews) (10/20/89)

One of the problems with porting a non-transputer C compiler to the
transputer is the transputer's evaluation stack.  One can simulate
registers using on-chip RAM, in which case the evaluation stack is
barely used and all of the loads and stores to and from RAM slow down
expression evaluation a lot.  If, on the other hand, one uses the
stack for expression evaluation, then one has to be very careful about
stack overflow (the stack is only three elements deep).  A peephole
optimiser would clean up a lot of the loads and stores associated with
the first approach, but the second approach would need a quite
different code generation algorithm.

Would someone who has written for the transputer or ported any form of
compiler to the transputer please comment on the feasability or
difficulty of these two approaches, or tell us what transputer
compilers really do.

-- 
Hubert Matthews      ...helping make the world a quote-free zone...
hjm@cernvax.cern.ch   hjm@vxomeg.decnet.cern.ch    ...!mcvax!cernvax!hjm

roger@wraxall.inmos.co.uk (Roger Shepherd) (10/21/89)

In article <1126@inmos.co.uk (Hubert Matthews) writes:
>
>One of the problems with porting a non-transputer C compiler to the
>transputer is the transputer's evaluation stack.  One can simulate
>registers using on-chip RAM...
>                                  ...If, on the other hand, one uses the
>stack for expression evaluation, then one has to be very careful about
>stack overflow (the stack is only three elements deep).  A peephole
>optimiser would clean up a lot of the loads and stores associated with
>the first approach, but the second approach would need a quite
>different code generation algorithm.
>

The compilers we have written at Inmos use the approach outlined in 
``The transputer Instruction Set: a compiler writers' guide''. (Surprise,
surprise, we designed the machine, we wrote the book, we implemented 
the compiler). This approach is the second one outlined above. 

Variables live in the (local) workspace and the evaluation stack is
used  to evaluate expressions. It is very easy for a compiler to
introduce temporary variables if an expression is sufficiently
complicated that it cannot be evaluated in three registers. The
compiler can make use of the commutativity of certain operators to 
minimise the introduction of temporaries.

The Compiler Writers Guide also sets out the best way to load the 
three registers for procedure calling, or in order to execute an
instruction with three parameters (such as `long shift' or `input').

All these methods involve using recursive register counting algorithms
which determine the number of registers need to evaluate an expression.
Certain features of the transputer instruction set simplify these
calculations, for example, in occam or C any variable, be it local,
non-local, or referenced via pointer, requires a single register to load
it. For example,

   ldl x            -- load local x; loads a local variable

   ldl static-chain -- load local static-chain
   ldnl x           -- load non-local x; loads variable off static chain

   ldl pointer
   ldnl 0           -- load variable via pointer

I know that other people have successfully used these methods. I also
know of at least one other compiler which uses another evaluation stack
based method quite sucessfully.

As to the simulation of registers by the workspace, this might work, but
it would still leave a number of problems about how best to compile code
which loaded registers; I suspect to this properly requires some sort of
counting algorithm, and once this is in place it should be milked for
all possible benefit! All-in-all, I'd advise following the text-book method

Roger Shepherd, INMOS Ltd   JANET:    roger@uk.co.inmos 
1000 Aztec West             UUCP:     ukc!inmos!roger or uunet!inmos-c!roger
Almondsbury                 INTERNET: roger@inmos.com
+44 454 616616              ROW:      roger@inmos.com OR roger@inmos.co.uk

andy@topologix.UUCP (Andy Pfiffer) (10/22/89)

The Unidot (formerly Pentasoft, formerly Penguin) compiler generates
code to an intermediate form, then interprets it as a Transputer would,
noting when an expression stack overflow would occur, queueing up assembler
for the the peephole optimizer as it goes.

When it detects an expression stack overflow, it generates code to save and
restore intermediate expression values.  It does the same for the floating
point expression stack.

	Andy

--
Andy Pfiffer			  Topologix, Inc. (303) 421-7700
Trillium Diving Team              4860 Ward Road / Wheat Ridge, CO 80033
"...that's the way a Transputer works, right?"

schoenfr@tubsibr.uucp (Erik Schoenfelder) (10/23/89)

In article <8910192040.AA20132@topologix.com> andy@topologix.UUCP 
(Andy Pfiffer) writes:

Andy>   The problem with using the on-chip RAM as register variables
Andy>   is that if you want more than one GCC-produced binary running
Andy>   on the Transputer at a time, you have introduced a management
Andy>   problem; you must now context-switch on-chip RAM (or portions
Andy>   of it) between processes.

Yes, you are right.

Andy>	A better solution might be to design a new Transputer ...

Andy>	The best solution might be to throw an R3000*, an R3010*, two
Andy>	8-channel NCR-SCSI Sripts processors, a Virtual-Cut-Through
Andy>	router, and a few 100Mb/sec FDDI controllers into a centrifuge
Andy>	and pour the resulting goo into a 12x12 PGA.

And - don't forget to disconnect the transputer boards: We won't slow
the system down.

Erik
--
Trillium ?