[comp.sys.transputer] Occam2 compiler bugs & misfeatures

vorbrueg@bufo.usc.edu (Jan Vorbrueggen) (10/14/89)

I have found some bugs in the occam2 compiler (we use the slightly
modified version from Parsytec). Some are obscure and hard to find,
so I thought I'd share them.

1. Function returns: The compiler generates incorrect code for
some valid RESULT lines. A glaring example is the RAN function.
(Indeed, I can't understand why nobody seems to have complained
about this; after all, some people have been using the compiler
for over a year now!) In particular, RANs last line reads:

	RESULT SCALEB (...), NewSeed

The compiler thinks it is returning two integer values; accordingly, it
loads NewSeed (thinking the result of the SCALEB is still on the
register stack) and then generates a REV instruction to get the order
right!  Well, the caller dutifully gets some trash value for NewSeed,
usually a 0, so that the output isn't even pseudo-random.

A workaround is to write the equivalent, if slightly less efficient

	REAL32 Value :
	.
	.
	SEQ
          ...
          Value := SCALEB (...)
	RESULT Value, NewSeed

2. Associated with 1. is a confusion on the compiler writer's part on
the mechanism of returning functions having both real and integer results
and one or both of the groups having more than three elements (so that the 
register stack can't hold them all). Attempting to compile such a function
will be the end of your current session with the TDS...

We have received a modified version of the compiler from Parsytec, which
resolves both problems by *always* returning results via workspace locations.
This solves the problems. However, you of course have to recompile all your
libraries to adapt them to the new mechanism, and then the compiler goes
into Nirvana trying to compile the SINH function...
Additionally, I think the compiler writers shouldn't skimp over the 
unnecessary performance penalty introduced by returning all results via memory.

3. I use the MOVE2D function to extract columns from a 2-dimensional matrix.
(Our application is a 2D FFT, and this speeds up perfomance for a single
processor by about 5%. It is vital for the multiprocessor version.)
This function takes as input two 2-dimensional byte arrays, starting points
in these arrays (two arguments each), the length of each individual block
to be moved, and the number of rows. The actual T800 instructions implementing
this get arguments which are computed from this information; in particular,
the stride (number of bytes to skip between rows) is computed from the array
size, and the base address and indices of the two arrays are turned into
addresses. At this point, the compiler allocates space for and uses a temporary
value for the index calculation; however, this value is never calculated!
(Parsytec hardware generates some very nice and regular bit patterns when you
access non-existent memory :-).) As an example:

[NY][NX] REAL32 matrix :
[NY] REAL32 column :
[NY][4*NX] BYTE array RETYPES matrix :
[4*NY] BYTES line RETYPES column :
SEQ i=0 FOR NY
  MOVE2D (array, 4*i, 0,
          line, 0, 0,
          4, NY)
  do.something (line)
  MOVE2D (line, 0, 0,
          array, 4*i, 0,
          4, NY)

In these cases, the compiler generates code to compute the address of 
array[0][4*i] without computing 4*i first! A workaround is, strangely,
to insert a line like
  
  VAL offset IS 4*i :

before the MOVE2D calls and use offset in the call instead of 4*i.
I've reported this to Parsytec; no news as of today.

4. When you try the [LOAD NETWORK] function of the compiler on an uncompiled
foldset, you used to get some nice message pointing out the error of your ways.
Now, the host just stops, and a reboot is in order. Hmmm...this must be one
of the new features enhancing ease-of-use and user friendliness of the system.

Now, having gotten that off my chest, I'd like to ask people what they think
of the general quality of the compiler. 

For instance, it's all very nice and well to have alias and usage checking.
However, we write some heavily numerical code, and I've yet to find a
useful program which doesn't get the message "Implemenation limit: Array
to large to check".

I really like the new concept of seperate vectorspace and the ability to 
allocate data in either workspace or vectorspace. But the only way to influence
the placement of code is to make use of the order in which things are compiled.
This stops me from putting highly used subroutines (e.g., functions
implementing operations on complex numbers or an FFT), which should have the
benefit of the on-chip RAM, into libraries, where they belong. And don't fool
yourself: In our application, this makes about a 25% difference in performance!

Finally, I want to say that we really like to use transputers and occam,
but it is frustrating when, after a few hours debugging, you get the nasty
feeling that, just maybe, it isn't your fault, but the code generated by
the compiler... Ah well, that's what they pay me for :-)

Cheers, Jan Vorbrueggen