vorbrueg@bufo.usc.edu (Jan Vorbrueggen) (10/14/89)
I have found some bugs in the occam2 compiler (we use the slightly modified version from Parsytec). Some are obscure and hard to find, so I thought I'd share them. 1. Function returns: The compiler generates incorrect code for some valid RESULT lines. A glaring example is the RAN function. (Indeed, I can't understand why nobody seems to have complained about this; after all, some people have been using the compiler for over a year now!) In particular, RANs last line reads: RESULT SCALEB (...), NewSeed The compiler thinks it is returning two integer values; accordingly, it loads NewSeed (thinking the result of the SCALEB is still on the register stack) and then generates a REV instruction to get the order right! Well, the caller dutifully gets some trash value for NewSeed, usually a 0, so that the output isn't even pseudo-random. A workaround is to write the equivalent, if slightly less efficient REAL32 Value : . . SEQ ... Value := SCALEB (...) RESULT Value, NewSeed 2. Associated with 1. is a confusion on the compiler writer's part on the mechanism of returning functions having both real and integer results and one or both of the groups having more than three elements (so that the register stack can't hold them all). Attempting to compile such a function will be the end of your current session with the TDS... We have received a modified version of the compiler from Parsytec, which resolves both problems by *always* returning results via workspace locations. This solves the problems. However, you of course have to recompile all your libraries to adapt them to the new mechanism, and then the compiler goes into Nirvana trying to compile the SINH function... Additionally, I think the compiler writers shouldn't skimp over the unnecessary performance penalty introduced by returning all results via memory. 3. I use the MOVE2D function to extract columns from a 2-dimensional matrix. (Our application is a 2D FFT, and this speeds up perfomance for a single processor by about 5%. It is vital for the multiprocessor version.) This function takes as input two 2-dimensional byte arrays, starting points in these arrays (two arguments each), the length of each individual block to be moved, and the number of rows. The actual T800 instructions implementing this get arguments which are computed from this information; in particular, the stride (number of bytes to skip between rows) is computed from the array size, and the base address and indices of the two arrays are turned into addresses. At this point, the compiler allocates space for and uses a temporary value for the index calculation; however, this value is never calculated! (Parsytec hardware generates some very nice and regular bit patterns when you access non-existent memory :-).) As an example: [NY][NX] REAL32 matrix : [NY] REAL32 column : [NY][4*NX] BYTE array RETYPES matrix : [4*NY] BYTES line RETYPES column : SEQ i=0 FOR NY MOVE2D (array, 4*i, 0, line, 0, 0, 4, NY) do.something (line) MOVE2D (line, 0, 0, array, 4*i, 0, 4, NY) In these cases, the compiler generates code to compute the address of array[0][4*i] without computing 4*i first! A workaround is, strangely, to insert a line like VAL offset IS 4*i : before the MOVE2D calls and use offset in the call instead of 4*i. I've reported this to Parsytec; no news as of today. 4. When you try the [LOAD NETWORK] function of the compiler on an uncompiled foldset, you used to get some nice message pointing out the error of your ways. Now, the host just stops, and a reboot is in order. Hmmm...this must be one of the new features enhancing ease-of-use and user friendliness of the system. Now, having gotten that off my chest, I'd like to ask people what they think of the general quality of the compiler. For instance, it's all very nice and well to have alias and usage checking. However, we write some heavily numerical code, and I've yet to find a useful program which doesn't get the message "Implemenation limit: Array to large to check". I really like the new concept of seperate vectorspace and the ability to allocate data in either workspace or vectorspace. But the only way to influence the placement of code is to make use of the order in which things are compiled. This stops me from putting highly used subroutines (e.g., functions implementing operations on complex numbers or an FFT), which should have the benefit of the on-chip RAM, into libraries, where they belong. And don't fool yourself: In our application, this makes about a 25% difference in performance! Finally, I want to say that we really like to use transputers and occam, but it is frustrating when, after a few hours debugging, you get the nasty feeling that, just maybe, it isn't your fault, but the code generated by the compiler... Ah well, that's what they pay me for :-) Cheers, Jan Vorbrueggen