firth@sei.cmu.edu (Robert Firth) (09/22/87)
The current debate about BCPL, pointers, and the Amiga might find some history useful. The language itself is described in [Martin Richards: BCPL, The Language and its Compiler]. BCPL is a typeless language. That is, while the programmer is free to consider a variable 'v' as holding an integer, a character, an enumeral, a pointer, or even a function or label, the language makes no such distinction. The variable holds a bit pattern, and one can interpret it however one pleases, though of course programs that impose consistent interpretations are more likely to do sensible things. Likewise, the only storage unit is the word, which is a unit big enough to hold a bit pattern of useful size. The normal size chosen is the size that will comfortably hold an integer or a pointer, eg 16 bits on PDP-11, 32 bits on MC68000. The abstract machine assumed by BCPL has a von Neumann store, comprising a linear array of words numbered sequentially from 0 to some maximum, such as 2^15-1 or 2^30-1. Hence, given a pointer to one such word, 'p', say, you can generate a pointer to the next word by writing 'p+1'. An array (vector, in BCPL parlance) is represented by a variable holding a pointer to its first word, so if v is a vector then the value 'v' points to the first word, 'v+1' to the next, and 'v+i' to the i'th. The dereferencing operator in BCPL is written "!", so !(v+i) gets you the i'th component of the vector 'v'. This can be written also v!i using the diadic form of "!". Recall, however, that BCPL is typeless. The compiler doesn't know that 'v' is a vector and 'i' and index; they are both just bit patterns. Hence, the expression '(v+i)' looks just like normal integer addition. This has a strange consequence: if '(v+1)' is the same code, regardless of whether 'v' is an integer or a vector, then the BCPL pointers to adjacent words must indeed differ by 1. On a machine with an underlying byte addressing scheme, therefore, the BCPL pointers cannot be true addresses, but must be scaled, eg by 2 on the PDP-11 or by 4 on the MC68000. They must therefore be rescaled (by a left shift, usually) before being dereferenced. If the language were typed, then the compiler, on seeing 'v+1', could say "Hm, 'v' is of type pointer, so that "+" is not integer addition, it is pointer offsetting, and what I should really do is add '4', since one BCPL word occupiers four address units." This, of course, is just what a C compiler would do. But the BCPL compiler can't, since there is no type information. One alternative, however, is to attach the type information to the operators. That, after all, is why we have "+" for integer addition, and "#+" for floating addition: the compiler can't deduce the operation from the types of the operands, so we have to tell it. Well, suppose we redefine "!" to mean "indexing", ie v!i now means "access the i'th component of v, scaling i as necessary". We can then use the TRUE address as the pointer, and the compiler will indeed know that i must be multiplied by 4. And a good codegenerator can then use the indexing mode of the MC68020 to generate much slicker code. This has several consequences. First, the "!" operator is not symmetric; 'v!i' and 'i!v' no longer mean the same thing. Secondly, we cannot compute the address of 'v!i' just by saying 'v+i', since that is not the address. We could say 'v+4*i', or 'v+UnitsPerWord*i', but a better form seems to be '@(v!i)'. Nastiest, of course, are those FOR loops that say things like FOR p=v TO v+vmax DO .... !p ... since the step is no longer 1, but UnitsPerWord. How hard would this be? I edited these changes into the compiler and a codegenerator in less than a day, making the required consequential changes to the code that would allow the new compiler to compile itself. One necessary concession, was to interpret 'constant!variable' as if it were the other way round, since the code was littered with the idiom "constant_offset ! pointer_to_structure". Otherwise, a global scan of all occurrences of "+", "-", and "FOR" found everything. On the VAX-11, the code size of the compiler decreased by about 6%, because of better use of the indexed address modes and the omission of right shifts for address generation. The interface with assembler code is simplified, too, since addresses were now true machine addresses. Hope this puts the issue in perspective.