[comp.compilers] Out of range pointers

psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/22/88)
In article <8523@smoke.ARPA>, gwyn@smoke.ARPA (Doug Gwyn ) writes:
> In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
> >In C a pointer is a fairly anonymous object.  What you are saying is
> >that it is a potential error to add or subtract an integer from a
> >pointer if the result is out of range.  Very well, but what is that
> >range?
>...
> >Suppose a pointer is passed through a calling sequence.  In
> >the function I have no way of knowing whether &x[n] will break for any
> >n other than 0.  For that matter I have no way of knowing whether 
> >x is a legal pointer!
>...
> If you ever get the chance to help design a brand-new programming
> language, you might consider building in stronger aids for interface
> specification checking.  Ada tried that but didn't carry it far enough.

While Pascal lends itself to this somewhat better than C, I learned
some lessons from one compiler that are worth mentioning here.

Grads of the University of Wisconsin at Madison (and other places with
Univac 1100s) remember Charles Fisher's Pascal compiler with great
fondness.  Aside from some extensions (separate compilation) that
aren't relevant to this discussion, he worked within the limits of
Jensen and Wirth, but still produced a compiler that did an enormous
amount of run-time checking.  (Checking variant tags found several bugs
a day in my CS 701 project.)  The niftiest check, though, was sanity
testing pointers with a "lock" and a "key".

(In Pascal, a programmer can only point to objects that have been
allocated, one at a time, from the heap.  Think "p=malloc(sizeof t)"
or some such.)

When a "new" object is allocated, some space is reserved at the
beginning for a "lock".  This is an integer, usually fairly long
(thirty-two bits).  (As an important tweak, the high order bits of this
"lock" are never all zero or all ones.)  Every pointer has enough room
for both an address and a "key".  When a pointer variable is assigned a
value, both the address and the "key" are assigned.  When an object is
deallocated, the "lock" is changed (typically, set to an otherwise
impossible value, such as zero).

When a pointer is used, the compiler first checks the "key" field in
the pointer to the "lock" field in the structure.  NOTE THAT THIS IS
COMPLETELY TRANSPARENT TO THE PROGRAMMER!  Null pointers can always be
caught by an ambitious compiler.  With "locks" and "keys", even
uninitialized and dangling pointers can be caught.  That is, if you
allocate some memory, assign the address to a pointer variable, and
then deallocate the memory, this run-time check will complain if you
use the obsolete pointer variable to get at memory that's no longer
safe to access or modify.

Now, what changes would need to be made for C?  First, it's possible in
C to allocate not just single objects, but arrays of objects.  An
allocated array would need room for both a lock and a length.  You
don't need a lock for every element.  A "pointer" would be represented
as a base address, an offset, and a key.  The compiler would check that
the offset was no greater than the length, and that the key matched the
lock stored (once) at the beginning of the array.

Second, C can have pointers to global and local variables (to the
initialized data section and to the stack).  It doesn't make sense to
have a "lock" for every auto or extern short.  (It's not too,
unreasonable, though, to have a lock for every array and struct.)  C
would need some sort of "skeleton key" for small non-malloc'ed objects.

Finally, since we're generating code with "pointers" that are very
different than what most C compilers use, we'd have to be very careful
in linking code (say, libraries) that expect "normal" C pointers.

So, this is tough to do in C.  (I think some MS-DOS C interpreters do
at least some of this.)  On the other hand, by redefining Object* and
Object->, "lock" and "key" checking would be straightforward to
implement in C++.  Any takers?

References (CNF = Charles N. Fisher, Computer Sciences Department,
University of Wisconsin at Madison; RJL = Richard J. LeBlanc, then at
the School of Information and Computer Science, Georgia Institute of
Technology):

RJL and CNF, "A case study of run-time errors in Pascal programs",
SOFTWARE-PRACTICE AND EXPERIENCE, v. 12, pp. 825-834 (1982).

CNF and RJL, "Efficient implementation an optimization of run-time
checking in Pascal", SIGPLAN NOTICES, v. 12, #3, p 1924 (March 1977).

CNF and RJL, "The implementation of run-time diagnostics in Pascal",
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, v. 6, pp. 313-319 (1980).

CNF and RJL, UW-PASCAL REFERENCE MANUAL, Madison Academic Computing
Center, Madison, WI, 1977.

(More recent references may be available.  Anyone at uwvax or uwmacc
have anything to contribute?)
--
Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com)
AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request