psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/22/88)
In article <8523@smoke.ARPA>, gwyn@smoke.ARPA (Doug Gwyn ) writes: > In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: > >In C a pointer is a fairly anonymous object. What you are saying is > >that it is a potential error to add or subtract an integer from a > >pointer if the result is out of range. Very well, but what is that > >range? >... > >Suppose a pointer is passed through a calling sequence. In > >the function I have no way of knowing whether &x[n] will break for any > >n other than 0. For that matter I have no way of knowing whether > >x is a legal pointer! >... > If you ever get the chance to help design a brand-new programming > language, you might consider building in stronger aids for interface > specification checking. Ada tried that but didn't carry it far enough. While Pascal lends itself to this somewhat better than C, I learned some lessons from one compiler that are worth mentioning here. Grads of the University of Wisconsin at Madison (and other places with Univac 1100s) remember Charles Fisher's Pascal compiler with great fondness. Aside from some extensions (separate compilation) that aren't relevant to this discussion, he worked within the limits of Jensen and Wirth, but still produced a compiler that did an enormous amount of run-time checking. (Checking variant tags found several bugs a day in my CS 701 project.) The niftiest check, though, was sanity testing pointers with a "lock" and a "key". (In Pascal, a programmer can only point to objects that have been allocated, one at a time, from the heap. Think "p=malloc(sizeof t)" or some such.) When a "new" object is allocated, some space is reserved at the beginning for a "lock". This is an integer, usually fairly long (thirty-two bits). (As an important tweak, the high order bits of this "lock" are never all zero or all ones.) Every pointer has enough room for both an address and a "key". When a pointer variable is assigned a value, both the address and the "key" are assigned. When an object is deallocated, the "lock" is changed (typically, set to an otherwise impossible value, such as zero). When a pointer is used, the compiler first checks the "key" field in the pointer to the "lock" field in the structure. NOTE THAT THIS IS COMPLETELY TRANSPARENT TO THE PROGRAMMER! Null pointers can always be caught by an ambitious compiler. With "locks" and "keys", even uninitialized and dangling pointers can be caught. That is, if you allocate some memory, assign the address to a pointer variable, and then deallocate the memory, this run-time check will complain if you use the obsolete pointer variable to get at memory that's no longer safe to access or modify. Now, what changes would need to be made for C? First, it's possible in C to allocate not just single objects, but arrays of objects. An allocated array would need room for both a lock and a length. You don't need a lock for every element. A "pointer" would be represented as a base address, an offset, and a key. The compiler would check that the offset was no greater than the length, and that the key matched the lock stored (once) at the beginning of the array. Second, C can have pointers to global and local variables (to the initialized data section and to the stack). It doesn't make sense to have a "lock" for every auto or extern short. (It's not too, unreasonable, though, to have a lock for every array and struct.) C would need some sort of "skeleton key" for small non-malloc'ed objects. Finally, since we're generating code with "pointers" that are very different than what most C compilers use, we'd have to be very careful in linking code (say, libraries) that expect "normal" C pointers. So, this is tough to do in C. (I think some MS-DOS C interpreters do at least some of this.) On the other hand, by redefining Object* and Object->, "lock" and "key" checking would be straightforward to implement in C++. Any takers? References (CNF = Charles N. Fisher, Computer Sciences Department, University of Wisconsin at Madison; RJL = Richard J. LeBlanc, then at the School of Information and Computer Science, Georgia Institute of Technology): RJL and CNF, "A case study of run-time errors in Pascal programs", SOFTWARE-PRACTICE AND EXPERIENCE, v. 12, pp. 825-834 (1982). CNF and RJL, "Efficient implementation an optimization of run-time checking in Pascal", SIGPLAN NOTICES, v. 12, #3, p 1924 (March 1977). CNF and RJL, "The implementation of run-time diagnostics in Pascal", IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, v. 6, pp. 313-319 (1980). CNF and RJL, UW-PASCAL REFERENCE MANUAL, Madison Academic Computing Center, Madison, WI, 1977. (More recent references may be available. Anyone at uwvax or uwmacc have anything to contribute?) -- Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com) AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm I'm not speaking for the company, I'm just speaking my mind. -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request