g-rh@cca.CCA.COM (Richard Harter) (09/18/88)
In article <8515@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: Re comments about x[-1] should be legal and should be in the standard. >I doubt that any effect on the computer industry would have occurred >other than reduced adherence to the postulated C standard. People >writing portable applications would still not be able to compute >&array[-1], since several compilers would ignore that requirement >(benchmark speed is a far greater driving factor than the desires of >a few sloppy programmers to compute non-existent addresses). What >good would that situation accomplish? Better that the standard be >widely followed and that programmers become better educated about >actual portability considerations, than to encourage false hopes for >availability of features that are difficult or detrimental to provide. You may be right about reduced adherence, at least in this regard. However the problem is not simply a matter of "sloppy" programming. In C a pointer is a fairly anonymous object. What you are saying is that it is a potential error to add or subtract an integer from a pointer if the result is out of range. Very well, but what is that range? Suppose a pointer is passed through a calling sequence. In the function I have no way of knowing whether &x[n] will break for any n other than 0. For that matter I have no way of knowing whether x is a legal pointer! In principle this is not right -- there is no way to write defensive code to check on pointer validity. To be sure a "correct" program never has an invalid pointer and all that but what about the rest of us poor mortals? -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
gwyn@smoke.ARPA (Doug Gwyn ) (09/18/88)
In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >In C a pointer is a fairly anonymous object. What you are saying is >that it is a potential error to add or subtract an integer from a >pointer if the result is out of range. Very well, but what is that >range? Wherever the actual data is, plus a nonexistent extra member just past the end of an array. But you knew that. >Suppose a pointer is passed through a calling sequence. In >the function I have no way of knowing whether &x[n] will break for any >n other than 0. For that matter I have no way of knowing whether >x is a legal pointer! That's not quite the same issue. One presumes that your functions have interface specifications, which if adhered to guarantee proper operation. The code defining the function must of course implement those specifications. The C language does not provide the means to completely specify all relevant aspects of an interface in the C code itself, nor does it require run-time enforcement of the interface specifications. It does permit some degree of compile-time checking of the interface. >In principle this is not right -- there is no way to write defensive >code to check on pointer validity. To be sure a "correct" program >never has an invalid pointer and all that but what about the rest of >us poor mortals? I suppose you should all learn ways to produce correct code. Many such methods were figured out in the 1970s and should be part of every professional programmer's repertoire by now. If you ever get the chance to help design a brand-new programming language, you might consider building in stronger aids for interface specification checking. Ada tried that but didn't carry it far enough.
henry@utzoo.uucp (Henry Spencer) (09/20/88)
In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >In C a pointer is a fairly anonymous object. What you are saying is >that it is a potential error to add or subtract an integer from a >pointer if the result is out of range. Very well, but what is that >range? The members of the array that the pointer points into, plus the special case of just above the end of the array. >Suppose a pointer is passed through a calling sequence. In >the function I have no way of knowing whether &x[n] will break for any >n other than 0. For that matter I have no way of knowing whether >x is a legal pointer! That's correct. It is the caller's responsibility to supply a pointer that is adequate for the function's purposes, and the function writer's responsibility to document those purposes well enough that the caller knows what his responsibilities are. There is no way to check this at runtime in conventional C implementations. That's C for you. -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
g-rh@XAIT.XEROX.COM (Richard Harter) (09/21/88)
In article <1988Sep19.213023.13181@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>In C a pointer is a fairly anonymous object. What you are saying is >>that it is a potential error to add or subtract an integer from a >>pointer if the result is out of range. Very well, but what is that >>range? >The members of the array that the pointer points into, plus the special >case of just above the end of the array. The question was rhetorical, in that I pointing out that there is no way to determine from the pointer itself what its range was. I expect it is a good thing to repeat the legal answer, since it is does seem to be a complete mystery to some people :-). >>Suppose a pointer is passed through a calling sequence. In >>the function I have no way of knowing whether &x[n] will break for any >>n other than 0. For that matter I have no way of knowing whether >>x is a legal pointer! >That's correct. It is the caller's responsibility to supply a pointer >that is adequate for the function's purposes, and the function writer's >responsibility to document those purposes well enough that the caller >knows what his responsibilities are. There is no way to check this at >runtime in conventional C implementations. That's C for you. Such as it is :-). In theory this means that there is an entire class of error checking that one can't do. For the most part this doesn't matter. However it would be very nice if there were a library routine that would tell you whether a pointer was legal or not. I am not much a fan of the "if all the spec's and the interface definitions and the code are all correct then you don't need error checking" school of programming. I rather like the "be a skeptic and check and report the errors when you find them" school of thought. As a side note, one argument for making x[-1] legal is that it permits you to use sentinels in both directions. I don't see that this is a problem, regardless of architecture. All that is required is that nothing be allocated on a segment boundary. However, as the man says, they way it is is the way it is. There never was a machine, a language, or an operating system without arcane restrictions. [Except lisp :-)] -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
gwyn@smoke.ARPA (Doug Gwyn ) (09/21/88)
In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >As a side note, one argument for making x[-1] legal is that it permits >you to use sentinels in both directions. I don't see that this is a >problem, regardless of architecture. All that is required is that nothing >be allocated on a segment boundary. There is a considerable practical difference between this case and the one for a pointer just past the last member of an array. The [-1] case can require reservation of an arbitrary amount of unused address space below an actual object, whereas the other case requires no more than a byte (or a word, depending on the architecture) of space reserved above.
g-rh@XAIT.XEROX.COM (Richard Harter) (09/21/88)
In article <8544@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>As a side note, one argument for making x[-1] legal is that it permits >>you to use sentinels in both directions. I don't see that this is a >>problem, regardless of architecture. All that is required is that nothing >>be allocated on a segment boundary. >There is a considerable practical difference between this case and the >one for a pointer just past the last member of an array. The [-1] case >can require reservation of an arbitrary amount of unused address space >below an actual object, whereas the other case requires no more than a >byte (or a word, depending on the architecture) of space reserved above. Good point. [For those who didn't catch it, the size of the object that a pointer is pointing to can be a structure of (arbitrary) size. To allow &x[-1] to be legal you have to leave enough space for an instance of the structure (or array of structures or whatever) before the actual instance.] Doug's claim that the other case (one past) need only require a word (or byte) of memory at the end suggests a question that I don't know the answer to. Suppose that x is an array of structures of length n. As we all know, &x[n] is legal. But what about &x[n].item? If Doug's assertion is correct, and I expect it is since he is quite knowledgable in these matters, then it would seem to follow that &x[n].item is not guaranteed to be legal. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
henry@utzoo.uucp (Henry Spencer) (09/21/88)
In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >As a side note, one argument for making x[-1] legal is that it permits >you to use sentinels in both directions. I don't see that this is a >problem, regardless of architecture. All that is required is that nothing >be allocated on a segment boundary... The situation unfortunately isn't as symmetrical as it looks, because a pointer to an array element points to the *beginning* of the array element. A pointer one past the end of an array points to the byte (well, the addressing unit, whatever it is) following the array; a pointer one past the beginning points to the byte (etc.) that is one array-member-size before the beginning. Computing x[size] without risk of overflow only requires that there be at least one byte between the array and the end of a segment; computing x[-1] without risk of underflow requires an entire array element between the array and the start of the segment, which can get expensive if the elements are big (consider multidimensional arrays). The difference in costs was felt to be sufficient to justify a difference in treatment. Both practices have been technically illegal all along, so legitimizing both wasn't vitally necessary. Since x[size] gets used a lot and is cheap to do, it was legalized. Since x[-1] was rather more costly and is used less, it wasn't. -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
maart@cs.vu.nl (Maarten Litmaath) (09/22/88)
References: <867@osupyr.mast.ohio-state.edu> <3200@geac.UUCP> <1430@ficc.uu.net> <1988Sep15.145026.20325@ateng.uucp> <16041@ism780c.isc.com> <8515@smoke.ARPA> <33432@cca.CCA.COM> <1988Sep19.213023.13181@utzoo.uucp> <33547@XAIT.XEROX.COM> <8544@smok
psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/22/88)
In article <8523@smoke.ARPA>, gwyn@smoke.ARPA (Doug Gwyn ) writes: > In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: > >In C a pointer is a fairly anonymous object. What you are saying is > >that it is a potential error to add or subtract an integer from a > >pointer if the result is out of range. Very well, but what is that > >range? >... > >Suppose a pointer is passed through a calling sequence. In > >the function I have no way of knowing whether &x[n] will break for any > >n other than 0. For that matter I have no way of knowing whether > >x is a legal pointer! >... > If you ever get the chance to help design a brand-new programming > language, you might consider building in stronger aids for interface > specification checking. Ada tried that but didn't carry it far enough. While Pascal lends itself to this somewhat better than C, I learned some lessons from one compiler that are worth mentioning here. Grads of the University of Wisconsin at Madison (and other places with Univac 1100s) remember Charles Fisher's Pascal compiler with great fondness. Aside from some extensions (separate compilation) that aren't relevant to this discussion, he worked within the limits of Jensen and Wirth, but still produced a compiler that did an enormous amount of run-time checking. (Checking variant tags found several bugs a day in my CS 701 project.) The niftiest check, though, was sanity testing pointers with a "lock" and a "key". (In Pascal, a programmer can only point to objects that have been allocated, one at a time, from the heap. Think "p=malloc(sizeof t)" or some such.) When a "new" object is allocated, some space is reserved at the beginning for a "lock". This is an integer, usually fairly long (thirty-two bits). (As an important tweak, the high order bits of this "lock" are never all zero or all ones.) Every pointer has enough room for both an address and a "key". When a pointer variable is assigned a value, both the address and the "key" are assigned. When an object is deallocated, the "lock" is changed (typically, set to an otherwise impossible value, such as zero). When a pointer is used, the compiler first checks the "key" field in the pointer to the "lock" field in the structure. NOTE THAT THIS IS COMPLETELY TRANSPARENT TO THE PROGRAMMER! Null pointers can always be caught by an ambitious compiler. With "locks" and "keys", even uninitialized and dangling pointers can be caught. That is, if you allocate some memory, assign the address to a pointer variable, and then deallocate the memory, this run-time check will complain if you use the obsolete pointer variable to get at memory that's no longer safe to access or modify. Now, what changes would need to be made for C? First, it's possible in C to allocate not just single objects, but arrays of objects. An allocated array would need room for both a lock and a length. You don't need a lock for every element. A "pointer" would be represented as a base address, an offset, and a key. The compiler would check that the offset was no greater than the length, and that the key matched the lock stored (once) at the beginning of the array. Second, C can have pointers to global and local variables (to the initialized data section and to the stack). It doesn't make sense to have a "lock" for every auto or extern short. (It's not too, unreasonable, though, to have a lock for every array and struct.) C would need some sort of "skeleton key" for small non-malloc'ed objects. Finally, since we're generating code with "pointers" that are very different than what most C compilers use, we'd have to be very careful in linking code (say, libraries) that expect "normal" C pointers. So, this is tough to do in C. (I think some MS-DOS C interpreters do at least some of this.) On the other hand, by redefining Object* and Object->, "lock" and "key" checking would be straightforward to implement in C++. Any takers? References (CNF = Charles N. Fisher, Computer Sciences Department, University of Wisconsin at Madison; RJL = Richard J. LeBlanc, then at the School of Information and Computer Science, Georgia Institute of Technology): RJL and CNF, "A case study of run-time errors in Pascal programs", SOFTWARE-PRACTICE AND EXPERIENCE, v. 12, pp. 825-834 (1982). CNF and RJL, "Efficient implementation an optimization of run-time checking in Pascal", SIGPLAN NOTICES, v. 12, #3, p 1924 (March 1977). CNF and RJL, "The implementation of run-time diagnostics in Pascal", IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, v. 6, pp. 313-319 (1980). CNF and RJL, UW-PASCAL REFERENCE MANUAL, Madison Academic Computing Center, Madison, WI, 1977. (More recent references may be available. Anyone at uwvax or uwmacc have anything to contribute?) -- Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com) AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm I'm not speaking for the company, I'm just speaking my mind. -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { decvax | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
barmar@think.COM (Barry Margolin) (09/22/88)
In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >However it would be very nice if there were a library routine that would >tell you whether a pointer was legal or not. One problem with this is that on the segmented machines it is the act of computing such a pointer that is invalid, not the pointer itself. For example, if P is a pointer that happens to point to offset 0 in a segment, *computing* P-1 will cause a fault. So, what you need is a routine that tells you whether a particular offset from a pointer is legal; something like: if (valid_pointer_offset (P, sizeof(*P), -1)) P--; Also, in order for this to work on machines with linear address spaces, all pointers would have to carry around the location and size of the objects to which they point. This is done in Symbolics C, since pointers are actually implemented as two Lisp objects, an array and an offset into the array, and array objects contain their size, which is checked by the microcode/hardware array-indexing operations. Each call to malloc returns a new array, and each stack frame can be treated as an array (this means that it won't detect overflowing from one local array into another in the same frame, but nothing's perfect). Expecting C implementations on conventional architectures to do this is too much. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
news@ism780c.isc.com (News system) (09/22/88)
In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: > >As a side note, one argument for making x[-1] legal is that it permits >you to use sentinels in both directions. I don't see that this is a >problem, regardless of architecture. All that is required is that nothing >be allocated on a segment boundary. However, as the man says, they way >it is is the way it is. There never was a machine, a language, or an >operating system without arcane restrictions. [Except lisp :-)] >-- Consider: some_big_type x[2]; If sizeof(some_big_type) is half the size of a segement, computing &x[-1] is no harder (or easier) than computing &x[2]. The standard mandates that &x[2] be computable but it does not mandate that &x[-1] be computable. I suspect that a C implementation that allows arrays as large as large as a segment is able to compute both addresses. Note, mandating that &x[-1] be computable does not mean that x[-1] is referencable. So evenif &x[-1] were computable we still could not have a sentinel at the low end of the array. A sentinal at the low end is easy. just start the data with x[1]. Marv Rubinstein
gwyn@smoke.ARPA (Doug Gwyn ) (09/22/88)
In article <33598@XAIT.XEROX.COM>, g-rh@XAIT.XEROX.COM (Richard Harter) writes: > it would seem to follow that &x[n].item is not guaranteed to be legal. Yup.
gwyn@smoke.ARPA (Doug Gwyn ) (09/22/88)
In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >However it would be very nice if there were a library routine that would >tell you whether a pointer was legal or not. I think if your code has to worry about this, it is ALREADY in trouble. Pointers should come in two flavors: null (which is easy to test) and valid (which you should be able to assume when non-null).
will.summers@p6.f18.n114.z1.fidonet.org (will summers) (09/22/88)
In message <33547@XAIT.XEROX.COM> g-rh@XAIT.XEROX.COM (Richard Harter) writes: > >>In C a pointer is a fairly anonymous object. What you are saying is > >>that it is a potential error to add or subtract an integer from a > >>pointer if the result is out of range. Very well, but what is that > >>range? > > >The members of the array that the pointer points into, plus the special > >case of just above the end of the array. This got me thinking about a subtle dpANS wording difference: struct _whatever *pstruct; pstruct = (struct _whatever *) malloc (n * sizeof(struct _whatever)); is pstruct[n-1] or pstruct+(n-1) -guaranteed- to be allowed on -all- dpANS conformant installations? The dpANS (Jan '88) states that malloc() returns space for an -object- of size (n * sizeof(struct _whatever)) whereas calloc(n, sizeof(struct _whatever)) allocates space for an -array- of 'n' objects each of whose size is sizeof(struct _whatever). I guess it comes down to this: does dpANS -guarantee- an object is divisable into sub-objects following Chris Torek's "locally flat" paradigm, and that pointers produced by arithmetic on pointers to the sub-objects will be valid. Simply stated, does dpANS wording imply any difference between calloc (n, size) and malloc (n*size) ? \/\/ill -- St. Joseph's Hospital/Medical Center - Usenet <=> FidoNet Gateway Uucp: ...{gatech,ames,rutgers}!ncar!noao!asuvax!stjhmc!18.6!will.summers
g-rh@XAIT.XEROX.COM (Richard Harter) (09/22/88)
In article <8557@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>However it would be very nice if there were a library routine that would >>tell you whether a pointer was legal or not. >I think if your code has to worry about this, it is ALREADY in trouble. >Pointers should come in two flavors: null (which is easy to test) and >valid (which you should be able to assume when non-null). Oh come on now, my code is never in trouble. What, never? But it is just this kind of thinking "you should be able to assume" that leads to bug-ridden code which lacks robustness. If one is writing a service routine one doesn't blissfully assume that the caller hasn't blown it; one checks the input for validity. If there is trouble in your routine, that's your problem. But if you don't check your input and it violates your interface assumptions anything can happen. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
mcdonald@uxe.cso.uiuc.edu (09/22/88)
>One problem with this is that on the segmented machines it is the act >of computing such a pointer that is invalid, not the pointer itself. I don't understand this. I can understand that on certain wacko architectures that computing it IN A SEGMENT REGISTER would cause a problem. But why not do the computation in an ordinary arithmetic register, presumably by casting to an integer type? DougMcDonald
gwyn@smoke.ARPA (Doug Gwyn ) (09/23/88)
In article <33666@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >one checks the input for validity. If there is trouble in your routine, >that's your problem. But if you don't check your input and it violates >your interface assumptions anything can happen. You cannot fix the caller's violation of the interface spec in the called routine. It often pays to perform thorough parameter validation while debugging an application, but you should not rely on such micro-tests for robustness.
msb@sq.uucp (Mark Brader) (09/23/88)
The conclusion given is right; the reason is wrong. Richard Harter (g-rh@XAIT.Xerox.COM) writes: > >As a side note, one argument for making x[-1] legal is that it permits > >you to use sentinels in both directions. I don't see that this is a > >problem, regardless of architecture. All that is required is that nothing > >be allocated on a segment boundary... Henry Spencer (henry@utzoo.uucp), no less, replies: > The situation unfortunately isn't as symmetrical as it looks, because > a pointer to an array element points to the *beginning* of the array > element. He must not have gotten over his cold yet. The correct statement is: a pointer to an array element *is typically implemented as* pointing to the beginning of the array element. Depending on the machine architecture, it might be equally well implementable as a pointer to the *end* of the array element. Other implementations are also conceivable. A pointer to anything points to *all* of the thing, at once. The following code copies all of y over all of x, doesn't it? (Assuming that x and y have types for which the operations are legal.) p = &x; q = &y; *p = *q; I'm feeling rather sensitive about this point just now, because I've been discussing by email with David Prosser, the editor of the Draft ANSI Standard for C, the several errors in its descriptions of array and pointer operations. It appears that he and his predecessors made the same or similar mistakes. > Both practices have been technically illegal all along, > so legitimizing both wasn't vitally necessary. Since x[size] gets used > a lot and is cheap to do, it was legalized. Since x[-1] was rather more > costly and is used less, it wasn't. Rather, since x[size] gets used a lot and x[-1] is used less, *and an implementation is possible on most or all machines where x[size] is cheap*, it was appropriate to bless x[size]. Mark Brader "True excitement lies in doing 'sdb /unix /dev/kmem'" utzoo!sq!msb, msb@sq.com -- Pontus Hedman
henry@utzoo.uucp (Henry Spencer) (09/25/88)
In article <1988Sep23.141042.23951@sq.uucp> msb@sq.com (Mark Brader) writes: >Henry Spencer (henry@utzoo.uucp), no less, replies: >> The situation unfortunately isn't as symmetrical as it looks, because >> a pointer to an array element points to the *beginning* of the array >> element. > >He must not have gotten over his cold yet. The correct statement is: >a pointer to an array element *is typically implemented as* pointing to >the beginning of the array element... Mark, you might be surprised if you study the X3J11 drafts very closely. Remember, for example, that a pointer to a struct, suitably cast, is *required* to point to its first member. Okay, you can say that the cast can involve a conversion... but when you pay careful attention to the rules about pointers to unions, such schemes start coming unravelled. When you look very carefully at the combination of the rules for pointers to structs, pointers to unions, pointers to arrays whose members are also arrays, compatibility across separately-compiled modules, etc., it's very hard to do seriously unorthodox things with pointers without breaking a rule somewhere. -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
henry@utzoo.uucp (Henry Spencer) (09/25/88)
In article <225800073@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: >I don't understand this. I can understand that on certain wacko >architectures that computing it IN A SEGMENT REGISTER would cause >a problem. But why not do the computation in an ordinary >arithmetic register, presumably by casting to an integer type? For one reason, on machines with notions of data type at the hardware level, this may be illegal. For another reason, pointer arithmetic may be seriously unorthodox, to the point where doing it using integers may be much more expensive than using the segment registers. (One obvious way this can happen -- it almost did happen on the 68000 -- is that a pointer might not fit in an integer register.) -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
g-rh@XAIT.XEROX.COM (Richard Harter) (09/25/88)
In article <8564@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >In article <33666@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>one checks the input for validity. If there is trouble in your routine, >>that's your problem. But if you don't check your input and it violates >>your interface assumptions anything can happen. >You cannot fix the caller's violation of the interface spec in the called >routine. >It often pays to perform thorough parameter validation while debugging an >application, but you should not rely on such micro-tests for robustness. We seem to have strayed out of specifics into the area of general software methodology. The question as I see it is not one of "fixing" caller interface errors -- it is one of handling them gracefully. In robust code the effects of an error are predictable and there are damage control measures; in fragile code the effects of an error are unpredictable and may be catastrophic. I would say that it pays to perform parameter validation, not merely while debugging an application, but at all times and that the specifications should include, as a matter of course, the actions to be taken when parameters are not valid. My view is that one should never assume, as a matter of course, that software is correct. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/27/88)
< "NO toon can resist the old shave-and-a-haircut gag!" > In article <8557@smoke.ARPA>, gwyn@smoke.ARPA (Doug Gwyn ) writes: > In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: > >However it would be very nice if there were a library routine that would > >tell you whether a pointer was legal or not. > I think if your code has to worry about this, it is ALREADY in trouble. > Pointers should come in two flavors: null (which is easy to test) and > valid (which you should be able to assume when non-null). Well, maybe your code *is* in trouble. Maybe you're being called by some other slob's function, and he* can't tell '\0' from NULL. Or maybe you've got mysterious core dumps, and would like to at least printf( "Goodbye, cruel world!\n" ) before you exit() off that mortal coil. (Or, who knows, even tell your MS-DOS tester where the software was right before the PC suddenly froz Anyway, you do have a few sources of data on your data. Note that *all* of these are compiler and operating system dependent to *implement*, but once implemented, could be used in a fairly portable function. Everything's either global/static, automatic, or malloc'ed, right? You may be able (by staring at the output of nm, or at the .MAP files your linker generates) to find a relationship between some names that often (always?) show up. Is one symbol always the first or last in initialized (global) memory? Then you know one limit of the address range of extern's and static's. The symbols end, etext, and edata go *way* back in the history of the UNIX(R) operating system. "The address of etext is the first address above the program text [instructions; that is, a limit of the range of function pointers], edata above the initialized data region [extern's and static's], and end above the uninitialized data region. . . . the current value of the program break [initially & end] should be determined by 'sbrk(0)' (see brk(2))." [end(3C), from an old UNIX system manual.] (This one looks UNIX-system specific, I'm afraid.) It's very common for at least two of these areas to share a common boundary. (For example, the stack begins just above the instructions.) So one number gives you two boundaries. A final trick is to put a magic auto variable in main(), such that it's the very first object on the stack. (This may be the lexically first or last variable in main()), and store its address in an extern for later checking. A checking function can define a local variable of its own, if only to measure the extent of the stack. Between this flood of numbers, and some system-specific experimentation to see how they work together, we could produce the following checking functions: valid_fpointer(): Is the argument conceivably a valid function pointer? (In this case, make sure it's on a valid boundary, too.) valid_extern(): Is the argument possibly a valid pointer to an extern or static object? valid_auto(): Is the argument in the right range to be the address of a local variable of some active function? valid_alloc(): Is the argument a value malloc() or one of its cousins has returned? (There are all sorts of ways of beefing up this one.) valid_heap(): More lenient than valid_alloc(), is this possibly the address of an object, or part of an object, allocated off of the heap by the malloc() family? #define valid_data( p ) \ ( valid_extern( p ) || valid_auto( p ) || valid_heap( p ) ) Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com) AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm I'm not speaking for the company, I'm just speaking my mind. UNIX(R) is a registered trademark of AT&T (*"he": No female programmer would ever do that!-)
henry@utzoo.uucp (Henry Spencer) (09/28/88)
In article <33789@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >... I would say that it pays to perform parameter validation... True, but in general one must draw the line somewhere. It's almost always possible to add just one more check. There usually has to be some sort of balance with efficiency. -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
g-rh@XAIT.XEROX.COM (Richard Harter) (09/29/88)
In article <1988Sep27.172738.16453@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <33789@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>... I would say that it pays to perform parameter validation... >True, but in general one must draw the line somewhere. It's almost always >possible to add just one more check. There usually has to be some sort of >balance with efficiency. It is true that one always has to take into account the tradeoff between efficiency and error checking. However that is not normally the tradeoff involved in parameter validation. Unless the body of a function is very short and very fast, parameter validation execution times are a nominal part of the total cost of a function invocation. The real trade offs are between error checking and code size. It costs time and money to write error checking code, and the presence of that code in the source increases both maintenance costs and object code size. Furthermore, not all error checking is of equal value, nor is it always appropriate. If the called routine is essentially a subcomponent of the calling routine, parameter validation is of less value; the two two routines are part of a common package. On the other hand, if the called routine is a service routine, it is much more important to do things like parameter validation. Note: This is not so much a reply to Henry, who knows all these things, but a general comment. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
bill@proxftl.UUCP (T. William Wells) (09/30/88)
In article <33789@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
: We seem to have strayed out of specifics into the area of general software
: methodology.
Yes, but since C has some specific problems with what you would
like, it is not too far off the subject.
: The question as I see it is not one of "fixing" caller
: interface errors -- it is one of handling them gracefully.
And I see the problem as one of how much code and time one is
willing to spend on "graceful handling". As it happens, I
routinely write code that does not do *any* parameter
validation. And my debugged code almost never breaks in a way
that would have been caught by parameter validation (or, for that
matter, by most of the other tricks of the robust-is-god school),
Now, unless the cost of a failing program is fairly high and the
cost of adding validity checks is low enough, it follows that
such checks do not belong in my code. I don't suppose that I am
really that atypical, so I suppose that the same applies to other
programmers. (Note that I am not speaking of checks used during
the debugging phase; there one needs all the help one can get.
:-)
In fact, the only place that I recommend parameter validation of
any kind is in functions that are callable from the outside world
(entry points to libraries, subsystem entry points, etc.) And
even there, I only recommend this when the cost of a failure (in
the final product) is fairly high.
: I would say that it pays to perform parameter validation,
: not merely while debugging an application, but at all times
Such blanket assertions are almost always wrong; and in this case
is certainly wrong, for the reasons I gave above. Could we
please refrain from stating relatives as if they were absolutes?
(Remember the flames about "is volatile necessary"? Using
absolutes this way guarantees that kind of flame.)
: and that the
: specifications should include, as a matter of course, the actions to be
: taken when parameters are not valid.
This *is* a good idea when robustness is important.
: My view is that one should never
: assume, as a matter of course, that software is correct.
And my view is that one should balance the cost of software
failure against the cost of making it less likely to fail.
---
Bill
You can still reach me at proxftl!bill
But I'd rather you send to proxftl!twwells!bill
g-rh@XAIT.XEROX.COM (Richard Harter) (09/30/88)
In article <835@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes: >In article <33789@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >: The question as I see it is not one of "fixing" caller >: interface errors -- it is one of handling them gracefully. >And I see the problem as one of how much code and time one is >willing to spend on "graceful handling". As it happens, I >routinely write code that does not do *any* parameter >validation. And my debugged code almost never breaks in a way >that would have been caught by parameter validation (or, for that >matter, by most of the other tricks of the robust-is-god school), >Now, unless the cost of a failing program is fairly high and the >cost of adding validity checks is low enough, it follows that >such checks do not belong in my code. I don't suppose that I am >really that atypical, so I suppose that the same applies to other >programmers. (Note that I am not speaking of checks used during >the debugging phase; there one needs all the help one can get. >:-) Debugging??! You mean there are people that actually write code that has bugs in it? :-). As you imply, there are tradeoffs. Delivered commercial software, real time systems, software with a lot of users, software in critical applications, and the like have a high failure cost. A lot of my work has been in these areas, so I am biased towards robustness. There are a lot of situations where this is not the case. As a general remark, once you get past the initial typos, etc, most errors are the result of lacunae in the specifications and analysis. As you suggest, parameter validation is not a high profit area in the robust software game. However it is fairly easy and it mostly can be done in a routine fashion. It is also part of a general mind set which asks as a matter of course, "What will happen if the assumptions made are violated, and what should I do about it?" It is this mind set and the systematic routine application of this approach that reduces the likelihood of these pernicious lacunae. This is more important in large systems. >: My view is that one should never >: assume, as a matter of course, that software is correct. >And my view is that one should balance the cost of software >failure against the cost of making it less likely to fail. Oh, I agree. And when I write a throwaway program I cut a lot of corners. But even then it often pays to be scrupulous -- the routine error checking makes debugging much more pleasant. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
henry@utzoo.uucp (Henry Spencer) (10/02/88)
In article <34041@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>... in general one must draw the line somewhere. It's almost always >>possible to add just one more check. There usually has to be some sort of >>balance with efficiency. > > It is true that one always has to take into account the tradeoff >between efficiency and error checking. However that is not normally the >tradeoff involved in parameter validation. Unless the body of a function >is very short and very fast, parameter validation execution times are a >nominal part of the total cost of a function invocation. Ah, but it depends on how much validation you want to do, which was my point. I'm passing in a pointer to a complex data structure. Do I just check the pointer for being NULL? Do I confirm that the node it points at has a magic number in the right place? Do I inspect the rest of the node's data for plausibility? Do I apply the same tests to the node's neighbors? Do I apply a graph-tracing algorithm to verify correct linking of the whole multi-node structure? I've actually done all of these things at various times. It depends on the tradeoffs. (My normal practice? I almost always check for NULL, and usually have magic-number code controlled by #ifdef DEBUG. And any time my code vitally depends on some non-obvious property I'll usually throw in an assert(), partly to document the fact and partly as an error-catcher. The other things are responses to unusual situations.) -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
henry@utzoo.uucp (Henry Spencer) (10/02/88)
In article <835@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes: >... I routinely write code that does not do *any* parameter >validation. And my debugged code almost never breaks in a way >that would have been caught by parameter validation (or, for that >matter, by most of the other tricks of the robust-is-god school), Be careful... hubris tends to be followed by nemesis. Implementing extra checking has a nasty habit of catching errors that *weren't* caught by normal debugging. It often finds them even in "production" code. Scary but true. The experience of preparing my regexp library for posting convinced me that time spent in more paranoid testing is almost never wasted. The code was ready to go (I thought). I did up a quick regression-test widget so that other people could tell whether the thing was working on their weird machine after being compiled by their weird compiler. I ran it on my machine and several of the test cases failed. Oops. I then spent considerably more effort on building a fairly thorough set of test cases, over 100 of them. It paid off. The thing *STILL* had a bug or two that were found by others after I posted it, but it was much more solid than when I *first* thought it was ready to post. -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
wes@obie.UUCP (Barnacle Wes) (10/06/88)
In article <835@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes: > ... I routinely write code that does not do *any* parameter > validation. And my debugged code almost never breaks in a way > that would have been caught by parameter validation (or, for that > matter, by most of the other tricks of the robust-is-god school), In article <1988Oct1.221726.10698@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) replied: | The experience of preparing my regexp library for posting convinced me | that time spent in more paranoid testing is almost never wasted. The code | was ready to go (I thought). I did up a quick regression-test widget | so that other people could tell whether the thing was working on their | weird machine after being compiled by their weird compiler. I ran it | on my machine and several of the test cases failed. Oops. I then spent | considerably more effort on building a fairly thorough set of test cases, | over 100 of them. It paid off. The thing *STILL* had a bug or two that | were found by others after I posted it, but it was much more solid than | when I *first* thought it was ready to post. After having worked in software testing and validation for over 4 years, I have gotten pretty good at breaking code. During the developement testing, whenever we got a new build of the software, our boss expected us to break it somehow in the first night. We didn't let her down, through 11 released builds and several dozen "engineering versions." If your code is not written in the "robust-is-god" manner, I can break it easily. If it is written carefully, I can probably still break it, using inputs your code considers valid. Whenever I am confronted with the old line "I don't need to test/document/debug it, it already works," I have one simple answer: "Prove it!" -- {hpda, uwmcsd1}!sp7040!obie!wes "How do you make the boat go when there's no wind?" -- Me --