[comp.lang.c] Out of range pointers

g-rh@cca.CCA.COM (Richard Harter) (09/18/88)

In article <8515@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:

Re comments about x[-1] should be legal and should be in the standard.

>I doubt that any effect on the computer industry would have occurred
>other than reduced adherence to the postulated C standard.  People
>writing portable applications would still not be able to compute
>&array[-1], since several compilers would ignore that requirement
>(benchmark speed is a far greater driving factor than the desires of
>a few sloppy programmers to compute non-existent addresses).  What
>good would that situation accomplish?  Better that the standard be
>widely followed and that programmers become better educated about
>actual portability considerations, than to encourage false hopes for
>availability of features that are difficult or detrimental to provide.

You may be right about reduced adherence, at least in this regard.
However the problem is not simply a matter of "sloppy" programming.
In C a pointer is a fairly anonymous object.  What you are saying is
that it is a potential error to add or subtract an integer from a
pointer if the result is out of range.  Very well, but what is that
range?  Suppose a pointer is passed through a calling sequence.  In
the function I have no way of knowing whether &x[n] will break for any
n other than 0.  For that matter I have no way of knowing whether 
x is a legal pointer!

In principle this is not right -- there is no way to write defensive
code to check on pointer validity.  To be sure a "correct" program
never has an invalid pointer and all that but what about the rest of
us poor mortals?
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

gwyn@smoke.ARPA (Doug Gwyn ) (09/18/88)

In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>In C a pointer is a fairly anonymous object.  What you are saying is
>that it is a potential error to add or subtract an integer from a
>pointer if the result is out of range.  Very well, but what is that
>range?

Wherever the actual data is, plus a nonexistent extra member just past
the end of an array.  But you knew that.

>Suppose a pointer is passed through a calling sequence.  In
>the function I have no way of knowing whether &x[n] will break for any
>n other than 0.  For that matter I have no way of knowing whether 
>x is a legal pointer!

That's not quite the same issue.  One presumes that your functions have
interface specifications, which if adhered to guarantee proper operation.
The code defining the function must of course implement those
specifications.  The C language does not provide the means to completely
specify all relevant aspects of an interface in the C code itself, nor
does it require run-time enforcement of the interface specifications.
It does permit some degree of compile-time checking of the interface.

>In principle this is not right -- there is no way to write defensive
>code to check on pointer validity.  To be sure a "correct" program
>never has an invalid pointer and all that but what about the rest of
>us poor mortals?

I suppose you should all learn ways to produce correct code.  Many
such methods were figured out in the 1970s and should be part of
every professional programmer's repertoire by now.

If you ever get the chance to help design a brand-new programming
language, you might consider building in stronger aids for interface
specification checking.  Ada tried that but didn't carry it far enough.

henry@utzoo.uucp (Henry Spencer) (09/20/88)

In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>In C a pointer is a fairly anonymous object.  What you are saying is
>that it is a potential error to add or subtract an integer from a
>pointer if the result is out of range.  Very well, but what is that
>range?

The members of the array that the pointer points into, plus the special
case of just above the end of the array.

>Suppose a pointer is passed through a calling sequence.  In
>the function I have no way of knowing whether &x[n] will break for any
>n other than 0.  For that matter I have no way of knowing whether 
>x is a legal pointer!

That's correct.  It is the caller's responsibility to supply a pointer
that is adequate for the function's purposes, and the function writer's
responsibility to document those purposes well enough that the caller
knows what his responsibilities are.  There is no way to check this at
runtime in conventional C implementations.  That's C for you.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

g-rh@XAIT.XEROX.COM (Richard Harter) (09/21/88)

In article <1988Sep19.213023.13181@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>>In C a pointer is a fairly anonymous object.  What you are saying is
>>that it is a potential error to add or subtract an integer from a
>>pointer if the result is out of range.  Very well, but what is that
>>range?

>The members of the array that the pointer points into, plus the special
>case of just above the end of the array.

	The question was rhetorical, in that I pointing out that there is
no way to determine from the pointer itself what its range was.  I expect
it is a good thing to repeat the legal answer, since it is does seem to be
a complete mystery to some people :-).

>>Suppose a pointer is passed through a calling sequence.  In
>>the function I have no way of knowing whether &x[n] will break for any
>>n other than 0.  For that matter I have no way of knowing whether 
>>x is a legal pointer!

>That's correct.  It is the caller's responsibility to supply a pointer
>that is adequate for the function's purposes, and the function writer's
>responsibility to document those purposes well enough that the caller
>knows what his responsibilities are.  There is no way to check this at
>runtime in conventional C implementations.  That's C for you.

Such as it is :-).  In theory this means that there is an entire class
of error checking that one can't do.  For the most part this doesn't matter.
However it would be very nice if there were a library routine that would
tell you whether a pointer was legal or not.  I am not much a fan of the
"if all the spec's and the interface definitions and the code are all correct
then you don't need error checking" school of programming.  I rather like
the "be a skeptic and check and report the errors when you find them"
school of thought.  

As a side note, one argument for making x[-1] legal is that it permits
you to use sentinels in both directions.  I don't see that this is a
problem, regardless of architecture.  All that is required is that nothing
be allocated on a segment boundary.  However, as the man says, they way
it is is the way it is.  There never was a machine, a language, or an
operating system without arcane restrictions.  [Except lisp :-)]
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

gwyn@smoke.ARPA (Doug Gwyn ) (09/21/88)

In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>As a side note, one argument for making x[-1] legal is that it permits
>you to use sentinels in both directions.  I don't see that this is a
>problem, regardless of architecture.  All that is required is that nothing
>be allocated on a segment boundary.

There is a considerable practical difference between this case and the
one for a pointer just past the last member of an array.  The [-1] case
can require reservation of an arbitrary amount of unused address space
below an actual object, whereas the other case requires no more than a
byte (or a word, depending on the architecture) of space reserved above.

g-rh@XAIT.XEROX.COM (Richard Harter) (09/21/88)

In article <8544@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>>As a side note, one argument for making x[-1] legal is that it permits
>>you to use sentinels in both directions.  I don't see that this is a
>>problem, regardless of architecture.  All that is required is that nothing
>>be allocated on a segment boundary.

>There is a considerable practical difference between this case and the
>one for a pointer just past the last member of an array.  The [-1] case
>can require reservation of an arbitrary amount of unused address space
>below an actual object, whereas the other case requires no more than a
>byte (or a word, depending on the architecture) of space reserved above.

Good point.  [For those who didn't catch it, the size of the object that
a pointer is pointing to can be a structure of (arbitrary) size.  To allow
&x[-1] to be legal you have to leave enough space for an instance of the
structure (or array of structures or whatever) before the actual instance.]

Doug's claim that the other case (one past) need only require a word (or byte)
of memory at the end suggests a question that I don't know the answer to.
Suppose that x is an array of structures of length n.  As we all know, 
&x[n] is legal.  But what about &x[n].item?  If Doug's assertion is correct,
and I expect it is since he is quite knowledgable in these matters, then
it would seem to follow that &x[n].item is not guaranteed to be legal.



-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

henry@utzoo.uucp (Henry Spencer) (09/21/88)

In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>As a side note, one argument for making x[-1] legal is that it permits
>you to use sentinels in both directions.  I don't see that this is a
>problem, regardless of architecture.  All that is required is that nothing
>be allocated on a segment boundary...

The situation unfortunately isn't as symmetrical as it looks, because
a pointer to an array element points to the *beginning* of the array
element.  A pointer one past the end of an array points to the byte
(well, the addressing unit, whatever it is) following the array; a
pointer one past the beginning points to the byte (etc.) that is one
array-member-size before the beginning.  Computing x[size] without
risk of overflow only requires that there be at least one byte between
the array and the end of a segment; computing x[-1] without risk of
underflow requires an entire array element between the array and the
start of the segment, which can get expensive if the elements are big
(consider multidimensional arrays).

The difference in costs was felt to be sufficient to justify a difference
in treatment.  Both practices have been technically illegal all along,
so legitimizing both wasn't vitally necessary.  Since x[size] gets used
a lot and is cheap to do, it was legalized.  Since x[-1] was rather more
costly and is used less, it wasn't.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

maart@cs.vu.nl (Maarten Litmaath) (09/22/88)

References: <867@osupyr.mast.ohio-state.edu> <3200@geac.UUCP> <1430@ficc.uu.net> <1988Sep15.145026.20325@ateng.uucp> <16041@ism780c.isc.com> <8515@smoke.ARPA> <33432@cca.CCA.COM> <1988Sep19.213023.13181@utzoo.uucp> <33547@XAIT.XEROX.COM> <8544@smok

psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/22/88)

In article <8523@smoke.ARPA>, gwyn@smoke.ARPA (Doug Gwyn ) writes:
> In article <33432@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
> >In C a pointer is a fairly anonymous object.  What you are saying is
> >that it is a potential error to add or subtract an integer from a
> >pointer if the result is out of range.  Very well, but what is that
> >range?
>...
> >Suppose a pointer is passed through a calling sequence.  In
> >the function I have no way of knowing whether &x[n] will break for any
> >n other than 0.  For that matter I have no way of knowing whether 
> >x is a legal pointer!
>...
> If you ever get the chance to help design a brand-new programming
> language, you might consider building in stronger aids for interface
> specification checking.  Ada tried that but didn't carry it far enough.

While Pascal lends itself to this somewhat better than C, I learned
some lessons from one compiler that are worth mentioning here.

Grads of the University of Wisconsin at Madison (and other places with
Univac 1100s) remember Charles Fisher's Pascal compiler with great
fondness.  Aside from some extensions (separate compilation) that
aren't relevant to this discussion, he worked within the limits of
Jensen and Wirth, but still produced a compiler that did an enormous
amount of run-time checking.  (Checking variant tags found several bugs
a day in my CS 701 project.)  The niftiest check, though, was sanity
testing pointers with a "lock" and a "key".

(In Pascal, a programmer can only point to objects that have been
allocated, one at a time, from the heap.  Think "p=malloc(sizeof t)"
or some such.)

When a "new" object is allocated, some space is reserved at the
beginning for a "lock".  This is an integer, usually fairly long
(thirty-two bits).  (As an important tweak, the high order bits of this
"lock" are never all zero or all ones.)  Every pointer has enough room
for both an address and a "key".  When a pointer variable is assigned a
value, both the address and the "key" are assigned.  When an object is
deallocated, the "lock" is changed (typically, set to an otherwise
impossible value, such as zero).

When a pointer is used, the compiler first checks the "key" field in
the pointer to the "lock" field in the structure.  NOTE THAT THIS IS
COMPLETELY TRANSPARENT TO THE PROGRAMMER!  Null pointers can always be
caught by an ambitious compiler.  With "locks" and "keys", even
uninitialized and dangling pointers can be caught.  That is, if you
allocate some memory, assign the address to a pointer variable, and
then deallocate the memory, this run-time check will complain if you
use the obsolete pointer variable to get at memory that's no longer
safe to access or modify.

Now, what changes would need to be made for C?  First, it's possible in
C to allocate not just single objects, but arrays of objects.  An
allocated array would need room for both a lock and a length.  You
don't need a lock for every element.  A "pointer" would be represented
as a base address, an offset, and a key.  The compiler would check that
the offset was no greater than the length, and that the key matched the
lock stored (once) at the beginning of the array.

Second, C can have pointers to global and local variables (to the
initialized data section and to the stack).  It doesn't make sense to
have a "lock" for every auto or extern short.  (It's not too,
unreasonable, though, to have a lock for every array and struct.)  C
would need some sort of "skeleton key" for small non-malloc'ed objects.

Finally, since we're generating code with "pointers" that are very
different than what most C compilers use, we'd have to be very careful
in linking code (say, libraries) that expect "normal" C pointers.

So, this is tough to do in C.  (I think some MS-DOS C interpreters do
at least some of this.)  On the other hand, by redefining Object* and
Object->, "lock" and "key" checking would be straightforward to
implement in C++.  Any takers?

References (CNF = Charles N. Fisher, Computer Sciences Department,
University of Wisconsin at Madison; RJL = Richard J. LeBlanc, then at
the School of Information and Computer Science, Georgia Institute of
Technology):

RJL and CNF, "A case study of run-time errors in Pascal programs",
SOFTWARE-PRACTICE AND EXPERIENCE, v. 12, pp. 825-834 (1982).

CNF and RJL, "Efficient implementation an optimization of run-time
checking in Pascal", SIGPLAN NOTICES, v. 12, #3, p 1924 (March 1977).

CNF and RJL, "The implementation of run-time diagnostics in Pascal",
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, v. 6, pp. 313-319 (1980).

CNF and RJL, UW-PASCAL REFERENCE MANUAL, Madison Academic Computing
Center, Madison, WI, 1977.

(More recent references may be available.  Anyone at uwvax or uwmacc
have anything to contribute?)
--
Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com)
AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

barmar@think.COM (Barry Margolin) (09/22/88)

In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>However it would be very nice if there were a library routine that would
>tell you whether a pointer was legal or not.

One problem with this is that on the segmented machines it is the act
of computing such a pointer that is invalid, not the pointer itself.
For example, if P is a pointer that happens to point to offset 0 in a
segment, *computing* P-1 will cause a fault.  So, what you need is a
routine that tells you whether a particular offset from a pointer is
legal; something like:

	if (valid_pointer_offset (P, sizeof(*P), -1))
	    P--;

Also, in order for this to work on machines with linear address
spaces, all pointers would have to carry around the location and size
of the objects to which they point.  This is done in Symbolics C,
since pointers are actually implemented as two Lisp objects, an array
and an offset into the array, and array objects contain their size,
which is checked by the microcode/hardware array-indexing operations.
Each call to malloc returns a new array, and each stack frame can be
treated as an array (this means that it won't detect overflowing from
one local array into another in the same frame, but nothing's
perfect).  Expecting C implementations on conventional architectures
to do this is too much.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

news@ism780c.isc.com (News system) (09/22/88)

In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>
>As a side note, one argument for making x[-1] legal is that it permits
>you to use sentinels in both directions.  I don't see that this is a
>problem, regardless of architecture.  All that is required is that nothing
>be allocated on a segment boundary.  However, as the man says, they way
>it is is the way it is.  There never was a machine, a language, or an
>operating system without arcane restrictions.  [Except lisp :-)]
>-- 

Consider:

   some_big_type x[2];

If sizeof(some_big_type) is half the size of a segement, computing &x[-1] is
no harder (or easier) than computing &x[2].  The standard mandates that
&x[2] be computable but it does not mandate that &x[-1] be computable.  I
suspect that a C implementation that allows arrays as large as large as a
segment is able to compute both addresses.

Note, mandating that &x[-1] be computable does not mean that x[-1] is
referencable.  So evenif &x[-1] were computable we still could not have a
sentinel at the low end of the array.  A sentinal at the low end is easy.
just start the data with x[1].

   Marv Rubinstein

gwyn@smoke.ARPA (Doug Gwyn ) (09/22/88)

In article <33598@XAIT.XEROX.COM>, g-rh@XAIT.XEROX.COM (Richard Harter) writes:
> it would seem to follow that &x[n].item is not guaranteed to be legal.

Yup.

gwyn@smoke.ARPA (Doug Gwyn ) (09/22/88)

In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>However it would be very nice if there were a library routine that would
>tell you whether a pointer was legal or not.

I think if your code has to worry about this, it is ALREADY in trouble.
Pointers should come in two flavors: null (which is easy to test) and
valid (which you should be able to assume when non-null).

will.summers@p6.f18.n114.z1.fidonet.org (will summers) (09/22/88)

In message  <33547@XAIT.XEROX.COM> g-rh@XAIT.XEROX.COM (Richard Harter)
writes:
 > >>In C a pointer is a fairly anonymous object.  What you are saying is
 > >>that it is a potential error to add or subtract an integer from a
 > >>pointer if the result is out of range.  Very well, but what is that
 > >>range?
 >
 > >The members of the array that the pointer points into, plus the special
 > >case of just above the end of the array.

This got me thinking about a subtle dpANS wording difference:

    struct _whatever *pstruct;

    pstruct = (struct _whatever *) malloc (n * sizeof(struct _whatever));

        is pstruct[n-1] or pstruct+(n-1) -guaranteed- to be allowed on
        -all- dpANS conformant installations?


        The dpANS (Jan '88) states that malloc() returns space for an -object-
        of size (n * sizeof(struct _whatever)) whereas

            calloc(n, sizeof(struct _whatever))

        allocates space for an -array- of 'n' objects each of whose size
        is sizeof(struct _whatever).


        I guess it comes down to this: does dpANS -guarantee- an object is
        divisable into sub-objects following Chris Torek's "locally
        flat" paradigm, and that pointers produced by arithmetic on 
        pointers to the sub-objects will be valid.

        Simply stated, does dpANS wording imply any difference between

            calloc (n, size) and  malloc (n*size) ?


    \/\/ill


--  
St. Joseph's Hospital/Medical Center - Usenet <=> FidoNet Gateway
Uucp: ...{gatech,ames,rutgers}!ncar!noao!asuvax!stjhmc!18.6!will.summers

g-rh@XAIT.XEROX.COM (Richard Harter) (09/22/88)

In article <8557@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>>However it would be very nice if there were a library routine that would
>>tell you whether a pointer was legal or not.

>I think if your code has to worry about this, it is ALREADY in trouble.
>Pointers should come in two flavors: null (which is easy to test) and
>valid (which you should be able to assume when non-null).

Oh come on now, my code is never in trouble.  What, never?  But it is
just this kind of thinking "you should be able to assume" that leads to
bug-ridden code which lacks robustness.  If one is writing a service
routine one doesn't blissfully assume that the caller hasn't blown it;
one checks the input for validity.  If there is trouble in your routine,
that's your problem.  But if you don't check your input and it violates
your interface assumptions anything can happen.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

mcdonald@uxe.cso.uiuc.edu (09/22/88)

>One problem with this is that on the segmented machines it is the act
>of computing such a pointer that is invalid, not the pointer itself.

I don't understand this. I can understand that on certain wacko
architectures that computing it IN A SEGMENT REGISTER would cause
a problem. But why not do the computation in an ordinary 
arithmetic register, presumably by casting to an integer type?

DougMcDonald

gwyn@smoke.ARPA (Doug Gwyn ) (09/23/88)

In article <33666@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>one checks the input for validity.  If there is trouble in your routine,
>that's your problem.  But if you don't check your input and it violates
>your interface assumptions anything can happen.

You cannot fix the caller's violation of the interface spec in the called
routine.

It often pays to perform thorough parameter validation while debugging an
application, but you should not rely on such micro-tests for robustness.

msb@sq.uucp (Mark Brader) (09/23/88)

The conclusion given is right; the reason is wrong.

Richard Harter (g-rh@XAIT.Xerox.COM) writes:
> >As a side note, one argument for making x[-1] legal is that it permits
> >you to use sentinels in both directions.  I don't see that this is a
> >problem, regardless of architecture.  All that is required is that nothing
> >be allocated on a segment boundary...

Henry Spencer (henry@utzoo.uucp), no less, replies:
> The situation unfortunately isn't as symmetrical as it looks, because
> a pointer to an array element points to the *beginning* of the array
> element.

He must not have gotten over his cold yet.  The correct statement is:
a pointer to an array element *is typically implemented as* pointing to
the beginning of the array element.  Depending on the machine architecture,
it might be equally well implementable as a pointer to the *end* of the
array element.  Other implementations are also conceivable.

A pointer to anything points to *all* of the thing, at once.
The following code copies all of y over all of x, doesn't it?
(Assuming that x and y have types for which the operations are legal.)

	p = &x; q = &y; *p = *q;

I'm feeling rather sensitive about this point just now, because I've been
discussing by email with David Prosser, the editor of the Draft ANSI
Standard for C, the several errors in its descriptions of array and pointer
operations.  It appears that he and his predecessors made the same or
similar mistakes.

> Both practices have been technically illegal all along,
> so legitimizing both wasn't vitally necessary.  Since x[size] gets used
> a lot and is cheap to do, it was legalized.  Since x[-1] was rather more
> costly and is used less, it wasn't.

Rather, since x[size] gets used a lot and x[-1] is used less, *and an
implementation is possible on most or all machines where x[size] is cheap*,
it was appropriate to bless x[size].

Mark Brader		"True excitement lies in doing 'sdb /unix /dev/kmem'"
utzoo!sq!msb, msb@sq.com				-- Pontus Hedman

henry@utzoo.uucp (Henry Spencer) (09/25/88)

In article <1988Sep23.141042.23951@sq.uucp> msb@sq.com (Mark Brader) writes:
>Henry Spencer (henry@utzoo.uucp), no less, replies:
>> The situation unfortunately isn't as symmetrical as it looks, because
>> a pointer to an array element points to the *beginning* of the array
>> element.
>
>He must not have gotten over his cold yet.  The correct statement is:
>a pointer to an array element *is typically implemented as* pointing to
>the beginning of the array element...

Mark, you might be surprised if you study the X3J11 drafts very closely.
Remember, for example, that a pointer to a struct, suitably cast, is
*required* to point to its first member.  Okay, you can say that the cast
can involve a conversion... but when you pay careful attention to the
rules about pointers to unions, such schemes start coming unravelled.
When you look very carefully at the combination of the rules for pointers
to structs, pointers to unions, pointers to arrays whose members are also
arrays, compatibility across separately-compiled modules, etc., it's very
hard to do seriously unorthodox things with pointers without breaking a
rule somewhere.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (09/25/88)

In article <225800073@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>I don't understand this. I can understand that on certain wacko
>architectures that computing it IN A SEGMENT REGISTER would cause
>a problem. But why not do the computation in an ordinary 
>arithmetic register, presumably by casting to an integer type?

For one reason, on machines with notions of data type at the hardware
level, this may be illegal.  For another reason, pointer arithmetic
may be seriously unorthodox, to the point where doing it using integers
may be much more expensive than using the segment registers.  (One
obvious way this can happen -- it almost did happen on the 68000 --
is that a pointer might not fit in an integer register.)
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

g-rh@XAIT.XEROX.COM (Richard Harter) (09/25/88)

In article <8564@smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <33666@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>>one checks the input for validity.  If there is trouble in your routine,
>>that's your problem.  But if you don't check your input and it violates
>>your interface assumptions anything can happen.

>You cannot fix the caller's violation of the interface spec in the called
>routine.

>It often pays to perform thorough parameter validation while debugging an
>application, but you should not rely on such micro-tests for robustness.

We seem to have strayed out of specifics into the area of general software
methodology.  The question  as I see it is not one of "fixing" caller
interface errors -- it is one of handling them gracefully.  In robust code
the effects of an error are predictable and there are damage control measures;
in fragile code the effects of an error are unpredictable and may be
catastrophic.  I would say that it pays to perform parameter validation,
not merely while debugging an application, but at all times and that the
specifications should include, as a matter of course, the actions to be
taken when parameters are not valid.  My view is that one should never
assume, as a matter of course, that software is correct.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/27/88)

< "NO toon can resist the old shave-and-a-haircut gag!" >

In article <8557@smoke.ARPA>, gwyn@smoke.ARPA (Doug Gwyn ) writes:
> In article <33547@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
> >However it would be very nice if there were a library routine that would
> >tell you whether a pointer was legal or not.
> I think if your code has to worry about this, it is ALREADY in trouble.
> Pointers should come in two flavors: null (which is easy to test) and
> valid (which you should be able to assume when non-null).

Well, maybe your code *is* in trouble.  Maybe you're being called by
some other slob's function, and he* can't tell '\0' from NULL.  Or
maybe you've got mysterious core dumps, and would like to at least
printf( "Goodbye, cruel world!\n" ) before you exit() off that mortal
coil.  (Or, who knows, even tell your MS-DOS tester where the software
was right before the PC suddenly froz

Anyway, you do have a few sources of data on your data.  Note that
*all* of these are compiler and operating system dependent to
*implement*, but once implemented, could be used in a fairly portable
function.

Everything's either global/static, automatic, or malloc'ed, right?
You may be able (by staring at the output of nm, or at the .MAP files
your linker generates) to find a relationship between some names that
often (always?) show up.  Is one symbol always the first or last in
initialized (global) memory?  Then you know one limit of the address
range of extern's and static's.

The symbols end, etext, and edata go *way* back in the history of the
UNIX(R) operating system.  "The address of etext is the first address
above the program text [instructions; that is, a limit of the range of
function pointers], edata above the initialized data region [extern's
and static's], and end above the uninitialized data region. . . .  the
current value of the program break [initially & end] should be
determined by 'sbrk(0)' (see brk(2))."  [end(3C), from an old UNIX
system manual.]  (This one looks UNIX-system specific, I'm afraid.)

It's very common for at least two of these areas to share a common
boundary.  (For example, the stack begins just above the instructions.)
So one number gives you two boundaries.

A final trick is to put a magic auto variable in main(), such that
it's the very first object on the stack.  (This may be the lexically
first or last variable in main()), and store its address in an extern
for later checking.  A checking function can define a local variable of
its own, if only to measure the extent of the stack.

Between this flood of numbers, and some system-specific experimentation
to see how they work together, we could produce the following checking
functions:

valid_fpointer():  Is the argument conceivably a valid function
pointer?  (In this case, make sure it's on a valid boundary, too.)

valid_extern():  Is the argument possibly a valid pointer to an
extern or static object?

valid_auto():  Is the argument in the right range to be the address of
a local variable of some active function?

valid_alloc():  Is the argument a value malloc() or one of its cousins
has returned?  (There are all sorts of ways of beefing up this one.)

valid_heap():  More lenient than valid_alloc(), is this possibly the
address of an object, or part of an object, allocated off of the heap
by the malloc() family?

#define valid_data( p ) \
	( valid_extern( p ) || valid_auto( p ) || valid_heap( p ) )

Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com)
AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.
UNIX(R) is a registered trademark of AT&T
(*"he":  No female programmer would ever do that!-)

henry@utzoo.uucp (Henry Spencer) (09/28/88)

In article <33789@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>... I would say that it pays to perform parameter validation...

True, but in general one must draw the line somewhere.  It's almost always
possible to add just one more check.  There usually has to be some sort of
balance with efficiency.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

g-rh@XAIT.XEROX.COM (Richard Harter) (09/29/88)

In article <1988Sep27.172738.16453@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <33789@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>>... I would say that it pays to perform parameter validation...

>True, but in general one must draw the line somewhere.  It's almost always
>possible to add just one more check.  There usually has to be some sort of
>balance with efficiency.

	It is true that one always has to take into account the tradeoff
between efficiency and error checking.  However that is not normally the
tradeoff involved in parameter validation.  Unless the body of a function
is very short and very fast, parameter validation execution times are a
nominal part of the total cost of a function invocation.

	The real trade offs are between error checking and code size.
It costs time and money to write error checking code, and the presence
of that code in the source increases both maintenance costs and object
code size.

	Furthermore, not all error checking is of equal value, nor is it
always appropriate.  If the called routine is essentially a subcomponent
of the calling routine, parameter validation is of less value; the two
two routines are part of a common package.  On the other hand, if the
called routine is a service routine, it is much more important to do
things like parameter validation.

	Note:  This is not so much a reply to Henry, who knows all these
things, but a general comment.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

bill@proxftl.UUCP (T. William Wells) (09/30/88)

In article <33789@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
: We seem to have strayed out of specifics into the area of general software
: methodology.

Yes, but since C has some specific problems with what you would
like, it is not too far off the subject.

:               The question  as I see it is not one of "fixing" caller
: interface errors -- it is one of handling them gracefully.

And I see the problem as one of how much code and time one is
willing to spend on "graceful handling".  As it happens, I
routinely write code that does not do *any* parameter
validation.  And my debugged code almost never breaks in a way
that would have been caught by parameter validation (or, for that
matter, by most of the other tricks of the robust-is-god school),
Now, unless the cost of a failing program is fairly high and the
cost of adding validity checks is low enough, it follows that
such checks do not belong in my code.  I don't suppose that I am
really that atypical, so I suppose that the same applies to other
programmers.  (Note that I am not speaking of checks used during
the debugging phase; there one needs all the help one can get.
:-)

In fact, the only place that I recommend parameter validation of
any kind is in functions that are callable from the outside world
(entry points to libraries, subsystem entry points, etc.) And
even there, I only recommend this when the cost of a failure (in
the final product) is fairly high.

:                I would say that it pays to perform parameter validation,
: not merely while debugging an application, but at all times

Such blanket assertions are almost always wrong; and in this case
is certainly wrong, for the reasons I gave above.  Could we
please refrain from stating relatives as if they were absolutes?
(Remember the flames about "is volatile necessary"?  Using
absolutes this way guarantees that kind of flame.)

:                                                             and that the
: specifications should include, as a matter of course, the actions to be
: taken when parameters are not valid.

This *is* a good idea when robustness is important.

:                                       My view is that one should never
: assume, as a matter of course, that software is correct.

And my view is that one should balance the cost of software
failure against the cost of making it less likely to fail.

---
Bill

You can still reach me at proxftl!bill
But I'd rather you send to proxftl!twwells!bill

g-rh@XAIT.XEROX.COM (Richard Harter) (09/30/88)

In article <835@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
>In article <33789@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:

>:               The question  as I see it is not one of "fixing" caller
>: interface errors -- it is one of handling them gracefully.

>And I see the problem as one of how much code and time one is
>willing to spend on "graceful handling".  As it happens, I
>routinely write code that does not do *any* parameter
>validation.  And my debugged code almost never breaks in a way
>that would have been caught by parameter validation (or, for that
>matter, by most of the other tricks of the robust-is-god school),
>Now, unless the cost of a failing program is fairly high and the
>cost of adding validity checks is low enough, it follows that
>such checks do not belong in my code.  I don't suppose that I am
>really that atypical, so I suppose that the same applies to other
>programmers.  (Note that I am not speaking of checks used during
>the debugging phase; there one needs all the help one can get.
>:-)

	Debugging??!  You mean there are people that actually write
code that has bugs in it?  :-).  As you imply, there are tradeoffs.
Delivered commercial software, real time systems, software with a
lot of users, software in critical applications, and the like have
a high failure cost.  A lot of my work has been in these areas, so
I am biased towards robustness.  There are a lot of situations where
this is not the case.

	As a general remark, once you get past the initial typos,
etc, most errors are the result of lacunae in the specifications
and analysis.  As you suggest, parameter validation is not a high
profit area in the robust software game.  However it is fairly easy
and it mostly can be done in a routine fashion.  It is also part of
a general mind set which asks as a matter of course, "What will happen
if the assumptions made are violated, and what should I do about it?"
It is this mind set and the systematic routine application of this
approach that reduces the likelihood of these pernicious lacunae.
This is more important in large systems.

>:                                       My view is that one should never
>: assume, as a matter of course, that software is correct.

>And my view is that one should balance the cost of software
>failure against the cost of making it less likely to fail.

	Oh, I agree.  And when I write a throwaway program I cut a lot
of corners.  But even then it often pays to be scrupulous -- the routine
error checking makes debugging much more pleasant.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

henry@utzoo.uucp (Henry Spencer) (10/02/88)

In article <34041@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>>... in general one must draw the line somewhere.  It's almost always
>>possible to add just one more check.  There usually has to be some sort of
>>balance with efficiency.
>
>	It is true that one always has to take into account the tradeoff
>between efficiency and error checking.  However that is not normally the
>tradeoff involved in parameter validation.  Unless the body of a function
>is very short and very fast, parameter validation execution times are a
>nominal part of the total cost of a function invocation.

Ah, but it depends on how much validation you want to do, which was my
point.  I'm passing in a pointer to a complex data structure.  Do I just
check the pointer for being NULL?  Do I confirm that the node it points
at has a magic number in the right place?  Do I inspect the rest of the
node's data for plausibility?  Do I apply the same tests to the node's
neighbors?  Do I apply a graph-tracing algorithm to verify correct
linking of the whole multi-node structure?  I've actually done all of
these things at various times.  It depends on the tradeoffs.

(My normal practice?  I almost always check for NULL, and usually have
magic-number code controlled by #ifdef DEBUG.  And any time my code
vitally depends on some non-obvious property I'll usually throw in an
assert(), partly to document the fact and partly as an error-catcher.
The other things are responses to unusual situations.)
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (10/02/88)

In article <835@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
>... I routinely write code that does not do *any* parameter
>validation.  And my debugged code almost never breaks in a way
>that would have been caught by parameter validation (or, for that
>matter, by most of the other tricks of the robust-is-god school),

Be careful... hubris tends to be followed by nemesis.  Implementing extra
checking has a nasty habit of catching errors that *weren't* caught by
normal debugging.  It often finds them even in "production" code.  Scary
but true.

The experience of preparing my regexp library for posting convinced me
that time spent in more paranoid testing is almost never wasted.  The code
was ready to go (I thought).  I did up a quick regression-test widget
so that other people could tell whether the thing was working on their
weird machine after being compiled by their weird compiler.  I ran it
on my machine and several of the test cases failed.  Oops.  I then spent
considerably more effort on building a fairly thorough set of test cases,
over 100 of them.  It paid off.  The thing *STILL* had a bug or two that
were found by others after I posted it, but it was much more solid than
when I *first* thought it was ready to post.
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu

wes@obie.UUCP (Barnacle Wes) (10/06/88)

In article <835@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
> ... I routinely write code that does not do *any* parameter
> validation.  And my debugged code almost never breaks in a way
> that would have been caught by parameter validation (or, for that
> matter, by most of the other tricks of the robust-is-god school),

In article <1988Oct1.221726.10698@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) replied:
| The experience of preparing my regexp library for posting convinced me
| that time spent in more paranoid testing is almost never wasted.  The code
| was ready to go (I thought).  I did up a quick regression-test widget
| so that other people could tell whether the thing was working on their
| weird machine after being compiled by their weird compiler.  I ran it
| on my machine and several of the test cases failed.  Oops.  I then spent
| considerably more effort on building a fairly thorough set of test cases,
| over 100 of them.  It paid off.  The thing *STILL* had a bug or two that
| were found by others after I posted it, but it was much more solid than
| when I *first* thought it was ready to post.

After having worked in software testing and validation for over 4 years,
I have gotten pretty good at breaking code.  During the developement
testing, whenever we got a new build of the software, our boss expected
us to break it somehow in the first night.  We didn't let her down,
through 11 released builds and several dozen "engineering versions."  If
your code is not written in the "robust-is-god" manner, I can break it
easily.  If it is written carefully, I can probably still break it,
using inputs your code considers valid.  Whenever I am confronted with
the old line "I don't need to test/document/debug it, it already works,"
I have one simple answer: "Prove it!"
-- 
                     {hpda, uwmcsd1}!sp7040!obie!wes

         "How do you make the boat go when there's no wind?"
                                 -- Me --