[comp.lang.c] detecting invalid pointers

Kevin_P_McCarty@cup.portal.com (03/07/89)

Is there any guaranteed way to detect an out of range pointer,
i.e., one which is supposed to point into an array but might not?

For example, a friend had something like

     int  x[TABLESIZE];
     int  *p;
     ...
     checkrange(p, __FILE__, __LINE__);

where

     checkrange(int *p, char *fname, int lineno)
     {
          if ((p < x) || (p >= x+TABLESIZE)) {
               /* error message: out of range pointer */
          }
     }

My first reaction was that one should not rely on plausible behavior
of an out of range pointer in a comparison, since that is undefined.
For example, the null pointer need not pass either comparison.  That's
easy to remedy; append `|| (p == NULL)' to the condition.

It is conceivable, and I can't find anything that would rule it out,
that a non-null pointer p which did not point into x might fail
(p < &x[0]) and/or fail (p >= &x[TABLESIZE]).  Trichotomy can fail
because pointers need not have a global linear order.

My initial response was to recommend what I thought was a stronger,
more reliable test, namely

     if ((p != NULL) && (p >= x) && (p < x+TABLESIZE)
          /* p is in-range */
     else
          /* p is invalid */

But this is little better.  While it is conceivable that if p and q
are non-null and incomparable (don't point into the same array),
none of (p < q), (p == q), (p > q), (p < q+n) holds,  it is harder to
conceive the possibility that (p > q) *and* (p < q+n) could hold, but
I can't find anything to rule that out either.  While perhaps
almost all implementations would behave reasonably here, what
guarantees that one of these comparisons must fail?

A test like
     if ((p != NULL) && (p-x >= 0) && (p-x < TABLESIZE)
is subject to the same doubts.

What's the best way to test this?

Kevin McCarty

henry@utzoo.uucp (Henry Spencer) (03/09/89)

In article <15495@cup.portal.com> Kevin_P_McCarty@cup.portal.com writes:
>Is there any guaranteed way to detect an out of range pointer,
>i.e., one which is supposed to point into an array but might not?

No.  There simply is no way to do this portably, since the results of
pointer comparisons (ignoring NULL for the moment) are well-defined only
within the array.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

barmar@think.COM (Barry Margolin) (03/09/89)

I don't think it is possible to do what you want.  C only defines the
result of pointer less-than and greater-than comparisons when the
pointers are to the same array.

It's easy to imagine an implementation where A < P && P < A+N, yet P
is not a pointer into the array A[N].  If the system is segmented, and
arrays are required to fit within a single segment, the comparison
operations might only compare the offset portion of a pointer.  If A
were at seg1:100, N were 100, and P were seg2:150, the comparison
would be implemented as 100 < 150 && 150 < 200.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

karl@haddock.ima.isc.com (Karl Heuer) (03/10/89)

In article <15495@cup.portal.com> Kevin_P_McCarty@cup.portal.com writes:
>Is there any guaranteed way to detect an out of range pointer,
>i.e., one which is supposed to point into an array but might not?

Yes (my distinguished colleagues to the contrary notwithstanding):
	int within(void *ptr, void *a, size_t n) {
	    char *p;
	    for (p = (char *)a; p < (char *)a + n; ++p) {
	        if ((char *)ptr == p) return (1);
	    }
	    return (0);
	}
This works because pointer *equality* is well-defined even on pointers into
different arrays.  If you want to do it in constant time rather than linear,
then the answer is No.  However, on any given implementation there ought to be
an unportable way (e.g., with a type pun followed by one or more integer
compares).  So if you absolutely have to do this, just add a bunch of #ifdef's
(and a big comment, in a blinking font) to the within() routine above.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

jsdy@hadron.UUCP (Joseph S. D. Yao) (03/11/89)

In article <15495@cup.portal.com> Kevin_P_McCarty@cup.portal.com writes:
>Is there any guaranteed way to detect an out of range pointer,
>i.e., one which is supposed to point into an array but might not?
>     int  x[TABLESIZE];
>     int  *p;

How about something on the order of:
	if (q != (int *) NULL &&
	    (i = q - x) >= 0 && i < TABLESIZE &&
	    q == &x[i]) {
		...
	}

I don't really think that the first comparison against NULL is nece-
ssary, but feel free to contradict.  (I know I couldn't stop ya.)

	Joe Yao		jsdy@hadron.COM (not yet domainised??)
	hadron!jsdy@{uunet.UU.NET,dtix.ARPA,decuac.DEC.COM}
	Xarc,arinc,att,avatar,blkcat,cos,decuac,\
	dtix,ecogong,empire,gong,grebyn,inco,    \
	insight,kcwc,lepton,lsw,netex,netxcom,    >!hadron!jsdy
	paul,phw5,research,rlgvax,seismo,sms,    /
	smsdpg,sundc,telenet,uunet              /

jeenglis@nunki.usc.edu (Joe English) (03/11/89)

karl@haddock.ima.isc.com (Karl Heuer) writes:
>In article <15495@cup.portal.com> Kevin_P_McCarty@cup.portal.com writes:
>>Is there any guaranteed way to detect an out of range pointer,
>>i.e., one which is supposed to point into an array but might not?
>
>Yes (my distinguished colleagues to the contrary notwithstanding):
>	int within(void *ptr, void *a, size_t n) {
>	    char *p;
>	    for (p = (char *)a; p < (char *)a + n; ++p) {
>	        if ((char *)ptr == p) return (1);
>	    }
>	    return (0);
>	}
>This works because pointer *equality* is well-defined even on pointers into
>different arrays.  

Well, maybe not...

Take, for example, the (you guessed it) 80x86 series in 
'large' model, where pointers aren't normalized.  It's
then possible for two pointers to point to the same location
yet have different segment:offset pair values.  Most '86 compilers
will *not* get this comparison right, since pointers are in general
not normalized before comparisons to save time and space.

>However, on any given implementation there ought to be
>an unportable way (e.g., with a type pun followed by one or more integer
>compares).

Right -- in Microsoft/Turbo C a cast to 'huge *' should do the
trick.  On most reasonable architectures the code above ought to work
(but it's still nonportable.)

\begin{IMHO}

In My Opinion, you should never *have* to check a raw pointer for
validity.  Any code that might possibly generate an out-of-range
pointer should check the subscript (or loop count, or whatever)
beforehand.  I wouldn't bother to validate pointers inside, say, a
utility routine either (other than checking for non-NULL), because it
takes space and time and, quite frankly, it's the caller's
responsibility not to pass bad pointers around.

\begin{VeryBiasedOpinion}

Range-checking and pointer validation are for programmers who are
afraid that they might shoot themselves in the foot.  Correct code
will avoid generating bad pointers (among other things) in the first
place.  True, this is easier said than done -- but writing correct
code is much more worth the effort than inserting runtime checks all
over the place.

\end{VeryBiasedOpinion}

By the way, I don't practice what I preach as often as I should.

\end{IMHO}

--Joe English

  jeenglis@nunki.usc.edu

bill@twwells.uucp (T. William Wells) (03/12/89)

In article <3011@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes:
: Take, for example, the (you guessed it) 80x86 series in
: 'large' model, where pointers aren't normalized.  It's
: then possible for two pointers to point to the same location
: yet have different segment:offset pair values.  Most '86 compilers
: will *not* get this comparison right, since pointers are in general
: not normalized before comparisons to save time and space.

In which case, the compiler is broken. But we knew that...?

: In My Opinion, you should never *have* to check a raw pointer for
: validity.  Any code that might possibly generate an out-of-range
: pointer should check the subscript (or loop count, or whatever)
: beforehand.  I wouldn't bother to validate pointers inside, say, a
: utility routine either (other than checking for non-NULL), because it
: takes space and time and, quite frankly, it's the caller's
: responsibility not to pass bad pointers around.

One might want to check pointer validity to cope with program
behavior that is outside the C model.  Such can result in an invalid
pointer even when all other pointers are valid. Consider a pointer
munged by a bad array reference.

: Range-checking and pointer validation are for programmers who are
: afraid that they might shoot themselves in the foot.  Correct code
: will avoid generating bad pointers (among other things) in the first
: place.  True, this is easier said than done -- but writing correct
: code is much more worth the effort than inserting runtime checks all
: over the place.

When you find a perfect programmer, then we can talk about correct
code and its relevance to programming. Till then, we will have to
deal with less than correct code and thus will be forced to deal with
the fact that a C program can operate in non-C ways. And so run time
checks are and will remain important, in many applications.

---
Bill
{ uunet | novavax } !twwells!bill
(BTW, I'm going to be looking for a new job sometime in the next
few months.  If you know of a good one, do send me e-mail.)

bobmon@iuvax.cs.indiana.edu (RAMontante) (03/12/89)

bill@twwells.UUCP (T. William Wells) <767@twwells.uucp> :
-In article <3011@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes:
-: Take, for example, the (you guessed it) 80x86 series in
-: 'large' model, where pointers aren't normalized.  It's
-: then possible for two pointers to point to the same location
-: yet have different segment:offset pair values.  Most '86 compilers
-: will *not* get this comparison right, since pointers are in general
-: not normalized before comparisons to save time and space.
-
-In which case, the compiler is broken. But we knew that...?


Hmm, this leads me to a question.  I think I understand that pANS requires a
valid comparison of pointers that refer to the same object (esp. array).
But does the compiler or run-time code need to KNOW that the pointers are to
the same object for this to hold?  Or is it sufficient that the pointers
refer to memory that is somehow associated (by malloc'ing perhaps)?

To put it another way -- If I get tricky enough with indirection, casting,
etc. (but I only do legal things), is it possible/legal to wind up with a
pointer comparison that is NOT guaranteed to produce a "correct" answer
within the bounds of pANS?

Vacationing minds hope the answer will be here when I get back....

bill@twwells.uucp (T. William Wells) (03/12/89)

In article <18460@iuvax.cs.indiana.edu> bobmon@iuvax.cs.indiana.edu (RAMontante) writes:
: Hmm, this leads me to a question.  I think I understand that pANS requires a
: valid comparison of pointers that refer to the same object (esp. array).
: But does the compiler or run-time code need to KNOW that the pointers are to
: the same object for this to hold?  Or is it sufficient that the pointers
: refer to memory that is somehow associated (by malloc'ing perhaps)?
:
: To put it another way -- If I get tricky enough with indirection, casting,
: etc. (but I only do legal things), is it possible/legal to wind up with a
: pointer comparison that is NOT guaranteed to produce a "correct" answer
: within the bounds of pANS?

No. The whole point of sticking with legal things is that, after you
are done, it all still works. So, for as long as each cast is a valid
one, indirection is only done when the pointer is of the right types,
and all the other things that are supposed to be done correctly are
in fact done correctly, one is not supposed to be able to produce an
"incorrect" answer.

---
Bill
{ uunet | novavax } !twwells!bill
(BTW, I'm going to be looking for a new job sometime in the next
few months.  If you know of a good one, do send me e-mail.)

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (03/15/89)

In article <767@twwells.uucp>, bill@twwells.uucp (T. William Wells) writes:
> In article <3011@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes:
> : In My Opinion, you should never *have* to check a raw pointer for
> : validity.  Any code that might possibly generate an out-of-range
> : pointer should check the subscript (or loop count, or whatever)
> : beforehand.  I wouldn't bother to validate pointers inside, say, a
> : utility routine either (other than checking for non-NULL), because it
> : takes space and time and, quite frankly, it's the caller's
> : responsibility not to pass bad pointers around.
> 
> One might want to check pointer validity to cope with program
> behavior that is outside the C model.  Such can result in an invalid
> pointer even when all other pointers are valid. Consider a pointer
> munged by a bad array reference.

There is a common case where the pointers are perfectly valid
and yet you still need to check to see if one points inside the
other even though they might actually point to completely different
objects.

Consider implementing memmov(void *to, void *from, size_t bytes).

You have to determine if the arrays starting at "to" and at "from"
overlap before you know which direction is safe for doing the copy.
i.e. if "from" points inside the "to" array, you start the copy at
the beginning of the array; but if "to" points inside the "from" array,
you start the copy at the end of the array.
(If neither condition is true it doesn't matter where you start the copy,
 and if both conditions are true you don't need to do the copy).

As has been discussed in recent postings, there isn't any obvious
simple way of performing these tests required by the pANS C library.

karl@haddock.ima.isc.com (Karl Heuer) (03/15/89)

In article <3011@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes:
>karl@haddock.ima.isc.com (Karl Heuer) writes:
>>This works because pointer *equality* is well-defined even on pointers into
>>different arrays.
>
>Well, maybe not... [80x86 in large model with unnormalized pointers]

If the compiler ever generates unnormalized pointers, then it had better be
prepared to generate code to compare them correctly.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

henry@utzoo.uucp (Henry Spencer) (03/16/89)

In article <24230@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>...There is a common case where the pointers are perfectly valid
>and yet you still need to check to see if one points inside the
>other even though they might actually point to completely different
>objects.
>Consider implementing memmov(void *to, void *from, size_t bytes).
>...
>As has been discussed in recent postings, there isn't any obvious
>simple way of performing these tests required by the pANS C library.

Nobody has ever pretended that it is possible to implement ANSI C in a
totally machine-independent way; there has to be hardware-specific code
down there somewhere.  The insides of things like memmove are likely
to be heavily machine-dependent in any case.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

jeenglis@nunki.usc.edu (Joe English) (03/17/89)

karl@haddock.ima.isc.com (Karl Heuer) writes:
>In article <3011@nunki.usc.edu> jeenglis@nunki.usc.edu (Joe English) writes:
>>karl@haddock.ima.isc.com (Karl Heuer) writes:
>>>This works because pointer *equality* is well-defined even on pointers into
>>>different arrays.
>>
>>Well, maybe not... [80x86 in large model with unnormalized pointers]
>
>If the compiler ever generates unnormalized pointers, then it had better be
>prepared to generate code to compare them correctly.

You're right, but Turbo C (and probably MSC as well)
don't bother to most of the time.  Since in 'large'
model all arrays have to fit in the same segment, any
two pointers calculated from the base of the array can
be compared just by their offsets.  In normal usage,
this isn't a problem; however, it is possible for two
pointers into *different* arrays to test equal -- which
isn't that bad, since that's a meaningless comparison
to start with -- or for a normalized (cast to huge *)
pointer to test unequal to the equivalent
non-normalized value -- which can cause problems
and is one of the many reasons why I don't use
'large' model.

(According to the TC documentation, pointers cast
 to huge or in the huge model always compare correctly,
 but the execution speed suffers terribly.)

--Joe English

  jeenglis@nunki.usc.edu

Kevin_P_McCarty@cup.portal.com (03/20/89)

In <3092@nunki.usc.edu>, jeenglis@nunki.usc.edu (Joe English) writes:
>Since in 'large'
>model all arrays have to fit in the same segment, any
>two pointers calculated from the base of the array can
>be compared just by their offsets.  In normal usage,
>this isn't a problem; however, it is possible for two
>pointers into *different* arrays to test equal

Not true.  I just tried it.  The <, <=, >= and > comparisons
use only the offset parts of the pointers, but == looks
at both segment and offset when pointers are 32 bits.
This is also documented in the manual (User's Guide v2.0, p.346).

It is possible however to have two pointers point to the same storage
location but which compare unequal.  A one-to-one mapping between
pointers and storage locations is not required.
Not only is it true that pointers are not ints, they don't even
behave like ints.

Kevin McCarty

chip@ateng.ateng.com (Chip Salzenberg) (03/21/89)

According to jeenglis@nunki.usc.edu (Joe English), referring to
normalizing 80x86 segment+offset pointers before comparing them:

>Turbo C (and probably MSC as well) don't bother to most of the time.  [...]

Quite true.  For totally correct comparisons of all pointers, it's necessary
to normalize them by hand, or be sure that they are cast to "huge *" when
any pointer arithmetic is done.  Otherwise, the different combinations of
segment+offset that actually refer to the same address do not compare equal.

>it is possible for two pointers into *different* arrays to test equal [...]

Fortunately, this is false.  Comparisons of *magnitude* only examine the
offset -- which is OK.  Comparisons of *equality* examine both the segment
and offset -- otherwise, any pointer with an offset of zero would compare
equal to NULL.

For understanding compiler options and non-standard keywords, there's no
substitute for looking at the assembler output of a compiler.  Yes, the
documentation is the definitive guide, but looking at the assembler output
is a wonderful aid in understanding what the documentation means.  :-)
-- 
Chip Salzenberg             <chip@ateng.com> or <uunet!ateng!chip>
A T Engineering             Me?  Speak for my company?  Surely you jest!
	  "It's no good.  They're tapping the lines."