[comp.lang.c] Pointer arithmetic and comparisons.

daveh@marob.masa.com (Dave Hammond) (12/08/89)

Machine/OS: 286-compatible/MSDOS
Compiler: Turbo-C 1.5

One method I use to avoid overflow while filling a buffer is to set a
pointer to the address of the last element (&buffer[last]), then compare
the current buffer position to the end pointer, testing that "current"
is never greater than "end".  An example fragment might be:

some_function(char *buffer, int len)
{
	char *p = buffer;
	char *e = &buffer[len];

	while ((*p++ = getchar()) != EOF && p < e) {
		...
		}

	...
}

This method has never bitten me on Unix/Xenix/*nix systems of any
flavor, on machines ranging from 286's to 68K's to Sparcs.  Now, in an
attempt to port to MSDOS/Turbo-C, this method breaks.

The problem occurs when the address resulting from &buffer[len] exceeds
65535.  For example, if &buffer[0] is 65535 and len is 100, &buffer[len]
becomes 99, making `while (p < e)' immediately false.

I was under the impression that (for n > 0) buffer[n] should not yield
an address lower than buffer[0].  Is the pointer comparison I am doing
non-portable, or otherwise ill-advised ?

BTW, I am well aware of alternate methods of checking for buffer
overflow, so please do not suggest alternatives.  I am interested in why
*this* method fails.

Respond via e-mail if you feel this is of little net-wide interest.
Post if you feel others might benefit.

Thanks in advance.

--
Dave Hammond
daveh@marob.masa.com

c9h@psuecl.bitnet (12/08/89)

In article <257ECDFD.CDD@marob.masa.com>, daveh@marob.masa.com (Dave Hammond) writes:
> The problem occurs when the address resulting from &buffer[len] exceeds
> 65535.  For example, if &buffer[0] is 65535 and len is 100, &buffer[len]
> becomes 99, making `while (p < e)' immediately false.
>
> I was under the impression that (for n > 0) buffer[n] should not yield
> an address lower than buffer[0].  Is the pointer comparison I am doing
> non-portable, or otherwise ill-advised ?

This will *only* occur if you are accessing an element outside of the
array's boundaries.

--
- Charles Martin Hannum II       "Klein bottle for sale ... inquire within."
    (That's Charles to you!)     "To life immortal!"
  c9h@psuecl.{bitnet,psu.edu}    "No noozzzz izzz netzzzsnoozzzzz..."
  cmh117@psuvm.{bitnet,psu.edu}  "Mem'ry, all alone in the moonlight ..."

gs940971@longs.LANCE.ColoState.Edu (glen sunada f84) (12/08/89)

In article <257ECDFD.CDD@marob.masa.com>, daveh@marob.masa.com (Dave
Hammond) writes:
> Machine/OS: 286-compatible/MSDOS
> Compiler: Turbo-C 1.5
>    [ stuff deleted in concern of bandwidth]
> some_function(char *buffer, int len)
> {
> 	char *p = buffer;
> 	char *e = &buffer[len];
> 
> 	while ((*p++ = getchar()) != EOF && p < e) {
> 		...
> 		}
> 
> 	...
> }
>     [ more stuff deleted ]
> 
> The problem occurs when the address resulting from &buffer[len] exceeds
> 65535.  For example, if &buffer[0] is 65535 and len is 100, &buffer[len]
> becomes 99, making `while (p < e)' immediately false.
> 
> I was under the impression that (for n > 0) buffer[n] should not yield
> an address lower than buffer[0].  Is the pointer comparison I am doing
> non-portable, or otherwise ill-advised ?
> 
> Thanks in advance.
> 
> --
> Dave Hammond
> daveh@marob.masa.com

This question is probably one that a lot of people on the net have gotten bit
by or will get bit by, so I will answer over the net.  Yes the value of the
pointer to the end of the string array is suposed to be larger than the pointer
to the beginning of the string array.

What is causeing the problems with the comparison in the MS-DOS
enviroment is that
the 8086 family of micro-processors has a segmented architecture.  The
net result
of this is that in compareing pointers you need to compare both the
segment and the
offset to determine the placement in memory.  Because of the segmented
architecture
compilers of MS-DOS machines support many memory models.  These models
are listed below:
     TINY - One segment - all pointers are a 16 bit offset
     SMALL - One segment for code and one for data - all pointers are a
16 bit offset but
             code is in a different segment than data (i.e. pointers to
functions have a
             different segment that pointers to data) - usual default
                 NOTE: The example you give should result in a stack
corruption problem
     MEDIUM - any number of code segments on data segment pointers are
non-normalized
              (i.e. there are many segment:offset pairs that resolve to
the same linear
               address.  (segmets are defined on 16 byte boundries wit a
size of 64K) )
     COMPACT - one code segment any number of data segments
non-normalized pointers
     LARGE - any number of code segments, any number of data segments,
non-normalized
             pointers
     HUGE - and number of code segments, any number of data segments,
normalized pointers

There exists also another difference for non-normalized and normalized
pointers.  The non-
normalized pointers wrap around at 64K (65536 -> 0) and do not adjust
the segment. Therefor,
in the code fragment you posted when p is incremented it points to
offset zero in the same
segment as the start of the array, not the segment starting 64K above
the one that holds the
start of the array.  But, a normalized pointer has it's offset
automatically reset to keep
the offset less than 16 bytes.  To do this the segment is adjusted. 
This means that using
huge pointers in the example above should (CMA - Cover My A**) fix the
problem with wrap around.

There is a dis-advantage to working with huge pointers - they take more
time because of the
normalization.

I hope this quick overview of the architecture of the IBM-PC helps and
is not just a waste of
network bandwidth.

BTW - this segmentation is not a poblem with *NIX on a PC because *NIX
automatically uses the
HUGE memory model for an 8088, 8086, 80188, 80186, and 80286 and either
the HUGE or the
protected mode flat memory model for the 80386 (unfortunately this model
is not available
to DOS on a 80386 beacuse DOS runs in real mode not protectred mode).

Glen U. Sunada
gs940971@longs.LANCE.ColoState.EDU

sdawalt@wright.EDU (Shane Dawalt) (12/10/89)

in article <257ECDFD.CDD@marob.masa.com>, daveh@marob.masa.com (Dave Hammond) says:
> 
> The problem occurs when the address resulting from &buffer[len] exceeds
> 65535.  For example, if &buffer[0] is 65535 and len is 100, &buffer[len]
> becomes 99, making `while (p < e)' immediately false.
> 

  That same problem also haunted me: Could a block be allocated such that the
block was split in two by the segment boundry?  I was inclined to believe
not.  I was correct for once.  The question was asked in the Compuserve
Borland forum (BPROGB).  Both Borland techs and forum members assured me this
cannot happen unless I attempt to access outside of the allocated block.

  Shane    sdawalt@cs.wright.edu

ejp@bohra.cpg.oz (Esmond Pitt) (12/11/89)

In article <257ECDFD.CDD@marob.masa.com> daveh@marob.masa.com (Dave Hammond) writes:
>
>One method I use to avoid overflow while filling a buffer is to set a
>pointer to the address of the last element (&buffer[last]), then compare
>the current buffer position to the end pointer, testing that "current"
>is never greater than "end" ...
>
>This method has never bitten me on Unix/Xenix/*nix systems of any
>flavor, on machines ranging from 286's to 68K's to Sparcs.  Now, in an
>attempt to port to MSDOS/Turbo-C, this method breaks.

That's because the last element is not &buffer[last] but &buffer[last-1],
and so you should test for <= &buffer[last-1], not < &buffer[last].
You are incrementing a pointer to point outside the object, and this is
not guaranteed to work under _any_ implementation of C. Your code
should read:

some_function(char *buffer, int len)
{
	char *p = buffer;
	char *e = &buffer[len-1];

	while ((*p++ = getchar()) != EOF && p <= e) {
	...
	}

	...
}
-- 
Esmond Pitt, Computer Power Group
ejp@bohra.cpg.oz

bill@twwells.com (T. William Wells) (12/11/89)

In article <232@bohra.cpg.oz> ejp@bohra.UUCP (Esmond Pitt) writes:
: That's because the last element is not &buffer[last] but &buffer[last-1],
: and so you should test for <= &buffer[last-1], not < &buffer[last].
: You are incrementing a pointer to point outside the object, and this is
: not guaranteed to work under _any_ implementation of C.

Not only is this going to work in most implementations, it is
*required* to work by the ANSI standard.

My guess is that his object is exactly 64K bytes long and he is
running into the silly segmentation problems of the 8086. There
are two fixes: use an object of no more than 64K-1 bytes or use
whatever model it is that does the right thing with objects of
size 64K and larger.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/12/89)

In article <232@bohra.cpg.oz> ejp@bohra.UUCP (Esmond Pitt) writes:

| That's because the last element is not &buffer[last] but &buffer[last-1],
| and so you should test for <= &buffer[last-1], not < &buffer[last].
| You are incrementing a pointer to point outside the object, and this is
| not guaranteed to work under _any_ implementation of C. Your code
| should read:

  There seems no reason why "< &buf[last]" should fail. ANSI C allows
you to take the address of the element past the end of the array,
although it doesn't require thast you be allowed to dereference it.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

rob@cs.mu.oz.au (Robert Wallen) (12/12/89)

In article <232@bohra.cpg.oz| ejp@bohra.UUCP (Esmond Pitt) writes:
|In article <257ECDFD.CDD@marob.masa.com> daveh@marob.masa.com (Dave Hammond) writes:
|>
|>One method I use to avoid overflow while filling a buffer is to set a
|>pointer to the address of the last element (&buffer[last]), then compare
|>the current buffer position to the end pointer, testing that "current"
|>is never greater than "end" ...

You dont work on GNU by any chance, do you?  They seem to have this annoying
habit.  Annoying because its hard to port to most DOS compilers where memory
is broken up into stupid little chunks with segment/offset pairs

|>This method has never bitten me on Unix/Xenix/*nix systems of any
|>flavor, on machines ranging from 286's to 68K's to Sparcs.  Now, in an
|>attempt to port to MSDOS/Turbo-C, this method breaks.

Although K&R sez that you should be able to do p < q where p and q point to
members of the same array, Turbo C hangs an extra requirement on you.  That is,
that the pointers are normallized as well.  Either use the HUGE model or
normalize the pointers before comparison.  This has made my porting of gnu-egrep
real tedious to the point of not bothering...

Personally, I favor wasting a whole integer and keeping track of the number
of elements that I have used already; integer comparison is not only 
guaranteed to work, it should be fasted given sizeof(int)<sizeof(int *)

Now a beef.  Why check for (p <= pend)?  Why not just ( p == pend) ?  After all,
its not as though your code might accidentally skip over the last entry !!

chris@mimsy.umd.edu (Chris Torek) (12/13/89)

In article <232@bohra.cpg.oz> ejp@bohra.cpg.oz (Esmond Pitt) writes:
>... you should test for <= &buffer[last-1], not < &buffer[last].

It makes no difference:
>	char *p = buffer;
>	char *e = &buffer[len-1];
>
>	while ((*p++ = getchar()) != EOF && p <= e) {

(For the moment, let us ignore the fact that *p==EOF might well never be
true.)  Chances are that if the loop

	for (p = base; p <= &base[SIZE - 1]; p++)

works, the loop

	for (p = base; p < &base[SIZE]; p++)

will also work.  In particular, the latter loop tends to fail when
base+SIZE `wraps around', as it were, to 0.  That is, &base[SIZE] is
very small, and hence (p < &base[SIZE]) is always false.

Suppose this is the case.  Now let p be &buf[SIZE - 1], and assume we
are working on the former loop (p <= &base[SIZE - 1]).  Then, if
arithmetic works `as expected', p++ will cause p to wrap around to zero,
and the next iteration will test a very small `p' against a large address
and the loop will continue.  That is, the loop will run forever, rather
than not at all.

The only time something useful happens is when &base[SIZE] is computed
in a different way than (&base[SIZE - 1]) + 1.  Any system that does
this is not compiling C code correctly.

The situation is almost exactly the same as the case of

	for i := 0 to maxint do ... od;

which must be coded in C as

	for (i = 0;; i++) {
		...
		if (i == INT_MAX) break;
	}

(or any equivalent).  The variable i must not be incremented when
it is INT_MAX, as the result can either be overflow or INT_MIN (e.g.,
from 32767 to -32768, or from 2147483647 to -2147483648).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

bright@Data-IO.COM (Walter Bright) (12/14/89)

In article <2908@murtoa.cs.mu.oz.au> rob@murtoa.UUCP (Robert Wallen) writes:
<Now a beef.  Why check for (p <= pend)?  Why not just ( p == pend) ?
<After all,
<its not as though your code might accidentally skip over the last entry !!

(p <= pend) generates more efficient code (on PC's) than (p == pend), because
on the former only the offset portions of the pointer need to be compared,
while on the latter both the segment and the offset must be compared.

karl@haddock.ima.isc.com (Karl Heuer) (12/14/89)

In article <2908@murtoa.cs.mu.oz.au> rob@murtoa.UUCP (Robert Wallen) writes:
>Although K&R sez that you should be able to do p < q where p and q point to
>members of the same array, Turbo C hangs an extra requirement on you.  That
>is, that the pointers are normallized as well.

If the pointers point into the same array, which is smaller than 64K, why
shouldn't they already be normalized?

>Now a beef.  Why check for (p <= pend)?  Why not just (p == pend) ?

I believe this one is in The Elements of Programming Style.  To roughly
paraphrase from memory (it's been a decade since I read it): when either of
two comparisons is correct (because the case where they differ "can't
happen"), it's considered better to use the one that would stop the loop in
the event that the "impossible" happens.

Also, in a segmented implementation (p == q) is generally slower.  For
equality compares, the compiler has to ensure that the pointers are
normalized, but with non-huge use of (p < q) it can assume that the pointers
will already be in the same segment.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

peter@ficc.uu.net (Peter da Silva) (12/14/89)

> <Now a beef.  Why check for (p <= pend)?  Why not just ( p == pend) ?

In article <2241@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:
> (p <= pend) generates more efficient code (on PC's) than (p == pend), because
> on the former only the offset portions of the pointer need to be compared,
> while on the latter both the segment and the offset must be compared.

Implementation-dependent micro-optimisation isn't a good reason. A good
one is that !p <= pend! is a broader termination condition, and thus safer.
-- 
`-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
 'U`  Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org>.
"Scientific progress is predicated on name-calling and intolerance, not to
 mention asking lots of stupid questions." -- Mark S Madsen

fogel@dvinci.usask.ca (Earl Fogel) (12/15/89)

daveh@marob.masa.com (Dave Hammond) writes:
>Machine/OS: 286-compatible/MSDOS
>Compiler: Turbo-C 1.5
>
>The problem occurs when the address resulting from &buffer[len] exceeds
>65535.  For example, if &buffer[0] is 65535 and len is 100, &buffer[len]
>becomes 99, making `while (p < e)' immediately false.
>
>I was under the impression that (for n > 0) buffer[n] should not yield
>an address lower than buffer[0].  Is the pointer comparison I am doing
>non-portable, or otherwise ill-advised ?
>

The problem, I believe, is in the Memory Model you are using in
Turbo C.  If your data does not all fit into one 65535 byte segment,
then you have to declare pointers to be 'huge' for them to properly
address your buffer.