[comp.std.c] correct code for pointer subtraction

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (01/06/89)

In article <PINKAS.89Jan3082456@hobbit.intel.com> pinkas@hobbit.intel.com (Israel Pinkas ~) writes:

>>   > >         static int a[30000];
>>   > >         printf("%d\n",&a[30000]-a);

>                                                Let's ignore the fact that
>there is no a[30000] (and that taking its address is invalid). 

I have been told that dpANS explicitly states that the address of
"one-after-last" element of an array may be taken, and subtractions 
like the above are legal and should give correct result.
I do not have an access to the the dpANS - could somebody who does
please look this up?  
In any case all compilers I know of do it just fine
(unless some kind of overflow occurs, like in this very example -
but that's independent of how big the array is declared)
and a lot of existing code does rely on it.

Regarding the original problem, it *is* possible to do the subtraction
correctly, although not simply by using unsigned division.
Here is one way I think would work (on the left is what Turbo C
generates, for comparison):

                             	xor 	dx,dx
 mov    ax,&a[30000]         	mov 	ax,&a[30000]
 sub    ax,a                 	sub 	ax,a
 mov    bx,2                 	mov 	bx,2
 cwd                         	sbb 	dx,dx
 idiv   bx                   	idiv 	bx

I.e., take advantage of the fact that we can treat carry and
AX as one 17-bit register containing the result of subtraction.
It will cost a few clock cycles, I'm afraid.
In this particular case it can actually be done with
no speed penalty with the following trick:

 mov	ax,&a[30000]
 sub	ax,a
 rcr	ax

In general case it seems we must choose between doing it fast
and getting it right every time.  Perhaps a compiler option for
those who would otherwise use an old compiler version to save
the two cycles or whatever it costs...

Tapani Tarvainen
------------------------------------------------------------------
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi
BitNet:    tarvainen@finjyu

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/07/89)

In article <18683@santra.UUCP> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>>>   > >         static int a[30000];
>>>   > >         printf("%d\n",&a[30000]-a);
>I have been told that dpANS explicitly states that the address of
>"one-after-last" element of an array may be taken, and subtractions 
>like the above are legal and should give correct result.

Almost.  Address of "one after last" is legal for all data objects,
but of course cannot be validly used to access the pseudo-object.
All objects can be considered to be in effect arrays of length 1.
Pointers to elements of the same array object can be subtracted;
the result is of type ptrdiff_t (defined in <stddef.h>).
The example above assumes that ptrdiff_t is int,
which is not guaranteed by the pANS.
Casting to another, definite, integral type such as (long)
would make the result portably usable in printf() etc.

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (01/09/89)

In article <9878@drutx.ATT.COM> mayer@drutx.ATT.COM (gary mayer) writes:
>In article <18123@santra.UUCP>, tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>
>> The same error occurs in the following program 
>> (with Turbo C 2.0 as well as MSC 5.0):
>>
>> main()
>> {
>>         static int a[30000];
>>         printf("%d\n",&a[30000]-a);
>> }
>>
>> output:  -2768
>
>I grant that this is probably not the answer you would like, but it
>is the answer you should expect once pointer arithmetic is understood.
>
[deleted explanation (very good, btw) about why this happens]
>
>In summary, be careful with pointers on these machines, and try to
>learn about how things work "underneath".  The C language is very
>close to the machine, and there are many times that this can have
>an effect - understanding and avoiding these where possible is what
>writing portable code is all about.

I couldn't agree more with the last paragraph.  My point, however,
was that the result above is 
(1) Surprising: It occurs in small memory model, where both ints 
and pointers are 16 bits, and the result fits in an int).
When I use large data model I expect trouble with pointer
arithmetic and cast to huge when necessary, but it shouldn't
be necessary with the small model (or at least the manual
should clearly say it is).
(2) Unnecessary: Code that does the subtraction correctly has
been presented here.  
(3) WRONG according to K&R or dpANS -- or does either say that
pointer subtraction is valid only when the difference *in bytes*
fits in an int?  If not, I continue to consider it a bug.

Another matter is that the above program isn't portable
anyway, because (as somebody else pointed out),
pointer difference isn't necessarily an int (according to dpANS).
Indeed, in Turbo C the difference of huge pointers is long,
and the program can be made to work as follows:

         printf("%ld\n", (int huge *)&a[30000] - (int huge *)a);

Actually all large data models handle this example correctly (in Turbo C),
and thus casting to (int far *) also works here,
but as soon as the difference exceeds 64K (or the pointers
have different segment values) they'll break too, only 
huge is reliable then (but this the manual _does_ explain).

To sum up: Near pointers are reliable up to 32K,
far up to 64K, anything more needs huge.

With this I think enough (and more) has been said about the behaviour
of 8086 and the compilers, however I'd still want somebody
with the dpANS to confirm whether or not this is a bug
- does it say anything about when pointer arithmetic may
fail because of overflow?

------------------------------------------------------------------
Tapani Tarvainen                 BitNet:    tarvainen@finjyu
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

msb@sq.uucp (Mark Brader) (01/10/89)

Someone says:
Of the code:
		static int a[30000];
		printf("%d\n",&a[30000]-a);

Someone says:
> > I have been told that dpANS explicitly states that the address of
> > "one-after-last" element of an array may be taken, and subtractions 
> > like the above are legal and should give correct result.

And Doug Gwyn says:
> Almost. ... the result is of type ptrdiff_t (defined in <stddef.h>).
> The example above assumes that ptrdiff_t is int ...

Right so far.  But in addition, it's possible for a valid expression to
result in an overflow.  This is not a problem in the particular example
since 30000 can't overflow an int, but it's permissible for subscripts to
run higher than the maximum value that ptrdiff_t can contain.  In that
case, the analogous subtraction "like the above" would not work.

Section 3.3.6 in the October dpANS says:
#  As with any other arithmetic overflow, if the result does not fit in
#  the space provided, the behavior is undefined.

Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb@sq.com
	A standard is established on sure bases, not capriciously but with
	the surety of something intentional and of a logic controlled by
	analysis and experiment. ... A standard is necessary for order
	in human effort.				-- Le Corbusier