[comp.lang.c] correct code for pointer subtraction

carlp@iscuva.ISCS.COM (Carl Paukstis) (12/20/88)

(I apologize in advance for the length of this and for omitting several
comments by other folks.  I wanted to get the whole of the main context
since I am crossposting to comp.lang.c and redirecting followups there.)

Eric Gisin at Mortice Kern Systems writes:
>
>How come I can't find a compiler that generates correct
>code for pointer subtraction in C on 8086s?
>Neither Turbo, Microsoft, or Watcom do it right.
>Here's an example:
>
>struct six {
>	int i[3];		/* six bytes, at least for MSC (comment by C.P.) */
>};
>
>int diff(struct six far* p, struct six far* q) {
>  	return p - q;
>}
>
>main(void) {
>	struct six s[1];
>	printf("%d\n", diff(s+10000, s));	/* 10000 */
>	printf("%d\n", diff(s, s+100));		/* -100 */
>}
>
>All of the compilers I tried computed a 16 bit difference,
>then sign extended it before dividing.
>This does not work if the pointers differ by more than 32K.

(NOTE CRITICAL POINT FOR ERIC'S COMPLAINT:  the difference between s and
s+10000 is 60,000 bytes - easily less that the 64K segment limit)

Then I (Carl Paukstis) pick a nit and respond:

>Of course, the code you posted is NOT legal, since the two pointers in the
>example *do not* point inside the same object.  You have verified that the
>incorrect code is generated when you IN FACT declare "struct six s[10000]"?
>If so, it's a bona-fide bug.  But if it won't work with your example, the
>worst conclusion you can directly draw is that your example is "not 
>conformant".

And Eric (thoroughly frustrated by Intel architecture by now) responds:

>Summary: Oh my god
>
>No, I DO NOT have to verify that it still generates incorrect code
>when I declare "s" as s[10000]. "diff" is a global function,
>and could be called from another module with a legal object.
>A compiler with a dumb linker cannot generate 
>code for diff depending on how it is called in than module.

OK, I admit, I was picking a nit.  I stand by my original comment, but please
note that I wasn't claiming that it DID work, only that Eric's posted
code didn't PROVE it didn't work.

(In fact, of course, it does NOT work, even if one defines s[10000])

Anyway, I got interested and did some actual research (who, me? find facts 
before I post? nontraditional for me, I admit) with Microsoft C 5.1.  I even
went so far as to read the manual.  In chapter 6 (Working With Memory
Models) of the _Microsoft C Optimizing Compiler User's Guide_, I find
table 6.1 (Addressing of Code and Data Declared with near, far, and
huge).  The row for "far", column for "Pointer Arithmetic", says "Uses
16 bits".  Hmmm. This is consistent with Eric's results, if a tad
ambiguous - they only use 16 bits for ALL the arithmetic, including the
(required) signedness of the address difference.

I also find in section 6.3.5 (Creating Huge-Model Programs), the
following paragraph:

"Similarly, the C language defines the result of subtracting two
pointers as an _int_ value.  When subtracting two huge pointers,
however, the result may be a _long int_ value.  The Microsoft C
Optimizing Compiler ggives the correct result when a type cast like the
following is used:
    (long)(huge_ptr1 - huge_ptr2)"

So, I altered the "diff()" function as follows:

long diff(struct six huge* p, struct six huge* q) {
  	return (long)(p - q);
}

(and changed the printf() format specs to "%ld")

and left the rest of the program exactly as given in the original post.
No great surprise, it works fine.  Compile with any memory model
{small|medium|compact|large|huge} and it still works.  Split the code so
that diff() is defined in a different source file (but put a 
prototype declaration in the file with main()) and it still works.  The
prototype gets arguments promoted to a type in which MSC is capable of
doing correct pointer subtraction.  The cast of the return type is
apparently necessary for subtracting "huge" pointers - even in "huge"
model.  If one removes the model designators (far or huge) from the
prototype for diff() and compiles in "huge" model, it is still necessary
to cast and return a long.  My code presented above seems a complete and
fairly non-intrusive solution; it works for any compilation memory model.

Eric: does this prototype declaration and return type satisfy your needs
to avoid grepping through thousands of lines of code and changing same?

Gentlepersons all: is this about the best job Microsoft could have done,
given the wonderfulness of Intel segmented address space?

What's the moral of the story? (nb: not necessarily intended for Eric,
who I'm sure is aware of all this):

1)  Examine the manuals for odd requirements of your target environment
    with utmost care.  Experiment in this environment.
2)  Use prototypes whenever you have a compiler that supports them.
    They can be a BIG help in odd situations like this.
3)  Avoid huge (in the abstract, not Intel/Microsoft, sense) data
    objects whenever possible.
4)  Avoid Intel-based systems whenever possible :-)
5)  all of the above
6)  1) and 2)
7)  Isn't this article too damned long already?  Shaddup!
-- 
Carl Paukstis    +1 509 927 5600 x5321  |"The right to be heard does not
                                        | automatically include the right
UUCP:     carlp@iscuvc.ISCS.COM         | to be taken seriously."
          ...uunet!iscuva!carlp         |                  - H. H. Humphrey

egisin@mks.UUCP (Eric Gisin) (12/22/88)

In article <2245@iscuva.ISCS.COM>, carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
> Eric: does this prototype declaration and return type satisfy your needs
> to avoid grepping through thousands of lines of code and changing same?

It was fairly easy to find relevant code, it was all of the form
	... (ptr - array) ...
all I had to do was grep for "array".

The program was compiled large model, so "array" was implicitly "far*".
There was no way I could declare "array" huge,
the performance loss would be too great.

I had to change the code, and since this is a portable program
and I don't like scattering "#if PC" all over the place,
I defined a macro PTRDIFF(p,q) and replaced all occurences
of (ptr-array) with PTRDIFF(ptr,array).

There are several possible definitions for PTRDIFF:
#define	PTRDIFF(p,q) (int)((TYPE huge*)(p) - (TYPE huge*)(q))
where TYPE has to be replaced with the type of p and q,
  or
#define	PTRDIFF(p,q) (int)(((long)(unsigned)(p) - (long)(unsigned)(q)) / sizeof(*q))

Both involve calls to library support functions, but
there is a more efficient definition of PTRDIFF that works
when p>=q holds (this was the case, I am deriving a non-negative
array index from a pointer to an array element). It is:

/* machine dependencies */
#if !PC
#define	PTRDIFF(p, q)	((p) - (q))
#else
/* 8086 compiler writers are incapable of generating correct code */
#define	PTRDIFF(p, q)	(((unsigned)(p) - (unsigned)(q)) / sizeof(*q))
#endif

All it does is subtract the pointer offsets, resulting in an
*unsigned* (not signed) size_t which I then divide by the object's size.

rwwetmore@grand.waterloo.edu (Ross Wetmore) (12/26/88)

In article <2245@iscuva.ISCS.COM> carlp@iscuva (Carl Paukstis) writes:
>Eric Gisin at Mortice Kern Systems writes:
>>How come I can't find a compiler that generates correct
>>code for pointer subtraction in C on 8086s?
>>Neither Turbo, Microsoft, or Watcom do it right.
>>
>>All of the compilers I tried computed a 16 bit difference,
>>then sign extended it before dividing.
>>This does not work if the pointers differ by more than 32K.
>
>(NOTE CRITICAL POINT FOR ERIC'S COMPLAINT:  the difference between s and
>s+10000 is 60,000 bytes - easily less that the 64K segment limit)
  The 64K segment limit has little to do with it. The 16 bit architecture
ie 16 bit _int_ is the determining factor.

>"Similarly, the C language defines the result of subtracting two
>pointers as an _int_ value.  
  ... as you posted.

>Gentlepersons all: is this about the best job Microsoft could have done,
>given the wonderfulness of Intel segmented address space?
>
>4)  Avoid Intel-based systems whenever possible :-)
  ... but forgot when you let your prejudices take control.

  Have you tried the same on a 32 bit VAX where the addresses differ by
more than 2**31 - 1? Oops, DEC ducked this one by putting the 'negative'
addresses into a separate 'system' space so the address space is still
describable by a positive _int_. However, is the point not clear ... ?

>Carl Paukstis    +1 509 927 5600 x5321  

Ross W. Wetmore                 | rwwetmore@water.NetNorth
University of Waterloo          | rwwetmore@math.Uwaterloo.ca
Waterloo, Ontario N2L 3G1       | {uunet, ubc-vision, utcsri}
(519) 885-1211 ext 4719         |   !watmath!rwwetmore

carlp@iscuva.ISCS.COM (Carl Paukstis) (01/04/89)

In article <22905@watmath.waterloo.edu> rwwetmore@grand.waterloo.edu (Ross Wetmore) writes:
>In article <2245@iscuva.ISCS.COM> carlp@iscuva (Carl Paukstis) writes:
>>Eric Gisin at Mortice Kern Systems writes:
>>>How come I can't find a compiler that generates correct
>>>code for pointer subtraction in C on 8086s?
>>>Neither Turbo, Microsoft, or Watcom do it right.
>>>
>>>All of the compilers I tried computed a 16 bit difference,
>>>then sign extended it before dividing.
>>>This does not work if the pointers differ by more than 32K.
                                                          ^^^ BYTES!
>>
>>(NOTE CRITICAL POINT FOR ERIC'S COMPLAINT:  the difference between s and
>>s+10000 is 60,000 bytes - easily less that the 64K segment limit)

>  The 64K segment limit has little to do with it. The 16 bit architecture
>ie 16 bit _int_ is the determining factor.

I'm not sure which side of this I really want to argue :-)

In Eric's original code, the difference was between two pointers to
structures, each structure six bytes long.  The pointer difference, if
properly computed, comes out 10,000 - which I would think is fairly easy to
represent in a 16-bit int.  The difference in BYTES (a necessary
intermediate step in the generated code) doesn't fit in a 16-bit signed
int, which Microsoft (sort of) recognizes with their comment in the manual
from which I was quoting about casting to long.  Apparently they do "the
right thing" when the pointers are "huge" - they do a 32-bit (20-bit?) 
difference using the segment registers.  

This is what prompted my (admittedly somewhat muddled) remark about 
"CRITICAL POINT".  They provide a way to get the right answer, but only
when you use "huge" pointers, which include the segment information.

>  ... but forgot when you let your prejudices take control.

OK, I'm not happy about segmented address space, at least the Intel
version.  I do find the MS-DOS software base useful, and I even kind of
enjoy the arcana of 80?86 PC's - call me a masochist.

>                                 However, is the point not clear ... ?

Which point was that?  What DOES happen in other 16-bit-int environments?
Would somebody care to run Eric's example and let me know the outcome?  I'm
tempted to agree with Eric that a compiler which doesn't get the END RESULT
of the pointer arithmetic right is broken.  At least Microsoft provides a
way to get the correct result, albeit with some "unusual" coding - what
does it take to get the right result in another 16-bit-int environment?

-- 
Carl Paukstis    +1 509 927 5600 x5321  |"The right to be heard does not
                                        | automatically include the right
UUCP:     carlp@iscuvc.ISCS.COM         | to be taken seriously."
          ...uunet!iscuva!carlp         |                  - H. H. Humphrey

blarson@skat.usc.edu (Bob Larson) (01/04/89)

In article <2254@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) writes:

[discusssion of intel brain damage mostly omitted]

>At least Microsoft provides a
>way to get the correct result, albeit with some "unusual" coding - what
>does it take to get the right result in another 16-bit-int environment?

Pointer subtraction result is a long if int might not be big enough to
hold it.  A couple of bugs in beta-test Mg were related to this.  This
seems to be strictly an intel bug.  (I've also used C compilers where
pointer subtraction resulted in a 16 bit int -- but only 64k bytes of
memory were addressable.)
-- 
Bob Larson	Arpa: Blarson@Ecla.Usc.Edu	blarson@skat.usc.edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson
Prime mailing list:	info-prime-request%ais1@ecla.usc.edu
			oberon!ais1!info-prime-request

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (01/06/89)

In article <PINKAS.89Jan3082456@hobbit.intel.com> pinkas@hobbit.intel.com (Israel Pinkas ~) writes:

>>   > >         static int a[30000];
>>   > >         printf("%d\n",&a[30000]-a);

>                                                Let's ignore the fact that
>there is no a[30000] (and that taking its address is invalid). 

I have been told that dpANS explicitly states that the address of
"one-after-last" element of an array may be taken, and subtractions 
like the above are legal and should give correct result.
I do not have an access to the the dpANS - could somebody who does
please look this up?  
In any case all compilers I know of do it just fine
(unless some kind of overflow occurs, like in this very example -
but that's independent of how big the array is declared)
and a lot of existing code does rely on it.

Regarding the original problem, it *is* possible to do the subtraction
correctly, although not simply by using unsigned division.
Here is one way I think would work (on the left is what Turbo C
generates, for comparison):

                             	xor 	dx,dx
 mov    ax,&a[30000]         	mov 	ax,&a[30000]
 sub    ax,a                 	sub 	ax,a
 mov    bx,2                 	mov 	bx,2
 cwd                         	sbb 	dx,dx
 idiv   bx                   	idiv 	bx

I.e., take advantage of the fact that we can treat carry and
AX as one 17-bit register containing the result of subtraction.
It will cost a few clock cycles, I'm afraid.
In this particular case it can actually be done with
no speed penalty with the following trick:

 mov	ax,&a[30000]
 sub	ax,a
 rcr	ax

In general case it seems we must choose between doing it fast
and getting it right every time.  Perhaps a compiler option for
those who would otherwise use an old compiler version to save
the two cycles or whatever it costs...

Tapani Tarvainen
------------------------------------------------------------------
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi
BitNet:    tarvainen@finjyu

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/07/89)

In article <18683@santra.UUCP> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>>>   > >         static int a[30000];
>>>   > >         printf("%d\n",&a[30000]-a);
>I have been told that dpANS explicitly states that the address of
>"one-after-last" element of an array may be taken, and subtractions 
>like the above are legal and should give correct result.

Almost.  Address of "one after last" is legal for all data objects,
but of course cannot be validly used to access the pseudo-object.
All objects can be considered to be in effect arrays of length 1.
Pointers to elements of the same array object can be subtracted;
the result is of type ptrdiff_t (defined in <stddef.h>).
The example above assumes that ptrdiff_t is int,
which is not guaranteed by the pANS.
Casting to another, definite, integral type such as (long)
would make the result portably usable in printf() etc.

chasm@killer.DALLAS.TX.US (Charles Marslett) (01/08/89)

In article <2254@iscuva.ISCS.COM>, carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
> In article <22905@watmath.waterloo.edu> rwwetmore@grand.waterloo.edu (Ross Wetmore) writes:
> I'm not sure which side of this I really want to argue :-)

I have been on one side for a long time, but I've got to agree with this
comment -- can't we go on to something else (how about Windows debuggers
... no, forget that!)?

>                 ...    What DOES happen in other 16-bit-int environments?
> Would somebody care to run Eric's example and let me know the outcome?  I'm
> tempted to agree with Eric that a compiler which doesn't get the END RESULT
> of the pointer arithmetic right is broken.  At least Microsoft provides a
> way to get the correct result, albeit with some "unusual" coding - what
> does it take to get the right result in another 16-bit-int environment?

The other environments I have used with 16-bit integers and 32-bit pointers
all converted the intermediate result to long (so got the right result) or
paid attention to the overflow and carry results of the 16-bit arithmetic
(so they also got the right answer).

> -- 
> Carl Paukstis    +1 509 927 5600 x5321  |"The right to be heard does not
>                                         | automatically include the right
> UUCP:     carlp@iscuvc.ISCS.COM         | to be taken seriously."
>           ...uunet!iscuva!carlp         |                  - H. H. Humphrey

===========================================================================
Charles Marslett
STB Systems, Inc.  <== Apply all standard disclaimers
Wordmark Systems   <== No disclaimers required -- that's just me
chasm@killer.dallas.tx.us

pja@ralph.UUCP (Pete Alleman) (01/09/89)

In article <2254@iscuva.ISCS.COM> carlp@iscuva.ISCS.COM (Carl Paukstis) writes:
>In article <22905@watmath.waterloo.edu> rwwetmore@grand.waterloo.edu (Ross Wetmore) writes:
>>In article <2245@iscuva.ISCS.COM> carlp@iscuva (Carl Paukstis) writes:
>>>Eric Gisin at Mortice Kern Systems writes:
>>>>How come I can't find a compiler that generates correct
>>>>code for pointer subtraction in C on 8086s?
>>>>Neither Turbo, Microsoft, or Watcom do it right.
>>>>
>>>>All of the compilers I tried computed a 16 bit difference,
>>>>then sign extended it before dividing.
>>>>This does not work if the pointers differ by more than 32K.
>                                                          ^^^ BYTES!
>>>
>>>(NOTE CRITICAL POINT FOR ERIC'S COMPLAINT:  the difference between s and
>>>s+10000 is 60,000 bytes - easily less that the 64K segment limit)
>
>>  The 64K segment limit has little to do with it. The 16 bit architecture
>>ie 16 bit _int_ is the determining factor.

But on a 16 bit machine you have the 16 bit result AND A CARRY FLAG.  That
gives the full 17 bit representation needed for the signed result!

>Which point was that?  What DOES happen in other 16-bit-int environments?
>Would somebody care to run Eric's example and let me know the outcome? 

I tried this simple program on a PDP-11 running K&R's V7 compiler:

	int a[20000];
	main ()
	{
		printf ("%d\n", &a[20000] - &a[0]);
	}

Sure enough, it gets the right answer! (20000)

Here is an example of the proper code on a 16 bit machine:
	_main:
		jsr	r5,csv
		mov	$-61700+_a,r1
		sub	$_a,r1
		bic	r0,r0
		sbc	r0
		div	$2,r0
		mov	r0,(sp)
		mov	$L4,-(sp)
		jsr	pc,*$_printf
		tst	(sp)+
		jmp	cret

The problem is NOT a limitation of the processor!  Some compilers just
generate WRONG CODE.  (I'm glad such stupidity is usually restricted to
IBM FECES and MESSY-DOS)

-- 
Pete Alleman
	ralph!pja or
	digitran!pja

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (01/09/89)

In article <9878@drutx.ATT.COM> mayer@drutx.ATT.COM (gary mayer) writes:
>In article <18123@santra.UUCP>, tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>
>> The same error occurs in the following program 
>> (with Turbo C 2.0 as well as MSC 5.0):
>>
>> main()
>> {
>>         static int a[30000];
>>         printf("%d\n",&a[30000]-a);
>> }
>>
>> output:  -2768
>
>I grant that this is probably not the answer you would like, but it
>is the answer you should expect once pointer arithmetic is understood.
>
[deleted explanation (very good, btw) about why this happens]
>
>In summary, be careful with pointers on these machines, and try to
>learn about how things work "underneath".  The C language is very
>close to the machine, and there are many times that this can have
>an effect - understanding and avoiding these where possible is what
>writing portable code is all about.

I couldn't agree more with the last paragraph.  My point, however,
was that the result above is 
(1) Surprising: It occurs in small memory model, where both ints 
and pointers are 16 bits, and the result fits in an int).
When I use large data model I expect trouble with pointer
arithmetic and cast to huge when necessary, but it shouldn't
be necessary with the small model (or at least the manual
should clearly say it is).
(2) Unnecessary: Code that does the subtraction correctly has
been presented here.  
(3) WRONG according to K&R or dpANS -- or does either say that
pointer subtraction is valid only when the difference *in bytes*
fits in an int?  If not, I continue to consider it a bug.

Another matter is that the above program isn't portable
anyway, because (as somebody else pointed out),
pointer difference isn't necessarily an int (according to dpANS).
Indeed, in Turbo C the difference of huge pointers is long,
and the program can be made to work as follows:

         printf("%ld\n", (int huge *)&a[30000] - (int huge *)a);

Actually all large data models handle this example correctly (in Turbo C),
and thus casting to (int far *) also works here,
but as soon as the difference exceeds 64K (or the pointers
have different segment values) they'll break too, only 
huge is reliable then (but this the manual _does_ explain).

To sum up: Near pointers are reliable up to 32K,
far up to 64K, anything more needs huge.

With this I think enough (and more) has been said about the behaviour
of 8086 and the compilers, however I'd still want somebody
with the dpANS to confirm whether or not this is a bug
- does it say anything about when pointer arithmetic may
fail because of overflow?

------------------------------------------------------------------
Tapani Tarvainen                 BitNet:    tarvainen@finjyu
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

msb@sq.uucp (Mark Brader) (01/10/89)

Someone says:
Of the code:
		static int a[30000];
		printf("%d\n",&a[30000]-a);

Someone says:
> > I have been told that dpANS explicitly states that the address of
> > "one-after-last" element of an array may be taken, and subtractions 
> > like the above are legal and should give correct result.

And Doug Gwyn says:
> Almost. ... the result is of type ptrdiff_t (defined in <stddef.h>).
> The example above assumes that ptrdiff_t is int ...

Right so far.  But in addition, it's possible for a valid expression to
result in an overflow.  This is not a problem in the particular example
since 30000 can't overflow an int, but it's permissible for subscripts to
run higher than the maximum value that ptrdiff_t can contain.  In that
case, the analogous subtraction "like the above" would not work.

Section 3.3.6 in the October dpANS says:
#  As with any other arithmetic overflow, if the result does not fit in
#  the space provided, the behavior is undefined.

Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb@sq.com
	A standard is established on sure bases, not capriciously but with
	the surety of something intentional and of a logic controlled by
	analysis and experiment. ... A standard is necessary for order
	in human effort.				-- Le Corbusier