[comp.lang.c] Adding two pointers

mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) (05/06/89)

In this article, I give two possible definitions of pointer addition.
I then indicate that these definitions cannot be implemented
efficiently on most architectures.  I show that Peter de Silva's card
analogy is misleading.  At the end, I question the need for pointer
addition, and assign extra-credit problems for the interested student.
:-)

In article <4093@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>In article <563@lzaz.ATT.COM>, hutch@lzaz.ATT.COM (R.HUTCHISON) writes:
>> 	midpoint_pointer = (start_pointer + end_pointer) / 2;
>You're right. It's a valid operation.

At most, it's "valid" only in some "human semantic" sense.  As has
been pointed out before, it's not "valid" in the C language at
present.

How in creation do you define pointer addition?

One possible definition is this: 'For pointers P and Q of the same
type, both non-null and [WLOG] pointing into the same array, the value
of P+Q is defined to be the bit pattern which is the unsigned integral
sum of the bit patterns of P and Q.'  This is a bad definition,
because pANS C quite rightly says nothing about "bit patterns",
particularly of pointers, but I'll assume that you know what I mean.

Can we just add P and Q's internal bit representation as unsigned
integers?  No -- even one addition can overflow.

For example, suppose that all machine addresses are in the range
0 to 255.  Let Start be at 200, and End be at 220 (both are valid
addresses).  Suppose we want to compute "Total = Start + End;".
        Start + End = 200 + 220 = "420", which cannot be represented.
        Assume the result is mod 256 (a la unsigned ops in pANS C)
        Total is "420" % 256 = 164.
So Total/2, which by Peter's reasoning should be the average of Start
and End, is 82 -- NOT what was intended.  (If you prefer addresses to
be -128 to 127, consider Start = 72 and End = 92.  The same problem
occurs, so I consider only unsigned addition here.)

If overflow can occur for just one addition of two valid addresses,
there's not much use for the concept.

How would you fix this problem?  If all data addresses (due to the
stack, malloc, or static data) are in the lower half of memory
addresses, overflow can't occur in one addition.  In this case,
though, the stack has to grow upward -- suddenly, the idea of adding
pointers ripples all the way back to the hardware designer!

Even if data be restricted to low memory (and note that some current
implementations put data in high memory), a statement like
        Total = Loc1 + Loc2 + Loc3;
can still cause overflow.  If you choose any fixed-size representation
for pointers, I can add as many addresses as I like and cause overflow.

Perhaps, just like for other data (integers, floating-point), it might
be the user's responsibility to avoid overflow.  In that case, I have
to use
        Average = (End - Start) / 2 + Start;
anyway, or the equivalent for the three-Loc case, and there's no use
for pointer addition.

Here's another possible definition, more in line with pANS C concepts.
Maybe you think it's too restrictive, but I intend to show that even
this restrictive definition can't be implemented efficiently.

'Two pointers P and Q may be added if they are of the same type and
point into the same array A.  If P points at the Ith element of A, and
Q points at the Jth element of A, then P+Q has the same type as P and
Q, and P+Q points to the (I+J)th element of A, if there is one.  Under
any other conditions, the addition is undefined.'

This has different effects from the previous definition.  If
P=Q=&Array[0], here P+Q=P=Q.  Before, P+Q=P only if there's overflow
or if Q's internal bit-pattern representation is all zeroes.

Consider an implementation of this.  Let Start and End be pointers to
elements an array whose first address is Array.  If we want to compute
"Total = Start + End;", we can compute it something like this:
        Offset1 = Start - Array;
        Offset2 = End   - Array;
        Total = Array + Offset1 + Offset2;
or just
        Total = Start + End - Array;
If it overflows, the result should be undefined anyway.

But how do we know what Array is, given just Start or End?  Start and
End may have been computed in one function, and passed to a function
in a separate file.  The only way to know the value of Array is if all
pointers are implemented as pairs "(pointer,base)", like
"(Start,Array)" and "(End,Array)".

Then all pointers are twice as long as they used to be.  They take up
twice as much space on the stack as arguments.  They probably take
twice as long to manipulate.

One of the uses of pointers is to move around in arrays.  But if
pointers have to be pairs of addresses, we should use subscripts
instead, for many architectures.  Pointer arithmetic is generally
useful only for moving within arrays.  If subscripting is more
efficient, why have pointer arithmetic at all?

Are there any other possible implementations?

Here's how Peter da Silva's card analogy fails:
>What's the average of the 7 of clubs and the 9 of clubs?

Overflow: if you add them, you get the 16 of clubs, which does not
exist.  Peter carried more precision in the intermediate results than
is permitted in the representation.

Knowledge of the base: Peter knows that the low card is the 2 (or ace,
if you like).  With a bare pointer, all we know is that we're looking
at the (Querg) of clubs and the (Querg+2) of clubs.

In short, we either have to make each suit arbitrarily large, or we
can add only the low cards, or we have to know the low card in each
suit, or we have to avoid addition altogether.

One question has not yet been answered, and it makes this whole fiasco
moot.  What additional power does pointer addition give?  What can be
done with it that can't be done about as easily as with pointer
subtraction?  The example given, taking an average, is not such a
case.  To find the point x% of the way between Start and End, you
could use
        x*Start + (1-x)*End
(reducing to (Start+End)/2 in the special case x=50%), but you can
also use
        x*(End-Start) + Start
which uses only pointer subtraction, is more efficient in the general
case, and takes only one extra add in the x=50% case.

Are there any other applications in which pointer addition is useful?

For extra credit: :-)

Suppose all addresses on the machine in question are integers in the
range 0 to 255 (like a small VAX), and all integral types (signed and
unsigned) take exactly 8 bits.

1. Give a simple implementation of pointer subtraction as defined in
pANS C, for P and Q pointers to some type T, ignoring overflow.

2. The difference of two pointers must be storable in some signed
integral type.  Overflow is not permitted.  Give a simple condition on
the implementation that prevents overflow.  (Hint: If we have an array
in memory, "char A[201];", what are the values of "&A[200] - &A[0]"
and "&A[0] - &A[200]"?  What does this imply about the declaration of
A?)

3. I do a little sleight-of-hand above.  Peter de Silva's original
example was
 	midpoint_pointer = (start_pointer + end_pointer) / 2;
but my example was
        Total = Start + End;
How does pANS C justify my substitution?

4. What does "WLOG" mean?

--

             Tim, the Bizarre and Oddly-Dressed Enchanter

Center for      |||  Internet, BITNET:  mcdaniel@uicsrd.csrd.uiuc.edu
Supercomputing  |||  UUCP:     {uunet,convex,pur-ee}!uiucuxc!uicsrd!mcdaniel
Research and    |||  ARPANET:  mcdaniel%uicsrd@uxc.cso.uiuc.edu
Development,    |||  CSNET:    mcdaniel%uicsrd@uiuc.csnet
U of Illinois   |||  DECnet:   GARCON::"mcdaniel@uicsrd.csrd.uiuc.edu"

chris@mimsy.UUCP (Chris Torek) (05/06/89)

In article <922@garcon.cso.uiuc.edu> mcdaniel@uicsrd.csrd.uiuc.edu
(Tim McDaniel) writes:
>One question has not yet been answered, and it makes this whole fiasco
>moot.

Indeed, it is the heart of the matter.  (But beware the word `moot',
for it means both `not worthy of discussion' and `worthy of discussion'.
A `moot' is a meeting, where a group discusses some topic of interest.
The discussion is often interminable and leads to no resolution; hence
the double meaning.)

>What additional power does pointer addition give?  What can be
>done with it that can't be done about as easily as with pointer
>subtraction?

Presumably the reason someone might want to write

	some_type base[N], *low, *high, *mid;
	...
	mid = (low + high) / 2;

is to avoid the scaling that occurs with

	mid = ((low - base) + (high - base)) / 2 + base;

which, if compiled in the most straightforward (and stupid) manner,
yeilds machine code much like this:

	// pretend everything is an int
	// mid <- ( (low-base)/size + (high-base)/size ) * size + base
	sub	base,low,r0
*1	div	r0,size,r0	# `size' being a constant
	sub	base,high,r1
*2	div	r1,size,r1
	add	r0,r1,r0
	div	r0,$2,r0	# divide by 2 is really shift right 1
*3	mul	r0,size,r0
	add	r0,base,mid

The three lines marked with `*' are unnecessary.  They cannot be
deleted if the code is expressed as in the second comment above,
because it is not clear that the division by `size' (*1 and *2) does
not discard low bits---for instance, if size is 2, that it does not
force the result of the subtractions to be even.  We, however, know
that both `low' and `high' point into the same array, and that the
result of the two `sub's just before the starred `div's are in fact
a multiple of `size', that no bits are discarded, and that the marked
`div' and `mul' instructions can in fact be factored out.

But subtraction of pointers is a common occurrence in C, and any
optimising compiler worth its name should note this.  Subtraction of
two pointers is only defined if both point to elements of the same
array; and when they do, it is guaranteed that the difference is a
whole number of elements, or in other words a proper multiple of `size'.
So the compiler should reduce this to

	sub	base,low,r0
	sub	base,high,r1
	add	r0,r1,r0
	div	r0,$2,r0
	add	r0,base,mid

which is what we need anyway (to avoid overflow).

(Sad to say, gcc does not win this one....  It does, however, avoid
divides by using sub+bge+add+label+shift.  The bge+add+label sequence
is unnecessary.  Is rms listening?)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

njk@freja.diku.dk (Niels J|rgen Kruse) (05/06/89)

In article <17340@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
>
> But subtraction of pointers is a common occurrence in C, and any
> optimising compiler worth its name should note this.  Subtraction of
> two pointers is only defined if both point to elements of the same
> array; and when they do, it is guaranteed that the difference is a
> whole number of elements, or in other words a proper multiple of `size'.

The word "array" as used when saying that two pointers must
point into the same array in order to subtract them is a bit
foggy to me.

What about

struct {
  char c;
  double d;
} x[10];

Are (& x[3].d) and (& x[7].d) not pointing into x?

Is (& x[3].d - x[7].d) undefined?

What array is (char *)x pointing into?

I seem to remember to have seen statements in this newsgroup by
people enjoying gurustatus, that dpANS C provides a way to get
offsets of fields within structs. What would be the use of that
if objects like x are not flat?
-- 
         Niels J|rgen Kruse
Email    njk@diku.dk
Mail     Tustrupvej 7, 2 tv, 2720 Vanlose, Denmark

peter@ficc.uu.net (Peter da Silva) (05/07/89)

In article <922@garcon.cso.uiuc.edu>, mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) writes:
> In this article, ...  I show that Peter de Silva's card
> analogy is misleading.

It's just an analogy. But you do seem to misunderstand it.

"two of clubs" is a pointer. It's the second element in an object based at the
address "clubs". Pointer arithmetic between objects is meaningless.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) (05/07/89)

I've thought of one more argument against pointer addition in C.  It
is of a more philosophical nature (and thus I grant that it's not as
strong as my previous ones).

There are a lot of arithmetic identities that are handy to have
around.  Granted that some don't hold, even with the comparatively-
tame integers excluding overflow.  For example, you can't make all of
the div/mod identities true simultaneously for integers:
        a%b + a/b == a
        0 <= a%b and a%b < abs(b)
        (-a)/b == a/(-b) == -(a/b)

However, one identity that holds (excluding overflow for signed
integers) is
        a + b == c    if and only if    a == c - b

Suppose addition of pointers gives a pointer, such that
        a + b == c
However,
        a == c - b
is erroneous from a type point of view -- the right-hand side is an
integer in C, but the left-hand side is a pointer.

Since pointer subtraction yields an integer in C, its inverse
operation is the addition of a pointer to an integer.  (If you want to
design a language in which pointer subtraction somehow yields another
pointer, go right ahead, but I'll never program in it, and it's not
compatable with C.)

To use the "deck of cards" analogy: in C, the difference between the 9
of clubs and the 7 of clubs is 2.  Not "2 of clubs" -- just plain 2.

As a friend just pointed out (thanks, Paul!), adding the 2 of clubs to
the 7 of clubs just gives you a two-card hand.  You *can't* add cards.
:-)

Pointers are not cards.  Pointers do not have units like "feet".  Any
analogies may work, or they may not -- no guarantees.

--

             Tim, the Bizarre and Oddly-Dressed Enchanter

Center for      |||  Internet, BITNET:  mcdaniel@uicsrd.csrd.uiuc.edu
Supercomputing  |||  UUCP:     {uunet,convex,pur-ee}!uiucuxc!uicsrd!mcdaniel
Research and    |||  ARPANET:  mcdaniel%uicsrd@uxc.cso.uiuc.edu
Development,    |||  CSNET:    mcdaniel%uicsrd@uiuc.csnet
U of Illinois   |||  DECnet:   GARCON::"mcdaniel@uicsrd.csrd.uiuc.edu"

chris@mimsy.UUCP (Chris Torek) (05/07/89)

In article <17340@mimsy.UUCP>, regarding finding the midpoint of two
pointers, I wrote:
>So the compiler should reduce this to
>
>	sub	base,low,r0
>	sub	base,high,r1
>	add	r0,r1,r0
>	div	r0,$2,r0
>	add	r0,base,mid
>
>which is what we need anyway (to avoid overflow).

RMS listened, and pointed out that this is wrong.  The result of
the divide can be halfway between two valid pointer offsets.  (E.g.,
if size is 4 and low-base=4 and high=base-8---i.e., low is &arr[1]
and high is &arr[2]---then (4+8)/12 is 6, rather than the 4 that we
should get.) This can be fixed by rounding down to the nearest multiple
of `size' (here,

	and	r0,$~3,r0

between the final `div' and `add'); in the worst case this requires
a divide and a multiply anyway.  This particular sequence is probably
not common enough to bother with, although the fact that pointer
subtraction implies remainderless division does seem worth noting.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

trebor@biar.UUCP (Robert J Woodhead) (05/08/89)

In article <17357@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>	sub	base,low,r0
>>	sub	base,high,r1
>>	add	r0,r1,r0
>>	div	r0,$2,r0
>	and	r0,$~3,r0
>>	add	r0,base,mid

Gee, if you define things as an array of pointers, you not only get
clearer source code, but more efficient machine code:

	ptr=&ptrArray[(high+low)/2]

becomes (pardon my assembly language, I'm extrapolating from your example)

	add	high,low,r0	; add high and low
	shr	r0,$1,r0	; divide by 2 (gee, a smart compiler!)
	shl	r0,$2,r0	; adjust to wordsize
	add	ptrArray,r0,r0	; add in the base address
	move	r0,ptr		; all done.

Seems to me this whole argument is spurious.  You can't add/mul/div/sub two
pointers together because the result is semantically meaningless.  You want
to redefine the language to ``hack in'' an operation that is already there,
and your hack will result in code that is harder for humans to read, and
harder for compilers to optimize.

-- 
Robert J Woodhead, Biar Games, Inc.  !uunet!biar!trebor | trebor@biar.UUCP
"The lamb will lie down with the lion, but the lamb won't get much sleep."
     -- Woody Allen.

diamond@diamond.csl.sony.junet (Norman Diamond) (05/08/89)

In article <4646@freja.diku.dk> njk@freja.diku.dk (Niels J|rgen Kruse) writes:

>What about
>
>struct {
>  char c;
>  double d;
>} x[10];
>
>Are (& x[3].d) and (& x[7].d) not pointing into x?

They point "into" x, yes, as far as the English language is concerned.
But they do not point to ELEMENTS OF x.  You have to read technicalese
more carefully than ordinary English.

>Is (& x[3].d - x[7].d) undefined?

I think you mean (& x[3].d - & x[7].d).  Again, when you read the
technicalese carefully enough, you find that the answer is yes
(maybe surprisingly).

>What array is (char *)x pointing into?

Any pointer can be cast to a (char *) pointer and back to its original
type, but the (char *) version doesn't have to have a legal meaning.
If you dereference it, you might get a core dump or worse.

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net)
  The above opinions are my own.   |  Why are programmers criticized for
  If they're also your opinions,   |  re-inventing the wheel, when car
  you're infringing my copyright.  |  manufacturers are praised for it?

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (05/08/89)

In article <924@garcon.cso.uiuc.edu> mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) writes:
>I've thought of one more argument against pointer addition in C.  It
>is of a more philosophical nature (and thus I grant that it's not as
>strong as my previous ones).
>
Perhaps more of a mathematical nature.

> [Gives argument showing that if addition is added an equation can 
   result that has different types...]

The properties stated result in fact from some basic properties present
in arithmetic models: fields. (These models model our
arithmetic or arithmetic is based on these models - take your pick).

These properties include (math books everyone!) associativity, closure,
inverseness (having an inverse),distribution and a null element and other
pleasant niceties . 

Lets assume pointer addition (new style). There seems to be a consensus that 
this should result in a pointer type. In other words:

		P + Q = R	(where P,Q,R are pointer types)
also
		M + n = P   	( where M,P are pointer types, n is an integer)
		  ^
Now a little operator overloading is going on here. The first + and the
second + are conceptually different operators. So lets use '+' for the 2nd 
operator.

Thus
		M '+' n = P

This is something akin to scalar multiplication of vectors.

Now in "standard" arithmetic models the operation '+' would have to 
support distribution i.e.

		n '+' (P+Q) = (n'+'P) +  (n'+'Q)

Just as 3(a+b) = 3a + 3b.

However the semantics we seem to be aiming for is:

		n '+' (P+Q) = (n'+'P) + Q = P + (n'+'Q)

Now this is going to result in a "nonstandard" model with any number of
(for us) strange arithmetic properties. (Left as exercise to the reader :-).

Now theres nothing intrinsically wrong with nonstandard arithmetic models.
However the makers of C in their infinite wisdom (naive innocence?) were
probably just trying to model standard arithmetic. Which I think is as
it should be. Introducing pointer addition with the above semantics would
have introduced nonstandard arithmetic into the language.

Disclaimer: Opinions are my own, for my employers you pay extra.

karl@haddock.ima.isc.com (Karl Heuer) (05/09/89)

In article <4646@freja.diku.dk> njk@freja.diku.dk (Niels J|rgen Kruse) writes:
>The word "array" as used when saying that two pointers must point into the
>same array in order to subtract them is a bit foggy to me.  What about
>	struct { char c; double d; } x[10];
>Are (& x[3].d) and (& x[7].d) not pointing into x?

Not in this sense.  They point into (different) arrays of type `double [1]'.

>Is (&x[3].d - &x[7].d) undefined?

Yes (even after I fixed the typo).  On some machines, there does not exist an
integer k such that (&x[1].d == &x[0].d + k), so it would be meaningless to
attempt to evaluate (&x[1].d - &x[0].d) to any integer.

>What array is (char *)x pointing into?

An array of type `char [sizeof(x)]' (or equivalently, `char [10*sizeof(S)]'
where S is the struct type) whose existence is supposedly implied by the pANS
statement that `objects are composed of bytes'.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

kevinf@infmx.UUCP (Kevin Franden) (05/11/89)

Assuming that there does exist an application that wnats/needs to add
pointers, why won't this 'model' work?

	Assume an array a to be made up of elements of
	arbitrary size.  Assume further that this array has
	100 elements arranged (as we know C does) contiguously
	in memory.

	if p1 is pointing to element 37 and p2 is pointing to
	element 4, I'd like p3 to point to a[37+4] or a[41].
	
	struct{...}foo;

	foo p1,p2,p3;

	x=37;
	y=4;

	p1=a[x];
	p2=a[y];
	p3=(x+y)*sizeof(foo);


I realize that I missed part of this discussion, sorry if this has been
posted already, but won't p3 be pointing to the 41st element?  if this
is true, this is pointer addition, no?


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Kevin Franden		    UUCP: {backbone}!pyramid!infmx!kevinf
Informix Software Inc
disclaimer("I said what I said and not my employer");
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Cocaine: The big lie.

OS/2: The BIG hack.

gdtltr@vax1.acs.udel.EDU (Gary D Duzan) (05/11/89)

In article <1321@infmx.UUCP> kevinf@infmx.UUCP (Kevin Franden) writes:
=>
=>Assuming that there does exist an application that wnats/needs to add
=>pointers, why won't this 'model' work?
=>
=>	Assume an array a to be made up of elements of
=>	arbitrary size.  Assume further that this array has
=>	100 elements arranged (as we know C does) contiguously
=>	in memory.
=>
=>	if p1 is pointing to element 37 and p2 is pointing to
=>	element 4, I'd like p3 to point to a[37+4] or a[41].
=>	
=>	struct{...}foo;
=>
=>	foo p1,p2,p3;
=>
=>	x=37;
=>	y=4;
=>
=>	p1=a[x];
=>	p2=a[y];
=>	p3=(x+y)*sizeof(foo);
=>
=>
=>I realize that I missed part of this discussion, sorry if this has been
=>posted already, but won't p3 be pointing to the 41st element?  if this
=>is true, this is pointer addition, no?
=>
   No. I believe what you want is a[x+y] . p3 will be the number of
bytes taken up by x+y elements of foo. You also might be thinking
a+x+y . This should be legal and do what you want.
   Sorry if I am incorrect here, but I am a bit new to C myself. This
is a bit odd when you consider the fact that I am trying to put together
a minimal K&R C compiler (don't ask for what machine, you'll just laugh).

					Gary Duzan
					Time  Lord
				    Third Regeneration




-- 
+------------------------------------------+ UUCP: !uunet!udel.edu!vax1!gdtltr
| "Two hearts are better than one." -- Yes | Internet: gdtltr@vax1.acs.udel.edu
| "Don't listen to me; I never do." -- Doctor Who | Other: Relay through CUNYVM 
+-------------------------------------------------+        or something.

bengsig@oracle.nl (Bjorn Engsig) (05/11/89)

In article <1321@infmx.UUCP> kevinf@infmx.UUCP (Kevin Franden) writes:
=
=	struct{...}foo;
=
=	foo *p1,*p2,*p3;
/* I added the *'s in the above declarations */
/* And I would add the following: */
	int x,y;
=
=	x=37;
=	y=4;
=
=	p1=a[x];
=	p2=a[y];
=	p3=(x+y)*sizeof(foo);
=
= I realize that I missed part of this discussion, sorry if this has been
= posted already, but won't p3 be pointing to the 41st element?
Hmm, yes, but the construct is highly debatable ...
=                                                               if this
=is true, this is pointer addition, no?
No, it's integer addition, and a highly debatable computation of a pointer,
and an assignment which needs a cast.

In pointer arithmetic, you can achieve what you are trying by
	
	p3 = p2 + (p1 - a)

which means let p3 equeal the current p2 plus the number of elements between
p1 and the beginning of the array a.  Note, however that this is a pointer
plus an integer, the latter being the difference between two pointers.
-- 
Bjorn Engsig, ORACLE Europe         \ /    "Hofstadter's Law:  It always takes
Path:   mcvax!orcenl!bengsig         X      longer than you expect, even if you
Domain: bengsig@oracle.nl           / \     take into account Hofstadter's Law"