mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) (05/06/89)
In this article, I give two possible definitions of pointer addition. I then indicate that these definitions cannot be implemented efficiently on most architectures. I show that Peter de Silva's card analogy is misleading. At the end, I question the need for pointer addition, and assign extra-credit problems for the interested student. :-) In article <4093@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >In article <563@lzaz.ATT.COM>, hutch@lzaz.ATT.COM (R.HUTCHISON) writes: >> midpoint_pointer = (start_pointer + end_pointer) / 2; >You're right. It's a valid operation. At most, it's "valid" only in some "human semantic" sense. As has been pointed out before, it's not "valid" in the C language at present. How in creation do you define pointer addition? One possible definition is this: 'For pointers P and Q of the same type, both non-null and [WLOG] pointing into the same array, the value of P+Q is defined to be the bit pattern which is the unsigned integral sum of the bit patterns of P and Q.' This is a bad definition, because pANS C quite rightly says nothing about "bit patterns", particularly of pointers, but I'll assume that you know what I mean. Can we just add P and Q's internal bit representation as unsigned integers? No -- even one addition can overflow. For example, suppose that all machine addresses are in the range 0 to 255. Let Start be at 200, and End be at 220 (both are valid addresses). Suppose we want to compute "Total = Start + End;". Start + End = 200 + 220 = "420", which cannot be represented. Assume the result is mod 256 (a la unsigned ops in pANS C) Total is "420" % 256 = 164. So Total/2, which by Peter's reasoning should be the average of Start and End, is 82 -- NOT what was intended. (If you prefer addresses to be -128 to 127, consider Start = 72 and End = 92. The same problem occurs, so I consider only unsigned addition here.) If overflow can occur for just one addition of two valid addresses, there's not much use for the concept. How would you fix this problem? If all data addresses (due to the stack, malloc, or static data) are in the lower half of memory addresses, overflow can't occur in one addition. In this case, though, the stack has to grow upward -- suddenly, the idea of adding pointers ripples all the way back to the hardware designer! Even if data be restricted to low memory (and note that some current implementations put data in high memory), a statement like Total = Loc1 + Loc2 + Loc3; can still cause overflow. If you choose any fixed-size representation for pointers, I can add as many addresses as I like and cause overflow. Perhaps, just like for other data (integers, floating-point), it might be the user's responsibility to avoid overflow. In that case, I have to use Average = (End - Start) / 2 + Start; anyway, or the equivalent for the three-Loc case, and there's no use for pointer addition. Here's another possible definition, more in line with pANS C concepts. Maybe you think it's too restrictive, but I intend to show that even this restrictive definition can't be implemented efficiently. 'Two pointers P and Q may be added if they are of the same type and point into the same array A. If P points at the Ith element of A, and Q points at the Jth element of A, then P+Q has the same type as P and Q, and P+Q points to the (I+J)th element of A, if there is one. Under any other conditions, the addition is undefined.' This has different effects from the previous definition. If P=Q=&Array[0], here P+Q=P=Q. Before, P+Q=P only if there's overflow or if Q's internal bit-pattern representation is all zeroes. Consider an implementation of this. Let Start and End be pointers to elements an array whose first address is Array. If we want to compute "Total = Start + End;", we can compute it something like this: Offset1 = Start - Array; Offset2 = End - Array; Total = Array + Offset1 + Offset2; or just Total = Start + End - Array; If it overflows, the result should be undefined anyway. But how do we know what Array is, given just Start or End? Start and End may have been computed in one function, and passed to a function in a separate file. The only way to know the value of Array is if all pointers are implemented as pairs "(pointer,base)", like "(Start,Array)" and "(End,Array)". Then all pointers are twice as long as they used to be. They take up twice as much space on the stack as arguments. They probably take twice as long to manipulate. One of the uses of pointers is to move around in arrays. But if pointers have to be pairs of addresses, we should use subscripts instead, for many architectures. Pointer arithmetic is generally useful only for moving within arrays. If subscripting is more efficient, why have pointer arithmetic at all? Are there any other possible implementations? Here's how Peter da Silva's card analogy fails: >What's the average of the 7 of clubs and the 9 of clubs? Overflow: if you add them, you get the 16 of clubs, which does not exist. Peter carried more precision in the intermediate results than is permitted in the representation. Knowledge of the base: Peter knows that the low card is the 2 (or ace, if you like). With a bare pointer, all we know is that we're looking at the (Querg) of clubs and the (Querg+2) of clubs. In short, we either have to make each suit arbitrarily large, or we can add only the low cards, or we have to know the low card in each suit, or we have to avoid addition altogether. One question has not yet been answered, and it makes this whole fiasco moot. What additional power does pointer addition give? What can be done with it that can't be done about as easily as with pointer subtraction? The example given, taking an average, is not such a case. To find the point x% of the way between Start and End, you could use x*Start + (1-x)*End (reducing to (Start+End)/2 in the special case x=50%), but you can also use x*(End-Start) + Start which uses only pointer subtraction, is more efficient in the general case, and takes only one extra add in the x=50% case. Are there any other applications in which pointer addition is useful? For extra credit: :-) Suppose all addresses on the machine in question are integers in the range 0 to 255 (like a small VAX), and all integral types (signed and unsigned) take exactly 8 bits. 1. Give a simple implementation of pointer subtraction as defined in pANS C, for P and Q pointers to some type T, ignoring overflow. 2. The difference of two pointers must be storable in some signed integral type. Overflow is not permitted. Give a simple condition on the implementation that prevents overflow. (Hint: If we have an array in memory, "char A[201];", what are the values of "&A[200] - &A[0]" and "&A[0] - &A[200]"? What does this imply about the declaration of A?) 3. I do a little sleight-of-hand above. Peter de Silva's original example was midpoint_pointer = (start_pointer + end_pointer) / 2; but my example was Total = Start + End; How does pANS C justify my substitution? 4. What does "WLOG" mean? -- Tim, the Bizarre and Oddly-Dressed Enchanter Center for ||| Internet, BITNET: mcdaniel@uicsrd.csrd.uiuc.edu Supercomputing ||| UUCP: {uunet,convex,pur-ee}!uiucuxc!uicsrd!mcdaniel Research and ||| ARPANET: mcdaniel%uicsrd@uxc.cso.uiuc.edu Development, ||| CSNET: mcdaniel%uicsrd@uiuc.csnet U of Illinois ||| DECnet: GARCON::"mcdaniel@uicsrd.csrd.uiuc.edu"
chris@mimsy.UUCP (Chris Torek) (05/06/89)
In article <922@garcon.cso.uiuc.edu> mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) writes: >One question has not yet been answered, and it makes this whole fiasco >moot. Indeed, it is the heart of the matter. (But beware the word `moot', for it means both `not worthy of discussion' and `worthy of discussion'. A `moot' is a meeting, where a group discusses some topic of interest. The discussion is often interminable and leads to no resolution; hence the double meaning.) >What additional power does pointer addition give? What can be >done with it that can't be done about as easily as with pointer >subtraction? Presumably the reason someone might want to write some_type base[N], *low, *high, *mid; ... mid = (low + high) / 2; is to avoid the scaling that occurs with mid = ((low - base) + (high - base)) / 2 + base; which, if compiled in the most straightforward (and stupid) manner, yeilds machine code much like this: // pretend everything is an int // mid <- ( (low-base)/size + (high-base)/size ) * size + base sub base,low,r0 *1 div r0,size,r0 # `size' being a constant sub base,high,r1 *2 div r1,size,r1 add r0,r1,r0 div r0,$2,r0 # divide by 2 is really shift right 1 *3 mul r0,size,r0 add r0,base,mid The three lines marked with `*' are unnecessary. They cannot be deleted if the code is expressed as in the second comment above, because it is not clear that the division by `size' (*1 and *2) does not discard low bits---for instance, if size is 2, that it does not force the result of the subtractions to be even. We, however, know that both `low' and `high' point into the same array, and that the result of the two `sub's just before the starred `div's are in fact a multiple of `size', that no bits are discarded, and that the marked `div' and `mul' instructions can in fact be factored out. But subtraction of pointers is a common occurrence in C, and any optimising compiler worth its name should note this. Subtraction of two pointers is only defined if both point to elements of the same array; and when they do, it is guaranteed that the difference is a whole number of elements, or in other words a proper multiple of `size'. So the compiler should reduce this to sub base,low,r0 sub base,high,r1 add r0,r1,r0 div r0,$2,r0 add r0,base,mid which is what we need anyway (to avoid overflow). (Sad to say, gcc does not win this one.... It does, however, avoid divides by using sub+bge+add+label+shift. The bge+add+label sequence is unnecessary. Is rms listening?) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
njk@freja.diku.dk (Niels J|rgen Kruse) (05/06/89)
In article <17340@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > > But subtraction of pointers is a common occurrence in C, and any > optimising compiler worth its name should note this. Subtraction of > two pointers is only defined if both point to elements of the same > array; and when they do, it is guaranteed that the difference is a > whole number of elements, or in other words a proper multiple of `size'. The word "array" as used when saying that two pointers must point into the same array in order to subtract them is a bit foggy to me. What about struct { char c; double d; } x[10]; Are (& x[3].d) and (& x[7].d) not pointing into x? Is (& x[3].d - x[7].d) undefined? What array is (char *)x pointing into? I seem to remember to have seen statements in this newsgroup by people enjoying gurustatus, that dpANS C provides a way to get offsets of fields within structs. What would be the use of that if objects like x are not flat? -- Niels J|rgen Kruse Email njk@diku.dk Mail Tustrupvej 7, 2 tv, 2720 Vanlose, Denmark
peter@ficc.uu.net (Peter da Silva) (05/07/89)
In article <922@garcon.cso.uiuc.edu>, mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) writes: > In this article, ... I show that Peter de Silva's card > analogy is misleading. It's just an analogy. But you do seem to misunderstand it. "two of clubs" is a pointer. It's the second element in an object based at the address "clubs". Pointer arithmetic between objects is meaningless. -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.
mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) (05/07/89)
I've thought of one more argument against pointer addition in C. It is of a more philosophical nature (and thus I grant that it's not as strong as my previous ones). There are a lot of arithmetic identities that are handy to have around. Granted that some don't hold, even with the comparatively- tame integers excluding overflow. For example, you can't make all of the div/mod identities true simultaneously for integers: a%b + a/b == a 0 <= a%b and a%b < abs(b) (-a)/b == a/(-b) == -(a/b) However, one identity that holds (excluding overflow for signed integers) is a + b == c if and only if a == c - b Suppose addition of pointers gives a pointer, such that a + b == c However, a == c - b is erroneous from a type point of view -- the right-hand side is an integer in C, but the left-hand side is a pointer. Since pointer subtraction yields an integer in C, its inverse operation is the addition of a pointer to an integer. (If you want to design a language in which pointer subtraction somehow yields another pointer, go right ahead, but I'll never program in it, and it's not compatable with C.) To use the "deck of cards" analogy: in C, the difference between the 9 of clubs and the 7 of clubs is 2. Not "2 of clubs" -- just plain 2. As a friend just pointed out (thanks, Paul!), adding the 2 of clubs to the 7 of clubs just gives you a two-card hand. You *can't* add cards. :-) Pointers are not cards. Pointers do not have units like "feet". Any analogies may work, or they may not -- no guarantees. -- Tim, the Bizarre and Oddly-Dressed Enchanter Center for ||| Internet, BITNET: mcdaniel@uicsrd.csrd.uiuc.edu Supercomputing ||| UUCP: {uunet,convex,pur-ee}!uiucuxc!uicsrd!mcdaniel Research and ||| ARPANET: mcdaniel%uicsrd@uxc.cso.uiuc.edu Development, ||| CSNET: mcdaniel%uicsrd@uiuc.csnet U of Illinois ||| DECnet: GARCON::"mcdaniel@uicsrd.csrd.uiuc.edu"
chris@mimsy.UUCP (Chris Torek) (05/07/89)
In article <17340@mimsy.UUCP>, regarding finding the midpoint of two pointers, I wrote: >So the compiler should reduce this to > > sub base,low,r0 > sub base,high,r1 > add r0,r1,r0 > div r0,$2,r0 > add r0,base,mid > >which is what we need anyway (to avoid overflow). RMS listened, and pointed out that this is wrong. The result of the divide can be halfway between two valid pointer offsets. (E.g., if size is 4 and low-base=4 and high=base-8---i.e., low is &arr[1] and high is &arr[2]---then (4+8)/12 is 6, rather than the 4 that we should get.) This can be fixed by rounding down to the nearest multiple of `size' (here, and r0,$~3,r0 between the final `div' and `add'); in the worst case this requires a divide and a multiply anyway. This particular sequence is probably not common enough to bother with, although the fact that pointer subtraction implies remainderless division does seem worth noting. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
trebor@biar.UUCP (Robert J Woodhead) (05/08/89)
In article <17357@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >> sub base,low,r0 >> sub base,high,r1 >> add r0,r1,r0 >> div r0,$2,r0 > and r0,$~3,r0 >> add r0,base,mid Gee, if you define things as an array of pointers, you not only get clearer source code, but more efficient machine code: ptr=&ptrArray[(high+low)/2] becomes (pardon my assembly language, I'm extrapolating from your example) add high,low,r0 ; add high and low shr r0,$1,r0 ; divide by 2 (gee, a smart compiler!) shl r0,$2,r0 ; adjust to wordsize add ptrArray,r0,r0 ; add in the base address move r0,ptr ; all done. Seems to me this whole argument is spurious. You can't add/mul/div/sub two pointers together because the result is semantically meaningless. You want to redefine the language to ``hack in'' an operation that is already there, and your hack will result in code that is harder for humans to read, and harder for compilers to optimize. -- Robert J Woodhead, Biar Games, Inc. !uunet!biar!trebor | trebor@biar.UUCP "The lamb will lie down with the lion, but the lamb won't get much sleep." -- Woody Allen.
diamond@diamond.csl.sony.junet (Norman Diamond) (05/08/89)
In article <4646@freja.diku.dk> njk@freja.diku.dk (Niels J|rgen Kruse) writes: >What about > >struct { > char c; > double d; >} x[10]; > >Are (& x[3].d) and (& x[7].d) not pointing into x? They point "into" x, yes, as far as the English language is concerned. But they do not point to ELEMENTS OF x. You have to read technicalese more carefully than ordinary English. >Is (& x[3].d - x[7].d) undefined? I think you mean (& x[3].d - & x[7].d). Again, when you read the technicalese carefully enough, you find that the answer is yes (maybe surprisingly). >What array is (char *)x pointing into? Any pointer can be cast to a (char *) pointer and back to its original type, but the (char *) version doesn't have to have a legal meaning. If you dereference it, you might get a core dump or worse. -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net) The above opinions are my own. | Why are programmers criticized for If they're also your opinions, | re-inventing the wheel, when car you're infringing my copyright. | manufacturers are praised for it?
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (05/08/89)
In article <924@garcon.cso.uiuc.edu> mcdaniel@uicsrd.csrd.uiuc.edu (Tim McDaniel) writes: >I've thought of one more argument against pointer addition in C. It >is of a more philosophical nature (and thus I grant that it's not as >strong as my previous ones). > Perhaps more of a mathematical nature. > [Gives argument showing that if addition is added an equation can result that has different types...] The properties stated result in fact from some basic properties present in arithmetic models: fields. (These models model our arithmetic or arithmetic is based on these models - take your pick). These properties include (math books everyone!) associativity, closure, inverseness (having an inverse),distribution and a null element and other pleasant niceties . Lets assume pointer addition (new style). There seems to be a consensus that this should result in a pointer type. In other words: P + Q = R (where P,Q,R are pointer types) also M + n = P ( where M,P are pointer types, n is an integer) ^ Now a little operator overloading is going on here. The first + and the second + are conceptually different operators. So lets use '+' for the 2nd operator. Thus M '+' n = P This is something akin to scalar multiplication of vectors. Now in "standard" arithmetic models the operation '+' would have to support distribution i.e. n '+' (P+Q) = (n'+'P) + (n'+'Q) Just as 3(a+b) = 3a + 3b. However the semantics we seem to be aiming for is: n '+' (P+Q) = (n'+'P) + Q = P + (n'+'Q) Now this is going to result in a "nonstandard" model with any number of (for us) strange arithmetic properties. (Left as exercise to the reader :-). Now theres nothing intrinsically wrong with nonstandard arithmetic models. However the makers of C in their infinite wisdom (naive innocence?) were probably just trying to model standard arithmetic. Which I think is as it should be. Introducing pointer addition with the above semantics would have introduced nonstandard arithmetic into the language. Disclaimer: Opinions are my own, for my employers you pay extra.
karl@haddock.ima.isc.com (Karl Heuer) (05/09/89)
In article <4646@freja.diku.dk> njk@freja.diku.dk (Niels J|rgen Kruse) writes: >The word "array" as used when saying that two pointers must point into the >same array in order to subtract them is a bit foggy to me. What about > struct { char c; double d; } x[10]; >Are (& x[3].d) and (& x[7].d) not pointing into x? Not in this sense. They point into (different) arrays of type `double [1]'. >Is (&x[3].d - &x[7].d) undefined? Yes (even after I fixed the typo). On some machines, there does not exist an integer k such that (&x[1].d == &x[0].d + k), so it would be meaningless to attempt to evaluate (&x[1].d - &x[0].d) to any integer. >What array is (char *)x pointing into? An array of type `char [sizeof(x)]' (or equivalently, `char [10*sizeof(S)]' where S is the struct type) whose existence is supposedly implied by the pANS statement that `objects are composed of bytes'. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
kevinf@infmx.UUCP (Kevin Franden) (05/11/89)
Assuming that there does exist an application that wnats/needs to add pointers, why won't this 'model' work? Assume an array a to be made up of elements of arbitrary size. Assume further that this array has 100 elements arranged (as we know C does) contiguously in memory. if p1 is pointing to element 37 and p2 is pointing to element 4, I'd like p3 to point to a[37+4] or a[41]. struct{...}foo; foo p1,p2,p3; x=37; y=4; p1=a[x]; p2=a[y]; p3=(x+y)*sizeof(foo); I realize that I missed part of this discussion, sorry if this has been posted already, but won't p3 be pointing to the 41st element? if this is true, this is pointer addition, no? -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Kevin Franden UUCP: {backbone}!pyramid!infmx!kevinf Informix Software Inc disclaimer("I said what I said and not my employer"); =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Cocaine: The big lie. OS/2: The BIG hack.
gdtltr@vax1.acs.udel.EDU (Gary D Duzan) (05/11/89)
In article <1321@infmx.UUCP> kevinf@infmx.UUCP (Kevin Franden) writes: => =>Assuming that there does exist an application that wnats/needs to add =>pointers, why won't this 'model' work? => => Assume an array a to be made up of elements of => arbitrary size. Assume further that this array has => 100 elements arranged (as we know C does) contiguously => in memory. => => if p1 is pointing to element 37 and p2 is pointing to => element 4, I'd like p3 to point to a[37+4] or a[41]. => => struct{...}foo; => => foo p1,p2,p3; => => x=37; => y=4; => => p1=a[x]; => p2=a[y]; => p3=(x+y)*sizeof(foo); => => =>I realize that I missed part of this discussion, sorry if this has been =>posted already, but won't p3 be pointing to the 41st element? if this =>is true, this is pointer addition, no? => No. I believe what you want is a[x+y] . p3 will be the number of bytes taken up by x+y elements of foo. You also might be thinking a+x+y . This should be legal and do what you want. Sorry if I am incorrect here, but I am a bit new to C myself. This is a bit odd when you consider the fact that I am trying to put together a minimal K&R C compiler (don't ask for what machine, you'll just laugh). Gary Duzan Time Lord Third Regeneration -- +------------------------------------------+ UUCP: !uunet!udel.edu!vax1!gdtltr | "Two hearts are better than one." -- Yes | Internet: gdtltr@vax1.acs.udel.edu | "Don't listen to me; I never do." -- Doctor Who | Other: Relay through CUNYVM +-------------------------------------------------+ or something.
bengsig@oracle.nl (Bjorn Engsig) (05/11/89)
In article <1321@infmx.UUCP> kevinf@infmx.UUCP (Kevin Franden) writes:
=
= struct{...}foo;
=
= foo *p1,*p2,*p3;
/* I added the *'s in the above declarations */
/* And I would add the following: */
int x,y;
=
= x=37;
= y=4;
=
= p1=a[x];
= p2=a[y];
= p3=(x+y)*sizeof(foo);
=
= I realize that I missed part of this discussion, sorry if this has been
= posted already, but won't p3 be pointing to the 41st element?
Hmm, yes, but the construct is highly debatable ...
= if this
=is true, this is pointer addition, no?
No, it's integer addition, and a highly debatable computation of a pointer,
and an assignment which needs a cast.
In pointer arithmetic, you can achieve what you are trying by
p3 = p2 + (p1 - a)
which means let p3 equeal the current p2 plus the number of elements between
p1 and the beginning of the array a. Note, however that this is a pointer
plus an integer, the latter being the difference between two pointers.
--
Bjorn Engsig, ORACLE Europe \ / "Hofstadter's Law: It always takes
Path: mcvax!orcenl!bengsig X longer than you expect, even if you
Domain: bengsig@oracle.nl / \ take into account Hofstadter's Law"