keesan@bbncca.ARPA (Morris Keesan) (07/09/84)
---------------------- On machines where pointers are shorter than longs, should the pointer sign-extend when being converted to long (assume all pointers the same size for simplicity -- on machines with more than one size of pointer, the question applies to those types of pointers which are shorter than longs)? This question arises because of some code we came across which is converting pointers to long in various ways (don't ask why -- it's a long story, but the reasons turn out to be mostly valid), and the compiled code is generating sign-extension. At first, this appears to be wrong, but on second thought I'm not sure. K&R says (section 7.14, p.192) "The compilers currently allow a pointer to be assigned to an integer, an integer to a pointer, and a pointer to a pointer of another type. The assignment is a pure copy operation, with no conversion." But what does "pure copy" mean, when the objects are of different sizes? Is there any reason to prefer non-sign-extension over sign-extension, or vice-versa? Is there any reason why a C programmer should legitimately care whether sign-extension occurs in these cases? Please no flames about whether converting pointers to longs is a reasonable operation. I'm just trying to figure out what the compiler should do, given that the operation is allowed. -- Morris M. Keesan {decvax,linus,wjh12,ima}!bbncca!keesan keesan @ BBN-UNIX.ARPA
jhh@ihldt.UUCP (John Haller) (07/10/84)
There is a definite precedent for considering pointers to be unsigned. In the Version 6 (Remember that?) C compiler, when the were no unsigned quantities, (char *) was used where unsigned int's were desired. John Haller
dmk@lanl-a.UUCP (07/10/84)
[] You should be able to get both types of conversion by saying either anytype *x; long i; i = (long)((int) x); /* giving sign extension, or */ i = (long)((unsigned) x); /* no sign extension */ but I'm not sure which one is better to use as the default when you say, i = (long)x; -- it depends on whether you think of an address as just a token (unsigned) or an integer (signed). Maybe it's best to just do whatever is the most natural for the machine involved and if the user needs to force it one way or the other then let him use a double typecast as above. David Keaton Los Alamos National Laboratory ucbvax!lbl-csam!lanl-a!dmk dmk@lanl.arpa
miller@saturn.UUCP (Terrence C. Miller) (07/11/84)
The ANSI standard (the last I time looked) defined pointer comparison to be unsigned, and I suspect most of us regard memory addresses as unsigned. Thus pointers should be extended like unsigned numbers (i.e zero filled).
woods@hao.UUCP (07/11/84)
As usual, I don't know what the official manual says. All I know is, a negative pointer value is total nonsense. Therefore it seems to me that *any* pointer value assigned to an integer (or long) should be positive. So, I would advocate *no* sign extension. I would be interested to hear if someone can point out a reference stating what a compiler should do, or one that explicitly says that this is implemen- tation dependent. --Greg -- {ucbvax!hplabs | allegra!nbires | decvax!stcvax | harpo!seismo | ihnp4!stcvax} !hao!woods "Cherish well your thoughts, keep a tight grip on your booze 'Cause thinkin' and drinkin' are all I have today"
DBrown@HI-MULTICS.ARPA (07/13/84)
Well, on a machine with 3-word pointers the operation refrains from sign-extending so as not to accidentally create a long which points somewhere else when converted back to a pointer. Probably the criteria shoul be "do what you must, but try to retain corectness". This implies not sign-extending (so as to avoid a non-"pure" copy, but it also implies that a loss of significance should be visible to the programmer somehow. How you're going to do that I wouldn't know. --dave (unix hack on a 'bun) brown
keesan@bbncca.ARPA (Morris Keesan) (07/17/84)
-------------------------------- >From: Travis Lee Winfrey <Us.Travis%CU20B@COLUMBIA-20.ARPA> > >Well, by "pure copy", I think that what they had in mind was a copy where >information is neither lost (bits dropped) or irretrievably transmuted. >So if I take a pointer and convert it to a long, I should be able to take >that long variable containing the pointer, and convert it back to a usable >pointer. If your compiler doesn't sign-extend, isn't it losing a bit of >information? Given pointers that are one word and longs that are two words, it actually makes no difference in this scheme what goes into the high-order word. It could be sign-extension, zero-fill, one-fill, or alternating ones and zeros. In any case, converting it back to a pointer means taking the low-order word and discarding the high-order word, so the high-order word contains absolutely no information about the pointer that isn't in the low-order word. ------------------------------------------------------------------------------- Although most of the votes on this issue seem to agree with my initial feeling that sign-extension shouldn't be done, I've also seen enough votes the other way, and enough agreement with my position that it doesn't matter, that I've decided to leave the compiler alone. If you thought that was fun, you'll love what the compiler does with equality tests between a pointer and a long: it compares only the low-order word of the long, ignoring the high order. I'm not even going to ask for opinions on this one. It's incredibly ugly, anyone writing that kind of code is a fool, and I'm hiding behind the "result is machine dependent" clause in the reference manual. One last question -- does anyone know what the draft ANSI standard says about all of this? -- Morris M. Keesan {decvax,linus,ihnp4,wivax,wjh12,ima}!bbncca!keesan keesan @ BBN-UNIX.ARPA
phipps@fortune.UUCP (Clay Phipps) (07/21/84)
Whether pointers should be sign-extended or not depends on the architecture of the host machine. Addresses in the ELXSI 6400 are *signed* 32-bit integers. Therefore, they should be sign-extended when converted to 64-bit "long", so that they retain their integral value in both sizes. Although signed addresses are unconventional, it turns out that they cause no problems whatsoever, and they removed the need to perform 32-bit unsigned arithmetic as a special case within the machine for address calculation. If minus signs on addresses bother you, just print addresses in hexadecimal (or octal, if you're an octopus :-), and you won't see them at all. -- Clay Phipps -- { amd hplabs!hpda sri-unix ucbvax!amd } !fortune!phipps { ihnp4 cbosgd decvax!decwrl!amd harpo allegra}
smh@mit-eddie.UUCP (Steven M. Haflich) (07/21/84)
>>Since when should pointers be sign-extended on a 68000, or any other >>machine? Unless someone comes up with a meaning for the notion of a >>negative address, pointers should always be zero-extended. >>-- Jim Balter, INTERACTIVE Systems (ima!jim) Typical language execution environments on typical architectures (e.g. C on a 68000) have two kinds of allocated memory. The stack grows automatically with things like procedure invocation. The heap is managed by explicit user calls (e.g. malloc/free). On machines with huge virtual address spaces, it is quite reasonable to keep addresses for the two separate. Typically, virtual stack space could grow downward from 0xffffffff while the heap grows upward from 0x0. On a small dedicated-application system with limited physical memory, and in which it is certain neither the stack and heap will grow beyond 32K, it would be quite reasonable to keep pointers inside data structures as 16-bit *signed* quantities. In effect, one is modelling memory as a range around zero (0xffff8000..0x00007fff) instead of the usual notion of positive-number addressing (0x0..0x0000ffff). I know it smells of crock. You didn't say it had to be a *good* reason, did you? Steve Haflich, MIT
alan@allegra.UUCP (Alan S. Driscoll) (07/25/84)
> This discussion of sign extending pointers when converting to longs > has got me wondering what the Prime C compiler does when extending a > long into a pointer. Pointers are the longest type on this compiler: > 48 bits, while long, int, and unsigned are all 32 bits. This breaks > all programs that assume pointers can be assigned to longs and recovered > unchanged. Only char pointers need more than 32 bits, but the compiler > author decided to make all pointers 48 bits anyway. I'll investigate > this and report back if anyone is interested. If this is the case, then the Prime compiler is just wrong. See section 14.4 of the C Reference Manual, "Explicit pointer conversions." At least one of the integral types must be large enough to hold a pointer. -- Alan S. Driscoll AT&T Bell Laboratories
bsafw@ncoast.UUCP (The WITNESS) (07/30/84)
What he said is VERY implementation dependent -- the 68000 is a bit screwy. To wit: the program counter is 4 bytes and only uses the low three, and there is a "zero page" addressing mode that extends from 0x0 to 0x7fff AND from 0xff8000 to 0xffffff (sign-extended word quantity). THAT'S why it worked. But I wouldn't want to port it to a VAX. -- Brandon Allbery: decvax!cwruecmp{!atvax}!bsafw 6504 Chestnut Road, Independence, OH 44131 Witness, n. To watch and learn, joyously.
jim@ism780b.UUCP (08/01/84)
#R:bbncca:-83100:ism780b:25500012:000:563 ism780b!jim Jul 23 10:59:00 1984 > Although signed addresses are unconventional, > it turns out that they cause no problems whatsoever, > and they removed the need to perform 32-bit unsigned arithmetic > as a special case within the machine for address calculation. So NULL points to the middle of the address space? And the loader origin for programs is -0x7fffffff-1, which can't be expressed properly as a negative number without breaking most software? Or does no one use the first byte? I'm not sure I believe this "no problems whatsoever". -- Jim Balter, INTERACTIVE Systems (ima!jim)
jim@ism780b.UUCP (08/01/84)
#R:mit-eddie:-241000:ism780b:25500015:000:1078 ism780b!jim Jul 23 11:28:00 1984 > On machines with huge virtual address spaces, it is quite reasonable to > keep addresses for the two separate. Typically, virtual stack space > could grow downward from 0xffffffff while the heap grows upward from > 0x0. On a small dedicated-application system with limited physical > memory, and in which it is certain neither the stack and heap will grow > beyond 32K, it would be quite reasonable to keep pointers inside data > structures as 16-bit *signed* quantities. In effect, one is modelling > memory as a range around zero (0xffff8000..0x00007fff) instead of the > usual notion of positive-number addressing (0x0..0x0000ffff). No, one is modelling two distinct address spaces, [0..0x7fff] and [0x8000..0xffff]; since there is no order relationship between the two, it doesn't matter whether you sign-extend or not. I would claim that, whenever there is such an order relationship, it is much simpler to base your addresses at 0 rather than at 0x8000 or 0x80000000, or whatever depending upon your adress space size. -- Jim Balter, INTERACTIVE Systems (ima!jim)
jim@ism780b.UUCP (08/01/84)
#R:bbncca:-83100:ism780b:25500004:000:242 ism780b!jim Jul 13 19:46:00 1984 Since when should pointers be sign-extended on a 68000, or any other machine? Unless someone comes up with a meaning for the notion of a negative address, pointers should always be zero-extended. -- Jim Balter, INTERACTIVE Systems (ima!jim)
geoff@callan.UUCP (Geoff Kuenning) (08/02/84)
Interestingly enough, the 68000 considers word-length pointers to be signed in certain contexts. The "absolute short" addressing mode sign-extends the 16-bit address. Perhaps this is intended for use with a split user/system address space like on the Vax? -- Geoff Kuenning Callan Data Systems ...!ihnp4!wlbr!callan!geoff