[net.lang.c] pointer -> long conversion

keesan@bbncca.ARPA (Morris Keesan) (07/09/84)

----------------------

    On machines where pointers are shorter than longs, should the pointer
sign-extend when being converted to long (assume all pointers the same size
for simplicity -- on machines with more than one size of pointer, the question
applies to those types of pointers which are shorter than longs)?  This   
question arises because of some code we came across which is converting
pointers to long in various ways (don't ask why -- it's a long story, but the
reasons turn out to be mostly valid), and the compiled code is generating
sign-extension.  At first, this appears to be wrong, but on second thought I'm
not sure.  K&R says (section 7.14, p.192) "The compilers currently allow a
pointer to be assigned to an integer, an integer to a pointer, and a pointer to
a pointer of another type.  The assignment is a pure copy operation, with no
conversion."  But what does "pure copy" mean, when the objects are of different
sizes?  Is there any reason to prefer non-sign-extension over sign-extension,
or vice-versa?  Is there any reason why a C programmer should legitimately care
whether sign-extension occurs in these cases?  Please no flames about whether
converting pointers to longs is a reasonable operation.  I'm just trying to
figure out what the compiler should do, given that the operation is allowed.
-- 
					Morris M. Keesan
					{decvax,linus,wjh12,ima}!bbncca!keesan
					keesan @ BBN-UNIX.ARPA

jhh@ihldt.UUCP (John Haller) (07/10/84)

There is a definite precedent for considering pointers to be unsigned.
In the Version 6 (Remember that?) C compiler, when the were no unsigned
quantities, (char *) was used where unsigned int's were desired.

		John Haller

dmk@lanl-a.UUCP (07/10/84)

[]

     You should be able to get both types of conversion by saying
either
                    anytype *x;
                    long i;

                    i = (long)((int) x);  /* giving sign extension, or */
                    i = (long)((unsigned) x);  /* no sign extension */

but I'm not sure which one is better to use as the default when you say,

                    i = (long)x;

-- it depends on whether you think of an address as just a token (unsigned)
or an integer (signed).  Maybe it's best to just do whatever is the most
natural for the machine involved and if the user needs to force it one
way or the other then let him use a double typecast as above.

                               David Keaton
                               Los Alamos National Laboratory
                               ucbvax!lbl-csam!lanl-a!dmk
                               dmk@lanl.arpa

miller@saturn.UUCP (Terrence C. Miller) (07/11/84)

The ANSI standard (the last I time looked) defined pointer comparison to
be unsigned, and I suspect most of us regard memory addresses as unsigned.
Thus pointers should be extended like unsigned numbers (i.e zero filled).

woods@hao.UUCP (07/11/84)

  As usual, I don't know what the official manual says. All I know is,
a negative pointer value is total nonsense. Therefore it seems to me that
*any* pointer value assigned to an integer (or long) should be positive. 
So, I would advocate *no* sign extension.
  I would be interested to hear if someone can point out a reference stating
what a compiler should do, or one that explicitly says that this is implemen-
tation dependent.

--Greg
-- 
{ucbvax!hplabs | allegra!nbires | decvax!stcvax | harpo!seismo | ihnp4!stcvax}
       		        !hao!woods
   
   "Cherish well your thoughts, keep a tight grip on your booze
    'Cause thinkin' and drinkin' are all I have today"

DBrown@HI-MULTICS.ARPA (07/13/84)

  Well, on a machine with 3-word pointers the operation refrains from
sign-extending so as not to accidentally create a long which points
somewhere else when converted back to a pointer.
  Probably the criteria shoul be "do what you must, but try to retain
corectness".  This implies not sign-extending (so as to avoid a
non-"pure" copy, but it also implies that a loss of significance should
be visible to the programmer somehow.  How you're going to do that I
wouldn't know.
 --dave (unix hack on a 'bun) brown

keesan@bbncca.ARPA (Morris Keesan) (07/17/84)

--------------------------------
>From:  Travis Lee Winfrey <Us.Travis%CU20B@COLUMBIA-20.ARPA>
>
>Well, by "pure copy", I think that what they had in mind was a copy where
>information is neither lost (bits dropped) or irretrievably transmuted.
>So if I take a pointer and convert it to a long, I should be able to take
>that long variable containing the pointer, and convert it back to a usable
>pointer.  If your compiler doesn't sign-extend, isn't it losing a bit of
>information?  

 Given pointers that are one word and longs that are two words, it actually
makes no difference in this scheme what goes into the high-order word.  It
could be sign-extension, zero-fill, one-fill, or alternating ones and zeros.
In any case, converting it back to a pointer means taking the low-order word
and discarding the high-order word, so the high-order word contains absolutely
no information about the pointer that isn't in the low-order word.

-------------------------------------------------------------------------------
 Although most of the votes on this issue seem to agree with my initial feeling
that sign-extension shouldn't be done, I've also seen enough votes the other
way, and enough agreement with my position that it doesn't matter, that I've
decided to leave the compiler alone.

 If you thought that was fun, you'll love what the compiler does with equality
tests between a pointer and a long:  it compares only the low-order word of the
long, ignoring the high order.  I'm not even going to ask for opinions on this
one.  It's incredibly ugly, anyone writing that kind of code is a fool, and I'm
hiding behind the "result is machine dependent" clause in the reference manual.

 One last question -- does anyone know what the draft ANSI standard says about
all of this?
-- 
			    Morris M. Keesan
			    {decvax,linus,ihnp4,wivax,wjh12,ima}!bbncca!keesan
			    keesan @ BBN-UNIX.ARPA

phipps@fortune.UUCP (Clay Phipps) (07/21/84)

Whether pointers should be sign-extended or not depends on the architecture
of the host machine.

Addresses in the ELXSI 6400 are *signed* 32-bit integers.
Therefore, they should be sign-extended when converted to 64-bit "long",
so that they retain their integral value in both sizes.

Although signed addresses are unconventional,
it turns out that they cause no problems whatsoever,
and they removed the need to perform 32-bit unsigned arithmetic
as a special case within the machine for address calculation.

If minus signs on addresses bother you, just print addresses in hexadecimal
(or octal, if you're an octopus :-), and you won't see them at all.

-- Clay Phipps

-- 
            { amd  hplabs!hpda  sri-unix  ucbvax!amd }          
                                                      !fortune!phipps
   { ihnp4  cbosgd  decvax!decwrl!amd  harpo  allegra}

smh@mit-eddie.UUCP (Steven M. Haflich) (07/21/84)

>>Since when should pointers be sign-extended on a 68000, or any other
>>machine?  Unless someone comes up with a meaning for the notion of a
>>negative address, pointers should always be zero-extended.
>>-- Jim Balter, INTERACTIVE Systems (ima!jim)

Typical language execution environments on typical architectures (e.g.
C on a 68000) have two kinds of allocated memory.  The stack grows
automatically with things like procedure invocation.  The heap is
managed by explicit user calls (e.g. malloc/free).

On machines with huge virtual address spaces, it is quite reasonable to
keep addresses for the two separate.  Typically, virtual stack space
could grow downward from 0xffffffff while the heap grows upward from
0x0.  On a small dedicated-application system with limited physical
memory, and in which it is certain neither the stack and heap will grow
beyond 32K, it would be quite reasonable to keep pointers inside data
structures as 16-bit *signed* quantities.  In effect, one is modelling
memory as a range around zero (0xffff8000..0x00007fff) instead of the
usual notion of positive-number addressing (0x0..0x0000ffff).

I know it smells of crock.  You didn't say it had to be a *good* reason,
did you?

Steve Haflich, MIT

alan@allegra.UUCP (Alan S. Driscoll) (07/25/84)

> This discussion of sign extending pointers when converting to longs
> has got me wondering what the Prime C compiler does when extending a
> long into a pointer.  Pointers are the longest type on this compiler:
> 48 bits, while long, int, and unsigned are all 32 bits.  This breaks
> all programs that assume pointers can be assigned to longs and recovered
> unchanged.  Only char pointers need more than 32 bits,  but the compiler
> author decided to make all pointers 48 bits anyway.  I'll investigate
> this and report back if anyone is interested.  

If this is the case, then the Prime compiler is just wrong.  See section
14.4 of the C Reference Manual, "Explicit pointer conversions."  At least
one of the integral types must be large enough to hold a pointer.

-- 

	Alan S. Driscoll
	AT&T Bell Laboratories

bsafw@ncoast.UUCP (The WITNESS) (07/30/84)

What he said is VERY implementation dependent -- the 68000 is a bit screwy.
To wit:  the program counter is 4 bytes and only uses the low three, and
there is a "zero page" addressing mode that extends from 0x0 to 0x7fff AND
from 0xff8000 to 0xffffff (sign-extended word quantity).  THAT'S why it
worked.  But I wouldn't want to port it to a VAX.
-- 
		Brandon Allbery: decvax!cwruecmp{!atvax}!bsafw
		  6504 Chestnut Road, Independence, OH 44131

		  Witness, n.  To watch and learn, joyously.

jim@ism780b.UUCP (08/01/84)

#R:bbncca:-83100:ism780b:25500012:000:563
ism780b!jim    Jul 23 10:59:00 1984

> Although signed addresses are unconventional,
> it turns out that they cause no problems whatsoever,
> and they removed the need to perform 32-bit unsigned arithmetic
> as a special case within the machine for address calculation.

So NULL points to the middle of the address space?  And the loader origin
for programs is -0x7fffffff-1, which can't be expressed properly as
a negative number without breaking most software?  Or does no one use
the first byte?  I'm not sure I believe this "no problems whatsoever".

-- Jim Balter, INTERACTIVE Systems (ima!jim)

jim@ism780b.UUCP (08/01/84)

#R:mit-eddie:-241000:ism780b:25500015:000:1078
ism780b!jim    Jul 23 11:28:00 1984

> On machines with huge virtual address spaces, it is quite reasonable to
> keep addresses for the two separate.  Typically, virtual stack space
> could grow downward from 0xffffffff while the heap grows upward from
> 0x0.  On a small dedicated-application system with limited physical
> memory, and in which it is certain neither the stack and heap will grow
> beyond 32K, it would be quite reasonable to keep pointers inside data
> structures as 16-bit *signed* quantities.  In effect, one is modelling
> memory as a range around zero (0xffff8000..0x00007fff) instead of the
> usual notion of positive-number addressing (0x0..0x0000ffff).

No, one is modelling two distinct address spaces, [0..0x7fff] and
[0x8000..0xffff]; since there is no order relationship between the two,
it doesn't matter whether you sign-extend or not.  I would claim that,
whenever there is such an order relationship, it is much simpler to base
your addresses at 0 rather than at 0x8000 or 0x80000000, or whatever
depending upon your adress space size.

-- Jim Balter, INTERACTIVE Systems (ima!jim)

jim@ism780b.UUCP (08/01/84)

#R:bbncca:-83100:ism780b:25500004:000:242
ism780b!jim    Jul 13 19:46:00 1984

Since when should pointers be sign-extended on a 68000, or any other machine?
Unless someone comes up with a meaning for the notion of a negative address,
pointers should always be zero-extended.

-- Jim Balter, INTERACTIVE Systems (ima!jim)

geoff@callan.UUCP (Geoff Kuenning) (08/02/84)

Interestingly enough, the 68000 considers word-length pointers to be signed
in certain contexts.  The "absolute short" addressing mode sign-extends the
16-bit address.  Perhaps this is intended for use with a split
user/system address space like on the Vax?
-- 

	Geoff Kuenning
	Callan Data Systems
	...!ihnp4!wlbr!callan!geoff