[comp.lang.c] pointer representation

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/09/89)

In article <2064@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
>... I assume that x == y in C means "x and y point to the same place".
>1.  Does the current draft actually say this?

What it says is that (apart from null pointers) pointers that compare
equal refer to the same object (or function) and pointers that refer
to the same object (ditto) compare equal.

>If that is so, then a C compiler which exploits x == y to use y where x
>was written is _not_ guaranteed to preserve things like a ring number,
>which could be Bad News for people trying to write operating system
>kernels in C.

If they're trying to represent (address+ring) as a C pointer, they're
ALREADY in trouble!

>2.  According to the current draft, are C compilers allowed to use
>    y instead of x if x == y and it is known that x and y cannot have
>    changed since that was established?

Compilers can optimize however they want so long as the observable
results (apart from timing considerations and other side-effects not
within the scope of the Standard) are as though produced by the
abstract C machine operating as specified in the Standard .

>If, in order to avoid the problem of misteriously losing access rights,
>getting pointers with the invalid bit on, and so on, compiler writers
>adopt the "same pointer" semantics rather than the "same place" semantics,
>the argument for doing pointer comparison in address registers goes away.

So does speed of computation on many systems, where the natural way
to do pointer arithmetic can produce several different representations
for the same pointer.  By limiting the need for "normalization" to
just the comparison of pointers, considerable overhead is avoided.

What you're calling "same pointer" is actually "same representation".
So far as C is concerned "same pointer" means: both null or both
refer to the same object or both refer to the same function.
Details of representation are left up to the implementor.

ok@cs.mu.oz.au (Richard O'Keefe) (09/10/89)

In article <11016@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
: In article <2064@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
: >If that is so, then a C compiler which exploits x == y to use y where x
: >was written is _not_ guaranteed to preserve things like a ring number,
: >which could be Bad News for people trying to write operating system
: >kernels in C.
: 
: If they're trying to represent (address+ring) as a C pointer, they're
: ALREADY in trouble!

I don't quite follow this.  On the PR1ME 50 series -- and some others --
the native "address" format *always* has a ring number in it.  It's not
a question of someone implementing C on such a machine _choosing_ to
represent address+ring as a C pointer, it's a question of them choosing
to represent C pointers as native hardware-type addresses, and those
addresses always have ring numbers in them.  If the C implementors choose
some other representation for C pointers than native hardware-type addresses,
then they're going to get a performance hit whenever there is an & or *.
All the address registers (remember that this thread is about address
registers?) have a ring number field in them.  The PCL (procedure call)
instruction constructs a stack frame where the argument slots are filled
in with addresses which contain ring numbers.  And so it goes.  There are
three C compilers on these machines that I've heard of, is there anyone on
the net who knows how the "pointers that refer to the same object must
compare equal" rule will affect any of them?

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/10/89)

In article <2072@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
>In article <11016@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
>: In article <2064@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
>: >If that is so, then a C compiler which exploits x == y to use y where x
>: >was written is _not_ guaranteed to preserve things like a ring number,
>: >which could be Bad News for people trying to write operating system
>: >kernels in C.
>: If they're trying to represent (address+ring) as a C pointer, they're
>: ALREADY in trouble!
>I don't quite follow this.  On the PR1ME 50 series -- and some others --
>the native "address" format *always* has a ring number in it.  It's not
>a question of someone implementing C on such a machine _choosing_ to
>represent address+ring as a C pointer, it's a question of them choosing
>to represent C pointers as native hardware-type addresses, and those
>addresses always have ring numbers in them.

You're reading it backwards -- of course C pointers would be represented
using machine addresses, which in this case happen to have ring numbers
attached to them.  Your original concern was with OS kernels trying to
rely on the ring number extracted from a C pointer.  If the OS kernel
needs to deal with the ring number, it needs to do the bookkeeping some
other way.  Your example is essentially
	struct mach_addr { ... ring; ... loc; } x, y;
	if ( x.loc == y.loc )
		operation_on( y.ring, y.loc );
		/* where operation_on( x.ring, x.loc ) was intended */
When spelled out in detail, it looks more like the bug that it is.
The OS, in dealing with multiple active rings, has failed to check
whether the ring is appropriate, and if it were able to know that x.ring
was appropriate, it should also be able to retain the knowledge of what
x.ring happens to be and do the operation using that explicit ring
knowledge.  In fact a data structure similar to my "struct mach_addr"
may be desirable in such an OS implementation.

I don't think the difficulty of a ring-supporting OS implementation is
a very strong argument for imposing additional constraints on C pointer
implementation.

Pointer implementation that ignores ring number in comparisons will
perhaps work for user processes, but the kernel would have to be more
careful.  (It's probably written in PL/I anyhow.  You think C is hard
to use for such an OS implementation...)

Almost any reasonable multi-tasking OS is going to have to keep
straight which context a given address value belongs to.  It hasn't
been a big problem in UNIX so far; one merely has functions for
facilitating data access "across the OS/user interface", or even
better in some implementations, uses remapping to temporarily share
a portion of address space.  Right off hand, I don't see how it would
be much worse on a ring-based machine.

ok@cs.mu.oz.au (Richard O'Keefe) (09/11/89)

[This thread is about the effect of the "pointers are equal if and only
if they refer to the same object" rule on systems with ring numbers in
hardware addresses.]

Doug Gwyn wrote:
> Your original concern was with OS kernels trying to
> rely on the ring number extracted from a C pointer.  If the OS kernel
> needs to deal with the ring number, it needs to do the bookkeeping some
> other way.  Your example is essentially
> 	struct mach_addr { ... ring; ... loc; } x, y;
> 	if ( x.loc == y.loc )
> 		operation_on( y.ring, y.loc );
> 		/* where operation_on( x.ring, x.loc ) was intended */
> When spelled out in detail, it looks more like the bug that it is.

That doesn't look _quite_ like what I had in mind.  Here's an example
which I did have in mind:

	int store_for_user(int *p, int i)
	    {
		extern int *q;

		return p == q ? *p = i, 1 : 0;
	    }

where q is a pointer which has been created by kernel code (and thus has
ring number 0) while p is a pointer which has been passed by user code
(and thus has ring number 3).  The point is that on a machine with ring
numbers (or any other access right mechanism) encoded into machine
addresses, p and q can point to the same object without having the same
access rights to it.  If pointer equality ignores those access rights,
then pointer comparison on such machines (a) must be expensive and typically
can't be done using address registers and (b) has the bizarre property that
equality does not entail equivalence.  My point is that optimising compilers
generally assume that "equals may be subsituted for equals" (that's what
equality is all about), so that *correct* source code is very likely to be
incorrectly *optimised*.  For example, a compiler might translate my
store_for_user() example as if the return statement were

		return p == q ? *q = i : 0;

(q is not declared 'volatile').  It's not a question of allowing badly-
written programs to work; the programmer may have been quite careful about
which of several apparently equal pointers was used.  The point is that an
assumption which *is* true of numbers is true of pointers on a great many
machines, but under the pANS interpretation of "equality" it is not true,
so that there is a serious risk of compiler-writers getting it wrong.  At
the very least this needs to be spelled out in unmistakeable detail in
the Rationale so that compiler writers have no excuse for missing it.

Doug Gwyn wrote:
> I don't think the difficulty of a ring-supporting OS implementation is
> a very strong argument for imposing additional constraints on C pointer
> implementation.

Tu quoque: I don't think the difficulty of a loads-into-address-registers-
may-trap implementation is a very strong argument either.  

barmar@think.COM (Barry Margolin) (09/11/89)

In article <2075@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
>  My point is that optimising compilers
>generally assume that "equals may be subsituted for equals" (that's what
>equality is all about), so that *correct* source code is very likely to be
>incorrectly *optimised*.

If the optimizer for an architecture where pointers may compare equal
but not be completely equivalent were to perform such a
transformation, it would be a buggy optimizer.  You appear to be
assuming a system-independent optimizer; what you've done is point out
why such a thing is not really a good idea.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

ok@cs.mu.oz.au (Richard O'Keefe) (09/11/89)

In article <29171@news.Think.COM>, barmar@think.COM (Barry Margolin) writes:
> In article <2075@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
[deleted]

> If the optimizer for an architecture where pointers may compare equal
> but not be completely equivalent were to perform such a
> transformation, it would be a buggy optimizer.

Yes, it would.  But the compiler writer would have been _seduced_ into
that mistake by the standard.  People are encouraged to think of == as
testing for EQUALITY.  In dpANS C, it appears that == does *NOT* have
the properties of equality, and at the very least this needs to be said
clearly and explicitly in the Rationale.

By the way, proper mathematical terminology would be that the pointers
are EQUIVALENT (under ==) but not EQUAL (because they may not be
substituted freely for one another).  It is precisely this kind of
jumbling of words and ideas which I claim is likely to confuse compiler
writers.

> You appear to be
> assuming a system-independent optimizer; what you've done is point out
> why such a thing is not really a good idea.

I am _sick_ of people trying to guess what I am assuming; they are _always_
wrong.  (At least in this thread they have been.)  I wasn't assuming a
system-dependent optimizer at all.  The point is that it is a standard
technique described in most good compiler books to keep track of which
locations hold the "same" value and to use the "cheapest" location with
the desired value.  Anyone writing an optimising compiler for C who was
not on the committee and hasn't been reading this thread, whether he is
trying to out-do GCC or just generate code for the HAL Whizzbang 99 is likely
to make the natural assumption that because == is _called_ an "EQUALITY
operator" it has the properties of equality.  The more cunning of them may
be aware of some problems with floating-point "equality", but they are
likely to also be aware that Fortran and Ada compilers are licensed to act
_as if_ floating-point equality was exact.


The bottom line for those of us who do not write compilers is that we
should take care that two pointers are only compared when
EITHER (a) the two cannot refer to the same object
  OR   (b) the two must have been generated in the same way, so that they
	   cannot have different access rights &c
and that a pointer must not be referred to in any way after it has been
free()d.  Even ptr = NULL is too risky:  what if the compiler generates a
sequence which uses a read-modify-write cycle with the old value being
loaded into an address register?  -- remember how CLR works on a 68000...

barmar@think.COM (Barry Margolin) (09/12/89)

In article <2079@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
>In article <29171@news.Think.COM>, barmar@think.COM (Barry Margolin) writes:
>> If the optimizer for an architecture where pointers may compare equal
>> but not be completely equivalent were to perform such a
>> transformation, it would be a buggy optimizer.
>Yes, it would.  But the compiler writer would have been _seduced_ into
>that mistake by the standard.  People are encouraged to think of == as
>testing for EQUALITY.  In dpANS C, it appears that == does *NOT* have
>the properties of equality, and at the very least this needs to be said
>clearly and explicitly in the Rationale.

This is only true if there can actually be non-interchangeable
representations for pointers to the same location.  I'd expect the
compiler implementor for a system to know whether this is true, and
implement the optimizer accordingly.

By the way, since the difference between these representations is
outside the scope of the C standard, there's actually no reason such
optimizations couldn't be done (yes, I've changed my position from
when I concluded it would be a bug), as far as ANSI C is concerned.
In the case you described, the difference between using the two
pointers to the same object would be in whether it gets a memory
access violation because of ring numbers in the pointer; but C doesn't
specify the behavior regarding memory access violations.  If a C
implementation does performs such a transformation it may just not be
a good implementation to use for implementing certain inner-ring code
on such machines (although I'd expect the "volatile" modifier to be of
use in preventing such unwanted optimizations).  But it would still be
ANSI-conformant.

>> You appear to be
>> assuming a system-independent optimizer; what you've done is point out
>> why such a thing is not really a good idea.
>I am _sick_ of people trying to guess what I am assuming; they are _always_
>wrong.  (At least in this thread they have been.)  I wasn't assuming a
>system-dependent optimizer at all.  The point is that it is a standard
>technique described in most good compiler books to keep track of which
>locations hold the "same" value and to use the "cheapest" location with
>the desired value.  

But someone familiar enough with the architecture to be writing an
optimizer should know that "same" and "==" aren't necessarily the same
thing for pointers.  "Standard techniques" are fine, but they must be
understood in context.

>Anyone writing an optimising compiler for C who was
>not on the committee and hasn't been reading this thread, whether he is
>trying to out-do GCC or just generate code for the HAL Whizzbang 99 is likely
>to make the natural assumption that because == is _called_ an "EQUALITY
>operator" it has the properties of equality.  

Well, since I've reverse my position on whether that optimization is
valid (it's probably still not desirable, though), I don't see a major
problem with implementors who haven't read the thread.

>The bottom line for those of us who do not write compilers is that we
>should take care that two pointers are only compared when
>EITHER (a) the two cannot refer to the same object
>  OR   (b) the two must have been generated in the same way, so that they
>	   cannot have different access rights &c

Unfortunately, this may not be good enough.  What if the generated
code does the comparison and substitution without you writing the
comparison explicitly?  I doubt there's anything in the standard
prohibiting this.  I've never heard of an optimizer doing this, but I
don't read much compiler implementation literature; if I can conceive
of it, so can actual compiler researchers.

>and that a pointer must not be referred to in any way after it has been
>free()d.  Even ptr = NULL is too risky:  what if the compiler generates a
>sequence which uses a read-modify-write cycle with the old value being
>loaded into an address register?  -- remember how CLR works on a 68000...

Uh oh, how are you ever going to assign something to an automatic
pointer?  Their initial value is undefined, so they might contain
invalid addresses.  I suspect the standard requires assignments into
pointer variables to always work, regardless of the validity of the
previous value; the compiler will have to avoid generating
instructions that care what the old value is.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/12/89)

In article <2075@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
>My point is that optimising compilers generally assume that "equals may
>be subsituted for equals" (that's what equality is all about), so that
>*correct* source code is very likely to be incorrectly *optimised*.

Any my point is that the code was NOT correct, because it relied on the
unwarranted assumption that C pointers' associated ring numbers were
preserved, while the C implementation specifically ignores the ring
numbers, which should be a tip-off that they should not be relied on.
I still maintain that the proper way to deal with this is via some
implementation-specific code that explicitly knows about ring numbers.

cudcv@warwick.ac.uk (Rob McMahon) (09/15/89)

In article <29250@news.Think.COM> barmar@think.COM (Barry Margolin) writes:
>>Yes, it would.  But the compiler writer would have been _seduced_ into that
>>mistake by the standard.  People are encouraged to think of == as testing
>>for EQUALITY.  In dpANS C, it appears that == does *NOT* have the properties
>>of equality, and at the very least this needs to be said clearly and
>>explicitly in the Rationale.
>
>This is only true if there can actually be non-interchangeable
>representations for pointers to the same location.  I'd expect the compiler
>implementor for a system to know whether this is true, and implement the
>optimizer accordingly.

Who remembers the Burroughs B6700, which had (in Algol)

	POINTER A, B;
	IF A = B THEN ...
vs
	IF A IS B THEN ...

compiling to

	A
	B
	EQUL (sp?)
vs.
	A
	B
	SAME

Sigh.  Those were the days ...

Rob
-- 
UUCP:   ...!mcvax!ukc!warwick!cudcv	PHONE:  +44 203 523037
JANET:  cudcv@uk.ac.warwick             ARPA:   cudcv@warwick.ac.uk
Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England

ok@cs.mu.oz.au (Richard O'Keefe) (09/15/89)

In article <206@titania.warwick.ac.uk>, cudcv@warwick.ac.uk (Rob McMahon) writes:
> Who remembers the Burroughs B6700, which had (in Algol)
> 	POINTER A, B;
> 	IF A = B THEN ...
> vs
> 	IF A IS B THEN ...

I remember the B6700, with considerable fondness.  You have got it _wrong_.
The difference between = and IS concerned _numbers_.  The B6700 used 3 tag
bits and 48 data bits (double it for double precision, where tag=2).  The
numeric representation was that floating point numbers were
	sign x (8 ** exponent) * mantissa.0
(octonary point on the _right_, so no hidden bit) and integers were just
floats with exponent 0.  There were 39 bits of mantissa, and the 48th
bit of the word was ignored.  (I believe this was for backwards
compatibility with the B5500/5700, where the 48th bit was the tag bit.)
'=' compared numbers for numeric equality, treating -0.0 the same as +0.0,
ignoring the 48th bit, and normalising as required.
'IS' compared 48-bit words for bitwise identity.  This was useful if you
were using them as bit-vectors (bitwise operations on REALs gave you the
full 48 bits) or to hold characters (6 EBCDIC characters in one word).
If I remember correctly, the SAME instruction also compared tags.

For a comparison, consider IBM/370 floating-point numbers.  They are not
necessarily normalised, and can represent both +0.0 and -0.0.  Using the
floating-point comparison instructions would be like '=', comparing /370
floats using integer comparisons would be like 'IS'.

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/15/89)

In article <206@titania.warwick.ac.uk> cudcv@warwick.ac.uk (Rob McMahon) writes:
>Who remembers the Burroughs B6700, ...

I do!  It was a nice system.  The architecture made it somewhat difficult
to implement Fortran (and, I would suspect, C), but it was accomplished
nevertheless.  Burroughs gambled on Algol taking over, once it had been
approved as an international standard.  That may have occurred to some
degree in Europe, but over here people followed IBM's lead..