gwyn@smoke.BRL.MIL (Doug Gwyn) (09/09/89)
In article <2064@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: >... I assume that x == y in C means "x and y point to the same place". >1. Does the current draft actually say this? What it says is that (apart from null pointers) pointers that compare equal refer to the same object (or function) and pointers that refer to the same object (ditto) compare equal. >If that is so, then a C compiler which exploits x == y to use y where x >was written is _not_ guaranteed to preserve things like a ring number, >which could be Bad News for people trying to write operating system >kernels in C. If they're trying to represent (address+ring) as a C pointer, they're ALREADY in trouble! >2. According to the current draft, are C compilers allowed to use > y instead of x if x == y and it is known that x and y cannot have > changed since that was established? Compilers can optimize however they want so long as the observable results (apart from timing considerations and other side-effects not within the scope of the Standard) are as though produced by the abstract C machine operating as specified in the Standard . >If, in order to avoid the problem of misteriously losing access rights, >getting pointers with the invalid bit on, and so on, compiler writers >adopt the "same pointer" semantics rather than the "same place" semantics, >the argument for doing pointer comparison in address registers goes away. So does speed of computation on many systems, where the natural way to do pointer arithmetic can produce several different representations for the same pointer. By limiting the need for "normalization" to just the comparison of pointers, considerable overhead is avoided. What you're calling "same pointer" is actually "same representation". So far as C is concerned "same pointer" means: both null or both refer to the same object or both refer to the same function. Details of representation are left up to the implementor.
ok@cs.mu.oz.au (Richard O'Keefe) (09/10/89)
In article <11016@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes: : In article <2064@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: : >If that is so, then a C compiler which exploits x == y to use y where x : >was written is _not_ guaranteed to preserve things like a ring number, : >which could be Bad News for people trying to write operating system : >kernels in C. : : If they're trying to represent (address+ring) as a C pointer, they're : ALREADY in trouble! I don't quite follow this. On the PR1ME 50 series -- and some others -- the native "address" format *always* has a ring number in it. It's not a question of someone implementing C on such a machine _choosing_ to represent address+ring as a C pointer, it's a question of them choosing to represent C pointers as native hardware-type addresses, and those addresses always have ring numbers in them. If the C implementors choose some other representation for C pointers than native hardware-type addresses, then they're going to get a performance hit whenever there is an & or *. All the address registers (remember that this thread is about address registers?) have a ring number field in them. The PCL (procedure call) instruction constructs a stack frame where the argument slots are filled in with addresses which contain ring numbers. And so it goes. There are three C compilers on these machines that I've heard of, is there anyone on the net who knows how the "pointers that refer to the same object must compare equal" rule will affect any of them?
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/10/89)
In article <2072@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: >In article <11016@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes: >: In article <2064@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: >: >If that is so, then a C compiler which exploits x == y to use y where x >: >was written is _not_ guaranteed to preserve things like a ring number, >: >which could be Bad News for people trying to write operating system >: >kernels in C. >: If they're trying to represent (address+ring) as a C pointer, they're >: ALREADY in trouble! >I don't quite follow this. On the PR1ME 50 series -- and some others -- >the native "address" format *always* has a ring number in it. It's not >a question of someone implementing C on such a machine _choosing_ to >represent address+ring as a C pointer, it's a question of them choosing >to represent C pointers as native hardware-type addresses, and those >addresses always have ring numbers in them. You're reading it backwards -- of course C pointers would be represented using machine addresses, which in this case happen to have ring numbers attached to them. Your original concern was with OS kernels trying to rely on the ring number extracted from a C pointer. If the OS kernel needs to deal with the ring number, it needs to do the bookkeeping some other way. Your example is essentially struct mach_addr { ... ring; ... loc; } x, y; if ( x.loc == y.loc ) operation_on( y.ring, y.loc ); /* where operation_on( x.ring, x.loc ) was intended */ When spelled out in detail, it looks more like the bug that it is. The OS, in dealing with multiple active rings, has failed to check whether the ring is appropriate, and if it were able to know that x.ring was appropriate, it should also be able to retain the knowledge of what x.ring happens to be and do the operation using that explicit ring knowledge. In fact a data structure similar to my "struct mach_addr" may be desirable in such an OS implementation. I don't think the difficulty of a ring-supporting OS implementation is a very strong argument for imposing additional constraints on C pointer implementation. Pointer implementation that ignores ring number in comparisons will perhaps work for user processes, but the kernel would have to be more careful. (It's probably written in PL/I anyhow. You think C is hard to use for such an OS implementation...) Almost any reasonable multi-tasking OS is going to have to keep straight which context a given address value belongs to. It hasn't been a big problem in UNIX so far; one merely has functions for facilitating data access "across the OS/user interface", or even better in some implementations, uses remapping to temporarily share a portion of address space. Right off hand, I don't see how it would be much worse on a ring-based machine.
ok@cs.mu.oz.au (Richard O'Keefe) (09/11/89)
[This thread is about the effect of the "pointers are equal if and only if they refer to the same object" rule on systems with ring numbers in hardware addresses.] Doug Gwyn wrote: > Your original concern was with OS kernels trying to > rely on the ring number extracted from a C pointer. If the OS kernel > needs to deal with the ring number, it needs to do the bookkeeping some > other way. Your example is essentially > struct mach_addr { ... ring; ... loc; } x, y; > if ( x.loc == y.loc ) > operation_on( y.ring, y.loc ); > /* where operation_on( x.ring, x.loc ) was intended */ > When spelled out in detail, it looks more like the bug that it is. That doesn't look _quite_ like what I had in mind. Here's an example which I did have in mind: int store_for_user(int *p, int i) { extern int *q; return p == q ? *p = i, 1 : 0; } where q is a pointer which has been created by kernel code (and thus has ring number 0) while p is a pointer which has been passed by user code (and thus has ring number 3). The point is that on a machine with ring numbers (or any other access right mechanism) encoded into machine addresses, p and q can point to the same object without having the same access rights to it. If pointer equality ignores those access rights, then pointer comparison on such machines (a) must be expensive and typically can't be done using address registers and (b) has the bizarre property that equality does not entail equivalence. My point is that optimising compilers generally assume that "equals may be subsituted for equals" (that's what equality is all about), so that *correct* source code is very likely to be incorrectly *optimised*. For example, a compiler might translate my store_for_user() example as if the return statement were return p == q ? *q = i : 0; (q is not declared 'volatile'). It's not a question of allowing badly- written programs to work; the programmer may have been quite careful about which of several apparently equal pointers was used. The point is that an assumption which *is* true of numbers is true of pointers on a great many machines, but under the pANS interpretation of "equality" it is not true, so that there is a serious risk of compiler-writers getting it wrong. At the very least this needs to be spelled out in unmistakeable detail in the Rationale so that compiler writers have no excuse for missing it. Doug Gwyn wrote: > I don't think the difficulty of a ring-supporting OS implementation is > a very strong argument for imposing additional constraints on C pointer > implementation. Tu quoque: I don't think the difficulty of a loads-into-address-registers- may-trap implementation is a very strong argument either.
barmar@think.COM (Barry Margolin) (09/11/89)
In article <2075@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: > My point is that optimising compilers >generally assume that "equals may be subsituted for equals" (that's what >equality is all about), so that *correct* source code is very likely to be >incorrectly *optimised*. If the optimizer for an architecture where pointers may compare equal but not be completely equivalent were to perform such a transformation, it would be a buggy optimizer. You appear to be assuming a system-independent optimizer; what you've done is point out why such a thing is not really a good idea. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
ok@cs.mu.oz.au (Richard O'Keefe) (09/11/89)
In article <29171@news.Think.COM>, barmar@think.COM (Barry Margolin) writes: > In article <2075@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: [deleted] > If the optimizer for an architecture where pointers may compare equal > but not be completely equivalent were to perform such a > transformation, it would be a buggy optimizer. Yes, it would. But the compiler writer would have been _seduced_ into that mistake by the standard. People are encouraged to think of == as testing for EQUALITY. In dpANS C, it appears that == does *NOT* have the properties of equality, and at the very least this needs to be said clearly and explicitly in the Rationale. By the way, proper mathematical terminology would be that the pointers are EQUIVALENT (under ==) but not EQUAL (because they may not be substituted freely for one another). It is precisely this kind of jumbling of words and ideas which I claim is likely to confuse compiler writers. > You appear to be > assuming a system-independent optimizer; what you've done is point out > why such a thing is not really a good idea. I am _sick_ of people trying to guess what I am assuming; they are _always_ wrong. (At least in this thread they have been.) I wasn't assuming a system-dependent optimizer at all. The point is that it is a standard technique described in most good compiler books to keep track of which locations hold the "same" value and to use the "cheapest" location with the desired value. Anyone writing an optimising compiler for C who was not on the committee and hasn't been reading this thread, whether he is trying to out-do GCC or just generate code for the HAL Whizzbang 99 is likely to make the natural assumption that because == is _called_ an "EQUALITY operator" it has the properties of equality. The more cunning of them may be aware of some problems with floating-point "equality", but they are likely to also be aware that Fortran and Ada compilers are licensed to act _as if_ floating-point equality was exact. The bottom line for those of us who do not write compilers is that we should take care that two pointers are only compared when EITHER (a) the two cannot refer to the same object OR (b) the two must have been generated in the same way, so that they cannot have different access rights &c and that a pointer must not be referred to in any way after it has been free()d. Even ptr = NULL is too risky: what if the compiler generates a sequence which uses a read-modify-write cycle with the old value being loaded into an address register? -- remember how CLR works on a 68000...
barmar@think.COM (Barry Margolin) (09/12/89)
In article <2079@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: >In article <29171@news.Think.COM>, barmar@think.COM (Barry Margolin) writes: >> If the optimizer for an architecture where pointers may compare equal >> but not be completely equivalent were to perform such a >> transformation, it would be a buggy optimizer. >Yes, it would. But the compiler writer would have been _seduced_ into >that mistake by the standard. People are encouraged to think of == as >testing for EQUALITY. In dpANS C, it appears that == does *NOT* have >the properties of equality, and at the very least this needs to be said >clearly and explicitly in the Rationale. This is only true if there can actually be non-interchangeable representations for pointers to the same location. I'd expect the compiler implementor for a system to know whether this is true, and implement the optimizer accordingly. By the way, since the difference between these representations is outside the scope of the C standard, there's actually no reason such optimizations couldn't be done (yes, I've changed my position from when I concluded it would be a bug), as far as ANSI C is concerned. In the case you described, the difference between using the two pointers to the same object would be in whether it gets a memory access violation because of ring numbers in the pointer; but C doesn't specify the behavior regarding memory access violations. If a C implementation does performs such a transformation it may just not be a good implementation to use for implementing certain inner-ring code on such machines (although I'd expect the "volatile" modifier to be of use in preventing such unwanted optimizations). But it would still be ANSI-conformant. >> You appear to be >> assuming a system-independent optimizer; what you've done is point out >> why such a thing is not really a good idea. >I am _sick_ of people trying to guess what I am assuming; they are _always_ >wrong. (At least in this thread they have been.) I wasn't assuming a >system-dependent optimizer at all. The point is that it is a standard >technique described in most good compiler books to keep track of which >locations hold the "same" value and to use the "cheapest" location with >the desired value. But someone familiar enough with the architecture to be writing an optimizer should know that "same" and "==" aren't necessarily the same thing for pointers. "Standard techniques" are fine, but they must be understood in context. >Anyone writing an optimising compiler for C who was >not on the committee and hasn't been reading this thread, whether he is >trying to out-do GCC or just generate code for the HAL Whizzbang 99 is likely >to make the natural assumption that because == is _called_ an "EQUALITY >operator" it has the properties of equality. Well, since I've reverse my position on whether that optimization is valid (it's probably still not desirable, though), I don't see a major problem with implementors who haven't read the thread. >The bottom line for those of us who do not write compilers is that we >should take care that two pointers are only compared when >EITHER (a) the two cannot refer to the same object > OR (b) the two must have been generated in the same way, so that they > cannot have different access rights &c Unfortunately, this may not be good enough. What if the generated code does the comparison and substitution without you writing the comparison explicitly? I doubt there's anything in the standard prohibiting this. I've never heard of an optimizer doing this, but I don't read much compiler implementation literature; if I can conceive of it, so can actual compiler researchers. >and that a pointer must not be referred to in any way after it has been >free()d. Even ptr = NULL is too risky: what if the compiler generates a >sequence which uses a read-modify-write cycle with the old value being >loaded into an address register? -- remember how CLR works on a 68000... Uh oh, how are you ever going to assign something to an automatic pointer? Their initial value is undefined, so they might contain invalid addresses. I suspect the standard requires assignments into pointer variables to always work, regardless of the validity of the previous value; the compiler will have to avoid generating instructions that care what the old value is. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/12/89)
In article <2075@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: >My point is that optimising compilers generally assume that "equals may >be subsituted for equals" (that's what equality is all about), so that >*correct* source code is very likely to be incorrectly *optimised*. Any my point is that the code was NOT correct, because it relied on the unwarranted assumption that C pointers' associated ring numbers were preserved, while the C implementation specifically ignores the ring numbers, which should be a tip-off that they should not be relied on. I still maintain that the proper way to deal with this is via some implementation-specific code that explicitly knows about ring numbers.
cudcv@warwick.ac.uk (Rob McMahon) (09/15/89)
In article <29250@news.Think.COM> barmar@think.COM (Barry Margolin) writes: >>Yes, it would. But the compiler writer would have been _seduced_ into that >>mistake by the standard. People are encouraged to think of == as testing >>for EQUALITY. In dpANS C, it appears that == does *NOT* have the properties >>of equality, and at the very least this needs to be said clearly and >>explicitly in the Rationale. > >This is only true if there can actually be non-interchangeable >representations for pointers to the same location. I'd expect the compiler >implementor for a system to know whether this is true, and implement the >optimizer accordingly. Who remembers the Burroughs B6700, which had (in Algol) POINTER A, B; IF A = B THEN ... vs IF A IS B THEN ... compiling to A B EQUL (sp?) vs. A B SAME Sigh. Those were the days ... Rob -- UUCP: ...!mcvax!ukc!warwick!cudcv PHONE: +44 203 523037 JANET: cudcv@uk.ac.warwick ARPA: cudcv@warwick.ac.uk Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England
ok@cs.mu.oz.au (Richard O'Keefe) (09/15/89)
In article <206@titania.warwick.ac.uk>, cudcv@warwick.ac.uk (Rob McMahon) writes: > Who remembers the Burroughs B6700, which had (in Algol) > POINTER A, B; > IF A = B THEN ... > vs > IF A IS B THEN ... I remember the B6700, with considerable fondness. You have got it _wrong_. The difference between = and IS concerned _numbers_. The B6700 used 3 tag bits and 48 data bits (double it for double precision, where tag=2). The numeric representation was that floating point numbers were sign x (8 ** exponent) * mantissa.0 (octonary point on the _right_, so no hidden bit) and integers were just floats with exponent 0. There were 39 bits of mantissa, and the 48th bit of the word was ignored. (I believe this was for backwards compatibility with the B5500/5700, where the 48th bit was the tag bit.) '=' compared numbers for numeric equality, treating -0.0 the same as +0.0, ignoring the 48th bit, and normalising as required. 'IS' compared 48-bit words for bitwise identity. This was useful if you were using them as bit-vectors (bitwise operations on REALs gave you the full 48 bits) or to hold characters (6 EBCDIC characters in one word). If I remember correctly, the SAME instruction also compared tags. For a comparison, consider IBM/370 floating-point numbers. They are not necessarily normalised, and can represent both +0.0 and -0.0. Using the floating-point comparison instructions would be like '=', comparing /370 floats using integer comparisons would be like 'IS'.
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/15/89)
In article <206@titania.warwick.ac.uk> cudcv@warwick.ac.uk (Rob McMahon) writes: >Who remembers the Burroughs B6700, ... I do! It was a nice system. The architecture made it somewhat difficult to implement Fortran (and, I would suspect, C), but it was accomplished nevertheless. Burroughs gambled on Algol taking over, once it had been approved as an international standard. That may have occurred to some degree in Europe, but over here people followed IBM's lead..