[comp.compilers] Pointers in C

pfeiffer@herve.cs.wisc.edu (Phil Pfeiffer) (07/06/88)

Before other comp.compiler readers are quick to point out that my posting about
C's semantics was not totally accurate:

When I posted that C's semantic model allowed unconstrained use of pointers,
I said this based on my experience with the Unix C compiler, and did not
double-check K&R before posting.  My mistake.  I received two communiques
today from Bob Larson (blarson%skat.usc.edu@oberon.usc.edu) that I'd like
to pass along (with his permission) before other comp.compiler readers
correct me, as well.

> But C does constrain pointer arithmatic to the bounds of the array.
> (ANSI will allow the address folowing the array to be calculated but
> not referenced.)  Most compilers don't enforce this, but it is there
> in K&R ....
>
>K&R 1, page 98:
>"But all bets are off if you do arithmetic or comparisons with pointers
>pointing to different arrays.  If you're lucky, you'll get obvious
>nonsense on all machines.  If you're unlucky, your code will work on one
>machine but collapse mysteriously on another."
>
>This doesn't seem to be restated in the refernce manual section.
>
>My copy of K&R 2 is elsewhere, but I'm pretty sure the restriction still
>holds.  (My info on ANSI C is mostly from comp.lang.c and comp.std.c,
>so is less than perfectly reliable.)

Also, on page 90 of K&R (version 1):

"You should also note the implications in the declaration that a pointer is
constrained to point to a particular kind of object."

I guess this is why formal language specification and compiler validation
were invented.


-- Phil
[From Phil Pfeiffer <pfeiffer@herve.cs.wisc.edu>]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

louie@trantor.umd.edu (Louis A. Mamakos) (07/06/88)

i use a compiler which really enforces the rule of comparing (or operating)
on two pointers pointing at the same object.  The compiler is for a Unisys 1100
mainframe, which is a ones-complement, 36 bit, word addressable machine.  This
thing is a walking validation test for well written C programs.

Its pointers are generally 2 words (8 9-bit bytes) long, except for pointer to
functions which are 8 words (64 9-bit bytes) long.  Even with all of the
obvious adversities, it is not difficult to write C code.  You have to watch
out for code which thinks that pointers can be put into ints (or longs) and
then back again, but other than that, things work.

You will also see problems like this, I suspect, on 8086 type segmented
architectures.  Or so it would seem; I don't use such things.


Now, if I can just convince people that 

			-1 <> ~0

you see, on a one's complement computer you have both +0 and -0 and

			-0 == ~0

Fun stuff.

Louis A. Mamakos  WA3YMH    Internet: louie@TRANTOR.UMD.EDU
University of Maryland, Computer Science Center - Systems Programming
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

mrspock@hubcap.clemson.edu (Steve Benz) (07/06/88)

From article <1262@ima.ISC.COM>, by pfeiffer@herve.cs.wisc.edu (Phil Pfeiffer):
> ...my posting about C's semantics was not totally accurate...
> 
>>K&R 1, page 98:
>>"But all bets are off if you do arithmetic or comparisons with pointers
>>pointing to different arrays.  If you're lucky, you'll get obvious
>>nonsense on all machines.  If you're unlucky, your code will work on one
>>machine but collapse mysteriously on another."

To combine this with what was said before:  An optimizer can assume
that once a pointer, 'p', is assigned to an address within an array, 'a',
'p' will stay within the bounds of 'a' until assigned to an address
within some other array or in the heap space.

While this assumption is valid (according to the book,) it is also true
that the optimized code may behave differently than unoptimized code,
but only for programs that are not in the domain of valid K&R C programs.

Nevertheless, I can think of a rather large set of programs that
aren't "valid K&R C programs":  All those programs that use varargs.
(At least by the definitions of varargs that I've seen.)

> Also, on page 90 of K&R (version 1):
> "You should also note the implications in the declaration that a pointer is
> constrained to point to a particular kind of object."

I think a C compiler would be very hard pressed to guarantee this sort
of thing, with the cast operation lurking about.  The only way to absolutely
guarantee this is to do some unpleasantly complicated typechecking on
every object in memory.

Insofar as mainstream C compilers go, I think you have to accept Phil's
original assertion about pointer operations -- at least within the heap
space.  If you can circumvent the varargs problem, you might be able to
get by with the K&R definition in the stack space.

				- Steve Benz
[From ]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

daveb@geac.uucp (David Collier-Brown) (07/13/88)

In article <2109@hubcap.UUCP> mrspock@hubcap.clemson.edu (Steve Benz) writes:
[discussion of binding of pointers to arrays/heap]
|Nevertheless, I can think of a rather large set of programs that
|aren't "valid K&R C programs":  All those programs that use varargs.
|(At least by the definitions of varargs that I've seen.)

Yes, it is hard to make any usefull assumptions about a pointer being
used for varargs processing, or for wandering through a
core/executable file in a debugger (the other "fun" example I run
into often).

|> Also, on page 90 of K&R (version 1):
|> "You should also note the implications in the declaration that a pointer is
|> constrained to point to a particular kind of object."
|
|I think a C compiler would be very hard pressed to guarantee this sort
|of thing, with the cast operation lurking about.  The only way to absolutely
|guarantee this is to do some unpleasantly complicated typechecking on
|every object in memory.

  In some sense, that's what an optimizing compiler tends to have to
do.  In tightly-typed languages, there's more information about the
use of a pointer (and syntactic sugar), but one can extract the
required information from C in many of the common cases, based on 
the **particular** optimization being applied.

  
Let's look at the example that started this discussion thread: a
pointer in a copy routine which, because it can point anywhere, is
proposed to invalidate my register history... (I'm assuming that the
optimizatio in question is fetch minimization).

  If the pointer is kept "within bounds" by the copy routine, then
we can assume that passing the pointer to the routine has exactly
the same effect as a local assignment through the pointer. 

  If the pointer is not being kept within bounds, both the
non-optimized copy code and the optimized calling code fails.

  But it fails in a manner which does not involve the optimization!
If a function scribbles on memory, and I'm trying to do
fetch-minimization from memory, it doesn't matter if the fetch
minimization is rendered incorrect, because the fetch **itself** has
been rendered incorrect by the scribbler. 

  You can apply this same kind of argument to a number of common
cases of optimization in the presence of errors, to discover which
optimizations are independent of/orthogonal to the error... And
apply more optimizations than expected on first glance.  (In
a real sense, you're doing complicated mental typechecking on the
operations when you write the optimizer).

  In my global pessimizer, I assume all registers and all stack
entries are invalid at all times, and so generate code which
depends only on read-only data in the linkage segment and random
numbers in the heap.

--dave (whats a correctness?) c-b

  
-- 
 David Collier-Brown.  {mnetor yunexus utgpu}!geac!daveb
 Geac Computers Ltd.,  |  Computer science loses its
 350 Steelcase Road,   |  memory, if not its mind,
 Markham, Ontario.     |  every six months.
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request