[comp.lang.c] char

greg@utcsri.UUCP (01/01/70)

In article <253@xyzzy.UUCP> throopw@xyzzy.UUCP (Wayne A. Throop) writes:
>> franka@mmintl.UUCP (Frank Adams)
>> So does arithmetic on a null pointer produce undefined results?  I don't
>> have a copy of the proposed standard available, so I don't know what it
>> says.  This is about what it *should* say; if it doesn't, it should be
>> changed.
...
>on this point is that the standard does *NOT* say that arithmetic on the
>null pointer produces undefined results (contrary to my own
...
>> [it should be undefined because...]
>> So we have the general principle that pointer arithmetic should not be able
>> to adjust the value of the pointer outside the guaranteed "neighborhood" of
>> legal values near that pointer.

>> In the case of a null pointer, the only legal value in that neighborhood
>> is null itself; thus "(char *)0 + 1" produces an undefined result.
>> ("(char *)0 + 0" would be legal, and equivalent to "(char *)0".)

I agree that it should be illegal to do any kind of arithmetic on
a null pointer of any type.

All this stuff gets very interesting when you consider what happens on
an 80286 in its native 'protected' mode (as opposed to the 'fast 8086' mode
in which most of them are warming their sockets).

In this mode, a pointer is 32 bits; a 16-bit segment number, and a 16-bit
offset. It is meaningless to do arithmetic on the segment number since
it is just an index into a table maintained by the OS. Pointer
arithmetic as we know it in C affects only the offset.

The CPU supports a 'null' pointer as follows: Any pointer whose segment
part is zero is considered a null pointer. It is not legal to dereference
such a pointer, and it is not legal to load one into the stack-pointer
register pair (SS:SP) or the program-counter register pair (CS:IP).
Violations cause hardware traps.

It is legal to load a null pointer as a 'data pointer'. (What this really
means is that you can put 0 into DS and ES but not CS or SS). Thus the
code for incrementing a pointer, when given a null pointer, will always
produce a null pointer.

The other weird bit concerns the range of these pointers. The
compiler may assign a separate segment for every data object.
The segment has a size, and any reference to that segment beyond this
size causes a trap.
Suppose I declare 'int foo[10]', then I may get a 20-byte segment for
foo. Then &foo[10] is a pointer which is illegal to dereference.
This is good.
There are lots of bits of code like this:
	for( p = foo; p < &foo[10]; ++p ){
which cause p to be repeatedly compared to a constant invalid pointer until it
becomes an invalid pointer itself.  I can live with that.

What gets a little weird is this: pointer inequalities are done by comparing
only the offset part, since the comparison is invalid anyway if the segment
numbers are different. Also, offset arithmetic is done in 16 bits.  This
means that foo[-1] is not only an invalid pointer, but it will be 'greater
than' foo[0] since it will have an offset of 0xfffe. What this means is that
the following won't work:
	for( p = &foo[9]; p >= foo; --p ){	/* loops forever */
Furthermore, if I declare a 64K segment ( int foo64[32768] ), the (overflowed)
value of &foo64[32767] + 1 is the same as &foo64[0]. Thus not even this
will work:
	for( p = foo64; p <= &foo64[32767]; ++p ){	/* loops forever */

In order to avoid these problems, then, we need a class of pointers
which cannot be dereferenced but which can be used in comparisons.
It is sufficient that these pointers be restricted to the form (&x)+1,
where x is any valid data object. (&x)+1 > (&x) must always hold for
any data object x (which rules out a full 64k byte segment on a 286).
It would be nice if &x-1 were always less than &x, but that is not
possible under this segmentation scheme.

The ANSI standard must have something about such pointers. Do they
say roughly the same thing about them as I have in the preceding paragraph?

Sorry for all the blather, but I have noticed several previous
postings that have overlooked these considerations. These people
may never have to program on such an architecture, but it seems
like it isn't too much trouble to avoid constructs which won't port.
What I am looking for is a somewhat more concrete definition of
which constructs will and won't work.

[ e.g. what about: p = &foo[-1]; do{ ++p; ... }while( p <= &foo[9] );
 Does the first ++p cause p to be &foo[0]?
 Can I legally add 4123 to &foo[0], and if I then subtract 4120 do
 I get &foo[3]?
]
P.S. I am not a segment fan, but a pragmatist recently transplanted
to the real world ( arrggg! ).
-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

msb@sq.UUCP (08/11/87)

Regarding the code...

> >   main(a)
> >   char (*a)[];
> >   { a = 0; printf("a=0x%x\n", a); a++; printf("a=0x%x\n", a);	}

Wayne Throop writes:

> But the scariest thing about all this is that *none* *of* *my* *tools*
> *caught* *this* *bug*!!!!  Lint happily passed the program ...
> And the compiler didn't complain ...

Same on our machine, by the way.

> (By the way, for those of you who missed it, the program is illegal for
> the obvious reason that it increments a pointer to an object of unknown
> size,

Actually, *declaring* such a pointer is probably illegal.  The language
in K&R appendix A section 8.4 is a bit fuzzy, but seems to imply this;
and section 3.5.3.2 of the (Oct.'86) ANSI draft nails it down clearly.

> but *also* because it performs arithmetic on a null pointer, and
> of course, this is illegal.)

Um, I don't think so, Wayne; it's just that the result, if you indirect
through such a pointer, is undefined.  K&R is silent on this, but ANSI
3.3.6 seems pretty clear.  And here the pointer isn't being indirected through.

The OTHER thing that's wrong with the code is that a "%x" format is used
to print a pointer variable.  "%x" is used to print ints, or at least,
things that printf() can pretend are ints.  Pointers needn't be the
same size as ints.  It's much safer to do this:

	printf ("a=0x%lx\n", (long) a);

Then you get surprised only if the pointers won't even fit in a long.

ANSI has a better solution to this: the new format "%p".  (See 4.9.6.1).
On an ANSI compiler, you would write:

	printf ("a=%p\n", (void *) a);

and be guaranteed reasonable results.  But I don't think "%p" exists yet.

Mark Brader, utzoo!sq!msb			C unions never strike!

strouckn@nvpna1.UUCP (Louis Stroucken 42720) (08/12/87)

In article <1987Aug10.192923.7879@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
>
>Regarding the code...
>
>> >   main(a)
>> >   char (*a)[];
       [ discussion wether "a++;" should do something sensible ]
>
>Actually, *declaring* such a pointer is probably illegal.  The language
>in K&R appendix A section 8.4 is a bit fuzzy, but seems to imply this;
>and section 3.5.3.2 of the (Oct.'86) ANSI draft nails it down clearly.

I haven't got any ANSI draft here, so I'd better stay out of the
discussion, but:

Please note that "a" is a formal argument of main!!

K&R appendix A section 10.4 says on array arguments:
	...formal parameters declared "array of..." are adjusted to read
	"pointer to...".

The declaration of "a" might as well read "char **a;". "a++;" should
increment "a" with sizeof( char * ) bytes.

If I miss something, please let me know.

Louis Stroucken

UUCP:  ...!mcvax!philmds!prle1!nvpna1!strouckn

chris@mimsy.UUCP (Chris Torek) (08/12/87)

In article <234@nvpna1.UUCP> strouckn@nvpna1.UUCP (Louis Stroucken) writes:
>Please note that "a" is a formal argument of main!!

>K&R appendix A section 10.4 says on array arguments:
>	...formal parameters declared "array of..." are adjusted to read
>	"pointer to...".

>The declaration of "a" might as well read "char **a;". "a++;" should
>increment "a" with sizeof( char * ) bytes.

The type of `a' in `char (*a)[]' is `pointer to array <unspecified
size> of char'.  Aside from the fact that pointers to arrays of
unspecified size are illegal[1], this declaration is correct and
cannot be altered.  The adjustment is for formal parameters declared
`array ...', not `... array ...'.  The single ellipsis means that
the array type must come first.

-----
[1]This illegality is in fact unnecessary; a pointer to an array
   of unspecified size can be dereferenced.  It cannot be used in
   any pointer arithmetic except to add or subtract zero.  Nonetheless
   it was deemed illegal, and this loses nothing, since C does not
   have dynamic arrays.  (C has dynamic memory allocation, but what
   you get are flat blocks of address space, though they are not
   necessarily contained within a globally flat space.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

karl@haddock.ISC.COM (Karl Heuer) (08/13/87)

In article <1987Aug10.192923.7879@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
>Wayne Throop writes:
>>... but *also* because it performs arithmetic on a null pointer, and
>>of course, this is illegal.)
>
>Um, I don't think so, Wayne; it's just that the result, if you indirect
>through such a pointer, is undefined.  K&R is silent on this, but ANSI
>3.3.6 seems pretty clear.

"A.6.2 Undefined behavior: ... A pointer that is not to a member of an array
object is added to or subtracted from" [Oct86 dpANS].  A null pointer is an
extreme example of this.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

karl@haddock.ISC.COM (Karl Heuer) (08/13/87)

In article <234@nvpna1.UUCP> strouckn@nvpna1.UUCP (Louis Stroucken 42720) writes:
>[In the declaration "main(a) char (*a)[]; ..."] Please note that "a" is a
>formal argument of main!!  K&R [says] `...formal parameters declared "array
>of..." are adjusted to read "pointer to...".'

Which is irrelevant, since "a" is not an array.  It is a pointer to an array
(note the parens in the declaration), which is not converted.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

throopw@xyzzy.UUCP (Wayne A. Throop) (08/13/87)

> msb@sq.uucp (Mark Brader),   >> throopw@xyzzy.UUCP
>>>   main(a)
>>>   char (*a)[];
>>>   { a = 0; printf("a=0x%x\n", a); a++; printf("a=0x%x\n", a);	}
>> the program is illegal for
>> the obvious reason that it increments a pointer to an object of unknown
>> size,
> Actually, *declaring* such a pointer is probably illegal.  

True.  Draft X3J11 limits arrays of unknown size to formals and
externals.  In my opinion, this is wrong.

>> but *also* because it performs arithmetic on a null pointer, and
>> of course, this is illegal.
> Um, I don't think so, Wayne; it's just that the result, if you indirect
> through such a pointer, is undefined.  K&R is silent on this, but ANSI
> 3.3.6 seems pretty clear.  

Yes indeed.  Pretty clear:

    For addition, either both operands shall have arithmetic type, or
    one operand shall be a pointer to an object and the other shall have
    integral type.

I repeat: "pointer to an object".  The null pointer doesn't qualify.
Further, K&R are not totally silent on this point either.  I quote from
7.4 in the reference:

    A pointer to an object in an array and a value of any integral type
    may be added.

K&R therefore can be considered to be even *more* restrictive, since
they call out that the object must be a member of an array.  (Actually,
I think draft X3J11 calls this out also, but this result must be peiced
together from several readings of Holy Writ, and I don't want to be
thought to be in competition with The World Tomorrow TV show.)

> The OTHER thing that's wrong with the code is that a "%x" format is used
> to print a pointer variable. [...]
> On an ANSI compiler, you would write:
> 	printf ("a=%p\n", (void *) a);
> and be guaranteed reasonable results.  But I don't think "%p" exists yet.

Good point.

--
IBM manuals are written by little old ladies in Poughkeepsie who are
instructed to say nothing specific.
                                --- R. T. Lillington

drw@cullvax.UUCP (Dale Worley) (08/13/87)

strouckn@nvpna1.UUCP (Louis Stroucken 42720) writes:
# >> >   main(a)
# >> >   char (*a)[];
#        [ discussion wether "a++;" should do something sensible ]
# Please note that "a" is a formal argument of main!!
# The declaration of "a" might as well read "char **a;". "a++;" should
# increment "a" with sizeof( char * ) bytes.

And so, Louis is the only one to notice that Lint passes this code,
*because it is correct* (as far as static analysis goes)!  Now, let's
make the declaration local, rather than a parameter, and try again...

Dale (hey, I missed it too!)
-- 
Dale Worley	Cullinet Software		ARPA: cullvax!drw@eddie.mit.edu
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
OS/2: Yesterday's software tomorrow	    Nuclear war?  There goes my career!

bright@dataio.Data-IO.COM (Walter Bright) (08/13/87)

In article <234@nvpna1.UUCP> strouckn@nvpna1.UUCP (Louis Stroucken 42720) writes:
>In article <1987Aug10.192923.7879@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
>>Regarding the code...
>>> >   main(a)
>>> >   char (*a)[];
>       [ discussion wether "a++;" should do something sensible ]
>>
>>Actually, *declaring* such a pointer is probably illegal.
>Please note that "a" is a formal argument of main!!
>K&R appendix A section 10.4 says on array arguments:
>	...formal parameters declared "array of..." are adjusted to read
>	"pointer to...".
>
>The declaration of "a" might as well read "char **a;". "a++;" should
>increment "a" with sizeof( char * ) bytes.
>
>If I miss something, please let me know.

I'm letting you know :-)

The declaration:
	char (*a)[];
means:
	'a' is a pointer to an array of chars, the size of the array
	is unknown.
Since 'a' is not an "array of...", it is not adjusted to "pointer to..."
and is not equivalent to "char **a;".

The expression "a++" means "add the size of the array to the pointer 'a'".
Since the size of the array is unspecified, the compiler can't do it
in any 'unsurprising' way. Therefore, the attempt to do this should
be illegal.

Expressions of the form (*a)[n] are legal, however, since the compiler
does not need to know the size of the array to compile it.

john@caeco.UUCP (John Rigby) (08/14/87)

About the program
	main(a) 
	    char (*a)[];
	{ 
	    a = 0;
	    print("%x\n");
	    a++;
	    print("%x\n");
	}

in article <234@nvpna1.UUCP>, strouckn@nvpna1.UUCP (Louis Stroucken 42720) says:

> 
> K&R appendix A section 10.4 says on array arguments:
> 	...formal parameters declared "array of..." are adjusted to read
> 	"pointer to...".
> 
> The declaration of "a" might as well read "char **a;". "a++;" should
> increment "a" with sizeof( char * ) bytes.
> 
> If I miss something, please let me know.

"a" is NOT an array.  It is a pointer to an array.  As such, your argument 
is invalid.

in article <189@xyzzy.UUCP>, throopw@xyzzy.UUCP (Wayne A. Throop) says:
> 
> But the scariest thing about all this is that *none* *of* *my* *tools*
> *caught* *this* *bug*!!!!  Lint happily passed the program, as did other
> typecheckers.  And the compiler didn't complain (though on our system
> the output is
> 
>         a=0x0
>         a=0x1
> 

On my machine (Sun 3-260 running 3.2) both the compiler and lint give the
same warning:

	warning: zero-length array element

And the output is:

	a=0x0
	a=0x0

Which makes sence since the size is zero.

John Rigby		!utah-cs!caeco!john
CAECO Inc.
Salt Lake City, UT

franka@mmintl.UUCP (Frank Adams) (08/15/87)

In article <1987Aug10.192923.7879@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
>Wayne Throop writes:
>>... but *also* because it performs arithmetic on a null pointer, and
>>of course, this is illegal.)
>
>Um, I don't think so, Wayne; it's just that the result, if you indirect
>through such a pointer, is undefined.

Operations which produce undefined results are a special case of illegal
operations.  Specifically, they are illegal operations where the response of
the program can be anything.  Including, at the extremes, aborting with an
error message, or performing in some well-defined way as an implementation
extension.

On the other hand, I don't know of any programs (ala lint) which perform
flow analysis on C programs, and nothing less will detect this kind of bug.
Lint certainly cannot be expected to find it.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

throopw@xyzzy.UUCP (Wayne A. Throop) (08/21/87)

) john@caeco.UUCP (John Rigby)
) About the program
) 	main(a) char (*a)[]; { a = 0; print("%x\n"); a++; print("%x\n");}
) [...]
) On my machine (Sun 3-260 running 3.2) both the compiler and lint give the
) same warning:
) 	warning: zero-length array element
) [...] Which makes sence since the size is zero.

Well, no, not quite.  The size is unknown, which is not the same thing
as having size zero.  A small nit, to be sure, but mine own.

(On the other hand, it is a Good Thing to see that some instances of
 lint and/or other tools catch the thing and at least complain about it,
 however inaccurately.)
--
The best book on programming for the layman is "Alice in Wonderland";
but that's because it's the best book on anything for the layman.
                                --- Alan J. Perlis

DHowell.ElSegundo@Xerox.COM (08/22/87)

In article <1347@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter
Bright) writes:
>In article <234@nvpna1.UUCP> strouckn@nvpna1.UUCP (Louis Stroucken
42720) writes:
>>In article <1987Aug10.192923.7879@sq.uucp> msb@sq.UUCP (Mark Brader)
writes:
>>>Regarding the code...
>>>> >   main(a)
>>>> >   char (*a)[];
>>       [ discussion wether "a++;" should do something sensible ]
>>>
>>>Actually, *declaring* such a pointer is probably illegal.
>>Please note that "a" is a formal argument of main!!
>>K&R appendix A section 10.4 says on array arguments:
>>	...formal parameters declared "array of..." are adjusted to read
>>	"pointer to...".
>>
>>The declaration of "a" might as well read "char **a;". "a++;" should
>>increment "a" with sizeof( char * ) bytes.
>>
>>If I miss something, please let me know.
>
>The declaration:
>	char (*a)[];
>means:
>	'a' is a pointer to an array of chars, the size of the array
>	is unknown.
>Since 'a' is not an "array of...", it is not adjusted to "pointer
to..."
>and is not equivalent to "char **a;".
>
>Expressions of the form (*a)[n] are legal, however, since the compiler
>does not need to know the size of the array to compile it.

I'm confused.

Suppose I declare:

char (*a)[10];
char b[10];

Now b is an array of size 10 of char, and a is a pointer to an array of
size 10 of char.  So this means I should be able to say:

a = &b;

However, as I understand it, b is actually &b[0], which means a gets set
to &&b[0], which I'm not sure makes any sense at all.

What exactly does a point to?  Does it point to the first element of an
array? Does it point to a descriptor of an array?  How would I assign
anything useful to a, if I can't use the above type of assignment?  Or
is the assignment valid?  If so what is the meaning of &b?

Dan <DHowell.ElSegundo@Xerox.COM>

guy%gorodish@Sun.COM (Guy Harris) (08/22/87)

> So this means I should be able to say:
> 
> a = &b;
> 
> However, as I understand it, b is actually &b[0], which means a gets set
> to &&b[0], which I'm not sure makes any sense at all.

Well, given the current way array names (or array-valued expressions) are
treated, it doesn't.  PCC will treat "&b" as being equivalent to "b".

However, in the langugage described by the current ANSI C draft standard, the
"b is actually &b[0]" rule does not apply in certain contexts; one such context
is that of an operand of "&".  So, in ANSI C, "&b" is valid, and does make
sense; it is a pointer to the array "b", as opposed to being a pointer to the
first member of that array.

(It is a trivial change to PCC to implement this; you just delete a couple of
lines in "cgram.y", namely the one that converts the type of "&b" to "pointer
to <element of b>" and the one that prints a warning message telling you it is
doing so.)
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

chris@mimsy.UUCP (Chris Torek) (08/22/87)

In article <8942@brl-adm.ARPA> DHowell.ElSegundo@Xerox.COM writes:
>I'm confused.

With good reason.

>Suppose I declare:
>
>char (*a)[10];
>char b[10];
>
>Now b is an array of size 10 of char, and a is a pointer to an array of
>size 10 of char.

Right so far.

>So this means I should be able to say:
>
>a = &b;

And confusion strikes.  K&R C makes &b illegal.

>However, as I understand it, b is actually &b[0], which means a gets set
>to &&b[0], which I'm not sure makes any sense at all.

This is per K&R, and compilers that implement K&R C.  &b is illegal,
and in PCC, is ignored with a warning, such that the compiler `sees'

	a = b;

which is a type mismatch (<pointer to array 10 of char> = <pointer
to char>).

Other C definitions, in particular draft ANS X3J11, make &b legal,
defining it to produce an rvalue of type <pointer to array 10 of
char> and having as its value the address of the entire array,
whatever that means in the implementation.  Given that other parts
of C demand that each array be stored in a flat linear address
space (that may nonetheless be disjoint from any or all other flat
linear address spaces), this value will probably be the same as
that of the address of the first element of b.

>What exactly does a point to?

This is machine- and implementation-dependent.

>Does it point to the first element of an array?

Quite possibly.

>Does it point to a descriptor of an array?

This is possible but unlikely.

>How would I assign anything useful to a, if I can't use the above
>type of assignment?

You could write, e.g.,

	a = (char (*)[10]) malloc(sizeof (char [10]));
	if (a == NULL) ...
	(*a)[9] = 'c';
/*or*/	strcpy(a[0], "text");	/* < 10 characters */

This does not entirely rule out an implementation that creates
array descriptors, but makes things difficult for such.  Note that

	a = (char (*)[10]) malloc(5 * sizeof (char [10]));

is also legal, and creates something that can be addressed as

	a[4][9] = 'c';

or

	for (i = 0; i < 5; i++)
		strcpy(a[i], "text");	/* < 10 characters each */

A compiler that creates array descriptors will have quite a job
dealing with these.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

francus@cheshire.columbia.edu (Yoseff Francus) (08/24/87)

In article <8942@brl-adm.ARPA> DHowell.ElSegundo@Xerox.COM writes:
>I'm confused.
>
>Suppose I declare:
>
>char (*a)[10];
>char b[10];
>
>Now b is an array of size 10 of char, and a is a pointer to an array of
>size 10 of char.  So this means I should be able to say:
>
>a = &b;
>
>However, as I understand it, b is actually &b[0], which means a gets set
>to &&b[0], which I'm not sure makes any sense at all.
>
>What exactly does a point to?  Does it point to the first element of an
>array? Does it point to a descriptor of an array?  How would I assign
>anything useful to a, if I can't use the above type of assignment?  Or
>is the assignment valid?  If so what is the meaning of &b?
>
>Dan <DHowell.ElSegundo@Xerox.COM>

Since b is the name of an array it is considered to be a constant, and
you cannot use the & operator on a constant. The assignement you
want is simply 

a = b;

Be careful though, since a++ will not move to the next character, but
rather will jump forward by 10*sizeof(char).

******************************************************************
yf
In Xanadu did Kubla Khan a stately pleasure dome decree
But only if the NFL to a franchise would agree.

ARPA: francus@cs.columbia.edu
UUCP: seismo!columbia!francus

karl@haddock.ISC.COM (Karl Heuer) (08/24/87)

In article <8942@brl-adm.ARPA> DHowell.ElSegundo@Xerox.COM writes:
>char (*a)[10];  char b[10];  a = &b;
>
>However, as I understand it, b is actually &b[0],

A better way to state this rule is something like "the array-valued expression
b, if used in a rvalue context, will automatically be converted to the
pointer-valued expression which is conceptually &b[0]".  Since the premise is
false, the rule does not apply in this situation.

(Others have already mentioned that PCC interprets &b as a typo for b, and that
ANSI has fixed this, making the above code legal and useful.)

>What exactly does a point to?  Does it point to the first element of an
>array?  Does it point to a descriptor of an array?

As stated by its declaration, a points to an array.  Thus, if you dereference
it, you get an array (which, if used in an rvalue context, will be converted
to a pointer).  Your other two questions don't make sense, unless you rewrite
them as "If I cast a into a different pointer type and dereference it, will I
get ...?".  In this form, the answers are likely to be "Yes" and "No",
respectively; but I strongly discourage that type of code.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

mouse@mcgill-vision.UUCP (der Mouse) (08/26/87)

In article <2310@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes:
> In article <1987Aug10.192923.7879@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
>> Wayne Throop writes:
>>> ... but *also* because it performs arithmetic on a null pointer,
>>> and of course, this is illegal.)
>> Um, I don't think so, Wayne; it's just that the result, if you
>> indirect through such a pointer, is undefined.
> Operations which produce undefined results are a special case of
> illegal operations.

Yes.  But arithmetic on a null pointer is not what Mark was saying
produces undefined results, it's indirecting through the result.  And
the sample program didn't indirect through the pointer after it did the
arithmetic.

(Of course, arithmetic on a null pointer may *also* be illegal, or
produce undefined results, I don't know for sure.  But that's not my
point.)

					der Mouse

				(mouse@mcgill-vision.uucp)

franka@mmintl.UUCP (Frank Adams) (09/08/87)

In article <871@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse) writes:
>Yes.  But arithmetic on a null pointer is not what Mark was saying
>produces undefined results, it's indirecting through the result.
>
>(Of course, arithmetic on a null pointer may *also* be illegal, or produce
>undefined results, I don't know for sure.  But that's not my point.)

A good point.

So does arithmetic on a null pointer produce undefined results?  I don't
have a copy of the proposed standard available, so I don't know what it
says.  This is about what it *should* say; if it doesn't, it should be
changed.

Arithmetic on a null pointer should produce an undefined result.

To see this, first consider arithmetic on non-null pointers.  Given a
pointer to an element of an array a of size N.  Arithmetic on such a pointer
should not produce a pointer to anything outside the range a[0] to a[N].
That is, doing so should produce an undefined result.  This is because such
an operation may produce an arithmetic overflow in some cases; and
implementations should be able to enable interrupts for arithmetic overflow
without having to generate special code for pointer arithmetic.

So we have the general principle that pointer arithmetic should not be able
to adjust the value of the pointer outside the guaranteed "neighborhood" of
legal values near that pointer.

In the case of a null pointer, the only legal value in that neighborhood
is null itself; thus "(char *)0 + 1" produces an undefined result.
("(char *)0 + 0" would be legal, and equivalent to "(char *)0".)

As further evidence for this point of view, I note that there could be
machines where the hardware traps attempts to increment the null pointer.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

throopw@xyzzy.UUCP (09/12/87)

> franka@mmintl.UUCP (Frank Adams)
> So does arithmetic on a null pointer produce undefined results?  I don't
> have a copy of the proposed standard available, so I don't know what it
> says.  This is about what it *should* say; if it doesn't, it should be
> changed.

The conclusion reached by several people I know who studied the wording
on this point is that the standard does *NOT* say that arithmetic on the
null pointer produces undefined results (contrary to my own
interpretation).  Most of them, however, agree that it *OUGHT* to make
this undefined.

> [it should be undefined because...]
> So we have the general principle that pointer arithmetic should not be able
> to adjust the value of the pointer outside the guaranteed "neighborhood" of
> legal values near that pointer.
> In the case of a null pointer, the only legal value in that neighborhood
> is null itself; thus "(char *)0 + 1" produces an undefined result.
> ("(char *)0 + 0" would be legal, and equivalent to "(char *)0".)

I think this is not sufficent.  The operation of adding zero to the null
pointer should be illegal, because it is not a member of a neighborhood
at all, let alone a neighborhood of one element.  That is, the value
"null" is not a member of an ordered set of addresses upon which
arithmetic is meaningful.

> As further evidence for this point of view, I note that there could be
> machines where the hardware traps attempts to increment the null pointer.

And there could be machines where the hardware traps attempts to add
zero to the null pointer as well, so that should be undefined also.

--
To understand a program you must become both the machine and the program.
                                        --- Alan J. Perlis
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw

cabo@tub.UUCP (09/14/87)

Several people have argued that arithmetic on null pointers should be
disallowed in the C standard on the grounds that hardware may trap
attempts to alter the special cookie that implements a null pointer.

While I agree that

	char *p = 0;
	p++;

should not be allowed by the standard, I see some benign applications
for constant expressions involving null pointers, e.g.

	(char *)&((struct foo *)0)->bar - (char *)&((struct foo *)0)->baz

for computing the relative offset of two structure members in character
sized units.  I'm not saying that the above is a constant expression
according to the wording of the draft (I don't have it, unfortunately),
but I would like it to be one.

The alternative, declaring a dummy object (or worse, a dummy pointer
that would have to be initialized via malloc()) just for being able to
reference its members, doesn't appeal to me at all (is this PL/1?).

Carsten
--
Carsten Bormann, <cabo@tub.UUCP> <cabo@db0tui6.BITNET> <cabo@tub.BITNET>
Communications and Operating Systems Research Group
Technical University of Berlin (West, of course...)
Path: ...!pyramid!tub!cabo from the world, ...!unido!tub!cabo from Europe only.

guy@sun.uucp (Guy Harris) (09/15/87)

> The ANSI standard must have something about such pointers. Do they
> say roughly the same thing about them as I have in the preceding paragraph?

Yes.  Pointer ineqalities are only valid for pointers that "are members of the
same aggregate object", with one exception: "If P points to the last member of
an array object, the pointer expression P+1 compares higher than P, even
though P+1 does not point to a member of the array object."  (3.3.8 Relational
operators).
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

kent@xanth.UUCP (Kent Paul Dolan) (09/18/87)

In article <5391@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>All this stuff gets very interesting when you consider what happens on
>an 80286 in its native 'protected' mode (as opposed to the 'fast 8086' mode
>in which most of them are warming their sockets).
>
>In this mode, a pointer is 32 bits; a 16-bit segment number, and a 16-bit
>offset. It is meaningless to do arithmetic on the segment number since
>it is just an index into a table maintained by the OS. Pointer
>arithmetic as we know it in C affects only the offset.
>The other weird bit concerns the range of these pointers. The
>compiler may assign a separate segment for every data object.
>The segment has a size, and any reference to that segment beyond this
>size causes a trap.
>Suppose I declare 'int foo[10]', then I may get a 20-byte segment for
>foo. Then &foo[10] is a pointer which is illegal to dereference.
>This is good.
>There are lots of bits of code like this:
>	for( p = foo; p < &foo[10]; ++p ){
>which cause p to be repeatedly compared to a constant invalid pointer until it
>becomes an invalid pointer itself.  I can live with that.
>
>What gets a little weird is this: pointer inequalities are done by comparing
>only the offset part, since the comparison is invalid anyway if the segment
>numbers are different. Also, offset arithmetic is done in 16 bits.  This
>means that foo[-1] is not only an invalid pointer, but it will be 'greater
>than' foo[0] since it will have an offset of 0xfffe. What this means is that
>the following won't work:
>	for( p = &foo[9]; p >= foo; --p ){	/* loops forever */
>
>The ANSI standard must have something about such pointers. Do they
>say roughly the same thing about them as I have in the preceding paragraph?
>

I love it!

OK, X3J11, the ball's in your court.  Do we teach every C programmer
on every architecture in the world that decrementing pointer loops are
a no-no, and break half the code in existance, or do we finally bite
the bullet and decide that compiler writers for brain dead
architectures, and not the whole C community, pay the penalty for bad
hardware designs?  Especially since these folks are often the
perpetrators of the bad hardware design.

Obviously, at a cost in execution speed, several system trap calls and
whatever, the pointer can be converted to an integer, the arithmetic
done, and the result converted back to a (possibly now illegal)
pointer; if the code, as above, doesn't need the pointer, just the
arithmetic result for a comparision, then the programmer can go on
writing high level language code instead of worrying about boobosities
in the hardware.

If people who chose such obscenities to build their systems around
were made responsible for making them work like normal computers,
instead of having the whole rest of the world "fill the C compiler
with 'if' kludges" (from another posting), the forces of evolution
would consign these duds to the recycling heap in a heartbeat.  It is
only by continuing to cater to the inanities of some hardware (and
software, I suppose) designers that we are forced to continue to
suffer their stupidities in the systems we use.

Yes, I suffer fools.  But I _do_ suffer, and so do you.

Kent, the man from xanth.

"His expression lit up.  'Hey, you wouldn't be a dope smuggler, would you?'

Rail looked confused.  'Why would anyone wish to smuggle stupidity when
there is so much of it readily available?'"

		-- Alan Dean Foster, GLORY LANE

chris@mimsy.UUCP (09/20/87)

>>	for( p = &foo[9]; p >= foo; --p ){	/* loops forever */

In article <2474@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>I love it!
>
>OK, X3J11, the ball's in your court.  Do we teach every C programmer
>on every architecture in the world that decrementing pointer loops are
>a no-no, and break half the code in existance, or do we finally bite
>the bullet and decide that compiler writers for brain dead
>architectures, and not the whole C community, pay the penalty for bad
>hardware designs?  Especially since these folks are often the
>perpetrators of the bad hardware design.

Unfortunately for you, it has already been considered, and the
answer is that

	for (p = foo; p < &foo[10]; p++)

is portable, and compilers must allow for it, but

	for (p = &foo[9]; p >= foo; p--)

is not, and compilers need not allow for it.  This does not `break
half the code in existence' except when one tries to run it on an
architecture in which &foo[-1] >= &foo[0] (where---surprise!---it
already fails).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

rbutterworth@orchid.UUCP (09/20/87)

In article <8655@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
>     for (p = foo; p < &foo[10]; p++)
> is portable, and compilers must allow for it, but
>     for (p = &foo[9]; p >= foo; p--)
> is not, and compilers need not allow for it.  This does not `break
> half the code in existence' except when one tries to run it on an
> architecture in which &foo[-1] >= &foo[0] (where---surprise!---it
> already fails).

Using
    for (p = &foo[9]; p != foo; p--)
gets around this problem of the incorrect inequality.

The question is, does an expression, such as "p--", that generates
an illegal address violate the standard if that address is never
used as an address?

e.g. given "p = &foo[-1];", it is obviously wrong to use "*p",
but is there hardware for which the assignment itself would cause
a machine fault (assuming "p" is not declared as "register")?

If there aren't any such machines, why does this assignment violate
the ANSI standard?

greg@gryphon.CTS.COM (Greg Laskin) (09/21/87)

In article <2474@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>In article <5391@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>>All this stuff gets very interesting when you consider what happens on
>>an 80286 in its native 'protected' mode ...
>>	for( p = &foo[9]; p >= foo; --p ){	/* loops forever */
>
>OK, X3J11, the ball's in your court.  Do we teach every C programmer
>on every architecture in the world that decrementing pointer loops are
>a no-no, and break half the code in existance, or do we finally bite
>the bullet and decide that compiler writers for brain dead
>architectures, and not the whole C community, pay the penalty for bad
>hardware designs?  Especially since these folks are often the
>perpetrators of the bad hardware design.
>

Given:
	extern struct FOO foo[];

Assume:
	sizeof struct FOO == n+1
	foo == (struct FOO *) n;

where n is any address in any linear address space.

Does the example code loop forever?
If so, does this mean that linear address spaces are brain-dead or that
the code is broken?

&
-- 
Greg Laskin   
"When everybody's talking and nobody's listening, how can we decide?"
INTERNET:     greg@gryphon.CTS.COM
UUCP:         {hplabs!hp-sdd, sdcsvax, ihnp4}!crash!gryphon!greg
UUCP:         {philabs, scgvaxd}!cadovax!gryphon!greg

gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/21/87)

In article <48400001@tub.UUCP> cabo@tub.UUCP writes:
>	(char *)&((struct foo *)0)->bar - (char *)&((struct foo *)0)->baz
>for computing the relative offset of two structure members in character
>sized units.  I'm not saying that the above is a constant expression
>according to the wording of the draft (I don't have it, unfortunately),
>but I would like it to be one.

Unfortunately, X3J11 cannot guarantee that such use of null pointers
would be portable.  However, they have specified an offsetof() macro
to accomplish what you're trying to do; it's still being debated and
may change or (unlikely) vanish before the second public review.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/21/87)

In article <2474@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>OK, X3J11, the ball's in your court.  Do we teach every C programmer
>on every architecture in the world that decrementing pointer loops are
>a no-no, and break half the code in existance, or do we finally bite
>the bullet and decide that compiler writers for brain dead
>architectures, and not the whole C community, pay the penalty for bad
>hardware designs?  Especially since these folks are often the
>perpetrators of the bad hardware design.

A pointer to the [-1]st element of an array may well have an address
CONSIDERABLY smaller than that of the [0]th element, if the array
elements are large.  This is not a problem for the [N+1]st element.
Therefore, there is only a small penalty in requiring implementations
to ensure that [N+1] pointers have valid addresses (generally only
wastes at most one word of "slop" space per segment), but there would
be an unacceptably large penalty with requiring that a pointer to the
[-1]st element of an array have a valid address.

This was in fact discussed by X3J11 and general agreement reached to
require [N+1] pointer validity, but not [-1] pointer validity.  This
permits one common form of somewhat sloppy coding, but not the other.
Please note that the outlawed form was NEVER safe; I've seen it break
in a PDP-11 implementation of bsearch() (due to data space address
wrap-around), for example.  It is not within X3J11's power to "somehow"
make the unworkable work.

And yes, please teach every C programmer in the world how to write
reliable code.  Thanks in advance.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/21/87)

In article <10758@orchid.waterloo.edu> rbutterworth@orchid.waterloo.edu (Ray Butterworth) writes:
>Using
>    for (p = &foo[9]; p != foo; p--)
>gets around this problem of the incorrect inequality.

	for ( p = &foo[LIMIT]; p-- != foo; )
would be more correct, since it includes the case of &foo[0]
and excludes the case right after the end of the array.

>The question is, does an expression, such as "p--", that generates
>an illegal address violate the standard if that address is never
>used as an address?

Not according to current wording.  This was reaffirmed at the
Framingham meeting, in the course of revising related wording.