[net.unix-wizards] NULL vs 0 - chapter and verse, and a reply to Charles LaBrec

guy@rlgvax.UUCP (Guy Harris) (01/21/84)

An interesting comment from "The C Reference Manual":

	14.4  Explicit pointer conversions

		A pointer may be converted to any of the integral types
	large enough to hold it.  *Whether an "int" or "long" is required
	is machine dependent.*  ("Italics" mine)

Note that this means that a C implementation with 16-bit "int"s and 32-bit
"long" is perfectly legal.  Whether it is the "right" way to do it is a valid
point of debate, but whether it is a legal way to do it is *not*.

However, the manual also says:

	7.4  Additive operators

	.
	.
	.
	If two pointers to objects of the same type are subtracted, the
	result is converted (by division by the length of the object) to
	*an "int"* ("italics" mine) representing the number of objects
	separating the pointed-to objects.

This sounds to me like:

Machine X has a C implementation with 16-bit "int"s and 32-bit pointers.
As such,

	long l;
	char *p;

	l = p;

assigns the 32 bits of p to the 32 bits of l, while

	int i;
	char *p;

	i = p;

may throw away bits.  *However*,

	char *p;
	char *q;

	foo(p - q);

must pass an "int", *not* a "long int", to "foo".  I don't know whether this
is what is implied or not.  This, if true, implies that there must not be
more than "maxint" items between any two pointers, which will probably break
lots of programs which subtract two arbitrary heap pointers - programs like
the UNIX kernel, various memory allocators, etc..

> What makes a program portable?  Adhering strictly to the C reference
> manual is the answer I'd give.  Since the manual states that 0 == NULL,
> I believe that's that.  It is up to the implementation to assure that
> this works.

Well, that isn't that.  First of all, the manual does not state that
"0 == NULL".  For one thing, since NULL is defined in <stdio.h> with a
pre-processor statement, "0 == NULL" expands to "0 == 0" which is a
tautology.  What the manual says is:

	7.14  Assignment operators

	.
	.
	.
	The compilers currently allow a pointer to be assigned to an
	integer (note: an integer, not an "int"), an integer to a pointer,
	and a pointer to a pointer of another type.  (This implies that
	it may not be required that compilers permit these assignments, in
	general.)  The assignment is a pure copy operation, with no
	conversion.  This usage is nonportable, and may produce pointers
	which cause addressing exceptions when used.  However, it is guaranteed
	that assignment of the constant 0 to a pointer will produce a null
	pointer distinguishable from a pointer to any object.

This implies to me that

	char *p;

	p = 1;

and

	char *p;

	p = 0;

are not the same sort of thing.  I take it to mean that a C implementation
need not represent null pointers as a bit string all of whose bits are zero,
and that if it doesn't assignment of 0 to a pointer is *not* done as a pure
copy; it causes the bit string which represents a null pointer to be stuffed
into the pointer.  On page 98 of "The C Programming Language", it says "In
general, integers cannot meaningfully be assigned to pointers; zero is a
special case" which supports this interpretation.

This takes care of pointer conversions in general.  For comparisons, the
manual says:

	7.7  Equality operators

	.
	.
	.
		A pointer may be compared to an integer, but the
	result is machine-dependent unless the integer is the
	constant 0.  A pointer to which 0 has been assigned is
	guaranteed not to point to any object, and will *appear to be
	equal to* 0 ("italics" mine); in conventional usage, such a pointer
	is considered to be null.

Again, I read this as saying that a 0 pointer, which is conventionally called
a null pointer, need not have the same bit pattern as an integer (note: not
an "int") with the value 0.

Furthermore, "(char *)0" is not the same as "0"; it is a 0 pointer of type
"char *".  As such,

	foo(0);

and

	foo((char *)0);

are not equivalent.  Period.  Note that "lint" agrees with me 100% in this
case; if it finds that "foo" has a "char *" as its first argument, it will
report a type clash for the first statement but not for the second.  All
pointers are not created equal; it just so happens that on *most* - NOT all -
implementations of C, they are all the same sort of beast.  I'm sure that
many, if not all, implementations on word-addressable machines treat them
as two very different types.  In fact, they could even have different sizes!

"foo(0)" could fail for several reasons, not just the 16-bit versus 32-bit
problem.  If a null pointer were represented as something like 0xff000000
(which is, I think, the representation of a NULL pointer in System/360 and
successors' PL/I implementations),

	foo(0);

is passing an "int" of value 0 to foo, which has the bit pattern 0x00000000
on the 360, while

	foo((char *)0);

is passing a "char *" of value 0 to foo, which has the bit pattern 0xff000000.

The problem here is that in all C contexts except for subroutine arguments,
the C language can determine what type an expression is to be converted to
and will do that conversion automatically.  Since there is *currently* not
way of declaring the type of arguments to a function, it assumes that the
programmer got it right and will not do such conversions implicitly.  The
programmer *must*, as a result, specify such conversions explicitly, by
writing "foo((char *)0)", in order to write a correct C program.  (They must
also declare the return value of routines correctly, which is another related
problem I've seen with a lot of code.)  There is some discussion of adding
the ability to declare the type of the arguments to a function to C, which
would obviate most of this problem (you could say "foo(0)", and the proper
bit pattern for a 0 pointer of type "char *" would be passed to "foo"),
although "execl" still would require a cast, as it takes a variable number
of arguments (something which is *not* explicitly mentioned in the C
Reference Manual, so a weird implementation is legal but would break so many
UNIX programs - like those that use "printf", "scanf", and "execl" - that
anybody who makes such an implementation better have a very good reason for
it).

> Offhand, I could not find anything in the manual that says that function
> arguments on the stack are no smaller than type int.  (I could have
> easily overlooked this, however.)  Couldn't machines with 32 bit pointers
> and 16 bit ints push 32 bits on the stack always.  This is analogous to
> how chars are done now.

	7.1  Primary expressions

	.
	.
	.
	A function call is...

	Any actual arguments of type "float" are converted to "double"
	before the call; any of type "char" or "short" are converted to
	"int"; and as usual, array names are converted to pointers.  No
	other conversions are performed automatically; *in particular,
	the the compiler does not compare the types of actual arguments
	with those of formal arguments.  If conversion is needed, use a
	cast*... ("Italics" mine)

That's where it says that function arguments ("on the stack" is an
implementation detail) are no smaller than type "int".  Machines with 32-bit
pointers and 16-bit "int"s *could* push the "int" and 16 bits of zero, but
this wouldn't do the right thing if a zero pointer didn't consist of 32
bits of zero - remember, this could be a tagged architecture, for instance.
If it were felt that the implementation should compensate for deficient
programmers (sic!), one would just have to pay a performance deficiency
for this - but the whole point of *having* 16-bit "int"s and 32-bit pointers
is to *avoid* the performance penalty of 32-bit quantities on machines that
don't support them as fully and efficiently as one might like (i.e., they
don't fully do 32-bit arithmetic, like the 68K, or they require two memory
fetches on a 16-bit bus).  Since such an implementation would solely be
compensating for programmers too lazy to run their code through "lint", and
who, as such, are writing code which is not guaranteed to be portable - in
fact, code which is probably *guaranteed not to be portable*, even if this
"feature" were put into 16-bit "int" and 32-bit pointer implementations -
there's no point in doing it.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy