[comp.lang.c] null pointers

kent@ncoast.UUCP (Kent Williams) (11/14/86)

RE : The Use of NULL

Maybe I'm ignorant, but the use of NULL is a constant source of pain for which
it seems there is a simple solution to wit

/* local.h - include after other include modules */
#ifdef HASVOID
#define NULL ((void *)0)
#else
#define NULL 0L
#endif

. . .

If NULL is typed to have the size of the largest possible object that it
will be assigned to, it will be 'narrowed' to fit into whatever object you
are assigning it to.

As a matter of style, NULL for me seems to be something that should only be
used as a pointer object.  I get sick to my stomach every time I see

int stupid(cp)
char *cp;
{
	*cp = NULL;
}

Do they REALLY mean they want a 0 poked into the character pointed to by cp,
or did they mean cp = NULL, or what?

If you have void type, then a void pointer should be a 'universal pointer,'
i.e. assignable to any pointer variable, but un-dereference-able without a cast
to a non-void type.

Also, malloc and calloc should be of the type void pointer, so that you
don't get invalid pointer assignment complaints from compilers.  It seems
supremely asinine that Microsoft C complains about

struct nameless x;
x = malloc(sizeof(struct nameless));

All of the above suggestions should be very portable - if they're not, let me
know why.

AND another thing - why isn't it standard practice to bracket the standard
include files with the following?

/* stdio.h */
#ifndef STDIO_H_INCLUDED
#define STDIO_H_INCLUDED
/* rest of stdio.h */
#endif

This would mean that you could re-include things without complaints from
the pre-processor and compiler.

These opinions are my own, and reflect the views only of me.

desj@brahms (David desJardins) (11/16/86)

In article <1696@ncoast.UUCP> kent@ncoast.UUCP (Kent Williams) writes:
>[...]  It seems supremely asinine that Microsoft C complains about
>
>struct nameless x;
>x = malloc(sizeof(struct nameless));

   Maybe I'm confused, but this seems completely wrong (you are assigning
a pointer to an object).  Do you perhaps mean

struct nameless *x;

?  If so, you should really write

x = (struct nameless *) malloc (sizeof (struct nameless));

as this is what casts are for.  It doesn't seem asinine at all to give a
warning about this.

   -- David desJardins

chris@mimsy.UUCP (Chris Torek) (11/21/86)

In article <1696@ncoast.UUCP> kent@ncoast.UUCP (Kent Williams) writes:
>... the use of NULL is a constant source of pain for which
>it seems there is a simple solution to wit
>
>#ifdef HASVOID
>#define NULL ((void *)0)
>#else
>#define NULL 0L
>#endif

This is neither necessary nor sufficient:

>If NULL is typed to have the size of the largest possible object that it
>will be assigned to, it will be 'narrowed' to fit into whatever object you
>are assigning it to.

If you are using `0' (the present proper definition for NULL) in
an assignment or comparison context, the narrowing or widening or
implicit cast conversion or whatever-is-required is done automatically.
If you are using 0 in some other context, the information as to
just what narrowing or widening or implicit cast conversion or
whatever-is-required is unavailable.  It is not difficult to add
an assignment context by using an explicit cast:

	foo((char *) NULL);	 /* or foo((char *) 0); */

In C++, or some other C-like language with function prototypes, the
prototype can provide the context:

	extern void foo(char *s);
	...
	foo(NULL);		/* or foo(0); */

In either case, an explicit cast is still legal, and may help the
reader as well as the compiler.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

jdb@mordor.ARPA (John Bruner) (11/21/86)

>#ifdef HASVOID
>#define NULL ((void *)0)
>#else
>#define NULL 0L
>#endif

There is no need to define NULL to be anything other than 0.
Defining NULL as (void *)0 or 0L to avoid casts in function calls
detracts from portability.

In all contexts except arguments to functions, the integer constant 0
will be converted to the appropriate representation for a nil pointer.
This may mean that it is widened (e.g. on a 68000 whose C compiler uses
16-bit "int"s), or there may be some other conversion.

When a nil pointer is to be passed as an argument to a function, NULL
must be cast to the appropriate type.  (This assumes that there is no
function prototype, which is true for the vast majority of C compilers
available today.)  It is not sufficient to pass (void *)0, because
there is no guarantee that pointers to different-sized objects have
the same representation.

It is worse to use 0L when a pointer argument to a function is expected.
There are implementations [e.g. a PDP-11 running version 7] where
sizeof(anything *) < sizeof(long).  The only way to ensure that the
correct amount of information, in the correct format, is passed, is
to cast the argument when the function is called.
-- 
  John Bruner (S-1 Project, Lawrence Livermore National Laboratory)
  MILNET: jdb@mordor [jdb@s1-c.ARPA]	(415) 422-0758
  UUCP: ...!ucbvax!decwrl!mordor!jdb 	...!seismo!mordor!jdb

msb@sq.uucp (Mark Brader) (05/26/88)

> An interesting point: ANSI does not define (at least not anywhere
> I can find it) the result of `x == y' when x and y are both null
> pointers.

No, this bug (which I, at least, pointed out in the first-round public
comments) has been fixed in the January 1988 draft.  I may as well quote the
paragraph preceding the new one, as well.  From section 3.2.2.3 (on page 38):

# An integral constant expression with the value 0, or such an expression
# converted to type void *, is called a "null pointer constant".  If a null
# pointer constant is assigned to or compared for equality to a pointer,
# the constant is converted to a pointer of that type.  Such a pointer,
# called a "null pointer", is guaranteed to compare unequal to a pointer
# to any object or function.
#
# Two null pointers, converted through possibly different sequences of
# casts to pointer types, shall compare equal.

By the way, section 4.1.5 guarantees on page 99 that the macro NULL,
defined in certain standard #include headers,

# expands to an implementation-defined null pointer constant

thus it could be #defined as 0, 0L, (void*)0, 1-1, etc., but not (char*)0,
which always was wrong.  This has stayed the same through several drafts
and I think it is most unlikely to change before the final Standard.

Mark Brader, SoftQuad Inc., Toronto, utzoo!sq!msb, msb@sq.com
#define	MSB(type)	(~(((unsigned type)-1)>>1))

guy@gorodish.Sun.COM (Guy Harris) (06/08/88)

(The preceding article's contents represent a misunderstanding of the C
language, rather than a valid statement of what an architecture should or
should not have; I've therefore redirected followups to "comp.lang.c", although
many readers there are probably tired of seeing these same
misunderstandings....)

> This presents a sequential search through a linear list more positively than
> 
> 	#include <stdio.h>	/*stupid place for NULL*/
> 	...
> 	Foobar *foobar;
> 	for (foobar = first; foobar != (Foobar*)NULL; foobar = foobar->next)
> 	{
> 	}
> 
> (For code to be truly portable, NULL must be cast to the appropriate pointer
> type every time it is used since it is nothing more than an integer bit
> pattern :-)

If the smiley really means "I don't believe the statement that precedes the
smiley", OK; however, the statement in question is completely false.  Any valid
C compiler will turn

	foobar != 0

into a comparison of "foobar" with the appropriate representation for a null
pointer of type "pointer to Foobar".

> Why not just adopt a pattern of all zeroes as NULL instead?  Just ensure
> that this convention is observed by your malloc.  Then we can write
> 
> 	if (func)
> 		result = (*func)();
> 
> rather than using double negatives:
> 
> 	if (func != (int(*)(void))NULL)
> 		result = (*func)();

"if (func != (int(*)(void))NULL)" is equivalent to "if (func != NULL)", because
the compiler will perform the type conversion.  "if (func != NULL)" is
equivalent to "if (func != 0)", because "NULL" is supposed to be defined as "0"
(or "(void *)0", but the latter isn't really necessary).  "if (func != 0)" is
equivalent to "if (func)".  Therefore, you can write "if (func)" instead of
"if (func != (int(*)(void))NULL)" *regardless* of the representation of a null
pointer.

The confusion on this point probably results because C did not have function
prototypes until some compiler vendors put them in in anticipation of ANSI C.
Thus, while the compiler has enough information to know that NULL (or 0, it's
the same thing) in "foobar = NULL" or "if (func != NULL)" should be converted
to the appropriate pointer type, the compiler lacks enough information to do
the type conversion in "setbuf(stdout, NULL)" - it doesn't know that the second
argument to "setbuf" is "char *".  Therefore, you have to tell the compiler to
perform this conversion by explicitly casting the pointer:
"setbuf(stdout, (char *)NULL)".

If you have function prototypes in your C implementation, and have included the
proper include files or have otherwise arranged that prototypes are in scope
for all functions to which you refer, the compiler can do the conversion:

	void setbuf(FILE *stream, char *buf);

	...

	setbuf(stdout, NULL);

Please don't post a followup to this article unless you have good evidence that
the C *language* doesn't specify that the appropriate conversions be done in
the examples given.  Thank you.

guy@gorodish.Sun.COM (Guy Harris) (06/21/88)

> ANSI C is not C.  Prototypes do not exist in C.  Please show me where in
> K&R that it states that "0" refers to the NULL pointer irrespective of the
> underlying implementation.

7.7 Equality operators

	...

	A pointer may be compared to an integer, but the result is machine
	dependent unless the integer is the constant 0.  A pointer to which 0
	has been assigned is guaranteed not to point to any object, and will
	appear to be 0; in conventional usage, such a pointer is considered to
	be null.

7.14 Assignment operators

	...

	However, it is guaranteed that assignment of the constant 0 to a
	pointer will produce a null pointer distinguishable from a pointer to
	any object.

Next question.

> ...if "0" does refer to the null pointer, why do some systems have #define
> NULL (-1) in stdio?

Because the implementors were ignorant.  Any language implementation that
defines NULL as -1, and purports to be an implementation of C, is broken.

Next question.

> My statement regarding casting is correct, since not all pointers need be 
> of the same size.  Prototypes eliminate this annoyance, but I live with
> a compiler void of prototypes :-(

Your statement regarding casting was:

> 	#include <stdio.h>	/*stupid place for NULL*/
> 	...
> 	Foobar *foobar;
> 	for (foobar = first; foobar != (Foobar*)NULL; foobar = foobar->next)
> 	{
> 	}

> (For code to be truly portable, NULL must be cast to the appropriate pointer
> type every time it is used since it is nothing more than an integer bit
> pattern :-)

which is absolutely, positively, INcorrect.  The cast is totally unnecessary in
the "for" loop in question; the compiler is quite capable of figuring out that
0 or NULL must be converted to a null pointer of type "Foobar *" before
comparing it, and all valid C compilers will do so (if you know of a compiler
that does not, it is invalid - period).

Furthermore, as there are no function calls whatsoever in your example,
prototypes have no effect.

The ONLY place where you are required to cast NULL properly is when passing
arguments to a procedure; prototypes eliminate that unless you have a function
(such as "execl") that takes a variable number of arguments or otherwise cannot
have its calling sequence fully described by a prototype.  Any valid compiler
will perform the required conversion automatically in all other cases
(assignments and comparisons, as described above).

chris@mimsy.UUCP (Chris Torek) (07/01/88)

>In article <3100003@hpmwtla.HP.COM> jeffa@hpmwtla.HP.COM (Jeff Aguilera)
asks:
>>why do some systems have #define NULL (-1) in stdio? 

In article <743@award.UUCP> scott@award.UUCP (Scott Smith) answers:
>K&R p97: "C guarantees that no pointer that validly points
>at data will contain zero" ...

(A better place is the appendix, which is more explicit.)

>The reason some systems have #define NULL (-1) in stdio is becuase zero
>*can* be a pointer to valid data (as is the case in my micro). In this
>case, NULL is simply changed to be a value that can't be a valid pointer
>on that particular system.

Or, to simplify it:

	Some systems have `#define NULL (-1)' in <stdio.h> because
	some systems are broken.

If location zero is a valid data address, the compiler must take
care to ensure that either nil pointers are not the all-zero bit
pattern, or that something which is never accessed from C is stored
in location zero.

Given the C code

	main()
	{
		char *p = 0;

		if (p) *p = 0;
	}

the following pseudo-assembly is legal and correct:

	main_::
		create_frame

		move	#0xff00,-4(stack)	| p = (char *)0

		move	-4(stack),r0
		cmp	r0,#0xff00		| if (p)
		branch	notequal,$1
		move	#0,0(r0)		| *p = 0
	$1:

		destroy_frame

		return

Notice how pointer assignments and comparisons with `0' turn into
assignments and comparisons with a magic nil pointer.  Whether that
nil pointer's value is in fact zero is not determined, but IN THE
SOURCE LANGUAGE THAT NIL POINTER IS WRITTEN AS IF IT WERE THE INTEGER
ZERO.  (The dpANS also allows (void *)0.)

-----------------------------------------------------------------------
In C, NULL may or may not be in fact zero, but it is *written* as zero.
-----------------------------------------------------------------------

(Nil pointers also have types, and cannot be correctly discussed without
also mentioning their types, but this is the important part.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

guy@gorodish.Sun.COM (Guy Harris) (07/01/88)

> The reason some systems have #define NULL (-1) in stdio is becuase zero
> *can* be a pointer to valid data (as is the case in my micro). In this
> case, NULL is simply changed to be a value that can't be a valid pointer
> on that particular system.

Only if the implementor was ignorant, in which case they shouldn't be trying to
implement C without having somebody around who *does* know the language.

You don't have to guarantee that that location is inaccessible, you merely have
to guarantee that no object that a C program can refer to is at that address.
If "malloc" will never return a pointer to an object at location 0, and if no
static or global data or code defined in a C program will ever be put at
location 0, and if no data object on the stack will be at location 0, this is
sufficient.

If it is truly impossible to arrange for this to happen, what you can do is:

	1) define the format of a null pointer to be, say, 0xFFFF;

	2) have the C compiler properly handle null pointers.

Thus, the C code

	char *p;

	p = 0;

would generate machine code something like

	move.i	#ffff,p

(i.e., store the bit pattern 0xFFFF into "p"), and the C code

	char *p;

	if (p == 0)

or

	char *p;

	if (!p)

would generate machine code something like

	cmp.i	#ffff,p
	bne	1f
	(code inside "if")
1:

(i.e., compare "p" against the bit pattern 0xFFFF; if it's equal, the pointer
is "equal to zero", or null).

Given that, NULL should be defined as 0, not -1.

chad@lakesys.UUCP (D. Chadwick Gibbons) (07/24/89)

	I have seen so many contradictions of the defintion of null and most
of the constant discussion of it has caused my sensability of the defintion of
null to fly way out in left field.  Let me equate to you my current
understanding of the null pointer, and if anyone wishes, they may mail me a
different story.  I don't think the news group needs another rash of postings,
so let's stay away from that.

	As defined by K&R2 (A6.6, p. 198) null is

		"An integral constant expression with [the] value 0, or such
	an expression cast to type void * [which may be] converted, by a cast,
	by assignment, or by comparision, to a pointer of any type."

Which means to me that if you state a comparision such as "if (ptr == 0)" then
you are indeed checking for a null pointer in a valid way.  On the actual
symbolic constant of NULL, The Book says (p. 102):
		"The symbolic constant NULL is often used in place of zero, as
	mnemonic to indicate more clearly that this is a special value for a
	pointer."
Fine.  That makes sense.

	From what I have seen however, NULL may not be a number with a zero
bit pattern, since some implementations do not store zero as a zero bit
pattern (depending if the value in question is signed or unsigned.)  Thus,
zero and NULL may not equal each other in all implementations, yet _both_ may
be used to safely in comparision of a null pointer.  It appears that in
pre-ANSI C, this is not true, and often the symbol constant NULL requires
casting to the proper type to ensure proper conversion of alignment
restrictions.

	And then there is the problem of some architectures storing data at
address zero.  Occording to the defintions above - this does not matter.  It
is the compilers job to assume that the constant zero is for checking for a
null pointer, and if valid data can be at address zero that comparing a
pointer against zero is _not_ a comparision with that address.  Or so I have
extrapolated.

If you respond to this in way, shape, or form, try reading the whole thing
first, go grab something with caffiene in it, and then respond.
-- 
D. Chadwick Gibbons, chad@lakesys.lakesys.com, ...!uunet!marque!lakesys!chad

chris@mimsy.UUCP (Chris Torek) (07/25/89)

In article <883@lakesys.UUCP> chad@lakesys.UUCP (D. Chadwick Gibbons) writes:
>	As defined by K&R2 (A6.6, p. 198) null is
>
>		"An integral constant expression with [the] value 0, or such
>	an expression cast to type void * [which may be] converted, by a cast,
>	by assignment, or by comparision, to a pointer of any type."

This is the pANS definition (modulo minor possible wording changes since
the draft I have).  The Classic definition (Classic C vs New C) simply
leaves out the `void *' alternative.

The untyped nil pointer is an integral constant expression whose value
is zero; it acquires a type (and hence an actual internal
representation) by being cast, assigned, or compared to a pointer
type.  Note that `(void *)0' is a typed nil pointer, but is freely
convertible to other typed nil pointers (possibly changing in internal
representation in the process).

>	From what I have seen however, NULL may not be a number with a zero
>bit pattern, since some implementations do not store zero as a zero bit
>pattern (depending if the value in question is signed or unsigned.)

This is not quite right.  A typed nil pointer (the only `real' nil
pointers are all typed; the untyped nil pointer is a `fake') is not
necessarily represented as an all-zero-bits value.  When an untyped nil
acquires its type, it also acquires its representation, and that
representation may be virtually arbitrary, including being different
for different types of nil pointer.

>Thus, zero and NULL may not equal each other in all implementations,

More precisely, `The actual representation for int-zero may differ from
that of a nil pointer of any particular type.  The four-letter sequence
N, U, L, L, is a preprocessor macro which must be defined as one of
the source-code representations for an untyped or freely-convertible
nil pointer' (the latter is an escape clause to allow `(void *)0'),
`and might not happen to be the unadorned number ``0''.'

>yet _both_ may be used to safely in comparision of a null pointer.

This is because the unadorned number ``0'' is an integral constant
expression whose value is zero, which is one of the two pANS-legal
source code representations for a general nil pointer, while the four
letter sequence N, U, L, L, is required to be defined as one of the two
pANS-legal source code representations for a general nil pointer.

>It appears that in pre-ANSI C, this is not true,

No: this has always been true.  Before the pANS (and before the dpANSes
that preceded the pANSes) there was only one source code representation
for a general nil pointer, namely an integral constant expression whose
value was zero.  Hence, there was only one legal definition (with many
possible spellings) for `#define NULL'.  Now there are two (again with
many possible spellings---how many ways can *you* write an integral
constant expression with value zero?).

>and often the symbol constant NULL requires casting to the proper type
>to ensure proper conversion of alignment restrictions.

This is wrong.  Leave off the `of alignment restrictions' and it
becomes correct, even in New C.  The symbol NULL expands either to the
untyped source code nil pointer (`0') or to the freely-convertible
source code nil pointer (`(void *)0'), which must be given a type (and
hence a true representation) before being used.  It acquires that type
by being cast, assigned, or compared to a variable or expression that
has a pointer type.

The freely-convertible nil pointer (void *)0 already has a type (and
hence a representation), but when it is used where its type is both
incorrect and not-automatically-converted-to-correct, it may have the
wrong representation as well.  There is only one such place, and that
as an argument to a function, where that function does not have a
prototype, or that argument is part of a `...' prototype.  This is, by
design, the same place where the untyped source code nil pointer (`0')
is not automatically converted to a nil pointer to the correct type.
This is true in both New C and Classic C.

What all this means is that there are exactly two correct definitions
for NULL in New C, exactly one in Classic C, and exactly one place
where casts are required.  (There are no places where casts hurt.)  The
casts are required because each different kind of nil pointer can have
a different run-time representation (even though all can use the same
source code representation) and the compiler needs to know which
run-time representation to use.  Without the cast, the compiler has
to assume that you really meant `0' (if NULL is #defined to 0) or
`(void *)0' (if NULL is #defined this way), because functions without
prototypes, or functions with `...' prototypes, can certainly have
such arguments.

>	And then there is the problem of some architectures storing data at
>address zero.  Occording to the defintions above - this does not matter.

Correct---this is a separate problem, and comes down to the one of
choosing the run-time representations for each possible kind of nil
pointer.

>It is the compilers job to assume that the constant zero is for
>checking for a null pointer, and if valid data can be at address zero
>that comparing a pointer against zero is _not_ a comparision with that
>address.  Or so I have extrapolated.

This is correct, but not terribly well phrased.  It is the compiler's
job to know that the integer constant zero can be a source code expression
meaning `nil pointer to T' for some type T, and to so convert it where
there is sufficient information---cast, assignment, or comparison to
expression or variable of type pointer to T---and it is the entire
runtime system's job to make sure that whatever scheme is chosen works.

One scheme, which could be used on a machine where addresses from
0x0000 through 0x3FFF are the only valid addresses, would be to choose
the value 0xBAAD for all nil pointer types, and convert
	if (pointer_var == 0)
into
	compare var,0xBADD
instructions, and so forth.  Another more common and lazy scheme, which
is what was used for the Unix PDP-11 split I&D systems, is to put a
`shim' in at location zero, so that even though addresses 0x0000
through 0xFFFF were all legal data locations, there was nothing useful
at 0x0000, it being already occupied by the shim.  Then the system can
use 0 as the runtime representation for all nil pointer types, which
makes the compiler a little bit easier to write.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris