[net.lang.c] Must a NULL pointer be a 0 bit pattern?

kendall@wjh12.UUCP (Sam Kendall) (10/15/84)

> ...  There is no committment to the bit pattern of a 0 pointer in C.
> As such, I see no reason not to map a 0 pointer onto bit pattern you
> want, as long as 1) it's distinct from *all* bit patterns for
> legitimate pointers in C, ...  and 2) it fits in the same number of
> bits as any other pointer.
> 
> 	Guy Harris

   This is a very interesting point.  I claim that for K&R C, a null
pointer, and a floating point zero as well, MUST be the zero bit
pattern.  (The fact that some implementations cannot conform to this
means that the language must change; ANSI C deals with this problem, as
explained below.)

   The "proof" goes as follows: consider this external declaration:

	union {
		char *	p_member;
		double	f_member;
		char	c_member[COVERS_P_AND_F];
	} implicitly_initialized_to_zero;

It seems to me that K&R guarantees that, when the program begins
execution, p_member, f_member and c_member are guaranteed to be
zero simultaneously, and the only way to do it is to make them all a
zero bit pattern.  I don't have my manual with me--can anyone try to
poke holes in this?  It may be that the K&R wording isn't rigorous
enough, and that my proof "falls to the ground".

   ANSI C deals with this by giving a rule for explicit initialization
of unions: the first member is the one that is initialized.  Implicit
initialization can use the same rule, meaning that in the above example,
only p_member would be guaranteed to start out as a zero (null) value.

	Sam Kendall	  {allegra,ihnp4,ima,amd}!wjh12!kendall
	Delft Consulting Corp.	    decvax!genrad!wjh12!kendall

henry@utzoo.UUCP (Henry Spencer) (10/17/84)

> 	union {
> 		char *	p_member;
> 		double	f_member;
> 		char	c_member[COVERS_P_AND_F];
> 	} implicitly_initialized_to_zero;
> 
> It seems to me that K&R guarantees that, when the program begins
> execution, p_member, f_member and c_member are guaranteed to be
> zero simultaneously, and the only way to do it is to make them all a
> zero bit pattern.  I don't have my manual with me--can anyone try to
> poke holes in this?  It may be that the K&R wording isn't rigorous
> enough, and that my proof "falls to the ground".

I think your last conjecture is correct, i.e. K&R simply is not being
specific enough.  Note that a floating-point zero isn't necessarily
an all-zeros bit pattern either.  The K&R wording can be interpreted
in one of two ways:

1. Implicitly-initialized storage starts out as all-zeros bit patterns,
	which doesn't necessarily look like a 0 in all data types.

2. Implicitly-initialized storage looks like 0's, and the semantics of
	initializing unions simply aren't defined well enough.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (10/18/84)

The ANSI C committee apparently has talked about the problem of the semantics
of default initialization to "zero".  I am told that the latest draft, about
to be released, says that the default initialization of static data acts as
if everything had been assigned the integer constant 0.  So pointers really
do get initialized to NULL and floating-point numbers to 0.0, regardless of
the actual bit-level representations.  And the rule for initialization of
unions resolves the original example that started this discussion.

It is agreed that the semantics of calloc() would be tricky on a machine
with non-000 representations of 0.0 or NULL, but there's no simple fix.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

hokey@plus5.UUCP (Hokey) (10/19/84)

>    ANSI C deals with this by giving a rule for explicit initialization
> of unions: the first member is the one that is initialized.  Implicit
> initialization can use the same rule, meaning that in the above example,
> only p_member would be guaranteed to start out as a zero (null) value.

I'm not up on the reasons for using the first member of the union being
the one which is initialized.

I can see how initialization of unions can be very useful (I need it
in several places), but why the constraint on the type?  If I have an
external structure in which a member is a union which I want to initialize,
I gather I am out of luck unless the elements of the union which I am
initializing are of the same type.

It would be much more useful to be able to (explicitly) initialize a union
to *any* legal value.  I don't see why this is either bad or hard.  K&R
states in 6.8 (page 139) that "It is the responsibility of the programmer
to keep track of what type is currently stored in a union;...".

Can somebody tell me either why the ANSI C restriction is there, or
where I am missing the point?
-- 
Hokey           ..ihnp4!plus5!hokey
		  314-725-9492

kpmartin@watmath.UUCP (Kevin Martin) (10/19/84)

>   The "proof" goes as follows: consider this external declaration:
>
>	union {
>		char *	p_member;
>		double	f_member;
>		char	c_member[COVERS_P_AND_F];
>	} implicitly_initialized_to_zero;
>
>It seems to me that K&R guarantees that, when the program begins
>execution, p_member, f_member and c_member are guaranteed to be
>zero simultaneously, and the only way to do it is to make them all a
>zero bit pattern.  I don't have my manual with me--can anyone try to
>poke holes in this?  It may be that the K&R wording isn't rigorous
>enough, and that my proof "falls to the ground".
It is indeed the case that K&R isn't very rigorous on that point.
I have always read "guaranteed to start off as 0" as denoting the bit
pattern, rather than the interpretation of the bit pattern in any particular
type.

I would prefer that uninitialized variables be just that: uninitialized,
but initializing to the *bit pattern* of zero bits is second best.

>   ANSI C deals with this by giving a rule for explicit initialization
>of unions: the first member is the one that is initialized.  Implicit
>initialization can use the same rule, meaning that in the above example,
>only p_member would be guaranteed to start out as a zero (null) value.
>	Sam Kendall	  {allegra,ihnp4,ima,amd}!wjh12!kendall

As for explicit initializers, I certainly don't see a good reason for
picking the first element of a union; it is very likely that in two
variables of the same union type, I would want the initialization to
occur to two different elements.

It would be far more advantageous to add a method of explicitely naming the
union element you want to hit. As a side effect of the required syntax, you
could also initialize array and struct elements in any desired order.
Unfortunately, this would require additions to the language, which is
not what the committee is out to do, etc. etc.
                     Kevin Martin, UofW Software Development Group

henry@utzoo.UUCP (Henry Spencer) (10/22/84)

> As for explicit initializers, I certainly don't see a good reason for
> picking the first element of a union; it is very likely that in two
> variables of the same union type, I would want the initialization to
> occur to two different elements.

As I understand it, nobody is claiming that the "first element" rule is
good; all they are claiming is that it's simple and does not have adverse
consequences elsewhere.  Apparently the various alternatives all have
serious problems of one kind or another.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (10/22/84)

> It would be much more useful to be able to (explicitly) initialize a union
> to *any* legal value.  I don't see why this is either bad or hard.  K&R
> states in 6.8 (page 139) that "It is the responsibility of the programmer
> to keep track of what type is currently stored in a union;...".

The problem is, *which* member of the union are you initializing?  If your
union has int and double members, and you initialize it to 0, which member
does this initialize?  Remember that C converts int to double as necessary.
Suppressing the conversion for this case only is an awkward special case,
and creates other problems.

Nobody contends that the "first member" rule is a particularly useful way
of initializing unions.  It is there because (a) "doing it right" isn't
easy and there is little experience with the problems created, but (b) it
really is necessary to make initialization of unions meaningful, if only
so you can answer questions like "what does initialization to 0 mean?".
The "first member" rule is solely a matter of needing to do something,
not having any clear indication that there is any "best" way, and wanting
to avoid dangerous complexity.  The "first member" rule has the virtue
that there *is* implementation experience with it, so its effects are
understood to some extent.  Not true of most alternatives.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

grahamr@azure.UUCP (Graham Ross) (10/31/84)

Three comments about non-zero NULL:

1. Because of implicit comparison with zero, as in "while(p)s;", this idea
is cannot be implemented simply by changing stdio.h to read
#define NULL ((char*)0x87654321)

2. In making changes to the compiler, this must remain zero:
	(p = NULL , (int)p)
Also, for every declared "var", this must remain one:
	(p = &var , (int)p != 0)

3. The issue of how a union can be set to zero was handled properly by the
correspondent who said the first member is initialized to zero.  A further
note might be made that exec(2) need not know where to put the 0x87654321
patterns; crt0 or something emitted by the linker can do this, but perhaps
with a substantial amount of work and/or abandonment by Unix of the
"common model" for linking.

	Graham Ross, Tektronix, ...!tektronix!tekmdp!grahamr

henry@utzoo.UUCP (Henry Spencer) (11/04/84)

> 1. Because of implicit comparison with zero, as in "while(p)s;", this idea
> is cannot be implemented simply by changing stdio.h to read
> #define NULL ((char*)0x87654321)

Quite correct.  The compiler itself has to know what the bit pattern of
the 0 pointer is, so that it can generate correct code for implicit
comparisons against 0.

> 2. In making changes to the compiler, this must remain zero:
> 	(p = NULL , (int)p)
> Also, for every declared "var", this must remain one:
> 	(p = &var , (int)p != 0)

Sorry, wrong.  The results of casting a pointer to integer are explicitly
implementation-defined, except for the ability -- in certain limited
circumstances -- to cast it back to pointer and get the same one you
started with.  There are *no* *guarantees* anywhere about the results
of the comparisons you give, and compilers for odd machines are entitled
to give results other than the ones you suggest.

The key point in all of this is that the constant 0 does not necessarily
have the same representation in all types.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry