[comp.lang.c] What C compilers have non-zero null pointers?

jc@minya.UUCP (John Chambers) (07/08/90)

After following a bit of debate in another newsgroup concerning 
dereferencing null pointers, I've become curious as to how various
C compilers actually represent null pointers.  I've never actually
seen a C compiler that uses anything other than all 0 bits for
a null pointer, but some people insist that this is quite common
(and I'm a total idiot for not knowing it ;-).  Now, I've known
for some time that all-zeroes wasn't the *required* representation
of a null pointer, but I also understand why it's the obvious one.
Consider that the C bible (page 192) says, concerning assignments
to/from pointers and other types, "The assignment is a pure copy
operation, with no conversion."  This means that in:
	int i;
	char*p;
	i = 0;
	p = i;
the value assigned to p is the same bit pattern as i (which needs
to be long on some machines, of course).  Now, I've never even
heard of a C compiler that uses anything other than all zeroes
for an int (or long) 0, so it seems to my naive little mind that
p must be all zeroes, also.  Of course, the manual doesn't quite
say anywhere that the above code gives p the same value as 
	p = 0;
though I sorta expect that most programmers would be surprised
if the values were different.

Anyhow, what's the story here?  Are there really C compilers that
use something other than all-zero bits for a null pointer?  If so,
can you name the compilers, and describe their representations and
how they handle code like the above?

This seems like it could be the source of a lot of fun portability
problems.  Any insights here?

[It seems to me that it'd be useful if there were a compiler option
to specify the representation of null pointers, but that's probably
far too much to hope for... :-]

-- 
Typos and silly ideas Copyright (C) 1990 by:
Uucp: ...!{harvard.edu,ima.com,eddie.mit.edu,ora.com}!minya!jc (John Chambers)
Home: 1-617-484-6393
Work: 1-508-952-3274

unhd (Paul A. Sand) (07/10/90)

In article <422@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>Anyhow, what's the story here?  Are there really C compilers that
>use something other than all-zero bits for a null pointer?  If so,
>can you name the compilers, and describe their representations and
>how they handle code like the above?

"Certain Prime computers use a value different from all-bits-0 to
encode the null pointer. Also, some large Honeywell-Bull machines use
the bit pattern 06000 to encode the null pointer. On such machines, the
assignment of 0 to a pointer yields the special bit pattern that
designates the null pointer. Similarly, (char *)0 yields the special
bit pattern that designates a null pointer."

-- "Portable C" by H. Rabinowitz and Chaim Schaap, Prentice-Hall, 1990,
page 147.  A good book. Rex Jaeschke's "Portability and the C Language"
(Hayden, 1988) makes the same point but doesn't name names.

>This seems like it could be the source of a lot of fun portability
>problems.  Any insights here?

I bet you're right, although it's rather easy to be careful in these
cases; there are a lot of more common and subtler portability
problems.

These books point out, for example, that calloc() initializes its
allocated memory to all-bits-0.  Interestingly [at least for those
interested by such things] Rabinowitz & Schaap claim that "most C
environments" initialize non-explicitly-initialized static and extern
variables to all-bits-0. On the other hand, Jaeschke claims that such
variables are assigned the value of 0 "cast to their type." Unless
you're working in guaranteed ANSI-land only, I wouldn't rely on
Jaeschke being right.

It's also a good idea, I'm told, to cast the null pointer explictly
when using it as a function argument, for this and other reasons.

-- 
-- Paul A. Sand
-- University of New Hampshire
-- uunet!unhd!pas -or- pas@unh.edu

henry@zoo.toronto.edu (Henry Spencer) (07/10/90)

In article <422@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>Consider that the C bible (page 192) says, concerning assignments
>to/from pointers and other types, "The assignment is a pure copy
>operation, with no conversion."  This means that in:
>	int i;
>	char*p;
>	i = 0;
>	p = i;
>the value assigned to p is the same bit pattern as i...

Reading the Old Testament (K&R1) and trying to apply it to modern C is
a mistake.  This code isn't even legal nowadays.  You need an explicit
cast to turn the int into a pointer, and there is no promise that that
cast doesn't do some sort of arcane conversion operation.

Actually, even the Old Testament continued with:  "This usage is
nonportable, and may produce pointers which cause addressing exceptions
when used.  However, it is guaranteed that assignment of the *constant*
0 to a pointer will produce a null pointer distinguishable from a
pointer to any object." [emphasis added]

The constant 0 in a pointer context has no relationship whatsoever to
the integer value 0; it is a funny way of asking for the null pointer,
which need not resemble the int value 0 in any way.
-- 
NFS is a wonderful advance:  a Unix    | Henry Spencer at U of Toronto Zoology
filesystem with MSDOS semantics. :-(   |  henry@zoo.toronto.edu   utzoo!henry

roger@everexn.uucp (Roger House) (07/11/90)

In <422@minya.UUCP> jc@minya.UUCP (John Chambers) writes:

>                             ...  "The assignment is a pure copy
>operation, with no conversion."  This means that in:
>	int i;
>	char*p;
>	i = 0;
>	p = i;
>the value assigned to p is the same bit pattern as i (which needs
>to be long on some machines, of course).    ... 

I don't know about the Bible, but the ANSI C standard does NOT say that p = i
is a pure copy.  Page 37 of the standard Rationale says:

	Since pointers and integers are now considered incommensurate,
	the only integer that can be safely converted to a pointer is
	the constant 0.  The result of converting any other integer to
	a pointer is machine dependent.

Also, p38 of the standard itself says:

	An integral constant expression with the value 0, or such an
	expression cast to type void *, is called a null pointer con-
	stant.  If a null pointer constant is assigned to or compared
	for equality to a pointer, the constant is converted to a 
	pointer of that type.  Such a pointer, called a null pointer, is
	guaranteed to compare unequal to a pointer to any object or
	function.

Note the term "integral constant expression".  In your example, i is not
a constant expression, so the result of p = i is machine dependent.

						Roger House

amoore@softg.uucp (07/12/90)

In article <1990Jul10.141208.24902@uunet!unhd>, pas@uunet!unhd (Paul A. Sand) writes:

> "Certain Prime computers use a value different from all-bits-0 to
> encode the null pointer. 

The Prime 50 series (segmented) architecture has a segment 0.  A null
pointer on a Prime is segment 7777 location 0 (usually written 7777/0).
The C compiler (written by Garth Conboy of Pacer Software) deals with
comparisons to 0.

marking@drivax.UUCP (M.Marking) (07/13/90)

jc@minya.UUCP (John Chambers) writes:

) After following a bit of debate in another newsgroup concerning 
) dereferencing null pointers, I've become curious as to how various
) C compilers actually represent null pointers.  I've never actually
) seen a C compiler that uses anything other than all 0 bits for
) a null pointer, but some people insist that this is quite common

My recollection is that there are "funny" NULLs on those machines
that use 1s-complement arithmetic (Univac 1100, some CDC stuff)
because sometimes you can conveniently generate exceptions on
"minus zero". But it's been a few years...

bls@svl.cdc.com (brian scearce) (07/14/90)

The CDC Cyber series of computers uses not-all-0-bits for NULL.
Cyber addresses are 48 bits long, with 4 bits for ring, 12 bits
for segment and 32 bits for offset.  If you load an address register
with a number with ring == 0, you get a hardware trap.

So, on our compiler, NULL is represented by ring == (ring your
program is executing in), segment == 0, offset == 0.

This means that you have to be quite careful in those situations
where you type 0 and mean NULL and it isn't inferable from context
what you mean.  The only time that this makes a difference is (I
think) arguments to functions (should be "non-prototyped functions",
but I haven't implemented ANSI yet).

So, the output from:
main()
{
  char *p = 0;
  printf("%x\n", (int)p);
}

is:
b00000000000

Its still a very good compiler.  Really.  This small oddity isn't
as bad as most people seem to think.  As I've explained to a few
through email, it's almost like floating point.  Nobody expects
2.0 to have the same representation as 2, but we still write 2
sometimes when we mean 2.0 (like double x = 2; is OK).  

--
Brian Scearce        \ "I tell you Wellington is a bad general, the English are
(not on CDCs behalf)  \ bad soldiers; we will settle the matter by lunch time."
bls@u02.svl.cdc.com    \   -- Napolean Bonaparte, June 18, 1815 (at Waterloo)
shamash.cdc.com!u02!bls \ From _The Experts Speak_, Cerf & Navasky

rja@edison.cho.ge.com (rja) (07/17/90)

I used to use a compiler for MSDOS and the 80x86 cpus 
whose NULL pointer was F000:0000 hex when examined via
a debugger.  It of course did compile fine as long as one
used sense and compared pointers to NULL rather than 
a constant of zero...

Compilers where NULL isn't represented as all zero bits
just aren't that uncommon.

darcy@druid.uucp (D'Arcy J.M. Cain) (07/17/90)

In article <9007161750.AA00664@edison.CHO.GE.COM> rja <rja@edison.cho.ge.com> writes:
>I used to use a compiler for MSDOS and the 80x86 cpus 
>whose NULL pointer was F000:0000 hex when examined via
>a debugger.  It of course did compile fine as long as one
>used sense and compared pointers to NULL rather than 
>a constant of zero...
>
Which compiler was that?  I hope it didn't claim to be ANSI compatible.  The
NULL pointer does not have to be represented in memory as all zero bits but
it does have to be represented by the string "0" in the context of a pointer
comparison.  Comparing a pointer to 0 is always correct and does not have
anything to do with the internal representation of the NULL pointer.

However I always use NULL for two reasons.  Broken compilers on brain dead
CPUs and if NULL is defined as "(void *)0" then it tests for accidentally
testing a NULL pointer against a non pointer variable.  For example:
    int a = 0;
    if (a == NULL)
        do(something);
If tested against 0 the compiler won't complain but it will complain if it
is tested against (void *)0.  At least GNU C complains.  In other words, use
NULL not because 0 may not be the NULL pointer but because NULL can't be
anything else.

-- 
D'Arcy J.M. Cain (darcy@druid)     |   Government:
D'Arcy Cain Consulting             |   Organized crime with an attitude
West Hill, Ontario, Canada         |
(416) 281-6094                     |

peter@ficc.ferranti.com (Peter da Silva) (07/17/90)

In article <9007161750.AA00664@edison.CHO.GE.COM> rja <rja@edison.cho.ge.com> writes:
> I used to use a compiler for MSDOS and the 80x86 cpus 
> whose NULL pointer was F000:0000 hex when examined via
> a debugger.  It of course did compile fine as long as one
> used sense and compared pointers to NULL rather than 
> a constant of zero...

If that was the case, the compiler was broken. A constant zero in a pointer
context is the definition of NULL. !pointer == 0! and !pointer == NULL! should
evaluate the same way (as if they generated the same code).

> Compilers where NULL isn't represented as all zero bits
> just aren't that uncommon.

Compilers where it's something you need to watch out for should be.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>

ergo@netcom.UUCP (Isaac Rabinovitch) (07/18/90)

In <9007161750.AA00664@edison.CHO.GE.COM> rja@edison.cho.ge.com (rja) writes:

>I used to use a compiler for MSDOS and the 80x86 cpus 
>whose NULL pointer was F000:0000 hex when examined via
>a debugger.  It of course did compile fine as long as one
>used sense and compared pointers to NULL rather than 
>a constant of zero...

True.  But what the "NULL should always be 0" diehards want is not to
write (for example)

	for (ptr = fist; ptr != 0; ptr = ptr->next)

in which 0 should probably be #DEFINED anyway, but rather

	for (ptr = first; ptr ; ptr = ptr->next)

which produces tighter code and (most important of all) looks
spiffier.  It has the elegance of expression old C hands crave.

>Compilers where NULL isn't represented as all zero bits
>just aren't that uncommon.

My '78 K&R says that assigning 0 to a pointer is (or was) guarranteed
to produce a NULL, even on compilers that didn't like other
integer-to-pointer assignments.  But, interestingly, they did *not*
guarantee, even then, the reverse!
-- 

ergo@netcom.uucp			Isaac Rabinovitch
atina!pyramid!apple!netcom!ergo		Silicon Valley, CA
uunet!mimsy!ames!claris!netcom!ergo

	"I hate quotations.  Tell me what you know!"
			-- Ralph Waldo Emerson

leo@ehviea.ine.philips.nl (Leo de Wit) (07/18/90)

In article <1990Jul17.123627.1932@druid.uucp> darcy@druid.uucp (D'Arcy J.M. Cain) writes:
    []
|However I always use NULL for two reasons.  Broken compilers on brain dead
|CPUs and if NULL is defined as "(void *)0" then it tests for accidentally
|testing a NULL pointer against a non pointer variable.  For example:
|    int a = 0;
|    if (a == NULL)
|        do(something);
|If tested against 0 the compiler won't complain but it will complain if it
|is tested against (void *)0.  At least GNU C complains.  In other words, use
|NULL not because 0 may not be the NULL pointer but because NULL can't be
|anything else.

For much the same reason I always use explicit casts for null pointers;
this also catches unadvertent assignments or comparisions to a pointer
of a different type, and has the additional advantage that null
pointers as parameters have the same "appearance"; you don't have to
develop different habits of treating pointers. As an example:

    if (fgets(inbuf,sizeof(inbuf),fp) != (char *)0) ....

but also:

    execl("/bin/ls","ls",(char *)0);

Another advantage is that you see immediately from the program text
what kind of pointer is expected at some stage; in production code, and
especially in the maintenance phase, this may prevent a lot of type
lookups.

    Leo.

steve@taumet.com (Stephen Clamage) (07/18/90)

ergo@netcom.UUCP (Isaac Rabinovitch) writes:

>But what the "NULL should always be 0" diehards want is not to
>write (for example)
>	for (ptr = fist; ptr != 0; ptr = ptr->next)
>in which 0 should probably be #DEFINED anyway, but rather
>	for (ptr = first; ptr ; ptr = ptr->next)
>which produces tighter code ...

If in this context the expression
	ptr
produces code which is better than
	ptr != 0
then you are the victim of a lazy compiler writer.  There should be
no difference, since 'ptr' is shorthand for 'ptr != 0'.  I would
complain to the vendor, or buy a better compiler.

This is the sort of micro-optimization that programmers in a higher
level language should NOT have to worry about.  On compiler A, one
code version may produce a more efficient program; on compiler B, the
reverse may be true.  Compiler B might even be the next release of
compiler A.  Thus, the effort spent in this micro-optimization is
not only wastful, but may be counter-productive over time.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

ark@alice.UUCP (Andrew Koenig) (07/19/90)

In article <12288@netcom.UUCP>, ergo@netcom.UUCP (Isaac Rabinovitch) writes:

> True.  But what the "NULL should always be 0" diehards want is not to
> write (for example)

> 	for (ptr = fist; ptr != 0; ptr = ptr->next)

> in which 0 should probably be #DEFINED anyway, but rather

> 	for (ptr = first; ptr ; ptr = ptr->next)

These two forms are guaranteed to be equivalent (if you change `fist'
to `first' in the first example).  Period.

> which produces tighter code and (most important of all) looks
> spiffier.  It has the elegance of expression old C hands crave.

Whether it produces tighter code is a matter between you and your
local implementation.  Since the two forms are equivalent, there is
no particular reason to believe that they will produce different code
at all.

> My '78 K&R says that assigning 0 to a pointer is (or was) guarranteed
> to produce a NULL, even on compilers that didn't like other
> integer-to-pointer assignments.  But, interestingly, they did *not*
> guarantee, even then, the reverse!

Yes indeed.  However, if you write

	ptr != 0

then the 0 is converted to a pointer of the appropriate type and then
compared, and that *is* guaranteed to work.
-- 
				--Andrew Koenig
				  ark@europa.att.com

karl@haddock.ima.isc.com (Karl Heuer) (07/19/90)

In article <12288@netcom.UUCP> ergo@netcom.UUCP (Isaac Rabinovitch) writes:
>	for (ptr = fist; ptr != 0; ptr = ptr->next)
>	for (ptr = first; ptr ; ptr = ptr->next)
>which produces tighter code and (most important of all) looks
>spiffier.

There is no reason it should produce tighter code; the compiler still has
to generate a compare against zero.  And whether it "looks spiffier" is a
matter of taste.  I personally switched to explicit compares (against an
*appropriately typed* zero!) many years ago.  Redundancy is your friend.

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint
if (i != 0 && c != '\0' && x != 0.0 && p != NULL) abort();

martin@mwtech.UUCP (Martin Weitzel) (07/19/90)

Sometimes I think we should collect votes for starting a new group
"comp.lang.c.nullpointers". Yes, I know about the kill-files but some
of us could tell our feeds then not to send this group ... 1/2:-).
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

thacher@unx.sas.com (Clarke Thacher) (07/20/90)

Actually the Prime 50 series has both types of pointers.
The older (PL1) style of pointer uses segment 7777 (octal) in their
NULL pointers (there can never be a segment 7777), dereferencing
this type of pointer would raise a NULL_POINTER condition.  
The newer C compiler uses a pointer with a segment of 0 and an 
offset of 0.  This pointer is still not bit equal to an integer 0
there are two extra bits (for the ring number) that may or may not
be set.  To solve this, Prime added a TCNP (test c null pointer) 
instruction to the instruction set.  They also added a bunch of other
instructions for the C compiler (mostly having to do with character
operations).

Clarke Thacher        PRIMOS Host developer          SAS Institute, Inc.
sasrer@unx.sas.com    (919) 677-8000 x7703          Box 8000, Cary, NC 27512
-- 
Clarke Thacher        PRIMOS Host developer          SAS Institute, Inc.
sasrer@unx.sas.com    (919) 677-8000 x7703          Box 8000, Cary, NC 27512

ansok@stsci.EDU (Gary Ansok) (07/22/90)

In article <12288@netcom.UUCP> ergo@netcom.UUCP (Isaac Rabinovitch) writes:
>True.  But what the "NULL should always be 0" diehards want is not to
>write (for example)
>
>	for (ptr = fist; ptr != 0; ptr = ptr->next)
>
>in which 0 should probably be #DEFINED anyway, but rather
>
>	for (ptr = first; ptr ; ptr = ptr->next)
>
>which produces tighter code and (most important of all) looks
>spiffier.  It has the elegance of expression old C hands crave.

Once more with feeling:

	if (ptr)		/* or for(;ptr;) */

is exactly equivalent to

	if (ptr != 0)

which is exactly equivalent to

	if (ptr != (typeof ptr) 0)

which is exactly equivalent to

	if (ptr != NULL-pointer-for-typeof-ptr)

Any C compiler that has a not-all-bits-zero NULL internal representation
and does not compare a pointer to that in "if (ptr)" or "for (...; ptr; ...)"
is seriously BROKEN.

Whether you like "if (ptr)" on readability grounds is a different question
(I like it, but I seem to be in the minority) -- but that's purely a style 
question and the compiler had better produce the correct code.

Gary Ansok
ansok@stsci.edu