[comp.lang.c] Null revisited

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (02/02/89)

So many people have misread my last posting that I feel it is necessary
to restate what I said more clearly. When I corrected the person who
said any compiler which didn't define NULL to be zero was broken, and
quoted dpAns, I was not trying in any way to convince anyone that NULL
*must* be cast to a pointer type, only that it is valid to do so.

A portable program will not assume that it is an int or a pointer, but
will use it in ways which do not make any assumptions as to type. I
offer these ideas on avoiding problems, with *no* claim that they cover
all cases:
	1) If you mean zero, use zero
	2) NULL with either type of declaration works when assigned
	   to a pointer or compared with a pointer. ie. "foo = NULL"
	   or "if (foo == NULL)" should work for NULL defined as int
	   or pointer.
	3) NULL used as an argument to a procedure or macro should be
	   cast. The macro may not use it in a way which requires
	   casting, but it is better to be consistent.

About end of string:

  Some peopke put a zero in a char using NULL, such as "ffoA[n]=NULL".
I think this is portable for either type, but I have seen compilers
which warn about type conversion is NULL is a pointer. To avoid this I
have defined an EOS (end of string) macro which does not produce the
warnings, at least on compilers I've tried.

	#define EOS	((unsigned char) 0)

  I don't think this will work any better than NULL, but it certainly is
quieter.

  Hopefully this will clarify what I originally meant, so that the
people who want to disagree can at least do so based on what I meant and
not what they thought I said.

  Several people wrote and asked why anyone would define NULL as a
pointer in their stdio.h file. Because there is a lot of code which
uses uncast NULL as a procedure argument, it is important that it be a
pointer type where the size or form of an int is not the same as that
of a pointer. There is no way to solve all portability problems if
pointers to diferent types have diferent forms, but for implementations
in which all pointers have the same form, a pointer NULL can save the
users trouble porting code which was written without adequate attention
to portability.

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

les@chinet.chi.il.us (Leslie Mikesell) (02/05/89)

In article <13068@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:

>A portable program will not assume that it is an int or a pointer, but
>will use it in ways which do not make any assumptions as to type. 

Does the (IMHO) incorrect cast of NULL to (char *) in some compilers
stdio.h hurt anything in a correctly written program?  That is, does
(char *)0 become something other than 0 in the context of an assignment
or comparison, and is it treated differently when cast to the correct
type (perhaps (int *)(char *)0 or (struct foo *)(char *)0)?

Les Mikesell

john@frog.UUCP (John Woods) (02/06/89)

In article <13068@steinmetz.ge.com>, davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes:
>   Some peopke put a zero in a char using NULL, such as "ffoA[n]=NULL".
> I think this is portable for either type, but I have seen compilers
> which warn about type conversion is NULL is a pointer.

If NULL is #defined as 0, this will work, but if it is #defined as (void *)0
(which I think is the other ANSI-legal #definition; if you don't know, assume
that only 0 works, if you do know, quietly correct me by mail), this is not
guaranteed to work:  on a sufficiently bizarre architecture, the nil-pointer
need not have an all-zero-bits representation, as long as the compiler can
(a) properly recognize that the constant 0 must be coerced to the nil-pointer
pattern, and (b) can coerce the (void *)nil-pointer to the (foo *)nil-pointer
for all types "foo".  The compiler is not obligated to know how to turn
(void *)0 back into 0.

Myself, I usually use #define NUL '\000', named after the ASCII NUL character,
though I often worry about misunderstandings between NUL and NULL.
-- 
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

Presumably this means that it is vital to get the wrong answers quickly.
		Kernighan and Plauger, The Elements of Programming Style

chris@mimsy.UUCP (Chris Torek) (02/06/89)

In article <7630@chinet.chi.il.us> les@chinet.chi.il.us
(Leslie Mikesell) writes:
>Does the (IMHO) incorrect cast of NULL to (char *) in some compilers
>stdio.h hurt anything in a correctly written program?

That (not just opinion) incorrect definition of NULL does indeed hurt.
For instance, given the (correct) C code:

	#include <stdio.h>
	int *ip = NULL;

the compiler should at least warn about the improper assignment of
(char *)0 to a variable of type (int *).  A compiler with such a
definition is likely to warn about or refuse to compile a large number
of correct programs.

Once again, the *ONLY* correct definitions of NULL are

	#define NULL 0
	#define NULL 0L
	#define NULL ((void *)0)

and possibly other constant expressions that evaluate to 0, 0L,
or (void *)0 anyway.  (char *)0 is *not* a proper untyped nil pointer.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/07/89)

In article <15820@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>	#define NULL 0L

That's not legal, although you can probably get away with it.

penneyj@servio.UUCP (D. Jason Penney) (02/15/89)

I strongly disagree with Bill Davidsen's suggestion for null-terminating
C strings.  I submit the following example as safer, cleaner, and more
legible:

char aString[20];

aString[0] = '\0';

'x' is a literal of type char.  Thus, '\0' is the char with value 0,
which is really what was intended here.

When you assign an int or a pointer to a character, the reader ends
up wondering if the type mismatch is unintentional.

-- 
D. Jason Penney                  Ph: (503) 629-8383
Servio Logic Corporation       uucp: ...ogccse!servio!penneyj
15220 NW Greenbrier Parkway #100
Beaverton, OR 97006

diamond@diamond.csl.sony.junet (Norman Diamond) (02/20/89)

In article <102@servio.UUCP> penneyj@servio.UUCP (D. Jason Penney) writes:
[for null-terminating a C string]

>char aString[20];
>aString[0] = '\0';

Yes, this is fine.

>'x' is a literal of type char.  Thus, '\0' is the char with value 0,
>which is really what was intended here.

In fact, 'x' is a literal of type int, and '\0' is the same as 0.

>When you assign an int or a pointer to a character, the reader ends
>up wondering if the type mismatch is unintentional.

When you assign 'x' to a character, you are assigning an int to a
character.  The reader knows that the type mismatch was intentional.
But readers who do not understand the type mismatch often make
different errors, where results are counter-intuitive.
--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net)
  The above opinions are my own.   |  Why are programmers criticized for
  If they're also your opinions,   |  re-inventing the wheel, when car
  you're infringing my copyright.  |  manufacturers are praised for it?

scm@datlog.co.uk ( Steve Mawer ) (02/27/89)

In article <10138@socslgw.csl.sony.JUNET> diamond@diamond. (Norman Diamond) writes:
>
>When you assign 'x' to a character, you are assigning an int to a
>character.  The reader knows that the type mismatch was intentional.

Not if he knows the C language.  A single character written within
single quotes is a *character constant*.  This isn't an int.

'\0' is a special case to permit the representation of non-graphical
characters (also newline, tab, backslash, return, etc.) and is not
the same as 0, which is an integer constant.

It should, however, be noted that some compilers will allow the use
of multiple characters, as in 'abcd' (which *may* work on 32 bit
machines).  I wouldn't recommend this usage in portable software.
(In fact I wouldn't *ever* do it.  Well, maybe just to try it :-))
-- 
Steve C. Mawer        <scm@datlog.co.uk> or < {backbone}!ukc!datlog!scm >
                       Voice:  +44 1 863 0383 (x2153)

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/01/89)

In article <1783@dlvax2.datlog.co.uk> scm@datlog.co.uk ( Steve Mawer ) writes:
>In article <10138@socslgw.csl.sony.JUNET> diamond@diamond. (Norman Diamond) writes:
>>When you assign 'x' to a character, you are assigning an int to a
>>character.  The reader knows that the type mismatch was intentional.
>Not if he knows the C language.  A single character written within
>single quotes is a *character constant*.  This isn't an int.
>'\0' is a special case to permit the representation of non-graphical
>characters (also newline, tab, backslash, return, etc.) and is not
>the same as 0, which is an integer constant.

"Open mouth, insert foot."

Ok, gang, should we beat up on him or should we be nice?

A so-called "character constant" in C, such as 'x', is an integer
constant, expressed in a funny notation, that's all.  '\0' is no
more and no less than a fancy way of writing 0.  '\012' is a fancy
way of writing 10, and so forth.  The value of 'x' is some positive
integer that is identical to the numeric code used internally to
represent the character that is normally known as "x" on the system,
when it is read from a text file via getc(), for example.  To take a
specific case, if the system primarily uses ASCII codes to represent
characters, then 'x' would most likely be exactly the same as 120.

This is not new with ANSI C; it's the way it has always been in C.

nagel@blanche.ics.uci.edu (Mark Nagel) (03/01/89)

In article <1783@dlvax2.datlog.co.uk>, scm@datlog ( Steve Mawer ) writes:
|In article <10138@socslgw.csl.sony.JUNET> diamond@diamond. (Norman Diamond) writes:
|>
|>When you assign 'x' to a character, you are assigning an int to a
|>character.  The reader knows that the type mismatch was intentional.
|
|Not if he knows the C language.  A single character written within
|single quotes is a *character constant*.  This isn't an int.

If you knew the C language, you'd know that a character constant *is*
an integer.  See K&R page 17 and 35.

Mark Nagel @ UC Irvine, Dept of Info and Comp Sci
ARPA: nagel@ics.uci.edu              | Charisma doesn't have jelly in the
UUCP: {sdcsvax,ucbvax}!ucivax!nagel  | middle. -- Jim Ignatowski

chris@mimsy.UUCP (Chris Torek) (03/01/89)

>In article <10138@socslgw.csl.sony.JUNET> diamond@diamond.
>(Norman Diamond) writes:
>>When you assign 'x' to a character, you are assigning an int to a
>>character.  The reader knows that the type mismatch was intentional.

This is correct.

In article <1783@dlvax2.datlog.co.uk> scm@datlog.co.uk ( Steve Mawer ) writes:
>Not if he knows the C language.  A single character written within
>single quotes is a *character constant*.  This isn't an int.

False.  A character constant is an int (not a char) whose value is the
representation of that character in the machine's character set
(typically ASCII; EBCDIC is confused about where the []s go, and
sometimes does not have {} \ and ~, depending on which EBCDIC you
are using---there are at least three major variants).

>'\0' is a special case to permit the representation of non-graphical
>characters (also newline, tab, backslash, return, etc.) and is not
>the same as 0, which is an integer constant.

Aside from the backslash interpretation, '\0' is *not* a special case.
The type of '\0' is int and its value is zero, no matter what the
character set; this is guaranteed by the C language.  (The <type,value>
pair for an unadorned 0 is also <int,0>, although the <type,value>
pair of something like 65432 is sometimes <long,65432>.)

>It should, however, be noted that some compilers will allow the use
>of multiple characters, as in 'abcd' (which *may* work on 32 bit
>machines).  I wouldn't recommend this usage in portable software.

This is correct.  It is hard to predict whether (char)'abcd' will
be the same as (char)'a' or (char)'d' (or even b or c), so it is
best to avoid these.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gsh7w@astsun1.acc.virginia.edu (Greg Hennessy) (03/01/89)

Mark Nagel writes:
#If you knew the C language, you'd know that a character constant *is*
#an integer.  See K&R page 17 and 35.
#

Picking up the Bible (K&R) and quoting without permission from page 17

Any single character can be written between single quotes, to produce
a value equal to the numerical value of the character in the machine's
character set; this is called a {\it character constant}. So, for
example, 'A' is a character constant; in teh ASCII character set its
value is 65, the internal representation of the character A.

and page 35 says

A {\it character constant\/} is a single character written within
single quotes, as in 'x'.

end quotations of K&R ISBN 0-13-110163-3 (aka first edition).
 

-Greg Hennessy, University of Virginia
 USPS Mail:     Astronomy Department, Charlottesville, VA 22903-2475 USA
 Internet:      gsh7w@virginia.edu  
 UUCP:		...!uunet!virginia!gsh7w

guy@auspex.UUCP (Guy Harris) (03/02/89)

>>When you assign 'x' to a character, you are assigning an int to a
>>character.  The reader knows that the type mismatch was intentional.
>
>Not if he knows the C language.  A single character written within
>single quotes is a *character constant*.  This isn't an int.

Wanna bet?  The May 13, 1988 dpANS says:

	3.1.3.4 Character constants

	...

	Description

	   An integer character constant is a sequence of one or more
	multibyte characters enclosed in single quotes, as in 'x' or
	'ab'.  A wide character constant is the same except prefixed by
	the letter L. ...

	Semantics

	   An integer character constant has type "int". ...

>'\0' is a special case to permit the representation of non-graphical
>characters (also newline, tab, backslash, return, etc.) and is not
>the same as 0, which is an integer constant.

Wrong.  '\0' *is* the same as 0.

djones@megatest.UUCP (Dave Jones) (03/02/89)

From article <1783@dlvax2.datlog.co.uk>, by scm@datlog.co.uk ( Steve Mawer ):
> In article <10138@socslgw.csl.sony.JUNET> diamond@diamond. (Norman Diamond) writes:
>>
>>When you assign 'x' to a character, you are assigning an int to a
>>character.  The reader knows that the type mismatch was intentional.
> 
> Not if he knows the C language.  A single character written within
> single quotes is a *character constant*.  This isn't an int.
> 


From _The_C_Programming_Language, Kernighan and Ritchie, (a couple
guys who probably "know the C language".)

  p. 19:  "Character constant ... is just another way to write a
          small integer."

  p. 37:  "A character constant is an integer, written as one
           character within single quotes, such as 'x'."

          "Character constants participate in numeric operations
           just as any other integers..."

> '\0' is a special case to permit the representation of non-graphical
> characters (also newline, tab, backslash, return, etc.) and is not
> the same as 0, which is an integer constant.

'\0' is not a special case. It is just an instance of of the octal
escape sequences.  It is exactly the same as 0.

nagel@blanche.ics.uci.edu (Mark Nagel) (03/02/89)

In article <1207@hudson.acc.virginia.edu>, gsh7w@astsun1 (Greg Hennessy) writes:
|Mark Nagel writes:
|#If you knew the C language, you'd know that a character constant *is*
|#an integer.  See K&R page 17 and 35.

|Picking up the Bible (K&R) and quoting without permission from page 17

We're going to be that way, hmm?

|Any single character can be written between single quotes, to produce
|a value equal to the numerical value of the character in the machine's
|character set; this is called a {\it character constant}. So, for
|example, 'A' is a character constant; in teh ASCII character set its
|value is 65, the internal representation of the character A.

Instead of quoting points that say nothing, why not take a look a bit
further down on the page:

"You should note carefully that '\n' is a single character, and in
expressions is equivalent to a single integer; ..."

|and page 35 says
|
|A {\it character constant\/} is a single character written within
|single quotes, as in 'x'.

as well as:

"Character constants participate in numeric operations just as any
other numbers, although they are most often used in comparisons with
other characters."

It's not clear that you are refuting my references above, although you
have certainly left out of your quotations the relevant points.

Mark Nagel @ UC Irvine, Dept of Info and Comp Sci
ARPA: nagel@ics.uci.edu              | Charisma doesn't have jelly in the
UUCP: {sdcsvax,ucbvax}!ucivax!nagel  | middle. -- Jim Ignatowski

dhesi@bsu-cs.UUCP (Rahul Dhesi) (03/02/89)

In article <1095@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
>Wrong.  '\0' *is* the same as 0.

Isn't it possible to postulate the existence of a bizarre 1's
complement machine in which the lexical analyzer produces
binary 00000000000000000000000000000000 when it sees the
character 0, but 11111111111111111111111111111111 when it
sees the sequence '\0'?
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi
                    ARPA:  dhesi@bsu-cs.bsu.edu

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/02/89)

In article <5984@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
-In article <1095@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
->Wrong.  '\0' *is* the same as 0.
-Isn't it possible to postulate the existence of a bizarre 1's
-complement machine in which the lexical analyzer produces
-binary 00000000000000000000000000000000 when it sees the
-character 0, but 11111111111111111111111111111111 when it
-sees the sequence '\0'?

Only if all computations made with the -0 value are indistinguishable
from those made with the 0 value.  (This includes bitwise operations!)

ftw@masscomp.UUCP (Farrell Woods) (03/03/89)

In article <1783@dlvax2.datlog.co.uk> scm@datlog.co.uk ( Steve Mawer ) writes:

>A single character written within
>single quotes is a *character constant*.  This isn't an int.

Wrong!

>'\0' is a special case to permit the representation of non-graphical
>characters (also newline, tab, backslash, return, etc.) and is not
>the same as 0, which is an integer constant.

Absolutely WRONG!  I strongly suggest you try this on your favorite
compiler:

main(ac, av)
    int ac;
    char **av;
    {

    printf("repeat after me: sizeof ('x') ==  %d\n", sizeof ('x'));
    printf("                 sizeof ('\\0') == %d\n", sizeof ('\0'));
    printf("                 sizeof (int) ==  %d\n", sizeof (int));
    printf("if these are not all the same, your compiler is broken!\n");
    exit(0);
    }

Study the results carfully...
-- 
Farrell T. Woods				Voice: (508) 692-6200 x2471
MASSCOMP Operating Systems Group		Internet: ftw@masscomp.com
1 Technology Way				uucp: {backbones}!masscomp!ftw
Westford, MA 01886				OS/2: Half an operating system

guy@auspex.UUCP (Guy Harris) (03/03/89)

>Isn't it possible to postulate the existence of a bizarre 1's
>complement machine in which the lexical analyzer produces
>binary 00000000000000000000000000000000 when it sees the
>character 0, but 11111111111111111111111111111111 when it
>sees the sequence '\0'?

Yes, one could postulate that, but given that:

	1) '\0' is ASCII NUL on ASCII implementations of C (K&R I, p.
	   181);

	2) "it is guaranteed that a member of the standard character set
	   is non-negative" (K&R I, p. 183);

	3) the bit pattern you list is negative zero on a 1's complement
	   machine;

if the lexical analyzer is a lexical analyzer for a C implementation
that uses ASCII as "the standard character set", or that uses any other
character set in which '\0' is "a member of the standard character set",
the technical term for that C implementation would be "buggy".