[comp.lang.c] Passing

dhesi@bsu-cs.UUCP (Rahul Dhesi) (07/24/87)

In article <157@hobbes.UUCP> root@hobbes.UUCP (John Plocher) writes:
>+---- (Robert White) writes the following in article <166@qetzal.UUCP> ----
>| 10. Passing (char *) NULL to a function in a large model program can cause
>| a core dump/program abort. . . .
>| This blows up a LOT of things, including printf("%s", (char *) NULL)
>| (or equivalent).
>+----
>
>  WRONG!  Passing the NULL pointer is not what is wrong!  What IS wrong
>  is the ASSUMPTION that the data stored at location NULL is anything useful!
>
>  printf("%s", (char *) NULL); says print the string at address 0.  This
>  happens to work on a VAX because that machine SPECIFICALLY has set things
>  up so that the contents of location 0 is 0.  This bad coding practice
>  hits owners of Sun machines as well and is NOT a compiler bug.

I believe that the interpretation of (char *) NULL when supplied as the
actual parameter where printf is looking for a string may have changed
over the years.  The "correct" behavior today, according to ANSI C as
I know it, is for printf to print a token signifying that a NULL
pointer was passed.  Microsoft C will print the string "(null)" when
this happens.  However my System V Release 2 manual as supplied with
Microport System V/AT says that printf's behavior on a NULL pointer is
undefined.  (Then again, the UNIX C compilers are quite far away from
ANSI C, since they don't support function prototypes at all.)

I agree that passing NULL to printf is a bad idea, mostly because it's
not wise to assume that any C compiler conforms to ANSI specs,
especially since ANSI specs still aren't official.

I added comp.lang.c to the newsgroups heading, so the Real Gurus can
give us their opinion.  Let's keep followups there or, better still,
let the Gurus have the last word.
-- 
Rahul Dhesi         UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/25/87)

In article <875@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>I believe that the interpretation of (char *) NULL when supplied as the
>actual parameter where printf is looking for a string may have changed
>over the years.  The "correct" behavior today, according to ANSI C as
>I know it, is for printf to print a token signifying that a NULL
>pointer was passed.  Microsoft C will print the string "(null)" when
>this happens.

The Draft Proposed Standard for C says no such thing.
(Unless it was approved in the June 1987 Paris meeting of X3J11;
it's not in the current draft as mailed to the committee members,
nor was it in the draft that went out for public comment.)

Some implementations of printf() do indeed catch the null pointer
and do something non-random with it.  My implementation does the
same thing as MicroSoft's.  But that is a matter of quality of
implementation; it is not required by the specs.  It is an ERROR
for a programmer to pass a null pointer to printf() except for
parameters corresponding to %p format specifiers.

FLAME:  I have yet to see anything posted by Dhesi that was correct.

kyle@xanth.UUCP (Kyle Jones) (07/25/87)

First:

#define NULL 0

Well the answer to this one seems simple enough.  For each %s that appears in
the printf() conversion string, a matching argument that is a pointer to an
array of charaters should be provided.  While (char *) NULL is a pointer that
is the same size as any other character pointer, it cannot point to an array
of characters in a valid C implementation.

Therefore if you pass (char *) NULL to printf(), you are giving it an invalid
argument.  What's stored at location 0 has nothing to do with this.  In C
(object *) NULL simply cannot point to any object.  This is because (object *)
NULL must remain "distinguishable from a pointer to any object." K&R p. 192

guy%gorodish@Sun.COM (Guy Harris) (07/27/87)

> I believe that the interpretation of (char *) NULL when supplied as the
> actual parameter where printf is looking for a string may have changed
> over the years.  The "correct" behavior today, according to ANSI C as
> I know it, is for printf to print a token signifying that a NULL
> pointer was passed.

Well, you don't know ANSI C then.  I quote from the October 1, 1986 draft:

	A.6.2 Undefined behavior

	   The behavior in the following circumstances is undefined:

	...

	 + An invalid array reference, *null pointer reference*, or
	   reference to an object declared with automatic storage
	   duration in a terminated block occurs (\(sc 3.3.3.2).
	   ("italics" mine.)

As for "printf" (or, more correctly, "fprintf", since the description
of "printf" and "sprintf" refer back to the description of
"fprintf"), in its description of "fprintf" the ANSI C draft says
nothing whatsoever about its behavior when a NULL pointer is passed
as an argument to be converted with a "%s" conversion.

> Microsoft C will print the string "(null)" when this happens.

Many implementations do that, but that's just a common convention,
not any sort of requirement.

> I agree that passing NULL to printf is a bad idea, mostly because it's
> not wise to assume that any C compiler conforms to ANSI specs,
> especially since ANSI specs still aren't official.

Unless the ANSI C spec changes before it becomes official, it will
STILL be a bad idea to pass NULL to "printf".
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

ejbjr@ihlpg.ATT.COM (Branagan) (07/27/87)

> >  printf("%s", (char *) NULL); says print the string at address 0.  This
> >  happens to work on a VAX because that machine SPECIFICALLY has set things
> >  up so that the contents of location 0 is 0.  This bad coding practice
> >  hits owners of Sun machines as well and is NOT a compiler bug.

>                  The "correct" behavior today, according to ANSI C as
> I know it, is for printf to print a token signifying that a NULL
> pointer was passed.  Microsoft C will print the string "(null)" when
> this happens.

Some time ago I noticed one of our VAXes printing a NULL pointer as
"(null)".  I thought it was neat that printf now detected NULL string
pointers and did something sensible, until I found out that this only
worked because the loader stored the string "(null)" at location 0.
This apparently broke so many other programs it was backed out, and
location 0 now has a 0 in it again.   Oh well...
-- 
-----------------
Ed Branagan
ihnp4!ihlpg!ejbjr
(312) 369-7408 (work)

minow@decvax.UUCP (Martin Minow) (07/30/87)

In article <3533@ihlpg.ATT.COM> ejbjr@ihlpg.ATT.COM (Branagan) writes:
>Some time ago I noticed one of our VAXes printing a NULL pointer as
>"(null)".  I thought it was neat that printf now detected NULL string
>pointers and did something sensible, until I found out that this only
>worked because the loader stored the string "(null)" at location 0.
>This apparently broke so many other programs it was backed out, and
>location 0 now has a 0 in it again.   Oh well...
>-- 

Sorry, those programs were broke to begin with.  Some history:
Unix V6 stored a zero at memory location zero.  Thus, printf("%s", 0)
printed "" and, as always happens, people came to rely on this.
Decus C printf (which I wrote) detected a NULL argument and changed
it to "{null}" rather than print garbage or crash.  (PDP11's running Dec
operating systems loaded non-zero values at location zero.)  I assume
other developers implemented similar tests for similar reasons.

VMS doesn't map virtual locations 0 through 511 to trap these dereferencing
errors, so these bugs got shaken out of programs as they were ported to
Vax C.

You are asking for trouble if your programs expect to survive memory
accesses to random locations.

Martin Minow
decvax!minow

brianc@cognos.uucp (Brian Campbell) (08/04/87)

In article <875@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
! I believe that the interpretation of (char *) NULL when supplied as the
! actual parameter where printf is looking for a string may have changed
! over the years.  The "correct" behavior today, according to ANSI C as
! I know it, is for printf to print a token signifying that a NULL
! pointer was passed.  Microsoft C will print the string "(null)" when
! this happens.  However my System V Release 2 manual as supplied with

  Speaking of this... has anyone else noticed any problems with this in
small model Microsoft C?  I seem to be unable to print out a string that
happens to be placed at offset 0 of the data segment -- MSC's libraries
decide it is a null pointer and format it as "(null)".
-- 
Brian Campbell          uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc
Cognos Incorporated     mail: 3755 Riverside Drive, Ottawa, Ontario, K1G 3N3
(613) 738-1440          fido: sysop@163/8

guy%gorodish@Sun.COM (Guy Harris) (08/09/87)

>   Speaking of this... has anyone else noticed any problems with this in
> small model Microsoft C?  I seem to be unable to print out a string that
> happens to be placed at offset 0 of the data segment -- MSC's libraries
> decide it is a null pointer and format it as "(null)".

Either the representation of a null pointer in Microsoft C small
model is a pointer to offset 0 in the data segment, or it is not.  If
it is not, then there is a bug in their library; it is improperly
deciding that the pointer is null.

If it *is*, however (which I suspect it is), then a string that
"happens" to be placed at offset 0 of the data segment also "happens"
not to be a valid C object!  In a valid C implementation, *no* object
may have an address that compares equal to a null pointer.

Either Microsoft C allowed it to be placed there, in which case
*that* is the bug, or somebody got around whatever mechanism
Microsoft C provides to ensure that no C object gets put at that
location, in which case they got what they deserved.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

cramer@kontron.UUCP (Clayton Cramer) (08/11/87)

> In article <875@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
> ! I believe that the interpretation of (char *) NULL when supplied as the
> ! actual parameter where printf is looking for a string may have changed
> ! over the years.  The "correct" behavior today, according to ANSI C as
> ! I know it, is for printf to print a token signifying that a NULL
> ! pointer was passed.  Microsoft C will print the string "(null)" when
> ! this happens.  However my System V Release 2 manual as supplied with
> 
>   Speaking of this... has anyone else noticed any problems with this in
> small model Microsoft C?  I seem to be unable to print out a string that
> happens to be placed at offset 0 of the data segment -- MSC's libraries
> decide it is a null pointer and format it as "(null)".
> -- 
> Brian Campbell          uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc

I would love to know how you get a string at DS:0 with Microsoft C --
they reserve something like 16 bytes starting at DS:0 so that at end of
job, they can check for nil pointer reference writes by checking to
see if any of these bytes have been altered.

Clayton E. Cramer

atbowler@orchid.UUCP (08/14/87)

In article <24247@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>Unless the ANSI C spec changes before it becomes official, it will
>STILL be a bad idea to pass NULL to "printf".

I certainly hope that the standard does not ever do that.  Sun's approach
of faulting gets bugs in programs detected much earlier in the
development cycle.  Printing "(null)" is roughly equivalent
to quietly mapping an out of bound subscript to zero.  It "cures"
the problem of programs faulting, instead they just quietly
screw up.

gary@apex.UUCP (Gary Wisniewski) (08/15/87)

In article <25173@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>>   Speaking of this... has anyone else noticed any problems with this in
>> small model Microsoft C?  I seem to be unable to print out a string that
>> happens to be placed at offset 0 of the data segment -- MSC's libraries
>> decide it is a null pointer and format it as "(null)".
> [ ... comments about NULLness of pointers ... ]
>Either Microsoft C allowed it to be placed there [ds:0], in which case
>*that* is the bug, or somebody got around whatever mechanism
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>Microsoft C provides to ensure that no C object gets put at that
>location, in which case they got what they deserved.
>	Guy Harris

Code at the beginning of DS is an unlikely place to find user strings.
Microsoft places a small unchanging bit of junk there which it keeps
a checksum on.  If that section is written to during program execution,
MSC prints out "Error 2000: Null Pointer Assignment" when the program
terminates (the message may not be quite right---you get the idea).

This is actually a helpful technique in lieu of hardware address fault
checking.  It is also nice that NULL pointer references don't destroy
anything of great importance.

The original sender (name not known) seems to have reorganized or rewritten
the stock MSC runtime library routines.  I can't think of any other way
to get useful code at DS:0.  However, the tone of their comments suggests
that the NULL behavior is a mystery to them (anyone who had rewritten
runtime libraries would doubtless understand the formalities of segment
usage under MSC!)---Hmmmm.

Gary Wisniewski
{allegra,bellcore,cadre}!pitt!darth!apex!gary

henry@utzoo.UUCP (Henry Spencer) (08/16/87)

> ...Sun's approach
> of faulting gets bugs in programs detected much earlier in the
> development cycle.  Printing "(null)" is roughly equivalent
> to quietly mapping an out of bound subscript to zero...

It depends on what you are after.  For debugging work in particular, printing
"(null)" is a significant convenience.  As for detecting bugs early... don't
you *look* at the printed output?!?

I have no strong feelings either way on the issue, actually -- the debugging
issue can be handled with '#define NONNULL(s) ((s != NULL) ? s : "(null)")'
and a bit of care -- but the justification offered strikes me as bogus.
-- 
Support sustained spaceflight: fight |  Henry Spencer @ U of Toronto Zoology
the soi-disant "Planetary Society"!  | {allegra,ihnp4,decvax,utai}!utzoo!henry

guy@gorodish.UUCP (08/16/87)

> > ...Sun's approach
> > of faulting gets bugs in programs detected much earlier in the
> > development cycle.  Printing "(null)" is roughly equivalent
> > to quietly mapping an out of bound subscript to zero...
> 
> It depends on what you are after.  For debugging work in particular, printing
> "(null)" is a significant convenience.  As for detecting bugs early... don't
> you *look* at the printed output?!?

Sun's approach of faulting when dereferencing a null pointer (which was, I
believe, not done to catch bugs, but was a conseqence of some UNIX
implementation on Suns - although the fact that dereferencing NULL in the
*kernel* on Sun-3s was, I believe, done to catch bugs, which it did) does
detect bugs in places other than "printf" calls, where looking at the output
may not help.

However, our "printf" doesn't dereference null pointers:

	gorodish$ cat foo.c
	main()
	{
		printf("%s\n", (char *)0);
	}
	gorodish$ cc foo.c
	gorodish$ ./a.out
	(null)
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

drew@geac.UUCP (Drew Sullivan) (08/18/87)

If the person is writting mixed C and assembler, then this can happen
very easily.  
	I in my initial mis-understanding of what was going on
wrote:
_data	segment word public 'DATA'
_data	ends
	assume	ds:_data
rather than
_data segment word public 'DATA'
_data ends
dgroup	group _data
	assume	ds:dgroup

The first caused my assembler data to be offset relative to these extra
8 bytes and hence had all of my offset incorrect.

That coupled with being a TSR (Dos Pop-up) routine made it very hard
to debug.  (Oh for the want of a real OS)

kent@xanth.UUCP (Kent Paul Dolan) (08/18/87)

[I just _know_ this is gonna get me flamed.  Line eater, eat this posting!]

Is it just me, or has anyone else noticed how C is being warped, made
ugly and complex, because ONE microprocessor manufacturer couldn't see
fit to provide a flat, linear address space?  How about if we just
write off the 8086 and 80286, and the usual mode of the 80386, and
design C as if pointers were NOT structures, there was only ONE
address zero, and so on?  If someone _has_ to use those bizzare devices,
let him invent a compiler that reads "virtual machine C" (i.e., flat
address space) and kludges it for the weird case, but let the standard
C language treat the machine as if designed by a rational being.

I don't think it is a useful endeavor to labor at such length to embed
the hardware design mistakes of the past in the C standards of the
future.

[Incoming!  I'm _allergic_ to asbestos long johns, and breathing
asbestos lint (asbestos lint: the only way to find all the subtle
usage errors in flames ;-) is really bad for my lungs!  I think I'll
invent the multi-layer metal-layer insulating set of long johns.  Pat.
Pending]

Kent, the man from xanth.

rmtodd@uokmax.UUCP (Richard Michael Todd) (08/26/87)

In article <2163@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>[I just _know_ this is gonna get me flamed.  Line eater, eat this posting!]
>Is it just me, or has anyone else noticed how C is being warped, made
>ugly and complex, because ONE microprocessor manufacturer couldn't see
>fit to provide a flat, linear address space?  How about if we just
>write off the 8086 and 80286, and the usual mode of the 80386, and
>design C as if pointers were NOT structures, there was only ONE
>address zero, and so on?  If someone _has_ to use those bizzare devices,
>let him invent a compiler that reads "virtual machine C" (i.e., flat
>address space) and kludges it for the weird case, but let the standard
>C language treat the machine as if designed by a rational being.
I have a PC with one of those processors in it, and I program in C, and
I fail to see just what is being 'perverted' in C on the 8088.  As I
understand it, it has never been guaranteed that pointers will fit into an
int; such code is non-portable to other machines besides PCs with their wacky
memory models.  
  I vaguely recall that the original discussion was on something like
problems with Microsoft C because some valid data object was placed at address
0.  Well, Microsoft C may be broke, but Aztec C's linker makes sure that
none of the user's data has address 0 in the data segment.  (From what I've
heard, Microsoft C is broke in lots of other ways, too :-(). 
  As for the nonsense with declaring 'near' and 'far' pointers for the
mixed memory models, I don't like it much, either.  It's a pure efficiency
hack;  a program can be written entirely with the large data model, but it'll
be slower than the 'optimized' version with mixed-memory model.  Frankly, I've
*never* done anything where I felt I needed it. 
  Is there something else here I'm missing that causes problems with C
on the 8088?  Frankly, while I don't *like* the 8088 segmentation setup, and
I doubt anyone else much does either, it doesn't seem to cause that much
trouble. 
  Now that that's over with, I'll put forth my flame against a certain
characteristic of MSDOS that *I* feel is perverting C, and which I have found
to be much more of a pain in the rear end than any memory-model nonsense.
>>>>>>>>>FLAME ON<<<<<<<<<
I refer to the wonderful "feature" of MSDOS that has all text files in a
format where the lines are terminated by CR and LF instead of just LF alone.
This causes great problems with the C library, which was designed around the
implicit assumption that lines are terminated with one character only, called
'\n'.  This requires all sorts of disgusting kluges.  The two main approaches
seem to be the Lattice/Microsoft approach of requiring the file to be opened
in a special "binary" mode if the automatic CRLF->newline mapping is to be
disabled, and the Aztec approach of having the I/O function used determine
whether mapping is done.  Under the Aztec approach, functions such as
read(),write(),fread(),and fwrite() do binary (untranslated) I/O, whereas
fprintf(),fscanf(),fputs(), fgets() do ASCII I/O.  There are functions
getc() and putc() for single byte binary I/O, and agetc() and aputc()
for single character ASCII I/O.  Getchar() and putchar() are #defined in
terms of agetc() and aputc().  I prefer the Aztec approach, because it seems
to do what is 'right' most of the time; most of the time you mess with binary
files thru read() and write(), and with text files thru gets(),puts(), etc.
If you need to do text I/O on single characters from all files in your program,
you can just put
	#ifdef AZTEC
	#define getc agetc
	#define putc aputc
	#endif
in a header file somewhere.  I prefer the Aztec approach, but alas the other
approach seems to be the one blessed by the ANSI standards committee. 
  But, in case you've missed the point in all this digression about I/O
functions, NONE OF THIS GARBAGE WOULD BE NECESSARY if the standard text 
format was designed more sensibly.  In fact, except for the built-in device
drivers for the console and printer in MSDOS and the built-in TYPE command,
files with LF-only-terminated lines work just fine on my system. This is
due to a neat feature of the Aztec C I/O routines:  When reading in text data,
they do CRLF->LF mapping in the most obvious way--simply throw out any CRs
in the input.  This means that if the file is already in Unix-style LF-only
format, the I/O routines handle it with no problem whatsoever.  My copies
of MicroEMACS and Less, compiled with Aztec C, handle Unix-style files with
no problem at all, as will the C compiler itself (evidently Aztec uses their
own compiler). It's only the MSDOS commands and drivers that require the
CRLF.
  So, in analogy with Kent's question, I ask:  Is the C library going to
continually be perverted to accommodate braindead operating systems of
the past that can't handle text files correctly?
>>>>>>>Flame off<<<<<<<<
  (Yes, I suspect there will continue to be altered I/O libraries that
accommodate the weirder operating systems out there.  But I don't have
to like it much.  Oh well.  Anyone for XENIX? :-)
--------------------------------------------------------------------------
Richard Todd
USSnail:820 Annie Court,Norman OK 73069
UUCP: {allegra!cbosgd|ihnp4}!occrsh!uokmax!rmtodd

jfjr@mitre-bedford.arpa (08/27/87)

  I'd like to add my fist to MSDOS C bashing (MSDOS bashing even)
I have programmed on MSDOS machines using C, Turbo Pascal and 
MASM and everytime I have hit my head on problems with the
text file format and/or standard input redirection/inheritance.
I could go on for ever and produce simple Turbo Pascal (~5 liners)
whose behaviour will astonish the most hardened programmer,
I NEVER use the standard input functions in MSDOS C because
they are useless/unpredictable GRRRRRrrrr...

                              Jerry Freedman
                              jfjr@mitre-bedford.arpa