[net.lang.c] how has C bitten you?

ark@alice.UUCP (Andrew Koenig) (07/26/85)

I am collecting examples of how C can bite the unwary user.

Consider, for example, the following program fragment:

	int i, a[10];
	for (i = 0; i <= 10; i++)
		a[i] = 0;

On many implementations, this will result in an infinite loop.

If you have any good examples, and you don't mind my re-publishing
them (with attribution), send them along!



				Andrew Koenig
				AT&T Bell Laboratories
				600 Mountain Avenue
				Murray Hill NJ 07974
				research!ark      (or alice!ark)

matt@prism.UUCP (07/30/85)

> /* Written  8:36 pm  Jul 25, 1985 by ark@alice in prism:net.lang.c */
> /* ---------- "how has C bitten you?" ---------- */
> 
> I am collecting examples of how C can bite the unwary user.
> 
> Consider, for example, the following program fragment:
> 
> 	int i, a[10];
> 	for (i = 0; i <= 10; i++)
> 		a[i] = 0;
> 
> On many implementations, this will result in an infinite loop.
> 
> /* End of text from prism:net.lang.c */

This looks to me like it will simply overwrite one int's worth of
memory beyond the end of the array "a" with the value 0.  Granted,
depending on what happens to be after "a", this can have disastrous
results, but is there really an implementation in which it will
(reliably) lead to infinte looping?

On the other hand, in an implementation where char's are unsigned,
this common construct WILL lead to an unterminating loop.  I have
been bitten by this several times porting code that assumed signed
characters to implementation of C without them.  

	char	x;

	while (--x)
	{	do anything...
		and then some...
	}

I sure wish that while the ANSI committee was adding "signed" to the
language, they had standardized whether the default for "char" was
signed or unsigned.  As long as compilers have to provide them both
anyway, what's the harm in choosing one as the default?  (Well,
maybe the C programming community will eschew the use of "char" and
always use either "signed char" or "unsigned char" as appropriate.
Wanna bet?)

-----------------------------------------------------------------------------
 Matt Landau            {cca, datacube, ihnp4, inmet, mit-eddie, wjh12}...
 Mirror Systems, Inc.                                   ...mirror!prism!matt
 Cambridge, MA		(617) 661-0777
-----------------------------------------------------------------------------
 "Replace this mandolin with your wombat..."

lcc.niket@LOCUS.UCLA.EDU (Niket K. Patwardhan) (07/30/85)

Andrew:
      Regarding

	int i,a[10];
	for(i=0; i<=10; i++)
		a[i] = 0;

you should have expected some problems, as you are writing one past the end of
the array! The correct test to use is < not <=!

ark@alice.UUCP (Andrew Koenig) (07/31/85)

> Andrew:
>      Regarding

>	int i,a[10];
>	for(i=0; i<=10; i++)
>		a[i] = 0;

> you should have expected some problems, as you are writing one past the end of
> the array! The correct test to use is < not <=!

Yes, I know that!

The point is that this is an example of something that looks reasonable
at first glance and isn't, because of a property that C does not
share with many other languages (in most languages, a 10-element
array has an element #10).

Read my article again.  I am looking for examples for my collection,
not asking for advice on how to solve this particular problem.

john@frog.UUCP (John Woods) (07/31/85)

My favorite ouch is the following:

	if ( thingy_bits & TEST_ME == 0) {
	}

"When in doubt, parenthesize." -Kernighan and Plaugher

--
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA

peter@kitty.UUCP (Peter DaSilva) (08/01/85)

> I am collecting examples of how C can bite the unwary user.
> 
> Consider, for example, the following program fragment:
> 
> 	int i, a[10];
> 	for (i = 0; i <= 10; i++)
> 		a[i] = 0;
> 
> On many implementations, this will result in an infinite loop.

I assume you mean that auto's are allocated on the stack so &a[10]==&i.
I don't see an easy solution to this, except for built-in range checking.
I think "Safe/C" has this...

Anyone who uses "<=" in a for(;;) loop to initialise an array should be
strung up by their index(3) fingers and forced to listen to Sonny Bono
chanting "Zero Origin Arrays" until their ears fall off [:->].

david@ecrhub.UUCP (David M. Haynes) (08/02/85)

One of my all time favourites is the non-orthagonality between
scanf and printf. Especially the following:

	scanf("%D %F", long, double); or
	scanf("%ld %lf", long, double);
vs.
	printf("%ld %f", long, double);

Why no %F or %D on printf?
And why %lf vs %f? fun!

-- 
--------------------------------------------------------------------------
						David M. Haynes
						Exegetics Inc.
						..!utzoo!ecrhub!david

"I am my own employer, so I guess my opinions are my own and that of
my company."

chris@umcp-cs.UUCP (Chris Torek) (08/02/85)

>> 	int i, a[10];
>> 	for (i = 0; i <= 10; i++)
>> 		a[i] = 0;
>> 
>> On many implementations, this will result in an infinite loop.

>This looks to me like it will simply overwrite one int's worth of
>memory beyond the end of the array "a" with the value 0.  Granted,
>depending on what happens to be after "a", this can have disastrous
>results, but is there really an implementation in which it will
>(reliably) lead to infinte looping?

How does "every PCC implementation" grab you?  (Actually, I suspect
there may three or four PCC implementations in which it won't run
forever, but it *will* run forever on 4BSD Vaxen.)

>On the other hand, in an implementation where char's are unsigned,
>this common construct WILL lead to an unterminating loop.
>
>	char	x;
>	while (--x)

I assume you mean "while (--x >= 0)".  I only use this on "register
int"s (especially since it generates a sobgeq if the loop's small
enough).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

bright@dataio.UUCP (Walter Bright) (08/03/85)

In article <5400010@prism.UUCP> matt@prism.UUCP writes:
>I sure wish that while the ANSI committee was adding "signed" to the
>language, they had standardized whether the default for "char" was
>signed or unsigned.  As long as compilers have to provide them both
>anyway, what's the harm in choosing one as the default?

The reason why some compilers default to signed chars and others
default to unsigned can be found in the instruction set of the underlying
machine. Some machines support signed chars easier than unsigned ones,
and vice versa. Some examples:

	The 8088 can only sign extend a byte to a word from the AL register,
	whereas a zero extension can cheaply be done from the AL,BL,CL
	or DL register. Thus, most 8088 C compilers default to unsigned
	chars.

	The pdpll, when reading a byte from memory, automatically sign
	extends the byte. Thus, implementing unsigned chars costs an
	extra mask instruction for each char read. Not suprisingly,
	chars default to being signed.

For most applications, it doesn't matter whether a char is signed or
not, and so it is appropriate for the compiler to select that which
can be implemented most efficiently. When it does matter, the programmer
should take care (by explicitly declaring it as signed or unsigned).

mjs@eagle.UUCP (M.J.Shannon) (08/03/85)

Then of course is the obvious:

	if (fd = open("/etc/passwd", 0) == -1)
		panic("No password file?");

This, too goes in the category of "when in doubt, parenthesize".
-- 
	Marty Shannon
UUCP:	ihnp4!eagle!mjs
Phone:	+1 201 522 6063

Warped people are throwbacks from the days of the United Federation of Planets.

qwerty@drutx.UUCP (Brian Jones) (08/05/85)

> One of my all time favourites is the non-orthagonality between
> scanf and printf. Especially the following:
> 
> 	scanf("%D %F", long, double); or
> 	scanf("%ld %lf", long, double);
> vs.
> 	printf("%ld %f", long, double);
> 
> Why no %F or %D on printf?
> And why %lf vs %f? fun!
> 
> -- 
> --------------------------------------------------------------------------
> 						David M. Haynes
> 						Exegetics Inc.
> 						..!utzoo!ecrhub!david
> 
> "I am my own employer, so I guess my opinions are my own and that of
> my company."

scanf can be given a pointer to any data type:
	char (string)
	int,
	long,
	float,
	double;

When you put arguments on stack, expansion rules are followed.

	char => int
	float => double

So, printf can never get a float as an argument, it always gets a double.
Therefore, %lf or %F are meaningless to printf.

Note that printf does support %d and %ld, and will happily screw up if
there is a disagreement between the args and their specification in the
format string. ie. %d given a long arg, or %ld given a short. (machine
dependent!!).
-- 

Brian Jones  aka  {ihnp4,}!drutx!qwerty  @  AT&T-IS

preece@ccvaxa.UUCP (08/05/85)

> > 	int i, a[10];
> > 	for (i = 0; i <= 10; i++)
> > 		a[i] = 0;
> > 

> This looks to me like it will simply overwrite one int's worth of
> memory beyond the end of the array "a" with the value 0.  Granted,
> depending on what happens to be after "a", this can have disastrous
> results, but is there really an implementation in which it will
> (reliably) lead to infinte looping?
----------
Yes.  Any implementation that allocates the space for i following the
space for a.

tim@callan.UUCP (Tim Smith) (08/06/85)

> > Consider, for example, the following program fragment:
> > 
> > 	int i, a[10];
> > 	for (i = 0; i <= 10; i++)
> > 		a[i] = 0;
> > 
> > On many implementations, this will result in an infinite loop.
> 
> This looks to me like it will simply overwrite one int's worth of
> memory beyond the end of the array "a" with the value 0.  Granted,
> depending on what happens to be after "a", this can have disastrous
> results, but is there really an implementation in which it will
> (reliably) lead to infinte looping?
> 
The UniSoft System V C compiler for the 68k will reliably produce an
infinite loop here.  Note that i and a[] are both on the stack.  This
is what you get: ( high address higher up on page )

	i:	4 bytes
	a[9]:	4 bytes
	.
	.
	.
	a[0]	4 bytes

a[10] will overwrite i.
-- 
					Tim Smith
				ihnp4!{cithep,wlbr!callan}!tim
661

bet@ecsvax.UUCP (Bennett E. Todd III) (08/06/85)

In article <243@ecrhub.UUCP> david@ecrhub.UUCP (David M. Haynes) writes:
>One of my all time favourites is the non-orthagonality between
>scanf and printf. Especially the following:
>
>	scanf("%D %F", long, double); or
>	scanf("%ld %lf", long, double);
>vs.
>	printf("%ld %f", long, double);

Interesting. The mismatch in formatting arguments was something I had
never noticed; I was always amused by  this one instance where C's
call-by-value catches every C programmer, at least once. (I have never
heard anybody claim to have never been bitten by this one -- and it's
worst for those who had heard of it before it bit them.)

	printf("%d", i);

seems to cause people to want to try

	scanf("%d", i);

After you have been bitten once or twice you get really paranoid about
making sure you pass the *address* of i, not its value:

	scanf("%d", &i);

I am certain that this belongs on the list of all-time most popular
blunders.

-Bennett
-- 

"Some people are lucky; the rest of us have to work at it."

Bennett Todd -- Duke Computation Center, Durham, NC 27706-7756; (919) 684-3695
 ...{decvax,seismo,philabs,ihnp4,akgua}!mcnc!ecsvax!bet or dbtodd@tucc.BITNET

david@ecrhub.UUCP (David M. Haynes) (08/07/85)

>> One of my all time favourites is the non-orthagonality between
                                        ^^^^^^^^^^^^^^^^^
>> scanf and printf. Especially the following:
>> 
>> 	scanf("%D %F", long, double); or
>> 	scanf("%ld %lf", long, double);
>> vs.
>> 	printf("%ld %f", long, double);
>> 
>> Why no %F or %D on printf?
>> And why %lf vs %f? fun!
>> 
>
>scanf can be given a pointer to any data type:
>	char (string)
>	int,
>	long,
>	float,
>	double;
>
>When you put arguments on stack, expansion rules are followed.
>
>	char => int
>	float => double
>
>So, printf can never get a float as an argument, it always gets a double.
>Therefore, %lf or %F are meaningless to printf.
>
>Note that printf does support %d and %ld, and will happily screw up if
>there is a disagreement between the args and their specification in the
>format string. ie. %d given a long arg, or %ld given a short. (machine
>dependent!!).
>Brian Jones  aka  {ihnp4,}!drutx!qwerty  @  AT&T-IS

Yes, I did realize that, but (and this is where the show really starts..)
the problem I reported is one of NON-ORTHAGONALITY not implementation.
Your explanation is quite correct, but why should I (as a programmer)
have to worry about translation to stack? Why doesn't printf take 
%F and %D and translate for me so that the orthagonality of the two
system calls (which are considered by most to be related functions)
is the same?

B.T.W. This originally was in response to the "How has C bitten you?" question
but has digressed at this point. Apologies and I'll mail further 
discussion directly.

-- 
--------------------------------------------------------------------------
						David M. Haynes
						Exegetics Inc.
						..!utzoo!ecrhub!david

"I am my own employer, so I guess my opinions are my own and that of
my company."

mbarker@BBNZ.ARPA (Michael Barker) (08/08/85)

...omitted

>So, printf can never get a float as an argument, it always gets a double.
>Therefore, %lf or %F are meaningless to printf.
>
>Brian Jones  aka  {ihnp4,}!drutx!qwerty  @  AT&T-IS

Brian (et al) - the reasoning is correct, but printf could easily be changed to
accept %lf or %F (or any useful convention) as formatting directions for a
value with the knowledge that the value will *actually* be a double.  Let's try
to avoid letting the implementation details run rough-shod over the
abstraction.

In this case, the original poster indicated that the mnemonics are incomplete
(you can't match up the type of variable and the formatting string in all
cases).  I think this is a very valid point.  The fact that the implementation
of printf will receive both types of variables as double shouldn't stop us from
providing a complete set of mnemonics.

	"The sleep of reason produces monsters"
	mike

ARPA: mbarker@bbnz.ARPA
UUCP: harvard!bbnccv!mbarker

roy@phri.UUCP (Roy Smith) (08/10/85)

	Here's one that just got me:

		if (sv > score);   <----- note extraneous semi-colon
			score = sv;

	This was in a series of computations which gave various scores; the
fragment above was repeated in various places to pick out the maximum.  Of
course, the test is a no-op and the assignment was always done.  Naturally,
this passes lint (even with the -h flag which uses "heuristic tests to
attempt to intuit bugs") without any complaint.
-- 
Roy Smith <allegra!phri!roy>
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

atbowler@watmath.UUCP (Alan T. Bowler [SDG]) (08/11/85)

In article <505@brl-tgr.ARPA> mbarker@BBNZ.ARPA (Michael Barker) writes:
>>So, printf can never get a float as an argument, it always gets a double.
>>Therefore, %lf or %F are meaningless to printf.
>>
>>Brian Jones  aka  {ihnp4,}!drutx!qwerty  @  AT&T-IS
>
>Brian (et al) - the reasoning is correct, but printf could easily be changed to
>accept %lf or %F (or any useful convention) as formatting directions for a
>value with the knowledge that the value will *actually* be a double.  Let's try
I thought that the implicit promotion of float to double on passing
an argument was one of the things that was going away with the
new C standard.   It certainly has been high on my personal hit list.
  I grant that there was a reasonable case for it when C was just for
PDP-11's.  But these days when there is a good possibility that
floating point is being handled by a software implementation of
the IEEE standard, it is a loser.  In this situation the conversion
between float and double is a reasonably expensive operation,
and really should only be done when the programmer explicitly asks for it.

ken@turtlevax.UUCP (Ken Turkowski) (08/12/85)

In a previous article Brian Jones writes:
>>So, printf can never get a float as an argument, it always gets a double.
>>Therefore, %lf or %F are meaningless to printf.

PLEASE, don't use %F, when you can use %lf, and similarly for %E, %G,
%X, etc.

The biggest mistake in the implementation of printf is a disregard to
the standard in outputting hexadecimal and e-type output.  In the rest
of the programming world, hexadecimal is output as (for example):

	10AD            rather than             10ad

and floating-point e-type output as:

	3.1415926E+00   rather than             3.141592654e+00

Some implementations of printf intrepret %E and %G to mean "use 'E'
rather than 'e'".  Similarly, %X means "use the character set
[0123456789ABCDEF] rather than [0123456789abcdef] to print hexadecimal
numbers."  If you want to print out a long using cap hex, you would
use the format specifier "%lX".

Does anyone know what the proposed ANSI standard says about this?
-- 

Ken Turkowski @ CADLINC, Menlo Park, CA
UUCP: {amd,decwrl,hplabs,nsc,seismo,spar}!turtlevax!ken
ARPA: turtlevax!ken@DECWRL.ARPA

mouse@mcgill-vision.UUCP (der Mouse) (08/13/85)

>	scanf("%D %F", long, double);
>	scanf("%ld %lf", long, double);
[should be &long, &double in both cases]
> vs.
>	printf("%ld %f", long, double);

> Why no %F or %D on printf?

Good question.  Belongs there.  So don't use it in scanf and there's no
problem.

> And why %lf vs %f? fun!

Disclaimer first:  What I say here is based on my hacking on a VAX.
Lots of my comments may well be invalid elsewhere.

The C compiler produces exactly the same code for
	printf(format,long,double)
as
	printf(format,long,float)

Remember in K&R how all floats are converted to doubles all the time?  This
also happens in function calls.  Printf may support %lf; I haven't checked.
But it would necessarily be treated exactly the same as %f because of this
extension.  Scanf does not have the same problem (feature?) because you pass
a pointer, you don't pass the value directly.

By the way (this is very VAX-dependent), you can scanf into a double and tell
scanf it's a float (use %f rather than %lf).  This works because the first 4
bytes of a double form a valid float.  The extra precision will be unchanged,
but for user input, the data generally isn't that precise anyway.
-- 
					der Mouse
				System hacker and general troublemaker
				CVaRL, McGill University

Hacker: One responsible for destroying /
Wizard: One responsible for recovering it afterward

guy@sun.uucp (Guy Harris) (08/13/85)

> Does anyone know what the proposed ANSI standard says about (%X meaning
> "print hexadecimal with capital A-F" instead of "print a "long" in
> hexadecimal with lower-case a-f", and likewise for %E and %G)?

It agrees with Systems III and V, Sun 4.2BSD, and, I believe, 4.3BSD - %X
means print an "int" in hex with capital A-F.

(Note that if you use %D, put a number followed by something else using a
%<something> format, and put your code under SCCS, you get a *big* surprise
- %D% gets expanded into the date...)

	Guy Harris

jmoore@mips.UUCP (Jim Moore) (08/13/85)

> 
> 	Here's one that just got me:
> 
> 		if (sv > score);   <----- note extraneous semi-colon
> 			score = sv;
>	....
> -- 
> Roy Smith <allegra!phri!roy>
> System Administrator, Public Health Research Institute
> 455 First Avenue, New York, NY 10016

I have seen this bug many times, especially in code written by people
who routinely switch programming languages. It does seem that the compiler
should warn that that test is a no-operation. The problem in general is
that there are 2 copies of the same information: the control flow of 
the program. The compilers copy is contained strictly in the syntax
of the program, while the programmers copy is more loosely defined by
program layout conventions. It is strictly up to the programmer to keep
the 2 copies in sync in some situations. There was a paper given at 
a USENIX (Toronto?) describing an experiment with different program layout
techniques. The programs were written without any explicit grouping brackets,
and were specified by the layout and indentation. A program filter would
add all the required brackets and buzzard wings before feeding it to the
compiler.

Jim Moore
MIPS Computer Systems
Mountain View, Ca	[ucbvax | decvax]!decwrl!mips!jmoore

mash@mips.UUCP (John Mashey) (08/13/85)

> > 	Here's one that just got me:
> > 
> > 		if (sv > score);   <----- note extraneous semi-colon
> > 			score = sv;
> I have seen this bug many times, especially in code written by people
> who routinely switch programming languages....  There was a paper given at a
> USENIX (Toronto?) describing an experiment with different program layout
> techniques. The programs were written without any explicit grouping brackets,
> and were specified by the layout and indentation. A program filter would
> add all the required brackets and buzzard wings before feeding it to the
> compiler.

As I recall, there was a related bug in MERT, way back, of the form:

	if (something)
		stmt1;
		stmt2;
		stmt3;
where the the {}'s were "invisible".

The one I always remember most of the C bites was the truly infamous
bug in chksum in uucp/pk0.c.  (This was actually a code bug, masked by
bug in VAX compiler and irrelevant on 16-bit machines; it caused almost
every 68K port (that used the MIT C compiler, anyway) to break uucp, in that
the 68Ks could talk to each other, but not to VAXEn or 16-bit machines).
The bug was in lines of code that looked like:
	short s;
	unsigned short t;
	...
	if ((unsigned) s <= t) ...
where they really meant  if ((unsigned short)s <= t).
The VAX did (incorrectly) a 16-bit compare, rather than all of the
correct conversions. I'd call this a C bite, simply because psychologically,
it "feels" like (unsigned) type should mean (unsigned type) type,
although it clearly does not.
-- 
-john mashey
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash
DDD:  	415-960-1200
USPS: 	MIPS Computer Systems, 1330 Charleston Rd, Mtn View, CA 94043

henry@utzoo.UUCP (Henry Spencer) (08/14/85)

> I thought that the implicit promotion of float to double on passing
> an argument was one of the things that was going away with the
> new C standard. ...

Not quite.  Making it go away would break many, many programs.  What has
actually happened is a bit more complex.  Implicit float->double in most
contexts is now at the compiler's discretion, i.e. a compiler for a Cray
would probably opt to do it only if asked.  Function calls are messier.
If there's a function prototype in scope, then conversions get done to the
types in the prototype, so your function prototypes can all just say "float"
for the parameters in question and there will be no implicit widening to
double.  If there is *no* function prototype in scope, or if the prototype
ends with the "..." syntax for a variable-length argument list, then the
old behavior still applies and floats widen to double.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

tps@sdchema.UUCP (Tom Stockfisch) (08/14/85)

<>
roy@phri.UUCP (Roy Smith) writes:

>	Here's one that just got me:
>
>		if (sv > score);   <----- note extraneous semi-colon
>			score = sv;

This type of error is easy to find with cb(1), which indents your code
according to its logic.  The above fragment is turned by cb into

		if (sv > score);
		score = sv;

cb is particularly useful if you have macro functions, as these can easily
cause unexpected control-of-flow problems and are expanded on one long
line.  I often do

		cc -E prog.c | cb | cat -s

The -E flag just runs the preprocessor, and
the cat -s is to get rid of the masses of white space which lines like
"#include <stdio.h>" cause.

				-- Tom Stockfisch

meissner@rtp47.UUCP (Michael Meissner) (08/15/85)

In article <860@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes:
>
>Some implementations of printf intrepret %E and %G to mean "use 'E'
>rather than 'e'".  Similarly, %X means "use the character set
>[0123456789ABCDEF] rather than [0123456789abcdef] to print hexadecimal
>numbers."  If you want to print out a long using cap hex, you would
>use the format specifier "%lX".
>
>Does anyone know what the proposed ANSI standard says about this?
>
	ANSI requires this behavior (as does system III, V, V.2, IEEE P1003,
	and /usr/group).
--
	Michael Meissner
	Data General
	...{ ihnp4, decvax }!mcnc!rti-sel!rtp47!meissner

mike@whuxl.UUCP (BALDWIN) (08/15/85)

> The biggest mistake in the implementation of printf is a disregard to
> the standard in outputting hexadecimal and e-type output.  In the rest
> of the programming world, hexadecimal is output as (for example):
> 
> 	10AD            rather than             10ad
> 
> and floating-point e-type output as:
> 
> 	3.1415926E+00   rather than             3.141592654e+00
> 
> Some implementations of printf intrepret %E and %G to mean "use 'E'
> rather than 'e'".  Similarly, %X means "use the character set
> [0123456789ABCDEF] rather than [0123456789abcdef] to print hexadecimal
> numbers."  If you want to print out a long using cap hex, you would
> use the format specifier "%lX".
> 
> Does anyone know what the proposed ANSI standard says about this?

April 30 X3J11C uses %x -> "abcdefg", %X -> "ABCDEFG" %e -> "e",
%E -> "E", %g -> "e", %G -> "E".
-- 
						Michael Baldwin
						AT&T Bell Labs
						harpo!whuxl!mike

conrad@ucsfcca.UUCP (Conrad Huang) (08/15/85)

This one got me:

foo(a, b)
int	a[16], b[16];
{

	bcopy((char *) a, (char *) b, sizeof a);
	...
}

'sizeof a' is, of course, 4 (here).

					Eric

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (08/15/85)

> 		if (sv > score);   <----- note extraneous semi-colon
> 			score = sv;

This sort of thing makes me think that a few extra keywords
are called for programming languages like this.  E.g.
	if <bool_expr> then <stmt> fi
	while <bool_expr> do <stmt> od
Something to keep in mind when you design an Algol-like language.

greenber@timeinc.UUCP (Ross M. Greenberg) (08/15/85)

One that has bitten me on more occasions than I'm willing to
admit is difference in precedence operations:

Imagine transmitting a two byte checksum:

putc(highbyte, fd);
putc(lowbyte, fd);

and then reading it on the other side:

crc = (getch(fd) * 256) + getch(fd);


Different machines (compilers) do the two getch's in different orders.

-- 
------------------------------------------------------------------
Ross M. Greenberg  @ Time Inc, New York 
              --------->{vax135 | ihnp4}!timeinc!greenber<---------

I highly doubt that Time Inc.  would make me their spokesperson.
---

rbp@investor.UUCP (Bob Peirce) (08/16/85)

Here's one that trapped me this week.  It took much head scratching
and debug prints to figure it out.

int dial(telno)
char *telno;
{
	if(telno){		/*  should be if(*telno)  */
		dial it;
	}
	else{
		hang up;
	}
}

Print statements showed the telno was being handed to the routine,
but the if said nothing was there.  Turns out, on my system, the
address of telno is NULL.  I needed to check the contents not the
address!
-- 

		 	Bob Peirce, Pittsburgh, PA
		uucp: ...!{allegra, bellcore, cadre, idis}
		  	 !pitt!darth!investor!rbp
				412-471-5320

		NOTE:  Mail must be < 30,000 bytes/message

michael@python.UUCP (M. Cain) (08/16/85)

This one didn't bite me directly, but my wife spent most of a day
finding a more complicated instance of it in someone else's code.

Start with two source files:
foo.c:
main()
{
    sub2(1);
}

sub1()
{
}

bar.c:
extern sub1(a,b);

sub2(x)
int x;
{
    printf("a = %d, b = %d, x = %d\n",a,b,x);
}

Compiling with "cc foo.c bar.c" produced no error messages at all.
But when a.out is executed, the output was

a = 1, b = junk, x = junk

This was all done under XENIX on a Sritek 68000 board.  Same kind
of screw-up in both AT&T and Berkeley universes on a Pyramid.  Lint
on the Pyramid complains that sub2() has a variable number of argu-
ments.  Two different 68000 cross-compilers make the same mistake.
Our VAX running System V correctly tagged the extern statement as
incorrect.  My 6809 OS-9 system missed the extern statement, but at
least pointed out that a and b are undefined within sub2().

Michael Cain
Bell Communications Research
..!bellcore!python!michael

peter@baylor.UUCP (Peter da Silva) (08/18/85)

> > 		if (sv > score);   <----- note extraneous semi-colon
> > 			score = sv;
> 
> This sort of thing makes me think that a few extra keywords
> are called for programming languages like this.  E.g.
> 	if <bool_expr> then <stmt> fi
> 	while <bool_expr> do <stmt> od
> Something to keep in mind when you design an Algol-like language.

ICK ICK ICK! I hate languages that do that. Ever considered using "cb" as a
debugging tool? I have an MS-DOS version if anyone wants it...
-- 
	Peter da Silva (the mad Australian werewolf)
		UUCP: ...!shell!neuro1!{hyd-ptd,baylor,datafac}!peter
		MCI: PDASILVA; CIS: 70216,1076

ark@alice.UUCP (Andrew Koenig) (08/18/85)

> int dial(telno)
> char *telno;
> {
>	if(telno){		/*  should be if(*telno)  */
>		dial it;
>	}
>	else{
>		hang up;
>	}
> }

Bob Pierce says that this program failed because it should have been
checking *telno instead of telno.

If telno is NULL, you had better not look at *telno; it's illegal.
If the address of a legal character item is NULL, your compiler is
not implementing the language properly.

ludemann@ubc-cs.UUCP (Peter Ludemann) (08/18/85)

Here's my favourite bite in the neck (apologies if I've made
any typos - this is just an example):

typedef union {
		int  u1;
		char u2;
	} union_type;

typedef struct {
		int        f1;
		union_type f2;
	} struct_type;

struct_type s;

s.u1 = 0;   /* should be: s.f1.u1 = 0 */

This has the effect of "s.f1 = 0" with no complaint from
the compiler (lint, of course, is another matter).  Truly
spectacular results can occur if "f1" is a pointer to
another area.

The really annoying thing is that K&R (page 186) says:
    A primary expression followed by a dot followed by an
    identifier is an expression.  The first expression must
    be an lvalue naming a structure or union, and the 
    identifier must name a member of the structure or union.
In other words, type checking almost as strong as Pascal's (yes,
I know about the case where two structures have the first fields
declared the same).

However, K&R (page 209) says "... this restriction is not
firmly enforced by the compiler."  It is sad that the defects
of the original C compiler have been slavishly copied by
subsequent ims.  If backward compatibility were
important a "don't check structures strictly" switch could
have been added to the compilers.
-- 
ludemann%ubc-vision@ubc-cs.uucp (ubc-cs!ludemann@ubc-vision.uucp)
ludemann@cs.ubc.cdn
ludemann@ubc-cs.csnet
Peter_Ludemann@UBC.mailnet

rbp@investor.UUCP (Bob Peirce) (08/19/85)

>    What's worse, the optimiser has in this case hidden a program bug!!!
> 
> Thus the moral:
> 
> 	"Don't just test your code once.  Test it again, this time
>     	 turn the optimiser OFF first".

and vice versa!
-- 

		 	Bob Peirce, Pittsburgh, PA
		uucp: ...!{allegra, bellcore, cadre, idis}
		  	 !pitt!darth!investor!rbp
				412-471-5320

		NOTE:  Mail must be < 30,000 bytes/message

peter@baylor.UUCP (Peter da Silva) (08/19/85)

> 		cc -E prog.c | cb | cat -s

ANOTHER FLAG FOR CAT!?!?!? How many places have cat -s?
-- 
	Peter (Made in Australia) da Silva
		UUCP: ...!shell!neuro1!{hyd-ptd,baylor,datafac}!peter
		MCI: PDASILVA; CIS: 70216,1076

levy@ttrdc.UUCP (Daniel R. Levy) (08/19/85)

In article <389@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:
>	Here's one that just got me:
>
>		if (sv > score);   <----- note extraneous semi-colon
>			score = sv;
>
>	This was in a series of computations which gave various scores; the
>fragment above was repeated in various places to pick out the maximum.  Of
>course, the test is a no-op and the assignment was always done.  Naturally,
>this passes lint (even with the -h flag which uses "heuristic tests to
>attempt to intuit bugs") without any complaint.
>--
>Roy Smith <allegra!phri!roy>

Sounds like a question of style hiding function.  Why not stick to something
like

		if (sv > score) score = sv;

?

I can't think of anything much more straightforward than that.
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer, my pets, my plants, my boss, or the
| at&t computer systems division |  s.a. of any computer upon which I may hack.
|        skokie, illinois        |
|          "go for it"           |  Path: ..!ihnp4!ttrdc!levy
 --------------------------------     or: ..!ihnp4!iheds!ttbcad!levy

ark@alice.UUCP (Andrew Koenig) (08/21/85)

> typedef union {
>		int  u1;
>		char u2;
>	} union_type;
>
> typedef struct {
>		int        f1;
>		union_type f2;
>	} struct_type;
>
> struct_type s;
>
> s.u1 = 0;   /* should be: s.f1.u1 = 0 */

Gee, our compiler certainly complains about this one.

cdshaw@watmum.UUCP (Chris Shaw) (08/22/85)

In article <372@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes:
>In article <389@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:
>>	Here's one that just got me:
>>		if (sv > score);   <----- note extraneous semi-colon
>>			score = sv;
>
>Sounds like a question of style hiding function.  Why not stick to something
>like
>		if (sv > score) score = sv;
>?
>|       dan levy 

...because 

if(sv>score||this==that+the_other||fopen("crap","r"))save=the+whales+fur+christ++;

is the kind of statement where bugs really happen. Can you seriously spend less
than two seconds reading that to comprehend what's going on ? If you answered
yes, how about this (more important) question: Can you read a whole FILE of
this kind of crap and then be able to find a variable at will ?

I doubt it. I can think of more straightforward ways of producing code, some
of which include programming while awake, so that the errors like the
one in the original posting don't happen. Others include using a self-consistent
style, which Mr Levy's is not. Compound if statements should look the same
as simple if statements.

Mr Levy's style of if statement has an equivalent in English called 
"the run-on-sentence". What's silly about the whole thing is that a program
formatter can make this stuff QUITE readable, and will probably find the
bug that "bit" Mr Smith.

The most important element of a readable programming style is the use of white
space. I personally can't stand the K&R style because I get visually confused
when I read it. It's similar to an English paragraphing that doesn't use
indenting or spaces between paragraphs. In the C book itself, this isn't bad,
because the program fragments are small and the structures are simple.
In real programs, however, there are lots of programs which are unreadable
until passed through "indent" (on 4.2).

Chris Shaw    watmath!watmum!cdshaw  or  cdshaw@watmath
University of Waterloo
In doubt?  Eat hot high-speed death -- the experts' choice in gastric vileness !

rlk@chinet.UUCP (Richard L. Klappal) (08/22/85)

In article <471@baylor.UUCP> peter@baylor.UUCP (Peter da Silva) writes:
>> 		cc -E prog.c | cb | cat -s
>
>ANOTHER FLAG FOR CAT!?!?!? How many places have cat -s?
>-- 
>	Peter (Made in Australia) da Silva

The Fortune 32:16 has it.  Means force single spacing on output
(kinda like uniq) to get rid of excessive blank lines.

PS: Peter:  Could you post the MSDOS version of cb. (if legal to
do so).  I friend uses the Idiot/Barely Moron with Lattice, and
would appreciate having cb. (+vi + ... UN*X  :-)).

Richard Klappal

UUCP:		..!ihnp4!chinet!uklpl!rlk  | "Money is truthful.  If a man
MCIMail:	rklappal		   | speaks of his honor, make him
Compuserve:	74106,1021		   | pay cash."
USPS:		1 S 299 Danby Street	   | 
		Villa Park IL 60181	   |	Lazarus Long 
TEL:		(312) 620-4988		   |	    (aka R. Heinlein)
-------------------------------------------------------------------------

alan@drivax.UUCP (Alan Fargusson) (08/22/85)

>	Here's one that just got me:
>
>		if (sv > score);   <----- note extraneous semi-colon
>			score = sv;
>
>	This was in a series of computations which gave various scores; the
>fragment above was repeated in various places to pick out the maximum.  Of
>course, the test is a no-op and the assignment was always done.  Naturally,
>this passes lint (even with the -h flag which uses "heuristic tests to
>attempt to intuit bugs") without any complaint.
>--
>Roy Smith <allegra!phri!roy>

I have to tell you that I got bit the same way in PASCAL when I was
a student. This is not just a C problem. I think that all of the
strucutred languages I have seen (except Modula-2, and Algol 68)
have this problem.
-- 

Alan Fargusson.

{ ihnp4, amdahl, mot }!drivax!alan

mouse@mcgill-vision.UUCP (der Mouse) (08/23/85)

  [ ... ]
>	if(telno){		/*  should be if(*telno)  */
  [ ... ]

> Print statements showed the telno was being handed to the routine,
> but the if said nothing was there.  Turns out, on my system, the
> address of telno is NULL.  I needed to check the contents not the
> address!

Gee....and I thought a zero pointer was guaranteed not to point to
anything valid (K&R says this).  Or is NULL not a zero?!  No, you
are comparing to 0 not NULL.
-- 
					der Mouse

{ihnp4,decvax,akgua,etc}!utcsri!mcgill-vision!mouse
philabs!micomvax!musocs!mcgill-vision!mouse

Hacker: One responsible for destroying /
Wizard: One responsible for recovering it afterward

lam@btnix.UUCP (lam) (08/23/85)

[*** The Phantom Article Gobbler Strikes Again ***]
> > > 	int i, a[10];
> > > 	for (i = 0; i <= 10; i++)
> > > 		a[i] = 0;
> > > 

> > This looks to me like it will simply overwrite one int's worth of
> > memory beyond the end of the array "a" with the value 0.  Granted,
> > depending on what happens to be after "a", this can have disastrous
> > results, but is there really an implementation in which it will
> > (reliably) lead to infinte looping?
> ----------
> Yes.  Any implementation that allocates the space for i following the
> space for a.

The cause of the infinite loop is due to the storage allocation.
	i.e.	&i == &a[10]
   causing i to be overwritten with 0 when i is 10.

The more interesting thing is that on some compilers, the infinite
   loop does NOT occur.  Lo and behold, the OPTIMISER comes into play.
   If i is put in a Register at the start of the for(), a[10] = 0 
   will indeed overwrite i in memory but not the register !!! and the
   loop terminates normally.
   What's worse, the optimiser has in this case hidden a program bug!!!

Thus the moral:

	"Don't just test your code once.  Test it again, this time
    	 turn the optimiser OFF first".

------------------------------------------------------------------
	Onward Lam 
	CAP Group, Reading, England.

root@bu-cs.UUCP (Barry Shein) (08/24/85)

Not really a bite, but I remember when I was first learning C
I was quite bewildered by the fact that you couldn't really
declare your own 'argv', that is, you couldn't declare an
array of pointers to fixed length buffers except perhaps by:

char *myargv[] = {
	"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
	"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",

	etc

I mean, argv seemed kinda holy to me, disturbing.

	-Barry Shein, Boston University

P.S. I know argv is var length, but that would be even harder to declare!

guy@sun.uucp (Guy Harris) (08/25/85)

>   [ ... ]
> >	if(telno){		/*  should be if(*telno)  */
>   [ ... ]
> 
> > Print statements showed the telno was being handed to the routine,
> > but the if said nothing was there.  Turns out, on my system, the
> > address of telno is NULL.  I needed to check the contents not the
> > address!
> 
> Gee....and I thought a zero pointer was guaranteed not to point to
> anything valid (K&R says this).

All valid implementations of C guarantee this.  Obviously, the
implementation of C that this was done on is not valid.  He should complain
to the vendor.  (Yes, there have been such implementations; one well-known
chip maker's first UNIX release didn't put the necessary shim at data
location 0 on a separate I&D space program.  They fixed it shortly
afterwards.)

> Or is NULL not a zero?!  No, you are comparing to 0 not NULL.

If you compare a pointer against 0, the actual code compiled compares it
against a null pointer.  NULL *is* 0, if you're talking from the standpoint
of "what does the '#define' in <stdio.h> and other places say":

	/*	@(#)stdio.h 1.2 85/01/21 SMI; from UCB 1.4 06/30/83	*/

	...

	#define	NULL	0

(and you'll find the same thing in V7, 4.2, 4.3, S3, S5, ...).  In any
context where it is known to the compiler that something is supposed to be a
pointer to a specific data type, any zero that appears there is treated as a
null pointer of the type "pointer to that data type" (obviously, not a null
pointer to an object of that data type, since a null pointer can't point to
anything).  These contexts include comparisons and assignments, so the two
assignments in

	register struct frobozz *p;

	p = 0;
	p = (struct frobozz *)0;

are equivalent and the two comparisons in

	if (p == 0)
		foo();
	if (p == (struct frobozz *)0)
		foo();

are equivalent.  Procedure calls, however, are not such a context, so the
two procedure calls in

	bar(0);
	bar((struct frobozz *)0);

are very definitely *not* equivalent.  In ANSI Standard C, there is a syntax
to specify that "bar" takes an argument of type "struct frobozz *"; if you
declared "bar" in such a manner, the two procedure calls would be equivalent.

	Guy Harris

peters@cubsvax.UUCP (Peter S. Shenkin) (08/26/85)

I've had several bugs involving code hidden in macro definitions which have 
been very difficult to find.  One I recall offhand went something like this:

/* OPEN MOUTH *****************************************************************/
#define Coords(I)	(complicated.structure.redirection[I].x, \
			 complicated.structure.redirection[I].y, \
			 complicated.structure.redirection[I].z   )
main()
{
	...
	subr(Coords(i));  /* BITE */
	...
}
/***************************************************************************/
subr(x,y,z)
float x,y,z;
{...}
/* SWALLOW ******************************************************************/

Problem is, when expanded, the call to subr looks like
	subr((exp1,exp2,exp3));
The comma operator is applied, and subr() gets only exp1 !!!  The interesting
thing is that if anyone had asked me, whether (something), ((something)),
and (((something))) mean the same in C, I would have said "Yes," without
thinking.  Obviously, I would have been wrong.

Peter S. Shenkin	philabs!cubsvax!peters		Columbia Univ. Biology

mab@druca.UUCP (BlandMA) (08/28/85)

I was amused when I realized why this statement didn't print anything:

	printf("toggle ">" verbosity\n");

-- 
Alan Bland     {ihnp4|allegra}!druca!mab
AT&T Information Systems, Denver CO

lee@eel.UUCP (08/29/85)

>>Gee....and I thought a zero pointer was guaranteed not to point to
>>anything valid (K&R says this).

>All valid implementations of C guarantee this.  Obviously, the
>implementation of C that this was done on is not valid.  He should complain
>to the vendor.  (Yes, there have been such implementations; one well-known
>chip maker's first UNIX release didn't put the necessary shim at data
>location 0 on a separate I&D space program.  They fixed it shortly
>afterwards.)

Speaking of issues that have been beaten to death!  K&R says only that the
value 0 is distinguishable from pointers that point to objects, and that
therefore the value zero is not a "valid" pointer.  It certainly does not
say that the 0 pointer will give you the "null" or empty value of any
object, and in particular it does not promise that there will be an integer
zero if you dereference (int*)0, or a character zero if you dereference
(char*)0, nor a memory fault if you reference (foo*)0.

NO, you cannot depend upon the value obtained by dereferencing ANY pointer
that has been assigned the value zero.  It does not point to any object;
the implementation of C does not guarantee  to protect you from erroneously
trying to access that object and the result is unpredictable over various
implementations.

darryl@ISM780.UUCP (08/29/85)

[]

One final, subtle, point.  K&R does not guarantee that the *value* 0 is
distinguishable from all other pointers, but rather, that the *constant* 0
is.  That is to say, you may compare against 0 to determine the validity of
a pointer (or assign to guarantee invalidity), but you may not assume that
comparison against (or assignment of) an int variable whose value is 0 will
have the same result.  This picky distinction probably doesn't affect any
of the better known chips, but might be important on a machine where a null
pointer is not a bit string of 0s.

	    --Darryl Richman, INTERACTIVE Systems Corp.
	    ...!cca!ima!ism780!darryl
	    The views expressed above are my opinions only.

P.S.:  I know that this sounds amazing, so look at the top of K&R p190,
under the section 7.7, equality operators (second paragraph), and again
on top of p192, section 7.14, assignment operators.

dave@lsuc.UUCP (David Sherman) (08/30/85)

> >ANOTHER FLAG FOR CAT!?!?!? How many places have cat -s?
> 
> The Fortune 32:16 has it.  Means force single spacing on output
> (kinda like uniq) to get rid of excessive blank lines.

For those with BSD systems, or (as in our case) systems with
some BSD utilities, the ssp(1) program does this. (It's used
by man(1) for output to a terminal.)

Dave Sherman
The Law Society of Upper Canada
Toronto
-- 
{  ihnp4!utzoo  pesnta  utcs  hcr  decvax!utcsri  }  !lsuc!dave

ark@alice.UucP (Andrew Koenig) (08/30/85)

>>All valid implementations of C guarantee this.  Obviously, the
>>implementation of C that this was done on is not valid.  He should complain
>>to the vendor.  (Yes, there have been such implementations; one well-known
>>chip maker's first UNIX release didn't put the necessary shim at data
>>location 0 on a separate I&D space program.  They fixed it shortly
>>afterwards.)

>Speaking of issues that have been beaten to death!  K&R says only that the
>value 0 is distinguishable from pointers that point to objects, and that
>therefore the value zero is not a "valid" pointer.  It certainly does not
>say that the 0 pointer will give you the "null" or empty value of any
>object, and in particular it does not promise that there will be an integer
>zero if you dereference (int*)0, or a character zero if you dereference
>(char*)0, nor a memory fault if you reference (foo*)0.

>NO, you cannot depend upon the value obtained by dereferencing ANY pointer
>that has been assigned the value zero.  It does not point to any object;
>the implementation of C does not guarantee  to protect you from erroneously
>trying to access that object and the result is unpredictable over various
>implementations.

I think the "necessary shim" referred to in the first note quoted
above has nothing to do with a value intended to ensure that *(int*)0
give a defined value.  Rather, it is a dummy variable located at
location 0 designed to ensure the NOTHING ELSE find itself at location
0 by accident!  The trouble with putting a variable at location 0
is that its address will then erroneously appear to be NULL.

guy@sun.uucp (Guy Harris) (08/31/85)

> Not really a bite, but I remember when I was first learning C
> I was quite bewildered by the fact that you couldn't really
> declare your own 'argv', that is, you couldn't declare an
> array of pointers to fixed length buffers except perhaps by:
> 
> char *myargv[] = {
> 	"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 	"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 
> 	etc
> 
> I mean, argv seemed kinda holy to me, disturbing.

If you want an array of pointers to fixed-length buffers, you can declare it
as long as the number of such pointers can be determined at the time you
write the code.

	char bufs[3][20];

	char *bufps[3] = {
		bufs[0],
		bufs[1],
		bufs[2],
	};

If the number can't be fixed when you write the code, you can set up "bufps"
at run time.

Also note that "argv" isn't a pointer to an array of pointers to fixed-length
buffers, it's a pointer to an array of pointers to strings, which you *can*
declare.

> P.S. I know argv is var length, but that would be even harder to declare!

The secret is that "argv" (or, more correctly, what "argv" points to)
*isn't* declared.  Pointers need not point to things which have been
declared; "malloc" returns pointers to objects fabricated on the fly.  If
you have "n" arguments ("n" is a variable here), just do

	register char **argv;

	argv = (char **)malloc(n * sizeof(char *));

And you can fill them in.

	Guy Harris

gostas@kuling.UUCP (G|sta Simil{/ml) (09/01/85)

In article <2702@sun.uucp> guy@sun.uucp (Guy Harris) writes:

>Procedure calls, however, are not such a context, so the
>two procedure calls in
>
>	bar(0);
>	bar((struct frobozz *)0);
>
>are very definitely *not* equivalent.  In ANSI Standard C, there is a syntax
>to specify that "bar" takes an argument of type "struct frobozz *"; if you
>declared "bar" in such a manner, the two procedure calls would be equivalent.
>
>	Guy Harris

Is it also possible to give a NULL-pointer to a procedure as a parameter,
if for example the procedure would return several values, and we are not
interested in all of them?

wait(0) works at least here (4.2BSD), but something like this does not:

skip(fd, n)	/* skip n bytes om streams that don't allow lseek() */
int fd, n;
{
	(void)read(fd, 0, n);
}

		G|sta Simil{		gostas@kuling.UUCP

lee@eel.UUCP (09/02/85)

	One final, subtle, point.  K&R does not guarantee that the *value* 0
	is distinguishable from all other pointers, but rather, that the
	*constant* 0 is.  That is to say, you may compare against 0 to
	determine the validity of a pointer (or assign to guarantee
	invalidity), but you may not assume that comparison against (or
	assignment of) an int variable whose value is 0 will have the same
	result.  This picky distinction probably doesn't affect any of the
	better known chips, but might be important on a machine where a null
	pointer is not a bit string of 0s.

While the quotation is true, I think that it refers to the automatic
coercion that is required to give the constant 0 the proper distinguishable
pattern in the appropriate pointer type.  I think we all fairly assume that

	char *p=0, *q="a";
	main() {if (p==q) printf("bogus");}

will fail to print because one of the pointers has been assigned the
constant 0 and one has been assigned a pointer to a real object.  Therefore
the value 0 does persist after assignment to any pointer type and is
distinguishable from the values in other pointers as well.  And two such
pointers to the same type both of which have been assigned the value 0
will compare equal.

I don't see why the restriction applies to non-pointer variables.  As long as
type coercions are explicit, this should apply to all values of zero, whether
encountered as a literal in the program or as the value of a variable of
integral type.

I think it is not unreasonable, tho it is certainly not covered anywhere,
that coercions between pointers of different types should map the 0 value
properly so that, for example,

	int *p=0;
	char *q=0;
	main() {if (p==(int *)q) printf("this is right");}

should produce output.  We all know that 0 cannot be interpreted as a
pointer without knowing what it is a pointer to, but given that we know the
types of the pointers involved, the "I don't point to anything" values
should be considered equivalent in assignments and comparisons.

guy@sun.uucp (Guy Harris) (09/02/85)

> >>All valid implementations of C guarantee (that a null pointer doesn't
> >>point to anything valid). ... (Yes, there have been (invalid)
> >>implementations; one well-known chip maker's first UNIX release didn't
> >>put the necessary shim at data location 0 on a separate I&D space program.
> 
> >Speaking of issues that have been beaten to death!  K&R says only that the
> >value 0 is distinguishable from pointers that point to objects, and that
> >therefore the value zero is not a "valid" pointer.  It certainly does not
> >say that the 0 pointer will give you the "null" or empty value of any
> >object ...
>
> I think the "necessary shim" referred to in the first note quoted
> above has nothing to do with a value intended to ensure that *(int*)0
> give a defined value.

Yes, that is exactly what I was referring to.  Ideally, if possible,
location zero should literally have nothing there - i.e., your program
should get a segmentation violation if it tries to use the contents of
location 0.  (This hits errant programs upside the head at a nice early
stage in their lives.)  If not, however, you must ensure that it doesn't
have any code or data there - you have to stick a shim in there to prevent
this on separate I&D systems (the startup code acts as a shim in most
non-separate I&D systems).

	Guy Harris

peter@graffiti.UUCP (Peter da Silva) (09/03/85)

> > Not really a bite, but I remember when I was first learning C
> > I was quite bewildered by the fact that you couldn't really
> > declare your own 'argv', that is, you couldn't declare an
> > array of pointers to fixed length buffers except perhaps by:
> > 
> > char *myargv[] = {
> > 	"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> > 	"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> > 

What are you talking about?

char *myargv[5] = { "/bin/sh", "sh", "-c", "echo 'well it worked'", NULL };

What's so holy about this?

guy@sun.uucp (Guy Harris) (09/04/85)

> 	One final, subtle, point.  K&R does not guarantee that the *value* 0
> 	is distinguishable from all other pointers, but rather, that the
> 	*constant* 0 is.  That is to say, you may compare against 0 to
> 	determine the validity of a pointer (or assign to guarantee
> 	invalidity), but you may not assume that comparison against (or
> 	assignment of) an int variable whose value is 0 will have the same
> 	result.
> 
> I don't see why the restriction applies to non-pointer variables.  As long
> as type coercions are explicit, this should apply to all values of zero,
> whether encountered as a literal in the program or as the value of a
> variable of integral type.

("Oh no, Mabel!  Here comes another K&R quote!")

7.7 Equality operators

	A pointer may be compared to an integer, but the result is
	machine independent unless the integer is *the constant* 0.
	(Italics mine)

7.13 Conditional operator

	...otherwise, one must be a pointer and the other *the constant*
	0, and the result has the type of the pointer.  (Italics mine)

7.14 Assignment operators

	...However, it is guaranteed that assignment of *the constant*
	0 to a pointer will produce a null pointer distinguishable
	from a pointer to any object.  (Italics mine)

I'd say the intent of K and R was pretty clear here, wouldn't you?

As for "why" - think of a machine where a null pointer *didn't* have the
same bit pattern as the integer 0.  Every time you assigned an integer to a
pointer, you'd have to check whether the integer was zero or not and assign
a null pointer instead (unless the computation you had to do to convert an
integer to a pointer did this anyway).  Why penalize those assignments
solely to make assigning a 0 other than a constant 0 set the pointer to a
null pointer?

	Guy Harris

henry@utzoo.UUCP (Henry Spencer) (09/06/85)

> Is it also possible to give a NULL-pointer to a procedure as a parameter,
> if for example the procedure would return several values, and we are not
> interested in all of them?
> 
> wait(0) works at least here (4.2BSD), but something like this does not:
> ...
> 	(void)read(fd, 0, n);

Passing NULL only works if the function is prepared for the possibility
and explicitly checks for it.  wait() does; read() does not.  See the
documentation.  By the way, that should be "wait( (int *)0 )", to make
sure the type is right; the Unix documentation is often sloppy about this
particular detail, since it originated on machines where the sloppiness
didn't cause any problems.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

mouse@mcgill-vision.UUCP (der Mouse) (09/07/85)

>> ... so the two procedure calls in
>>	bar(0);
>>	bar((struct frobozz *)0);
>>are very definitely *not* equivalent.
>>	Guy Harris

>Is it also possible to give a NULL-pointer to a procedure as a parameter,
>if for example the procedure would return several values, and we are not
>interested in all of them?
>
>wait(0) works at least here (4.2BSD), but something like this does not:
>
>skip(fd, n)	/* skip n bytes om streams that don't allow lseek() */
>int fd, n;
>{
>	(void)read(fd, 0, n);
>}

     Let me be the 18th of  69 netters (we are a  leaf node so there's a
k-day delay, for some small integer k, between you and us)  to point out
that....

     Wait(0) works  because  the wait code *specifically* checks for a 0
argument.  I believe the code reads something like

	wait(stpointer)
	struct status *stpointer;
	{
	 ....
	 if (stpointer)
	  { *stpointer = ststruct;
	  }
	 ....
	}

     A lot of code (sigvec, for instance) works this way.  However,  for
calls like read(), where the lack  of interest  is a *very*  exceptional
case,  this  check  is  omitted.  Some machines, notably the 68K family,
will  catch  a  zero pointer  because  there's  no memory there.   Some,
notably VAXen,  will not.  However, for  syscalls involving writing into
memory, for most (-z format, see ld(1)) executable files,  attempting to
write into  address 0 will  fault (syscalls return EFAULT, user code get
SEGV errors).

     Read(fd,0,n)  *should* give you  a memory error  (EFAULT returned).
The only case I know  of in which it won't is when  you are running on a
VAX, so there is memory  at address 0  (it's usually the C startup  code
from crt0.o), and the executable file is in the old old old format which
doesn't do sharing of  text segments, so the text segment  is writeable.
In this case, read will happily overwrite the first n bytes  of the text
segment.   Normally (because that *is* the crt0 code, which  doesn't get
reentered), you won't notice unless n is big.
-- 
					der Mouse

{ihnp4,decvax,akgua,etc}!utcsri!mcgill-vision!mouse
philabs!micomvax!musocs!mcgill-vision!mouse

Hacker: One responsible for destroying /
Wizard: One responsible for recovering it afterward

henry@utzoo.UUCP (Henry Spencer) (09/08/85)

> Exactly, but also consider what K&R says in section 7.14:
> 
>     The compilers currently allow a pointer to be assigned to an integer, an
>     integer to a pointer, and a pointer to a pointer of another type.  The
>     assignment is a pure copy operation, with no conversion.

Note that they do not say that this is a legitimate feature of the language!
All they say is that the current compilers will let you get away with it.
This is no longer generally true, by the way.  K&R is quite old.

> Also, in section 14.4:
> 
>     A pointer may be converted to any of the integral types large enough to
>     hold it. [...]  The mapping function is also machine dependent, but is
>     intended to be unsurprising to those who know the addressing structure
>     of the machine.
>
> Although this does not seal it up completely, it seems that K&R had it in
> mind that putting pointers into integers (and taking them back again) would
> have no overhead....

True, but there is a subtle point here:  they say you can convert pointers
to (sufficiently large) integers, they may say that you can convert the
result back, but they don't say what the integer will look like.  A NULL
pointer will not necessarily show up as an integer zero.  The equality
between NULL pointers and 0 works only when 0 is a literal constant, in
which case it is (potentially) treated specially by the compiler when
encountered in a "pointer" context.  The conversion of literal 0 to the
NULL pointer is *not* an instance of the general "putting pointers into
integers (and taking them back again)" conversion.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

darryl%ism780.uucp@BRL.ARPA (09/08/85)

>>                                  K&R does not guarantee that the *value* 0
>>       is distinguishable from all other pointers, but rather, that the
>>       *constant* 0 is.
>>
>> I don't see why the restriction applies to non-pointer variables.  As long
>> as type coercions are explicit, this should apply to all values of zero,
>> whether encountered as a literal in the program or as the value of a
>> variable of integral type.
>
>As for "why" - think of a machine where a null pointer *didn't* have the
>same bit pattern as the integer 0.  Every time you assigned an integer to a
>pointer, you'd have to check whether the integer was zero or not and assign
>a null pointer instead (unless the computation you had to do to convert an
>integer to a pointer did this anyway).

Exactly, but also consider what K&R says in section 7.14:

    The compilers currently allow a pointer to be assigned to an integer, an
    integer to a pointer, and a pointer to a pointer of another type.  The
    assignment is a pure copy operation, with no conversion.

Also, in section 14.4:

    A pointer may be converted to any of the integral types large enough to
    hold it. [...]  The mapping function is also machine dependent, but is
    intended to be unsurprising to those who know the addressing structure
    of the machine.

Although this does not seal it up completely, it seems that K&R had it in
mind that putting pointers into integers (and taking them back again) would
have no overhead.  Checking for a 0 *value* probably is more overhead than
they had in mind.

	    --Darryl Richman, INTERACTIVE Systems Corp.
	    ...!cca!ima!ism780!darryl
	    The views expressed above are my opinions only.

darryl@ISM780.UUCP (09/10/85)

>> Although this does not seal it up completely, it seems that K&R had it in
>> mind that putting pointers into integers (and taking them back again) would
>> have no overhead....
>
>True, but there is a subtle point here:  they say you can convert pointers
>to (sufficiently large) integers, they may say that you can convert the
>result back, but they don't say what the integer will look like.

Henry, you and I are NOT arguing;  I agree that the implicit conversion of
0 to a null pointer only happens for constant 0s.  Perhaps I was less than
completely clear, but I wanted to be sure (hah!) that the netters would
understand that 0 and an int variable containing the value 0 are (may be)
treated differently here.

	    --Darryl Richman, INTERACTIVE Systems Corp.
	    ...!cca!ima!ism780!darryl
	    The views expressed above are my opinions only.

peterc@ecr2.UUCP (Peter Curran) (09/11/85)

Although the topic of Null pointers has been beaten to death many times,
there is one point that I have never seen discussed.

External variables are to be initialized to 0, according to the C Reference
Manual (I don't have a copy of K&R handy, but I'm pretty sure it says the
same thing.)  These means that integers get 0, and pointers get NULL.
(I don't know whatis supposed to happen to variables for which 0 is not
valid - what really happens is they get 0 anyhow, of course).  Since this
includes all variables not explicitly initialized, it includes unions.
It is hard to imagine an implementation of this that allows a block of
memory representing simultaneously one or more pointers and one or more
integers to be initialized correctly unless the bit pattern for a null
pointer is identical to the bit pattern for a 0 integer of the same size
(assuming one exists - otherwise concatenations of integers, or whatever
else is required).

I can think of at least two ways it could happen.  First, I believe a
compiler is free to treat 'union' as equivalent to 'struct' - i.e. ignore
the intended overlaying of memory.  It could then initialize the two
sets of variables entirely independently.  Second, I can imagine some
form of tagged memory architecture in which the tags are only used in
conjunction with instructions that use the memory as an address, so the
non-tag (i.e. the integer) is zero, but the entire location (including the
tag) is non-zero.  I don't know enough about tagged memory architectures
to pursue this very far, but it seems too complex to be really credible.

Therefore, unless you accept a brain-damaged compiler that treats 'union'
as equivalent to 'struct,' it seems hard to avoid the conclusion that
C requires that the bit-pattern for a "null" pointer be identical to the
bit pattern of "(int) 0" (except possibly in length).

gwyn@BRL.ARPA (VLD/VMB) (09/19/85)

The last X3J11 draft that I have a copy of states that objects with
static storage duration that are not initialized explicitly are
initialized implicitly as if every scalar member were assigned the
integer constant 0.

This does not imply anything about bit patterns for null pointers.