[comp.lang.c] "do ... while

ccdn@levels.sait.edu.au (DAVID NEWALL) (08/07/89)

I want to scan a string with fields separated by commas.  To do this, I
wrote the following:
                do
                   ...
                while ((s = strchr(s, ',') + 1) - 1);

I've been told that this is not valid C because, in the case that there
are no more fields (commas), strchr() returns NULL; and NULL + 1 is not
valid.

Comments, anyone?


David Newall                     Phone:  +61 8 343 3160
Unix Systems Programmer          Fax:    +61 8 349 6939
Academic Computing Service       E-mail: ccdn@levels.sait.oz.au
SA Institute of Technology       Post:   The Levels, South Australia, 5095

cpcahil@virtech.UUCP (Conor P. Cahill) (08/08/89)

In article <1043@levels.sait.edu.au>, ccdn@levels.sait.edu.au (DAVID NEWALL) writes:
> 
> I've been told that this is not valid C because, in the case that there
> are no more fields (commas), strchr() returns NULL; and NULL + 1 is not
> valid.

NULL, in this case, is just a pointer that has the value 0.  NULL + 1 is a valid
operations, however *(NULL+1) is not.

I wouldn't code the loop as you have displayed because one has to
spend time thinking about what the will is  trying to do.  A "better" method
would be something like:

		do 
		   ...;
		   s = strchr(s,','); 
		while ( s++ != NULL );

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/09/89)

In article <1043@levels.sait.edu.au> ccdn@levels.sait.edu.au (DAVID NEWALL) writes:
-I've been told that this is not valid C because, in the case that there
-are no more fields (commas), strchr() returns NULL; and NULL + 1 is not
-valid.
-Comments, anyone?

You were told right.  You're not allowed to perform pointer arithmetic
involving null pointers.

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/09/89)

In article <961@virtech.UUCP> cpcahil@virtech.UUCP (Conor P. Cahill) writes:
>NULL + 1 is a valid operations, ...

No!

wjr@ftp.COM (Bill Rust) (08/09/89)

In article <10684@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <961@virtech.UUCP> cpcahil@virtech.UUCP (Conor P. Cahill) writes:
>>NULL + 1 is a valid operations, ...
>
>No!


In my experience, NULL is always defined using the preprocessor line
"#define NULL 0" (or 0L). Since the while construct is relying on the
fact NULL is, in fact, 0, doing NULL + 1 - 1 is ok. I certainly wouldn't
recommend using it as a reference to memory. But, unless NULL is a 
reserved word to your compiler, the compiler sees 0 + 1 - 1 and that is
ok.

Bill Rust (wjr@ftp.com)

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/10/89)

In article <696@ftp.COM> wjr@ftp.UUCP (Bill Rust) writes:
-In article <10684@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
->In article <961@virtech.UUCP> cpcahil@virtech.UUCP (Conor P. Cahill) writes:
->>NULL + 1 is a valid operations, ...
->No!
-In my experience, NULL is always defined using the preprocessor line
-"#define NULL 0" (or 0L).

That's not always true, but anyway it's irrelevant...

-Since the while construct is relying on the fact NULL is, in fact, 0,
-doing NULL + 1 - 1 is ok.

The code example was adding 1 to the return value from strchr().
strchr() does not return a preprocessor macro; it returns a null macro
(when it doesn't return a pointer to a valid char object).  You are not
allowed to add 1 to a null pointer.  If you happen to get away with it,
you're just lucky; it's not correct code.

In any event, if you rely on NULL being defined (for example in <stdio.h>)
as the source character string "0", then you're asking for trouble, since
it can be defined as any valid form of null pointer constant, including
for example "((void*)0)".  Indeed, it's rather expected that standard-
conforming implementations are more likely to choose the latter form.
Your program may suddenly stop working when a new release of the compiler
is installed, or when you port it to another environment.

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/10/89)

In article <10691@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>strchr() does not return a preprocessor macro; it returns a null macro

That was supposed to say:

strchr() does not return a preprocessor macro; it returns a null pointer ...

Our stupid news system software wouldn't let me cancel the article so
that I could send a corrected version.  I hope that this slip-up didn't
cause anybody too much confusion.

strchr(/*...*/)+1 is WRONG when strchr() returns a null pointer.

chris@mimsy.UUCP (Chris Torek) (08/10/89)

In article <696@ftp.COM> wjr@ftp.COM (Bill Rust) writes:
>In my experience, NULL is always defined using the preprocessor line
>"#define NULL 0" (or 0L).

NULL may correctly (by the pANS) be defined as `(void *)0'.

>Since the while construct is relying on the fact NULL is, in fact, 0,
>doing NULL + 1 - 1 is ok.

It is *if* two conditions hold:

	0. NULL is `#define'd as an integral constant zero
	   rather than (void *)0, and
	1. the loop actually reads `while (NULL + 1 - 1)'.

The latter did not hold in the original example, which was

	do ... while ((s = index(s, ',') + 1) - 1);

The result of

	<expression yeilding non nil character pointer> + 1

is a pointer to the character `beyond the one returned', so that

	s = index("foo, bar", ',') + 1

winds up making s point to the space in "foo, bar"; but the
result of

	<expression yeilding nil character pointer> + 1

is not defined.%  On many machines it `just happens' to give the
address of byte number 1 in the machine; loading this into a machine
pointer register (e.g., for assignment to s) may cause a runtime trap.
In any case, its being undefined gives the system license to do
arbitrarily annoying things at this point.  The `-1' after this
is thus irrelevant: like Humpty Dumpty, once a pointer is broken,
not all the King's horses nor all the King's persons%% can put it
back together again.
-----
% So *that* is how you get a butterfly! :-)
%% non-sexist noun :-) [too bad about `King']
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chad@lakesys.UUCP (D. Chadwick Gibbons) (08/10/89)

In article <696@ftp.COM> wjr@ftp.UUCP (Bill Rust) writes:
|In my experience, NULL is always defined using the preprocessor line
|"#define NULL 0" (or 0L). Since the while construct is relying on the
|fact NULL is, in fact, 0, doing NULL + 1 - 1 is ok. I certainly wouldn't
|recommend using it as a reference to memory. But, unless NULL is a 
|reserved word to your compiler, the compiler sees 0 + 1 - 1 and that is
|ok.

	Bad assumption.  In most decent implementations, NULL is indeed
defined as either 0 or 0L.  But this can't be, and isn't, true in all
implementations, which immediately prohibts use of it, if for nothing else
than portability reasons.  In many current implmentations, NULL is often
defined as ((char *)0) since it is the only "safe" thing to do--since many
programmers are not safe.

	As defined by K&R2, NULL is an expression "with value 0, or such
cast to type void *." (A6.6, p.198) This allows implementations to define NULL
as (void *)0, which would cause your NULL +1 -1 to fail.

	In spite of all that, why the hell would you want to use something
designed to designate a _nil pointer_ as an integer expression?! All of the
above is moot; NULL should not be used as an integer in an integer expression!
If using a symbolic is that important to you, do the ASCII thing:

#define NUL	(0)

Or perhaps something a little more readable:

#define ZERO	(0)

(Don't you feel sorry for those you don't know what a "0" means when they see
it inside code?  I know I sure do.)
-- 
D. Chadwick Gibbons, chad@lakesys.lakesys.com, ...!uunet!marque!lakesys!chad

wolfgang@ruso.UUCP (Wolfgang Deifel) (08/10/89)

ccdn@levels.sait.edu.au (DAVID NEWALL) writes:

>                do
>                   ...
>                while ((s = strchr(s, ',') + 1) - 1);

>I've been told that this is not valid C because, in the case that there
>are no more fields (commas), strchr() returns NULL; and NULL + 1 is not
>valid.

Why should NULL + 1 not be valid ??? NULL is a pointer with the value 0
and you can add the integer 1 to it ( but you cannot access *s in the case
strchr is NULL of course ).

    Wolfgang.

rae98@wash08.UUCP (Robert A. Earl) (08/10/89)

In article <696@ftp.COM> wjr@ftp.UUCP (Bill Rust) writes:
>In article <10684@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>>In article <961@virtech.UUCP> cpcahil@virtech.UUCP (Conor P. Cahill) writes:
>>>NULL + 1 is a valid operations, ...
>>No!
>In my experience, NULL is always defined using the preprocessor line
>"#define NULL 0" (or 0L). Since the while construct is relying on the
>fact NULL is, in fact, 0, doing NULL + 1 - 1 is ok.
>Bill Rust (wjr@ftp.com)


I have to disagree with Bill here.  The NULL being
returned was from a string manipulation function...ie
not just a NULL but a (char *) NULL....I believe it is
illegal (or at least unportable) to add (char *)NULL + 1.


-- 
===========================================================
Name:	Bob Earl		Phone:	(202) 872-6018 (wk)
UUCP:	...!uunet!wash08!rae98
BITNET:	...rae98@CAS	(At least, that is what I'm told)

chris@mimsy.UUCP (Chris Torek) (08/11/89)

In article <940@lakesys.UUCP> chad@lakesys.UUCP (D. Chadwick Gibbons) writes:
>... In most decent implementations, NULL is indeed defined as either
>0 or 0L.

Right.

>But this can't be, and isn't, true in all implementations,

No and yes: it could be, but it is not.

>... In many current implmentations, NULL is often defined as ((char *)0)
>since it is the only "safe" thing to do [meaning `the only way the vendor
>can keep the authors of bad code happy'].

This is both unsafe and wrong, even if it does keep such authors happy.
Consider:  If we write

	char *cp;
	int *ip;

	ip = cp;

the compiler must issue some kind of diagnostic (it says so in the
proposed ANSI C specification, and it says in K&R-1 that this operation
is machine-dependent, and all quality compilers do indeed generate a
warning).  This situation does not change if we write

	ip = (char *)ip;

It does change if we write instead

	ip = (int *)(char *)ip;

which puts the value in ip through two transformations (from
pointer-to-int to pointer-to-char, then from pointer-to-char to
pointer-to-int), and these two together are required to reproduce
the original value (this is something of a special case).

So: consider what happens if some implementer has wrongly put the line

	#define NULL ((char *)0)

in <stdio.h> and <stdarg.h> and so forth, and we write

	ip = NULL;

The compiler sees

	ip = ((char *)0);

which, as far as the type system is concerned, is identical to

	ip = cp;

---that is, it is machine dependent, and requires a warning.  We can
(probably) eliminate the warning% by adding a cast:

	ip = (int *)NULL;

which expands to

	ip = (int *)((char *)0);

On *most* machines, this `just happens' to work.  But if we look very
closely at the language definition, we find that it is not *required*
to work.  The version of this that is required to work is instead

	ip = (int *)(char *)(int *)0;

We are not allowed (outside of machine-dependent code) to change a
pointer-to-char into a pointer-to-int unless the pointer-to-char itself
came into existence as the result of a cast from a pointer-to-int.  The
only way to *create* a nil-pointer-to-int in the first place is to
write (int *)0, or (in the proposed ANSI C) (int *)(void *)0.

Of course, the actual definition of NULL in <stdio.h> and <stdarg.h>
and so on is provided per machine, so if

	int *ip = (int *)(char *)0;

`just happens' to work on that machine, the vendor could get away
with it.  But

	int *ip = NULL;

is guaranteed to work *without* generating warnings on any machine
where NULL is correctly defined, and one should not have to write

	int *ip = (int *)NULL;

just to avoid getting warnings---nor should the compiler be silent
about code like

	int *ip; char *cp; ip = cp;

The rest of <940@lakesys.UUCP> is correct.
-----
% The (probably) in eliminating warnings refers to the fact that
  a compiler can warn about anything it pleases:

	% cc -o foo foo.c
	cc: Warning: relative humidity and barometer pressure
		indicate that thunderstorms are likely
	cc: Warning: your shoelace is untied
	cc: Warning: this code looks ugly
	cc: Warning: your mother wears army boots
	cc: Warning: Hey!  Keep away from me with that axe!
	cc: Warning: Ack!  No, wait, I di(*&1to01llk
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (08/11/89)

>>                while ((s = strchr(s, ',') + 1) - 1)

In article <826@ruso.UUCP> wolfgang@ruso.UUCP (Wolfgang Deifel) writes:
>Why should NULL + 1 not be valid ??? NULL is a pointer with the value 0
>and you can add the integer 1 to it ....

NULL is not a pointer with the value 0, and 1 is not being added to
NULL here, but rather to a nil-pointer-to-char in the case in question.

NULL is a preprocessor macro; it expands to either an integral constant
zero (whose type is one of the integral types, e.g., int or short or long,
and whose value is zero) or to such a value cast to pointer-to-void
(whose type is pointer-to-void and whose value is unknowable).

A nil-pointer-to-char has type pointer-to-char and an ineffable value.
There is no way to talk about its value other than to say `it is a
nil pointer to char'.  In particular, you cannot say what happens
when you add one to it.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

dfp@cbnewsl.ATT.COM (david.f.prosser) (08/12/89)

In article <18996@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>
>	ip = (int *)((char *)0);
>
>On *most* machines, this `just happens' to work.  But if we look very
>closely at the language definition, we find that it is not *required*
>to work.  The version of this that is required to work is instead
>
>	ip = (int *)(char *)(int *)0;
>
>We are not allowed (outside of machine-dependent code) to change a
>pointer-to-char into a pointer-to-int unless the pointer-to-char itself
>came into existence as the result of a cast from a pointer-to-int.  The
>only way to *create* a nil-pointer-to-int in the first place is to
>write (int *)0, or (in the proposed ANSI C) (int *)(void *)0.

The pANS does guarantee that, for example,

	0 == (void *)(int *)(char *)0

[3.2.2.3: "Two null pointers, converted through possibly different sequences
of casts to pointer types, shall compare equal."]

Therefore, I interpret the pANS as requiring

	(int *)(char *)0

to have the same value as

	(int *)0

(the nil-pointer-to-int, in your terminology)--not `just happening' to work.

Dave Prosser	...not an official X3J11 answer...

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/12/89)

In article <18996@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>	cc: Warning: this code looks ugly

I've seen this one (actually I think it was "expression too complex")

>	cc: Warning: your mother wears army boots

You must be using our compiler..

bph@buengc.BU.EDU (Blair P. Houghton) (08/12/89)

In article <10709@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <18996@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>	cc: Warning: this code looks ugly
>
>I've seen this one (actually I think it was "expression too complex")
>
>>	cc: Warning: your mother wears army boots
>
>You must be using our compiler..

I got this one the other day:

	cc: Warning: this is a riot; posting to comp.lang.c

				--Blair
				  "Y'see, I was trying to
				   add two NULL pointers
				   together, and..."

bengsig@oracle.nl (Bjorn Engsig) (08/14/89)

Article <3726@buengc.BU.EDU> by bph@buengc.bu.edu (Blair P. Houghton) says:
|				--Blair
|				  "Y'see, I was trying to
|				   add two NULL pointers
|				   together, and..."
Can you add two pointers ?

[ No replies or followups please :-) ]
-- 
Bjorn Engsig, ORACLE Europe         \ /    "Hofstadter's Law:  It always takes
Path:   mcvax!orcenl!bengsig         X      longer than you expect, even if you
Domain: bengsig@oracle.nl           / \     take into account Hofstadter's Law"

wolfgang@ruso.UUCP (Wolfgang Deifel) (08/15/89)

ccdn@levels.sait.edu.au (DAVID NEWALL) writes:

>                while ((s = strchr(s, ',') + 1) - 1);

>I've been told that this is not valid C because, in the case that there
>are no more fields (commas), strchr() returns NULL; and NULL + 1 is not
>valid.

I think it's a difference if you write " NULL + 1 " ( which is non-
portable C, NULL is a machine dependent macro ) and " strchr(...) + 1 ".
strchr() is a function that returns always a legal value. If strchr fails
it will return (char*)0 ( regardless of the machine or the compiler ),
and here it's legal to add '1' ( the result is (char*)1 ).

----------------------------------------------------------------------------
Wolfgang Deifel
Dr. Ruff Software GmbH, 5100 Aachen, Juelicherstr. 65-67, W-Germany
uucp: ...!uunet{!mcvax}!unido!rwthinf!ruso!wolfgang - phone : +49 241 156038

bph@buengc.BU.EDU (Blair P. Houghton) (08/15/89)

In article <474.nlhp3@oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes:
>Article <3726@buengc.BU.EDU> by bph@buengc.bu.edu (Blair P. Houghton) says:
>>				  "Y'see, I was trying to
>>				   add two NULL pointers
>>				   together, and..."
>
>Can you add two pointers ?  [...]  :-)

Well, _I_ can.  :-)

				--Blair
				  "But not in this dump."

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/16/89)

In article <828@ruso.UUCP> wolfgang@ruso.UUCP (Wolfgang Deifel) writes:
>it will return (char*)0 ( regardless of the machine or the compiler ),
>and here it's legal to add '1' ( the result is (char*)1 ).

What is this, a time-shift phenomenon?  We keep getting a sprinkling
of articles making this incorrect claim.  It is NOT legal to perform
arithmetic on a null pointer.

chris@mimsy.UUCP (Chris Torek) (08/16/89)

In article <828@ruso.UUCP> wolfgang@ruso.UUCP (Wolfgang Deifel) writes:
>I think it's a difference if you write " NULL + 1 " ( which is non-
>portable C, NULL is a machine dependent macro ) and " strchr(...) + 1 ".

There is a difference.  However:

>strchr() is a function that returns always a legal value. If strchr fails
>it will return (char*)0 ( regardless of the machine or the compiler ),
>and here it's legal to add '1' ( the result is (char*)1 ).

this is what we just got finished saying is false: it is NOT legal to
add 1 to (char *)0; indeed, it is not legal to add 1 to any of the
various infinite varieties of nil pointer.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

ruud@targon.UUCP (Ruud Harmsen) (08/16/89)

In article <18996@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>Consider:  If we write
>	char *cp;
>	int *ip;
>	ip = cp;
>
>the compiler must issue some kind of diagnostic (it says so in the
>proposed ANSI C specification, and it says in K&R-1 that this operation
>is machine-dependent, ...

I suppose this is machine-dependent because of alignment: char-pointers can
point to just about anywhere, but int-pointers on many machines have to be
aligned properly.  My question is: can I make sure in my program, that
though generally non-portable this IS portable?  I tried this once in the
following way:
The char-pointer gets its value from malloc, which the manual says gives
pointers properly aligned for any type.  I never change that char-pointer
other than by adding multiples of sizeof(int) to it.
Is a "ip = cp" guaranteed safe under these conditions, so can I ignore
the compiler-warning?

ccdn@levels.sait.edu.au (DAVID NEWALL) (08/17/89)

A while ago, ccdn@levels.sait.edu.au (That's me!) wrote:
>                 do
>                    ...
>                 while ((s = strchr(s, ',') + 1) - 1);
>
> I've been told that this is not valid C

Thanks, everyone, for your opinions.  I'll remember the rule in future:
(Offsets to NULL are non-portable, and should never be used).


David Newall                     Phone:  +61 8 343 3160
Unix Systems Programmer          Fax:    +61 8 349 6939
Academic Computing Service       E-mail: ccdn@levels.sait.oz.au
SA Institute of Technology       Post:   The Levels, South Australia, 5095

dfp@cbnewsl.ATT.COM (david.f.prosser) (08/18/89)

In article <597@targon.UUCP> ruud@targon.UUCP (Ruud Harmsen) writes:
>I suppose this is machine-dependent because of alignment: char-pointers can
>point to just about anywhere, but int-pointers on many machines have to be
>aligned properly.  My question is: can I make sure in my program, that
>though generally non-portable this IS portable?  I tried this once in the
>following way:
>The char-pointer gets its value from malloc, which the manual says gives
>pointers properly aligned for any type.  I never change that char-pointer
>other than by adding multiples of sizeof(int) to it.
>Is a "ip = cp" guaranteed safe under these conditions, so can I ignore
>the compiler-warning?

Almost.  Strictly speaking, malloc must return a pointer to an object that
can be accessed by a type commensurate with its size in bytes.  For example,
``malloc(1)'' need not return a pointer that is appropriately aligned for a
pointer-to-int.

Moreover, it may well be possible to argue that unless the requested size
is a multiple of the size of an int, the returned pointer need not be
aligned appropriately for an int.  For example, ``malloc(5)''.

However, the rest of your conditions are sufficient for the guarantee of
correct behavior.

Dave Prosser	...not an official X3J11 answer...

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/19/89)

In article <597@targon.UUCP> ruud@targon.UUCP (Ruud Harmsen) writes:
>Is a "ip = cp" guaranteed safe under these conditions, so can I ignore
>the compiler-warning?

If you use a cast it is.

casper@betty.fwi.uva.nl (Casper H.S. Dik) (08/20/89)

In article <597@targon.UUCP> ruud@targon.UUCP (Ruud Harmsen) writes:
>
>I suppose this is machine-dependent because of alignment: char-pointers can
>point to just about anywhere, but int-pointers on many machines have to be
>aligned properly.  My question is: can I make sure in my program, that
>though generally non-portable this IS portable?  I tried this once in the
>following way:
>The char-pointer gets its value from malloc, which the manual says gives
>pointers properly aligned for any type.  I never change that char-pointer
>other than by adding multiples of sizeof(int) to it.
>Is a "ip = cp" guaranteed safe under these conditions, so can I ignore
>the compiler-warning?

No. It is not safe. If you ever want to run your program on a Data General
MV, among others, you should use "ip = (int *) cp".

Since pointers to anything except char are word aligned on MV machines,
they decided that they could drop the last bit of the address and shift it.

A char pointer pointing to the second byte of memory is represented with
0x2. A word pointer to the same location is represented by 0x1.

This gave problems when porting programs. Most programmers write 
"newp = (type *) malloc (sizeof type)"

but many forget the cast to char with free:

      "free(oldp)" instead of "free((char *) oldp)"

This works fine in most cases, but not on machines that shift pointers
when casting.

  --cd
Casper H.S. Dik				VCP/HIP: +31205922022
University of Amsterdam     |		casper@fwi.uva.nl
The Netherlands             |		casper%fwi.uva.nl@hp4nl.nluug.nl

ruud@targon.UUCP (Ruud Harmsen) (08/22/89)

In article <781@janus.UUCP> casper@fwi.uva.nl (Casper H.S. Dik) writes:
>> The char-pointer gets its value from malloc, and I never change that char-
>> pointer other than by adding multiples of sizeof(int) to it.  Is a "ip =
>> cp" guaranteed safe under these conditions, so can I ignore the compiler-
>> warning?
>No. It is not safe. If you ever want to run your program on a Data General
>MV, among others, you should use "ip = (int *) cp".

You're right, of course.  As a matter of fact, I did use the cast in my
program.  Sorry I didn't mention that in the original article.

	Ruud Harmsen