[net.lang.c] Cryptic C code?

bobc@tektools.UUCP (Bob Crane) (08/09/85)

I was looking through the book, _The C Programming Language_,
and came across something very disturbing.

In chapter 5, pg. 101 it says:

   As the final abbreviation, we again observe that a comparison 
   against \0 is redundant, so the function is often written as

      strcpy(s, t)  /* copy t to s; pointer version 3 */
      char *s, *t;
      {
	 while (*s++ = *t++)
	    ;
      }
   
   Although this may seem cryptic at first sight, the notational
   convenience is considerable, and the idiom should be mastered,
   if for no other reason than that you will see it frequently in
   C programs.

Yeaacch!!!!!!  It was still very cryptic to me the tenth time that I read
it!!!  A friend explained it to me by saying that the character in the
'while' expression is converted to an int and that the NULL character has
an ascii value of 0 so the test will exit when the NULL character is
encountered.

I have trouble believing that the above has advantages of great
speed OR readability over:

   strcpy(s,t)  /* copy t to s; pointer version 2 */
   char *s, *t;
   {
      while ((*s++ = *t++) != '\0')
	 ;
   }

Does anyone out there support the author by saying that Version 3 of
'strcpy' is better than Version 2?

Bob Crane
!tektronix!tektools!bobc
(503)627-5379

gwyn@BRL.ARPA (VLD/VMB) (08/11/85)

First, the (char) in the while(expression) is NOT converted to an (int)
in case 3; it is tested against zero directly.  In case 2 it is
converted to (int) for the comparison against '\0'.

I think case 2 is certainly more readable, but as the book says, you
need to learn to read things like case 3 since a lot of code is like
that.  More usually one will see something like
	char *s;
	...
	while ( *s++ )
		...
This really is a standard C idiom, although I don't recommend writing
code that way.  I personally prefer to distinguish between Boolean
expressions (such as comparisons) and arithmetic expressions, using
strictly Boolean expressions as conditions.  Thus:
	while ( *s++ != '\0' )
or even
	while ( (int)*s++ != '\0' )
The typecast is perhaps overly fussy; it is not required by the
language rules and may detract from readability.

Tests for NULL pointers and flags often are written
	if ( p )
		...
	if ( flag & BIT )
		...
rather than
	if ( p != NULL )
		...
	if ( (flag & BIT) != 0 )
		...
(I prefer the latter.)  Get used to it..

nather@utastro.UUCP (Ed Nather) (08/11/85)

> Does anyone out there support the author by saying that Version 3 of
> 'strcpy' is better than Version 2?
> Bob Crane

No.

-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
nather%utastro.UTEXAS@ut-sally.ARPA

ark@alice.UUCP (Andrew Koenig) (08/12/85)

      strcpy(s, t)  /* copy t to s; pointer version 3 */
      char *s, *t;
      {
	 while (*s++ = *t++)
	    ;
      }
   
   strcpy(s,t)  /* copy t to s; pointer version 2 */
   char *s, *t;
   {
      while ((*s++ = *t++) != '\0')
	 ;
   }

> Does anyone out there support the author by saying that Version 3 of
> 'strcpy' is better than Version 2?


Yes.

In version 3, I am saying that the character that terminates a string
is the same character is that is the implicit subject of an unstated
comparison in a `while' statement.  In version 2, the string terminator
is an explicitly stated constant.  Viewed that way, the two versions
are equivalent only by coincidence.

darryl@ISM780.UUCP (08/12/85)

>   Although this may seem cryptic at first sight, the notational
>   convenience is considerable, and the idiom should be mastered,
>   if for no other reason than that you will see it frequently in
>   C programs.
>
>I have trouble believing that the above has advantages of great
>speed OR readability over:

Note that K&R didn't say that the terse form had speed or readability
advantages; their comment was that the lack of keystrokes overrode
other considerations, once you got used to it.  They were writing
code using the ed editor on 110 or 300 baud terminals;  anything that
cut down the number of keystrokes was a big win.  If you don't like
the popular idioms in C, no says (well, at least, I don't) you have
to use them.  But you'd better get used to them, 'cause you'll see
them a lot.

Hnery Spencer aside, there does not seem to be a great force in the
C community to throw these idioms out of the language or of common use.
I suggest that they are here to stay;  if you don't like them, you're
going to be lumping them for a long time to come.


	    --Darryl Richman, INTERACTIVE Systems Corp.
	    ...!cca!ima!ism780!darryl
	    The views expressed above are my opinions only.

haahr@siemens.UUCP (08/12/85)

Relevant code: (Kernighan & Ritchie, chapter 5, page 105)

     strcpy(s, t)  /* copy t to s; pointer version 3 */
     char *s, *t;
     {
 	 while (*s++ = *t++)
 	    ;
     }
  

Bob Crane (tektools!bobc) writes:
>    ... [text from K&R, everyone owns it, no point quoting again] ...
> 
> Yeaacch!!!!!!  It was still very cryptic to me the tenth time that I read
> it!!!  A friend explained it to me by saying that the character in the
> 'while' expression is converted to an int and that the NULL character has
> an ascii value of 0 so the test will exit when the NULL character is
> encountered.
> 
> I have trouble believing that the above has advantages of great
> speed OR readability over:
> 
>    strcpy(s,t)  /* copy t to s; pointer version 2 */
>    char *s, *t;
>    {
>       while ((*s++ = *t++) != '\0')
> 	 ;
>    }
> 
> Does anyone out there support the author by saying that Version 3 of
> 'strcpy' is better than Version 2?

I do.  Why?  Read on.

Doug Gwyn (brl-tgr!gwyn) responds:
> I think case 2 is certainly more readable, but as the book says, you
> need to learn to read things like case 3 since a lot of code is like
> that.  More usually one will see something like
> 	char *s;
> 	...
> 	while ( *s++ )
> 		...
> This really is a standard C idiom, although I don't recommend writing
> code that way.  I personally prefer to distinguish between Boolean
> expressions (such as comparisons) and arithmetic expressions, using
> strictly Boolean expressions as conditions.  Thus:
> 	while ( *s++ != '\0' )
> 
> Tests for NULL pointers and flags often are written
> 	if ( p )
> 		...
> 	if ( flag & BIT )
> 		...
> rather than
> 	if ( p != NULL )
> 		...
> 	if ( (flag & BIT) != 0 )
> 		...
> (I prefer the latter.)  Get used to it..

The two pieces are different in terms of the abstraction presented.
	while (*s++ != ANYTHING) ...
This code looks for some character in a string.  In C, the character '\0'
is the character after the last character in a string, so when you find
that character, you have reached the end of the string.  It is an idiom that
we have all gotten used to, knowing to look for '\0'.  On the other hand:
	while (*s++) ...
loops through a string until the first 'false' character.  Now what does
falseness for a character mean?  A logical (and, in the case of C, correct)
interpretation is that we have reached the end of a string.

With the case of (p != NULL) I can understand Doug's argument a little bit
better, because NULL is a better abstraction for pointer to nothing (i.e.
end of a list) than '\0' is for end of a string.  But code like
	while (p->next) ...
says "while there is a pointer after p on the list" very clearly.

The (flag & BIT) comparison is also easier for me to understand than
the explicit test because it allows me to forget about the low-level
bit-twiddling that is going on, and worry about the actual test.

Now, the hard case is the one Bob brought up.
	while (*s++ = *t++) ...
looks very much like the
	while (*s++ == *t++) ...
one would expect from strcmp or similar functions.  I think a comment
or something in this case is much more help than the explicitly redundant
comparison against zero.  This is a matter of personal preference.  The
reason I wouldn't put the "!= '\0'" in this code is that it doesn't tell
you anything, unless you are used to a convention that says something like
"thou shalt always compare everything but explicit tests to 0."  But
putting in a '\0' test won't even make lint complain on the one where it
doesn't belong.  Again, with the possible confusion brought up because
of the = and == operators, maybe one should take special care with tests like
this one.

The idea of abstracting a test beyond explicitly testing for zero is nice
and C is not the first language to do it.  Bjarne Stroustrup recognized this
and included in C++ (as part of the general overloading capability), the
ability to overload comparisons, and retained the convention that an if
is an implicit comparison against zero.  Any class can be the object of
an if or while and the appropriate comparison operator is called.  A
conditional of the form
	while (cin) ...		// cin is the stream associated with stdin
while fail when there is no more input on cin.  Exactly what one would
expect.  While
	while (cin.state != eof && cin.state != fail) ...
(or whatever it is exactly -- I forget) tells you explitly what it is
doing, it tells you more than you normally need to know.

Because the values that fail tests in C (null pointer, character beyond end
of string) are logical and consistent, they provide a nice abstraction beyond
worrying about what should be implementation details (i.e. '\0' is the
end of a string, NULL is the pointer to nothing).

					Paul Haahr
					..!princeton!macbeth!haahr

jeq@laidbak.UUCP (Jonathan E. Quist) (08/15/85)

>Relevant code: (Kernighan & Ritchie, chapter 5, page 105)
>
>     strcpy(s, t)  /* copy t to s; pointer version 3 */
>     char *s, *t;
>     {
> 	 while (*s++ = *t++)
> 	    ;
>     }
>  
>
>Bob Crane (tektools!bobc) writes:
>>    ... [text from K&R, everyone owns it, no point quoting again] ...
>> 
>> Yeaacch!!!!!!  It was still very cryptic to me the tenth time that I read
>> it!!!  A friend explained it to me by saying that the character in the
>> 'while' expression is converted to an int and that the NULL character has
>> an ascii value of 0 so the test will exit when the NULL character is
>> encountered.
>> 
>> I have trouble believing that the above has advantages of great
>> speed OR readability over:
>> 
>>    strcpy(s,t)  /* copy t to s; pointer version 2 */
>>    char *s, *t;
>>    {
>>       while ((*s++ = *t++) != '\0')
>> 	 ;
>>    }
>> 
>> Does anyone out there support the author by saying that Version 3 of
>> 'strcpy' is better than Version 2?

Personally, I find Version 3 easier to deal with, but then,
I learned C after years of assembly language bit-twiddling.
Version 3 (and many other "standard" C constructs) happens
to be the form I settled on in various assembly language
implementations with various microprocessors.  In my case,
compactness of code and speed were of utmost importance.
(In some cases, saving 10 bytes of instructions meant
the difference between using a 2K or 4K EPROM.)
In working on things like tty drivers and such,
I found forms without the "extra" comparison easier to live
with while scanning unfamiliar code, because, though cryptic,
it is compact and unambigous, and after a while (i.e. with experience),
I think is is easier to recognize (*s++ = *t++) than to
see ((*s++ = *t++) != '\0') and stop to think `what are they
comparing?'
I think it's more or less the same as finding it
easier to scan through lower case comments (as opposed to
all upper case) to find something I wrote "some time back."
I suspect that this has something to do with the fact that
there is more size contrast between lower case letters than
the same in upper case.  This is only my personal theory.

I don't mean to flame those who prefer version 3, this
is just my own preference.

As to efficiency, that would depend upon the hardware and
the cleverness of the compiler.  If the machine sets
a zero flag when a '\0' is transferred, then a
"branch if zero" type of intruction can be immediately executed,
without additionally comparing the character to 0.
Whether the compiler takes advantage of this is another matter...

Jonathan E. Quist
Lachman Associates, Inc.
ihnp4!laidbak!jeq
``I deny this is a disclaimer.''

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (08/15/85)

Well, yes, but.  Abstract objects (such as input data streams)
can have more than one interesting predicate.  What would
testing for the "truth" of such an object mean?  Clearly you
would have to include a (predicate) selection operation, and
that isn't notably different from just writing the boolean
expression (predicate) that one has in mind.  Just packaged
differently.

guy@sun.uucp (Guy Harris) (08/19/85)

> As to efficiency (of "if (*p++ = *q++)" vs "if ((*p++ = *q++) == '\0')"
> - gh), that would depend upon the hardware and
> the cleverness of the compiler.  If the machine sets
> a zero flag when a '\0' is transferred, then a
> "branch if zero" type of intruction can be immediately executed,
> without additionally comparing the character to 0.
> Whether the compiler takes advantage of this is another matter...

What??  I dunno about your compilers, but every one I've worked with
generates the exact same code for both constructs; some may even convert the
first to the second in their internal representation (it's been too long
since I've poked inside PCC to remember).  It doesn't take much in the way
of compiler technology to make the efficiency issue irrelevant in this case.

	Guy Harris