[comp.lang.c] Writing readable code

stevesu@copper.TEK.COM (Steve Summit) (06/23/87)

In article <6714@auspyr.UUCP>, mick@auspyr.UUCP (Mick Andrew) writes:
> As far as typedefs go, I fall into the "anti" camp.  When supporting
> unfamiliar code, the following sequences drive me crazy
> 
> func()
> {
> sometype  var;
> }
> 
> Hmm, search for "sometype".  In an (always obscure :-) include file we find
> 
> typedef struct s  sometype;
> 
> Ah ha, "var" is actually a structure...  now where is that structure
> definition... you get the idea.  By using a typedef (or even in this case
> a #define), the program becomes less readable.

And in article <5999@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> >     extern   func_ptr_func_int  getfunc ;
> 
> The problem with this is that anyone reading your code would have to
> untangle the several levels of typedef in order to determine the meaning
> of this declaration, whereas with
> 	extern int (*getfunc())();
> the meaning of the declaration is patent.  This is not a particularly
> complicated construct; anyone who has to read C code should learn how
> type specification works in C, after which this is easy to understand.

It's unfortunate that people are finding reasons to deprecate
typedefs, which can be used to substantially _i_n_c_r_e_a_s_e the
readability and portability of code.  The two complaints above
follow from the fact that typedefs, like many features of C, can
be (ab)used, with devastating effect, to make code absolutely
impenetrable.

Many of the things being discussed on this newsgroup (including
especially the bizarre ideas about enums) simply wouldn't come
up if people would be content to write their programs simply and
straightforwardly, instead of treating every new programming
opportunity as a chance to enter the obfuscated C hall of
fame.

If the program you are dealing with has been conscientiously
written (and yes, I know, many programs are not) then the
definition for "sometype" should be in some obviously-named
header file like "sometype.h", and even if it's not obvious, it
should require only a few seconds with grep to find it, if you
really need to know.  When you come across something like

	extern   func_ptr_func_int  getfunc ;

it's a safe bet, especially when you glance down at how getfunc
is used, that it is a pointer to a function returning an int,
unless whoever wrote it was deliberately trying to make your life
miserable.

A friend of mine actually objects to

	#define TRUE 1
	#define FALSE 0

because whenever he comes across something like

	while(TRUE)

he claims he has to go fishing around to find out how TRUE is
defined.  Now, if you're dealing with a mentality that would
do something like

	#define TRUE 0		/* don't try this at home, kids */

then you've got major problems, and it wouldn't help you if that
misguided individual had avoided typedefs and/or #defines.

(I will admit that the examples quoted in Mick's and Doug's
postings could have been written more clearly, but the problem is
not with the typedefs per se.  Although typedeffed structures can
certainly be confusing, they can also provide a useful abstraction,
especially at a library interface, where you aren't supposed to
have to know what the insides of the structure look like.)

There appears to be a widespread notion, a sort of overzealous
application of Occam's Razor, that programs are to be judged
with the wc utility, and that, for a given algorithm, the source
file with the fewest characters shall be deemed best.  This leads
to constructions like

	char *p;

	if(p) ...

which are so endlessly discussed in this newsgroup.  My only
complaint with K&R, which is otherwise as close to prefection as
I expect to find in this industry, is that it appears to condone
that sort of usage:

	Although this may seem cryptic at first sight, the
	notational convenience is considerable, and the idiom
	should be mastered, if for no other reason than that you
	will frequently see it in C programs.  [1]

I have no idea what "notational convenience" is, but I suspect it
has to do with keeping that character count down.  The right
thing to try to keep down in your programs is needless complexity,
by doing everything you can to make the code comprehensible to
someone (including yourself) who might have to make sense of it
later.  The well-considered #include, #define, typedef, or even
(heavens!) goto can work wonders towards improving the
readability of code.  Discouraging the use of these constructs
only makes it harder to write clean code, and doesn't really slow
down the antagonistic programmers out there who seem to think
they'll be violated if anyone can penetrate their code.

It seems that there are two schools of thought on this.  A
large body of opinion (suggested by Doug's comment that "anyone
who has to read C code should learn how type specification works,"
holds that, for us Real Programmers, any attempt to demystify C
is akin to quiche-eating.  Given the current state of the "art,"
it is true that everyone who wants to read existing C code should
learn about type specification in all its gory detail, but I wish
they didn't _h_a_v_e to.

Let me make some confessions.  I consider myself an excellent C
programmer, but:

     a)	I still don't completely understand _a_l_l the intricate nuances
	of the more obscure C types.  I can usually come pretty
	close, but for the really strange ones, like char (*b)[6],
	I always check myself with a mindless automaton like cdecl.
	Interestingly enough, the _o_n_l_y time I have to use cdecl is
	when I'm reading the the fascinating but artificial posers
	in this newsgroup; those bizarre type constructions just
	don't come up that often in the real programs I work with.

     b)	I actively prefer

		if(p != NULL)
	to
		if(!p)

	I know that they are 100% equivalent as far as a correct
	compiler is concerned (and they are; let's not get
	started on that again), and I have "mastered the idiom,"
	but the second form takes enough extra thought, and is
	confusing enough to the uninitiated, that it just plain
	isn't worth the seven characters saved.

The following quote, from _T_h_e _E_l_e_m_e_n_t_s _o_f _S_t_y_l_e, explains very
well what I am trying to say:

	The practical objection to unaccepted and oversimplified
	spellings is the disfavor with which they are received by
	the reader.  They distract his attention and exhaust his
	patience.  He reads the form "though" automatically,
	without thought of its needless complexity; he reads the
	abbreviation "tho" and mentally supplies the missing
	letters, at a cost of a fraction of his attention.  The
	writer has defeated his own purpose.  [2]

Now, I'm sure that a lot of you disagree with this (as I said,
there seem to be two camps, and mine may be the minority), so
please don't flame me on the network, unless you feel you must
save others from my corrupting influence.  (Private flames are
always welcome, and occasionally replied to.)

                                           Steve Summit
                                           stevesu@copper.tek.com

References:
[1] Kernighan and Ritchie, _T_h_e _C _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, p. 101
[2] Strunk and White, _T_h_e _E_l_e_m_e_n_t_s _o_f _S_t_y_l_e, Third Edition, p. 75

ddl@husc6.UUCP (Dan Lanciani) (06/24/87)

In article <1158@copper.TEK.COM>, stevesu@copper.TEK.COM (Steve Summit) writes:
|      b)	I actively prefer
| 
| 		if(p != NULL)
| 	to
| 		if(!p)
| 
| 	I know that they are 100% equivalent as far as a correct
| 	compiler is concerned (and they are; let's not get
| 	started on that again), and I have "mastered the idiom,"
| 	but the second form takes enough extra thought, and is
| 	confusing enough to the uninitiated, that it just plain
| 	isn't worth the seven characters saved.

	Are you absolutely sure that you have "mastered the idiom?" :-)
(Sorry, I couldn't resist.)

					Dan Lanciani
					ddl@harvard.*

mick@auspyr.UUCP (Mick Andrew) (06/26/87)

in article <1158@copper.TEK.COM>, stevesu@copper.TEK.COM (Steve Summit) says:

[ All kinds of good stuff deleted. ]

Steve Summit made an excellent posting regarding writing clear, understandable
code.  More good sense and useful observations than have been seen on the 
net in many moons.

I agree 99.9% with all of his comments on coding style, especially
the awful habit of making code "better" by reducing the total character count.

In the spirit of
 		if(p != NULL)  vs.    if(!p)

I offer one of my favourite obfuscated one liners:

		if (strcmp(s1, s2) == 0)     vs.   if (!strcmp(s1, s2))


In general, I try to use the yardstick, 
"if it seems like a neat trick, don't do it!"

-- 
----
Mick    Austec, Inc.,   San Jose, CA 
	{sdencore,necntc,cbosgd,amdahl,ptsfa,dana}!aussjo!mick
	{styx,imagen,dlb,gould,sci,altnet}!auspyr!mick

chris@mimsy.UUCP (Chris Torek) (06/27/87)

In article <1158@copper.TEK.COM> stevesu@copper.TEK.COM (Steve Summit) writes:
>It's unfortunate that people are finding reasons to deprecate
>typedefs, which can be used to substantially *increase* the
>readability and portability of code.  The two complaints above
>follow from the fact that typedefs, like many features of C, can
>be (ab)used, with devastating effect, to make code absolutely
>impenetrable.

This connects with my own distrust of adding strong typing to C.
Here is a real live example.  Looking in <stdio.h>, we find

	struct _iobuf { ... };
	...
	#define FILE struct iobuf

(Here the define is much like a typedef; either would do.)  This
virtually (but not in fact) adds a type to the set of types the
programmer must understand.  `FILE' is such a well-known type that
no one has any more trouble with it than with `int'.

For reasons that I shall not explain, I have decided to change the
internals of stdio such that user code can `open' a set of I/O
functions.  Stdio will call these functions instead of `read' and
`write', `seek', and `close'.  For instance:

	struct info info;	/* where to write, etc. */
	void *p = &info;	/* p will be given to writefn */
	int writefn();		/* and writefn will write things */
	FILE *fp = write_function_open(p, writefn);

	... fprintf(fp, fmt, ...) ...

would open `writefn' for writing, and would call `writefn(p, buf, n)'
instead of `write(fileno(fp), buf, n)'.  This can be used for a
dynamically allocating `sprintf', or for a curses-style `wprintf',
or whatever.  But this is not properly typed: writefn() is not a
function of no arguments.  This:

	int writefn(void *p, const char *buf, int n);

is the proper declaration.

But stdio can do more than just write: it can read, and it can
seek; and it eventually needs to close the `file'.  So the general
form is this:

	FILE *funopen();

Whoops, we forgot the types of the arguments.

	FILE *funopen(void *p, int (*readfn)(), int (*writefn)(),
		long (*seekfn)(), int (*closefn)());

All these parentheses are annoying, so let us add a few typedefs:

	typedef int (*iofun)();
	typedef long (*seekfun)();
	FILE *funopen(void *p, iofun readfn, iofun writefn,
		seekfun seekfn, iofun closefn);

But wait, this is not right!  The arguments to close are not the
same as those to read!  We need to declare *everything*, to get
those types right.

	typedef int (*iorfp)(void *p, char *buf, int n);
	typedef int (*iowfp)(void *p, const char *buf, int n);
	typedef long (*ioseekfp)(void *p, long off, int whence);
	typedef int (*ioclosefp)(void *p);
	FILE *funopen(void *p, iorfp readfn, iowfp writefn,
		ioseekfp seekfn, ioclosefp closefn);

Well, we finally did it.  But look at the cost:  Four typedefs just
for the arguments to `funopen'.  It really takes four; all four
are different.  We could get away with three, by lying and pretending
that the write function is allowed to clobber the contents of *buf:

	typedef int (*iorwfp)(void *p, char *buf, int n);

but if we are going to use strong typing at all, we should keep
the two distinct.  Well, we could discard the typedefs entirely,
since they are just aliases and the declaration of an ior function
will have to match the use of the iorfp:

	FILE *funopen(
		void *p,
		int (*readfn)(void *p, char *buf, int n),
		int (*writefn)(void *p, const char *buf, int n),
		long (*seekfn)(void *p, long off, int whence),
		int (*closefn)(void *p));

It is hard to say which is worse.  This does not steal away any
of the global namespace, as do the typedefs, but it is horribly
monolithic.

This demonstrates what happens with strongly typed systems:  They
lead to a profusion of types, and it becomes difficult to keep them
straight.  Does database_write call an iowfp function, or did we
put a database type layer over it, and it calls a dbiowfp? Of
course, these types are indeed different, and calling an iowfp
function when you were supposed to call a dbiowfp is likely to be
a drastic error.  So all this strong typing may be good.  But
it certainly does not look like C anymore.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

g-rh@cca.CCA.COM (Richard Harter) (06/27/87)

In article <6858@auspyr.UUCP> mick@auspyr.UUCP (Mick Andrew) writes:
>
>I agree 99.9% with all of his comments on coding style, especially
>the awful habit of making code "better" by reducing the total character count.
>
>In the spirit of
> 		if(p != NULL)  vs.    if(!p)
>
>I offer one of my favourite obfuscated one liners:
>
>		if (strcmp(s1, s2) == 0)     vs.   if (!strcmp(s1, s2))
>
>
>In general, I try to use the yardstick, 
>"if it seems like a neat trick, don't do it!"

	Speaking as one who distinctly falls into the "not minimizing the
total character count" school, I fear I must disagree with these specific
examples.  When I see 

	if (!p)

I read it as

	if p is not valid then ...

The (!p) syntax tells me that p is among the class of items that may be
treated as boolean (under the C language conventions) and that we are
testing whether it is false.  This is not a matter of "saving characters";
it is a matter of classification.  When I see

	if (p != NULL)

it tells me two rather different things.  First of all it tells me that
p is an item for which there are one or more coded values, among which
is NULL, and that for all cases where p is not NULL, there is some action
to be taken.  Secondly it tells me that the file that the statement is in
includes stdio.h (or that the author of the code is a dweeb.)  And that
should tell me that the code in this file needs stdio.h, FOR I DO NOT
CONSIDER IT GOOD PROGRAMMING PRACTICE TO INCLUDE INCLUDE FILES WHICH ARE
NOT USED.

	This is not an issue of compactness; rather it is an issue of
clarity of format.  Different people, different strokes, and all that.
My vote is for simple clear code, generously commented.  But clarity is
relative -- if you understand the principles of a particular algorithm
or language or the standard conventions of a large program then a piece
of code may be quite clear to you and cryptic to someone without that
understanding.  And I do not really see that one can claim to understand
C and also claim that 'if (!p)' is obscure.
-- 

Richard Harter, SMDS Inc. [Disclaimers not permitted by company policy.]

guy%gorodish@Sun.COM (Guy Harris) (06/28/87)

> When I see 
> 
> 	if (!p)
> 
> I read it as
> 
> 	if p is not valid then ...
> 
> The (!p) syntax tells me that p is among the class of items that may be
> treated as boolean (under the C language conventions) and that we are
> testing whether it is false.  This is not a matter of "saving characters";
> it is a matter of classification.

But what does it mean to say that a pointer is "false"?  Pointers
themselves really aren't Boolean; there is a boolean predicate *on* a
pointer, namely the "is this pointer null" predicate.  You could view
the construct "p", used in a context that requires a Boolean
expression, as really meaning "is_non_null(p)", and "!p" as meaning
"!is_non_null(p)", or "is_null(p)".

> When I see
> 
> 	if (p != NULL)
> 
> it tells me two rather different things.  First of all it tells me that
> p is an item for which there are one or more coded values, among which
> is NULL, and that for all cases where p is not NULL, there is some action
> to be taken.

All of which happen to be the case for pointers.  Are you arguing
that "!p" is somehow better than "p != NULL"?  This is a matter of
taste; if you do not view "!p" as shorthand for "is_null(p)", then "p
== NULL" makes more sense as a way of writing "is_null(p)", and many
good programmers do not view "!p" as such a shorthand.

> Secondly it tells me that the file that the statement is in
> includes stdio.h (or that the author of the code is a dweeb.)  And that
> should tell me that the code in this file needs stdio.h, FOR I DO NOT
> CONSIDER IT GOOD PROGRAMMING PRACTICE TO INCLUDE INCLUDE FILES WHICH ARE
> NOT USED.

OK, so either:

	1) stick

		#define	NULL	0

	   at the front of all modules not including <stdio.h>

	2) say "if (p == 0)" instead of "if (p == NULL)"

	3) in an ANSI C implementation, stick

		#include <stddef.h>

	   at the front of your module; you *are* using at least one
	   of the items it defines, namely NULL.

Also, you didn't address the issue of

	if (!strcmp(str1, str2))

which I find less defensible.  "strcmp" is not really boolean; it's a
function from the set of strings to the set of "int"s, such that
there are three predicates on the result with the following
properties:

	strcmp(str1, str2) == 0 iff str1 == str2
	strcmp(str1, str2) > 0 iff str1 > str2
	strcmp(str1, str2) < 0 iff str1 < str2

(where something like "str1 == str2" refers to equality in the sense
of string equality, *not* in the sense of pointer equality) and, as
such, there are several predicates to be applied to the result of
"strcmp" that model various predicates that can't be applied directly
to strings in C.  Given that, "!strcmp(str1, str2)" is "meaningful"
in the sense that the semantics of C give it a meaning; however, it isn't
"meaningful" in the sense that it very clearly suggests what you're
testing.  "!strcmp(str1, str2)" means that the two strings are equal,
but someone not reading the expression carefully and just seeing the
"!" might see it as testing whether they were *un*equal.

If, however, there were a macro "streq", defined as

	#define	streq(str1, str2)	(strcmp(str1, str2) == 0)

(we ignore the efficiency issue here, and don't stick in the
recently-much-discussed optimization of comparing the first two
characters) one could write "streq(str1, str2)" and it would be more
clear that it was testing whether the strings were equal or not.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

g-rh@cca.CCA.COM (Richard Harter) (06/28/87)

RH: Richard Harter
GH: Guy Harris

In article <22250@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
RH:  When I see 
RH:  
RH:  	if (!p)
RH:  
RH:  I read it as
RH:  
RH:  	if p is not valid then ...
RH:  
RH:  The (!p) syntax tells me that p is among the class of items that may be
RH:  treated as boolean (under the C language conventions) and that we are
RH:  testing whether it is false.  This is not a matter of "saving characters";
RH:  it is a matter of classification.

GH: But what does it mean to say that a pointer is "false"?  Pointers
GH: themselves really aren't Boolean; there is a boolean predicate *on* a
GH: pointer, namely the "is this pointer null" predicate.  You could view
GH: the construct "p", used in a context that requires a Boolean
GH: expression, as really meaning "is_non_null(p)", and "!p" as meaning
GH: "!is_non_null(p)", or "is_null(p)".

	Well now, C doesn't really have boolean types, but it does have
a convention that anything which is not zero tests true and any thing
which is zero tests false, and the convention that a null pointer (which
we all know is 0x6000000 :-)) behaves on assignment and testing as zero.
So the expression "!p" in "if (!p)" is "null pointer", i.e. not a potent-
ionally legal pointer.  Since that is what I am asking, I see the construct
as saying what I mean.

RH:  When I see
RH:  
RH:  	if (p != NULL)
RH:  
RH:  it tells me two rather different things.  First of all it tells me that
RH:  p is an item for which there are one or more coded values, among which
RH:  is NULL, and that for all cases where p is not NULL, there is some action
RH:  to be taken.

GH: All of which happen to be the case for pointers.  Are you arguing
GH: that "!p" is somehow better than "p != NULL"?  This is a matter of
GH: taste; if you do not view "!p" as shorthand for "is_null(p)", then "p
GH: == NULL" makes more sense as a way of writing "is_null(p)", and many
GH: good programmers do not view "!p" as such a shorthand.

	Well, as you say, it is a matter of taste.  Many good programmers
use "p != NULL" and many equally good programmers use "!p".  However let
us define FALSE as zero explictly and ask which is more reasonable to 
say "if (p == FALSE)" or "if (!p)", or, in English, 

	if p is false then ... vs
	if not p then

This is truly a matter of taste; many people would find the first form
to be clearer and more comprehensible.  Many other people would find the
the second form equally clear or clearer, and the first form to be redundant.
I suspect that people who have a background in formal logic will prefer the
second form.  From my viewpoint, the form "!p" asserts that "p" is in the
large class of things which takes the values "true" (where true is any
non null) and "false" (null) and that I am testing on whether it is false
or not.  Note that under this viewpoint I don't care what the specific
values for "true" and "false" are -- I only care that the semantics
of the language provide for a "true" and a "false".

	Now the chap who prefers "p == FALSE" doesn't see it that way.
He sees it as a test on whether p has a specified value.  This, too,
is a perfectly legitimate approach.  However...

	I take exception to those who, only understanding their own
viewpoint and approach, treat any other approach based on some other
viewpoint as being bad code.  The person to whom I originally responded
supposed that "if (!p)" was a sloppy abomination used to save characters.
This is an intensely parochial view.

RH:  Secondly it tells me that the file that the statement is in
RH:  includes stdio.h (or that the author of the code is a dweeb.)  And that
RH:  should tell me that the code in this file needs stdio.h, FOR I DO NOT
RH:  CONSIDER IT GOOD PROGRAMMING PRACTICE TO INCLUDE INCLUDE FILES WHICH ARE
RH:  NOT USED.

GH: OK, so either:
GH: 
GH: 	1) stick
GH: 
GH: 		#define	NULL	0
GH: 
GH: 	   at the front of all modules not including <stdio.h>
GH: 
GH: 	2) say "if (p == 0)" instead of "if (p == NULL)"
GH: 
GH: 	3) in an ANSI C implementation, stick
GH: 
GH: 		#include <stddef.h>
GH: 
GH: 	   at the front of your module; you *are* using at least one
GH: 	   of the items it defines, namely NULL.

	Sorry Guy, this one I cannot buy.  You third alternative is not
acceptable for those of us who are writing code which runs on a variety
of implementations.  The first is a kludge.  The second is perfectly
acceptable.  In point of fact we actually defined lower case null as 0
in the stddef just to avoid conflicts with stdio.h.  We also have FALSE
defined as 0 (obviously).  So we have lots of choices.  However we don't
use NULL unless it is specifically referring to a null file pointer.

GH: Also, you didn't address the issue of
GH: 
GH: 	if (!strcmp(str1, str2))
GH: 
GH: which I find less defensible.  "strcmp" is not really boolean; it's a
GH: function from the set of strings to the set of "int"s, such that
GH: there are three predicates on the result with the following
GH: properties:
GH: 
GH: 	strcmp(str1, str2) == 0 iff str1 == str2
GH: 	strcmp(str1, str2) > 0 iff str1 > str2
GH: 	strcmp(str1, str2) < 0 iff str1 < str2
GH: 
GH: (where something like "str1 == str2" refers to equality in the sense
GH: of string equality, *not* in the sense of pointer equality) and, as
GH: such, there are several predicates to be applied to the result of
GH: "strcmp" that model various predicates that can't be applied directly
GH: to strings in C.  Given that, "!strcmp(str1, str2)" is "meaningful"
GH: in the sense that the semantics of C give it a meaning; however, it isn't
GH: "meaningful" in the sense that it very clearly suggests what you're
GH: testing.  "!strcmp(str1, str2)" means that the two strings are equal,
GH: but someone not reading the expression carefully and just seeing the

	Here I have to agree -- strcmp returns, according to the semantics
of C, a value of true if the strings are unequal and a value of false if
they are equal.  Accordingly, strcmp is not really in the category of
functions which match the semantics of C.  In this case, !strcmp is a
trick, which is an entirely different matter.  We seldom use the string
library [portability, you know -- we avoid library functions, other than
those we absolutely have to use] so I have never dealt with this particular
issue.
-- 

Richard Harter, SMDS Inc. [Disclaimers not permitted by company policy.]
			  [I set company policy.]

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/29/87)

In this silly debate, people have been making noises that sound as if
they thought !p and p!=NULL were equivalent.  They of course have
opposite meaning, which perhaps helps make the point that one should
"say what one means" rather than trying to take clever shortcuts.

ron@topaz.rutgers.edu.UUCP (06/29/87)

I have always wondered why people think NULL is more mnemonic than 0.
When I read them as exactly the same (and fortunately in C, they are
defined to be).  I also wonder about people who define TRUE to be any
thing, since it leads to things like
	if( bool == TRUE )
which is different than
	if(!bool)

Generally, I use !p when I'm dealing with things that are supposed
boolean values like

	if(!isdigit(c))

and comparison to zero for things that are supposed to be returning
a value
	if( malloc(100) == 0 )


MY PET PEEVES:

1.  Comparing error returns from UNIX syscalls to be less than zero.
    UNIX system calls that return ints, are usually defined to return
    -1 on error.  It drives me crazy to see code test for less than
    zero.  It doesn't say returns negative value on error, it says
    -1.

2.  Needless use of the comma operator and parenthesis to demonstrate
    manhood to the obliteration of code readability, e.g.

	    if((fd=open("foo",1)<0)

    SCREW this, too difficult, how about writing the code to indicate
    what is going on:

	    fd = open("foo", 1);
	    if(fd == -1)

-Ron

jas@rtech.UUCP (Jim Shankland) (06/29/87)

In article <6034@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In this silly debate, people have been making noises that sound as if
>they thought !p and p!=NULL were equivalent.  They of course have
>opposite meaning, which perhaps helps make the point that one should
>"say what one means" rather than trying to take clever shortcuts.

Good point.  Another example:  I frequently see code written on a VAX
that incorrectly says "if (*p)" instead of "if (p)" or "if (p !=
NULL)".  Granted, inexperienced C programmers are more likely to make
this mistake (then again, staying on the VAX, they never learn better);
I still suggest that it is less error-prone to use NULL for zero-valued
pointers, '\0' for the zero-valued character, and "if (e)" only if e is
CONCEPTUALLY boolean (i.e., it is a variable declared as the typedef
'bool',  or its main operator is a relop, or....).

This is, indeed, a silly debate, and I'm sure we will never come to
consensus; but years of moving other people's C code to dozens of
different machines has absolutely convinced me that being able to say:

    if (integer-or-pointer-valued-expr)

is an error-prone misfeature, and that we'd all be better off programming
as though it didn't exist.
-- 
Jim Shankland
 ..!ihnp4!cpsc6a!\
                  rtech!jas
..!ucbvax!mtxinu!/

mlandau@Diamond.BBN.COM (Matt Landau) (06/29/87)

In comp.lang.c (<13008@topaz.rutgers.edu>), ron@topaz.rutgers.edu writes:
>and comparison to zero for things that are supposed to be returning
>a value
>	if( malloc(100) == 0 )

I tend to use NULL in this case because it has the connotation of 
a zero valued *pointer*.  Thus, seeing NULL instead of 0 clues the
reader in to the fact that it's a pointer value that is being 
tested instead of an int or a long (in which case I'd want to see
the zero written as 0L anyway).

>MY PET PEEVES:
>
>1.  Comparing error returns from UNIX syscalls to be less than zero.
>    UNIX system calls that return ints, are usually defined to return
>    -1 on error.  It drives me crazy to see code test for less than
>    zero.  It doesn't say returns negative value on error, it says
>    -1.

It's an efficiency hack of sorts.  Comparison of a value to zero to
check for less than zero values is MUCH cheaper on most architectures
than comparison to an explicit -1 (which may require saving, overwriting, 
and reloading registers).  Granted, on a 68000 or a VAX, the difference 
is probably too small to notice.  But believe me, a few such efficiency 
hacks can make a BIG difference on a slow, stupid 8088!  (Yes, I've
profiled and benchmarked code on little machines when I used to do that
sort of thing professionally, and verified that even little tweaks like
this make an important difference in a frequently-executed loop.)

>2.  Needless use of the comma operator and parenthesis to demonstrate
>    manhood to the obliteration of code readability, e.g.
>
>	    if((fd=open("foo",1)<0)
>
>    SCREW this, too difficult, how about writing the code to indicate
>    what is going on:
>
>	    fd = open("foo", 1);
>	    if(fd == -1)

As with any programming language, it's all a matter of being comfortable
with the idioms.  Anyone who programs in C for any length of time, or 
who has read K&R carefully, should not be mystified by seeing

	if ((fp = fopen(foo, "r")) == NULL)

Granted, it's easier for novices or occasional programmers to read the
more verbose (pascaloid?) forms, but a language isn't designed only for
the novice.  When I'm doing serious work on large chunks of code,
what's important is expressing what's going on concisely, but clearly
to anyone who knows *the standard C idioms*.  I'd rather fit that extra
couple of lines of code on the screen or page than not.  

Besides, the single statement version may actually be EASIER to read: 
to me, it says "open this file and check to see if the fopen call 
succeeded", i.e., it's conceptually a single atomic operation, and should 
be written so as to indicate that fact.  That's how I define "writing the 
code to indicate what's going on."

On the other hand, this is all aesthetics/religion, and everyone is 
certainly entitled to his or her opinion.
-- 
 Matt Landau					"Wage Peace"
 mlandau@bbn.com

edw@ius2.cs.cmu.edu (Eddie Wyatt) (06/29/87)

In article <13008@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie) writes:
> 
> MY PET PEEVES:
> 
> 1.  Comparing error returns from UNIX syscalls to be less than zero.
>     UNIX system calls that return ints, are usually defined to return
>     -1 on error.  It drives me crazy to see code test for less than
>     zero.  It doesn't say returns negative value on error, it says
>     -1.
> 
> 2.  Needless use of the comma operator and parenthesis to demonstrate
>     manhood to the obliteration of code readability, e.g.
> 
> 	    if((fd=open("foo",1)<0)
> 
>     SCREW this, too difficult, how about writing the code to indicate
>     what is going on:
> 
> 	    fd = open("foo", 1);
> 	    if(fd == -1)
> 
> -Ron
> 
>     


  Comment 1.  Its all a question of taste.

  Comment 2.  Though the code may be logically equivalent, the assemble
actually generated differs.  Example for Suns 3.2  cc:

  Consider the program


	main()
    	    {
    	    int x,y;
	
    	    if (x=y);
    	    }

  Running cc -S over it yields :

        LL0:
	        .data
	        .text
        |#PROC# 04
	        .globl	_main
        _main:
        |#PROLOGUE# 0
	        link	a6,#0
	        addl	#-LF12,sp
	        moveml	#LS12,sp@
        |#PROLOGUE# 1
	        movl	a6@(-0x8),a6@(-0x4)
	        jeq	L14
        L14:
        LE12:
	        unlk	a6
	        rts
	        LF12 = 8
	        LS12 = 0x0
	        LFF12 = 8
	        LSS12 = 0x0
	        LP12 =	0x8
	        .data

  Consider the program


        main()
            {
            int x,y;

            x = y;
            if (x);
            }

  Running cc -S over it yields :

    LL0:
	    .data
	    .text
    |#PROC# 04
	    .globl	_main
    _main:
    |#PROLOGUE# 0
	    link	a6,#0
	    addl	#-LF12,sp
	    moveml	#LS12,sp@
    |#PROLOGUE# 1
	    movl	a6@(-0x8),a6@(-0x4)
	    tstl	a6@(-0x4)		; NOTE the extra tstl
	    jeq	L14
    L14:
    LE12:
	    unlk	a6
	    rts
	    LF12 = 8
	    LS12 = 0x0
	    LFF12 = 8
	    LSS12 = 0x0
	    LP12 =	0x8
	    .data

   Note the extra test instruction.  So the two methods are not
totally equivalent.

   I prefer "if ((fd = open("foo",O_WRONLY)) < 0)".  Major difference
being, I parathesize the operation.  Just a matter of taste, that's all.

  Also another note, any good (even bad) optimizing compiler should
reckonize that the tstl is over kill and remove it.

-- 
					Eddie Wyatt

e-mail: edw@ius2.cs.cmu.edu

terrorist, cryptography, DES, drugs, cipher, secret, decode, NSA, CIA, NRO.

ehrhart@aai8..istc.sri.com (Tim Ehrhart) (06/29/87)

In article <13008@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes:
>MY PET PEEVES:
>
>1.  Comparing error returns from UNIX syscalls to be less than zero.
>    UNIX system calls that return ints, are usually defined to return
>    -1 on error.  It drives me crazy to see code test for less than
>    zero.  It doesn't say returns negative value on error, it says
>    -1.

I'd guess some folks do that because they know that most of today's
current microprocessors have more than enough bits in their status
register to be able to know if a value is negative or not. This saves
the CPU from actually having to do the compare of two values, then 
looking at it's status register bits to determine the result. What this
really means in terms of microprocessor instructions is basically
a single operand instruction versus two double operand instructions.
For example on a 680x0, here's the assembler code:

if ( x < 0) case:

	tstl	d0		| test on value in d0

if ( x == -1) case:

	moveq	#-1,d7		| load -1 into register d7 (if == -1)
	cmpl	d7,d0		| compare d0 with d7

I know it's only a few assembler instructions and a few microseconds,
but if I can get small optimizations like this at no cost I'll take
them everytime. Using clearly stated '#defines' is place of the magic
numbers 0 and -1 is my way of not compromising clarity and readability.

>2.  Needless use of the comma operator and parenthesis to demonstrate
>    manhood to the obliteration of code readability, e.g.
>	    if((fd=open("foo",1)<0)
>

Once again to save instructions. The function return value is usually 
already in d0, and it needs only check it's status bits for negative, 
etc.. as opposed to loading -1 into another register, then comparing
the two register values.

Sorry for getting so low level, but I think sometimes that is where the
answers or the motivation for things lie. And no, I don't think "coding"
for these hardware features are non-portable or cutesy. Just getting
the job done with littlest work possible without compromising read-
ability or maintainability. I can assure these two issue are very
important to me also.

Tim Ehrhart  SRI International  Menlo Park, CA 94025   415-859-5842

edw@ius2.cs.cmu.edu (Eddie Wyatt) (06/29/87)

 Please excuse the terrible bad spelling of recognize - I was dozing
off again.
-- 
					Eddie Wyatt

e-mail: edw@ius2.cs.cmu.edu

terrorist, cryptography, DES, drugs, cipher, secret, decode, NSA, CIA, NRO.

ken@argus.UUCP (Kenneth Ng) (06/30/87)

In article <13008@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie) writes:
> I have always wondered why people think NULL is more mnemonic than 0.
> -Ron

It's not mnemonic, on some machines its just wrong.  NULL is ***NOT***
defined as zero on all machines.  Therefore the software writen with
that assumption will not work on such a machine.  More than likely
the machine will be blamed even though the writer of the software is
to blame.

> 	    if((fd=open("foo",1)<0)
[edit]
> 	    fd = open("foo", 1);
> 	    if(fd == -1)

Almost agreed: but if a negative return code other than -1 is returned
the code doesn't react the same.

... This signature was put in in a way to bypass the 
... bogus artificial line limit on the .signature file.
... Also, by its length it adds fodder to help avoid having
... my followups being bounced due to the restriction on
... followup articles.

Kenneth Ng: Post office: NJIT - CCCC, Newark New Jersey  07102
uucp !ihnp4!allegra!bellcore!argus!ken *** NOT ken@bellcore.uucp ***
bitnet(prefered) ken@orion.bitnet

ken@argus.UUCP (Kenneth Ng) (06/30/87)

In article <10509@sri-spam.istc.sri.com>, ehrhart@aai8..istc.sri.com (Tim Ehrhart) writes:
> >2.  Needless use of the comma operator and parenthesis to demonstrate
> >    manhood to the obliteration of code readability, e.g.
> >	    if((fd=open("foo",1)<0)
> Once again to save instructions.
> Tim Ehrhart  SRI International  Menlo Park, CA 94025   415-859-5842

How about get a better compiler?  Or kicking the buts of the compiler
writers and get them to write better compilers?

... This signature was put in in a way to bypass the 
... bogus artificial line limit on the .signature file.
... Also, by its length it adds fodder to help avoid having
... my followups being bounced due to the restriction on
... followup articles.

Kenneth Ng: Post office: NJIT - CCCC, Newark New Jersey  07102
uucp !ihnp4!allegra!bellcore!argus!ken *** NOT ken@bellcore.uucp ***
bitnet(prefered) ken@orion.bitnet

stevesu@copper.TEK.COM (Steve Summit) (06/30/87)

It has been pointed out that my original posting had a blatant
error in it: (!p) and (p != NULL) are not at all equivalent, but
in fact are exactly opposite.  I was at first acutely embarrassed,
but I am now flatly amazed that the statements have been argued
so hotly, by some of the best minds on the network, without
noting this crucial fact.

The fact that this error is so easy to make (I spent several
hours on the original posting, and I read through it countless
times, and I still missed it) ends up illustrating the point very
well: when coding, say what you _m_e_a_n.

Since I started this, I might as well make my thinking process
clear: we have a pointer variable, and it either holds or does
not hold a single "out of band" nil value.  The closest
transliterations of those English statements into C are

	if(p == NULL)
and
	if(p != NULL)

(assuming a proper #definition of NULL).

In my opinion, the _m_o_s_t _i_m_p_o_r_t_a_n_t _t_h_i_n_g is to keep that
transliteration step simple.  Everything else (compilation time,
efficiency, source code character count) is secondary.  The
biggest problem in Software Engineering is not code size or
efficiency but correctness: a program is supposed to do what you
want it to do, but rarely does.  The more I can make my C code
"read" like an English description of what the code is supposed to
do, the fewer transliteration errors I will make, and the easier
my code will be for me, and others, to read.

While I am back up on the soapbox, I will respond to one other point:

> 	I take exception to those who, only understanding their own
> viewpoint and approach, treat any other approach based on some other
> viewpoint as being bad code.  The person to whom I originally responded
> supposed that "if (!p)" was a sloppy abomination used to save characters.
> This is an intensely parochial view.

That was only part of my supposition, and it may be parochial,
but it's no more parochial than the supposition that "C is for
experts" is elitist.  "if(p == NULL)" is not only easier for me
to understand, but I suspect that it is also easier for people
who are not as experienced with C as I am.  (Arguments of the
form "but anyone who considers himself a C programmer _s_h_o_u_l_d
understand" ignored.  Face it, there are less experienced C
programmers in the world, and one of them will have to maintain
your programs some day.  Why not make their job a bit easier?)

> And I do not really see that one can claim to understand
> C and also claim that 'if (!p)' is obscure.

Apparently you do not understand me any better than you think I
understand you.

					Steve Summit
					stevesu@copper.tek.com

"I meant what I said and I said what I meant.
An elephant's honest, one hundred percent."
	-- Dr. Suess, "Horton Hears a Who!"

edw@ius2.cs.cmu.edu (Eddie Wyatt) (06/30/87)

In article <926@argus.UUCP>, ken@argus.UUCP (Kenneth Ng) writes:
> In article <13008@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie) writes:
> > I have always wondered why people think NULL is more mnemonic than 0.
> > -Ron
> 
> It's not mnemonic, on some machines its just wrong.  NULL is ***NOT***
> defined as zero on all machines.  Therefore the software writen with
> that assumption will not work on such a machine.  More than likely
> the machine will be blamed even though the writer of the software is
> to blame.

   Yes, its a software problem, but its a problem with that C compiler.

    Sorry, but the C compiler is suppose to treat 0 as a special token in this
case.  Quote from K&R pg. 192

     However, it is guaranteed that assignment of the constant 0 to a pointer
    will produce a null pointer distingushable from a pointer to any object.

The concept is extended to comparison too.  

   But as a save guard against implementations of C that don't adhere to
this rule, one should use NULL.  So to answer Ron's question, you use
NULL instead of 0 because there are those implements of C (which are wrongly
implemented) that don't treat 0 the way they should.


> 
> > 	    if((fd=open("foo",1)<0)
> [edit]
> > 	    fd = open("foo", 1);
> > 	    if(fd == -1)
> 
> Almost agreed: but if a negative return code other than -1 is returned
> the code doesn't react the same.
> 

    I can think of no Unix system call that doesn't return -1 on error.
So I would say that it's a pretty good bet that "if (call(...) < 0)" and
"if (call(...) == -1)" will act the same in all cases. Though, one should
always consult the man pages for return values if in doubt.

> 
> Kenneth Ng: Post office: NJIT - CCCC, Newark New Jersey  07102
> uucp !ihnp4!allegra!bellcore!argus!ken *** NOT ken@bellcore.uucp ***
> bitnet(prefered) ken@orion.bitnet

-- 
					Eddie Wyatt

e-mail: edw@ius2.cs.cmu.edu

terrorist, cryptography, DES, drugs, cipher, secret, decode, NSA, CIA, NRO.

ron@topaz.rutgers.edu (Ron Natalie) (06/30/87)

> It's not mnemonic, on some machines its just wrong.  NULL is ***NOT***
> defined as zero on all machines.

Come on.  We've been through this a hundred times.  Any C compiler
that has NULL set to anything other than plain 0 is wrong.  It's
in the language spec.  Pointers must standup to comparison to the
constant 0.  If NULL is ever set to anything other than 0, it will
break far more things.

-Ron

guy%gorodish@Sun.COM (Guy Harris) (06/30/87)

> >2.  Needless use of the comma operator and parenthesis to demonstrate
> >    manhood to the obliteration of code readability, e.g.
> >	    if((fd=open("foo",1)<0)
> >
> 
> Once again to save instructions. The function return value is usually 
> already in d0, and it needs only check it's status bits for negative, 
> etc.. as opposed to loading -1 into another register, then comparing
> the two register values.

The "< 0" vs. "== -1" may make a difference in the generated code, but if the

	if ((fd = open("foo", 1)) < 0)

vs.

	fd = open("foo", 1);
	if (fd < 0)

makes a difference, you don't have a good compiler.  Our compiler
gives the exact same code for

	if ((fd = open("foo", 1)) < 0)
		die();

and

	fd = open("foo", 1);
	if (fd < 0)
		die();

namely, something similar to:

	pea	1
	pea	"foo"
	jbsr	_open
	addqw	#8,sp
	movl	d0,fd
	jge	1f
	jbsr	_die
1:
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

dant@tekla.TEK.COM (Dan Tilque;1893;92-789;LP=A;608C) (06/30/87)

Ron Natalie writes:
>I have always wondered why people think NULL is more mnemonic than 0.

NULL is more mneumonic than 0 because it's easy to confuse 0 (zero) with
the letter O.  On many printers and some terminals it's impossible to
tell them apart without close comparison with a known zero.  Since it's
bad form to use a variable named O, and the compile will give a warning
if you use an undeclared variable, this may not be a problem.  However,
I suspect that some of the early compilers did not give a warning and
just defaulted the letter O to an int.  I know this is (was?) a common
problem in FORTRAN.  

>	I also wonder about people who define TRUE to be any
>thing, since it leads to things like
>	if( bool == TRUE )
>which is different than
>	if(!bool)

I thought that TRUE and FALSE should be:

#define FALSE 0
#define TRUE !FALSE

With these #defines the above two statements are equivalent.
>
>Generally, I use !p when I'm dealing with things that are supposed
>boolean values like
>
>	if(!isdigit(c))

It's often easy to miss a single character (especially one that doesn't
stand out like the "!") when quickly scanning code.

	if (isdigit(c) == TRUE)

will compile to the same object code on almost every compiler and is easier
to grasp immediately.  The fact that it's positive logic also makes it
easier.

---
Dan Tilque
dant@tekla.tek.com

mpl@sfsup.UUCP (06/30/87)

In article <13008@topaz.rutgers.edu>, ron@topaz.rutgers.edu.UUCP writes:
> I have always wondered why people think NULL is more mnemonic than 0.
> When I read them as exactly the same (and fortunately in C, they are
> defined to be).  I also wonder about people who define TRUE to be any
> thing, since it leads to things like
> 	if( bool == TRUE )
> which is different than
> 	if(!bool)

I define TRUE to be 1, so I can say things like
	if (blah) {
		do some pocessing
		return TRUE;
	}
	return FALSE;

I also think NULL *is* more mnemonic than 0, since when I see
	if (p == NULL) /* which I always use rather than if (!p) */
it is clear that I am doing a *pointer* test rather than any other type.

> Generally, I use !p when I'm dealing with things that are supposed
> boolean values like
> 
> 	if(!isdigit(c))

right on!

> and comparison to zero for things that are supposed to be returning
> a value
> 	if( malloc(100) == 0 )

also right on, but I'd prefer to use NULL for malloc.  I insist on
	if (strcmp(a, b) == 0)
however, since strcmp returns an integer, *not* a boolean.

> MY PET PEEVES:
> 
> 1.  Comparing error returns from UNIX syscalls to be less than zero.
>     UNIX system calls that return ints, are usually defined to return
>     -1 on error.  It drives me crazy to see code test for less than
>     zero.  It doesn't say returns negative value on error, it says
>     -1.

If you read "The Elements of Programming Style" by Kernighan and Plauger (?)
(which you may or may not agree with), the authors suggest that a test for
< 0 is better than a test for -1, since a coding error could possibly cause
a function to return a value that is out of the legal range (> 0) but is not
the intended error value (-1).  Not only that, but on many machines, a test
for < 0 may be more efficient than a test against a specific value.  I know,
this s just fluff, but I think it makes a program more readable to test for
out-of-range conditions rather than a specific value:
	n = foo(bar);
	if (n != -1)
		a = goo[n];
		/* oh, I guess goo[-2] is legal, or am I *that sure*
		   foo could *never* return -2? */

> 2.  Needless use of the comma operator and parenthesis to demonstrate
>     manhood to the obliteration of code readability, e.g.
> 
> 	    if((fd=open("foo",1)<0)
> 
>     SCREW this, too difficult, how about writing the code to indicate
>     what is going on:
> 
> 	    fd = open("foo", 1);
> 	    if(fd == -1)
> -Ron

Oh, but this is one of the things that makes C beautiful.  Granted, the comma
operator should rarely be used outside of #defines (where it is useful to
get some things to work).  However, I find
	if ((fd = open("foo", 1)) < 0)
*more* readable than
 	    fd = open("foo", 1);
 	    if(fd == -1)
since what I want to test is the return of open, *not* fd.  The "fd =" is
only there so I can capture the value to use it later.

Mike

gwyn@brl-smoke.UUCP (06/30/87)

In article <926@argus.UUCP> ken@argus.UUCP (Kenneth Ng) writes:
>NULL is ***NOT*** defined as zero on all machines.

I suspect readers of this newsgroup are sick of this by now,
but it is IMPORTANT to understand that:
	(a) NULL is not defined by a machine; it is defined in <stdio.h>
	and perhaps other places for programmer convenience
	(b) #define NULL 0	is ALWAYS a CORRECT definition; this has
	nothing to do with how null pointers are represented in a C
	implementation but is required by the C language definition
	(c) NULL is not appropriate for passing as a function parameter
	unless it is appropriately cast

karl@haddock.UUCP (Karl Heuer) (06/30/87)

In article <13008@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes:
>I have always wondered why people think NULL is more mnemonic than 0.
>When I read them as exactly the same (and fortunately in C, they are
>defined to be).

When used properly, NULL refers only to a null pointer constant.  Thus, I can
read "if (x == NULL)" and know without further context that x is a pointer.
(The constant 0 with an explicit (redundant) cast conveys even more
information; I use that notation too.)

>I also wonder about people who define TRUE to be any thing, since it leads to
>things like "if( bool == TRUE )" which is different than "if(!bool)"

There is that danger.  However, the reason *I* define TRUE (actually I prefer
YES, as in K&R) is for assignment ("flag=YES" instead of "flag=1"), not
comparison.  See above paragraph for reason.

>Generally, I use !p when I'm dealing with things that are supposed
>boolean values like "if(!isdigit(c))"

Good idea.  (Btw, note that some implementations of isdigit do not return 1
for success, so "if (isdigit(c) == TRUE)" is bad news.)

>MY PET PEEVES: ...
>2.  Needless use of the comma operator and parenthesis to demonstrate
>    manhood to the obliteration of code readability, e.g.
>	    if((fd=open("foo",1)<0)
>    SCREW this, too difficult, how about writing the code to indicate
>    what is going on:
>	    fd = open("foo", 1);
>	    if(fd == -1)

The main advantage of this idiom is for "while" statements.  The usual example
is "while ((c = getchar()) != EOF) ...", which cannot be written cleanly
without the embedded assignment.  The use in "if" statements often permits one
to collapse nested ifs, which can *improve* code readability.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

chris@mimsy.UUCP (Chris Torek) (07/01/87)

In article <13008@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron
Natalie) says he hates code that:
>[compares] error returns from UNIX syscalls to be less than zero.
>UNIX system calls that return ints, are usually defined to return
>-1 on error.  It drives me crazy to see code test for less than
>zero.  It doesn't say returns negative value on error, it says -1.

% man 2 open
     ...
     Upon successful completion a non-negative integer termed a
     file descriptor is returned. ...

This is from the 4.3BSD manuals.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

ddl@husc6.UUCP (Dan Lanciani) (07/01/87)

In article <1176@copper.TEK.COM>, stevesu@copper.TEK.COM (Steve Summit) writes:
| It has been pointed out that my original posting had a blatant
| error in it: (!p) and (p != NULL) are not at all equivalent, but
| in fact are exactly opposite.  I was at first acutely embarrassed,
| but I am now flatly amazed that the statements have been argued
| so hotly, by some of the best minds on the network, without
| noting this crucial fact.

	I sincerely doubt that the "best minds on the network" are the
ones doing the arguing.  Nevertheless, I noted this "crucial fact"
and hence my question as to whether you had really mastered the idiom.
I guess I was too subtle and you missed the point.

| The fact that this error is so easy to make (I spent several
| hours on the original posting, and I read through it countless
| times, and I still missed it) ends up illustrating the point very
| well: when coding, say what you mean.

	I really hope you mean the fact that this error is so easy
for YOU to make illustrates the point that YOU need to be very
careful when coding.  The error was immediately obvious to me and
to the only other person I mentioned it to.  When I code "if(!p)"
I say exactly what I mean and I mean what I say.  Your reasoning
is about as good as my saying, "Since I can't fly an airplane, it
is clearly impossible to do so."

| Since I started this, I might as well make my thinking process
| clear: we have a pointer variable, and it either holds or does
| not hold a single "out of band" nil value.  The closest
| transliterations of those English statements into C are
| 
| 	if(p == NULL)
| and
| 	if(p != NULL)
| 
| (assuming a proper #definition of NULL).

	I would claim that the closest transliterations are "if(!p)"
and "if(p)," if only because they DON'T assume a proper #define of
ANYTHING.

| In my opinion, the most important thing is to keep that
| transliteration step simple.

	Again, C has made that step very simple for you by providing the
"if(p)" construct.  Of course, this is my opinion.

| Everything else (compilation time,
| efficiency, source code character count) is secondary.

	Other than the fact that I happen to find less cluttered code
easier to read, the issue of character count is a joke.  Compilation
time is a programmer resource and can often be adjusted...  But efficiency
is not always a secondary issue.  In fact, it is RARELY a secondary issue.
Sometimes efficiency makes the difference between an application's working
or failing.  More often however, this attitude encourages the production
of needlessly slow programs with which users must suffer for years.

| The
| biggest problem in Software Engineering is not code size or
| efficiency but correctness: a program is supposed to do what you
| want it to do, but rarely does.  The more I can make my C code
| "read" like an English description of what the code is supposed to
| do, the fewer transliteration errors I will make, and the easier
| my code will be for me, and others, to read.

	Maybe you should try COBOL.  It was designed to read like an
English description.  C was not.  Why not let C read like a C description
of the problem?  When I program in C I do not transliterate from English
because I find C to provide a better mechanism for describing many
problems.  This is my personal style.  You have your own style.  I have
no trouble reading code written in either style.  Can't we leave it at
that rather than trying to prove that one style somehow encourages
correctness and another style tends to discourage it?


					Dan Lanciani
					ddl@harvard.*

jr@amanue.UUCP (Jim Rosenberg) (07/01/87)

In article <13008@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie)
writes:
>  [... other pet peeves ... ]
>
> 2.  Needless use of the comma operator and parenthesis to demonstrate
>     manhood to the obliteration of code readability, e.g.
> 
> 	    if((fd=open("foo",1)<0)
> 
>     SCREW this, too difficult, how about writing the code to indicate
>     what is going on:
> 
> 	    fd = open("foo", 1);
> 	    if(fd == -1)

I recall many moons ago whilst browsing K&R and before really learning C that I
swore up and down that I would forego such (what I thought at the time to be)
unwarranted over-abbreviation as:

	while ((c = getchar()) != EOF) {
		.
		.
		.

It didn't take me long once I was actually using C on a regular basis to
realize that forgoing constructs such as the one above is not only ridiculous,
a case can be made that it is actually LESS CLEAR stylistically.  Yes, it is
more difficult to read for novices to C.  But consider the alternative:

	for (;;) {
		c = getchar();
		if (c == EOF)
			break;
		.
		.
		.

I would argue that a strong case can be made that the for loop is actually
LESS CLEAR than the while loop.  By announcing the for loop as a forever loop,
in effect I'm saying that I really don't know what will terminate this loop
'till I find it.  But of course I *do* know, as the while loop shows.  I.e.
the while loop is more clear because it does a better job of "front-loading"
the code so the reader can understand what's coming.  A true forever loop is a
whole different kettle of fish than a loop where you really can predict in
advance the condition that will cause it to terminate.  It seems to me to be a
self-evident principle of coding clarity that the sooner in the code you can
state your intentions the better, other things being equal.  If you forego
using *ANY* assignments in expressions then you must use the forever loop to
code this example, which logically speaking isn't really right.

Using assignments in expressions can be taken to extremes of course, but it
doesn't always hide what's going on -- sometimes it *REVEALS* what is going
on.  Note that I'm making this point purely on the basis of code readability,
regardless of what number of machine cycles the generated code might save.
The case you made was for if statements not loops, but once you get used to
the idea, judicious use of an assignment in an expression will seem perfectly
natural.  I think it certainly does to 90% of all C programmers, and it's not
a matter of showing off.  It's a matter of saying what you mean.
-- 
 Jim Rosenberg
     CIS: 71515,124                         decvax!idis! \
     WELL: jer                                   allegra! ---- pitt!amanue!jr
     BIX: jrosenberg                 seismo!cmcl2!cadre! /

edw@ius2.cs.cmu.edu (Eddie Wyatt) (07/01/87)

In article <221@amanue.UUCP>, jr@amanue.UUCP (Jim Rosenberg) writes:
> 	while ((c = getchar()) != EOF) {
> 		.
> 		.
> 		.
> 
> It didn't take me long once I was actually using C on a regular basis to
> realize that forgoing constructs such as the one above is not only ridiculous,
> a case can be made that it is actually LESS CLEAR stylistically.  Yes, it is
> more difficult to read for novices to C.  But consider the alternative:
							 ^^^^^^^^^^^^^^^
> 
> 	for (;;) {
> 		c = getchar();
> 		if (c == EOF)
> 			break;
> 		.
> 		.
> 		.
[ lots of stuff about how this is unreadable]
> -- 
>  Jim Rosenberg
>      CIS: 71515,124                         decvax!idis! \
>      WELL: jer                                   allegra! ---- pitt!amanue!jr
>      BIX: jrosenberg                 seismo!cmcl2!cadre! /

   "the alternative"!  Come on there are a thousand and one ways to code
the semantics of the above loop.  For example :

	c = getchar();
	while (c != EOF)
	    {
	    .
	    .
	    .
	    c = getchar();
	    }

  Look, no assignments in the conditionals, no hidden gotos (break). 
For the people that argue the first form is bad, this would 
probably be the approach they would take.

  I still prefer the first form though (the assignment inside the conditional).
Its just a matter of style to me, not readablity.

-- 
					Eddie Wyatt

e-mail: edw@ius2.cs.cmu.edu

terrorist, cryptography, DES, drugs, cipher, secret, decode, NSA, CIA, NRO.

dg@wrs.UUCP (David Goodenough) (07/01/87)

In article <927@argus.UUCP> ken@argus.UUCP (Kenneth Ng) writes:
>In article <10509@sri-spam.istc.sri.com>, ehrhart@aai8..istc.sri.com (Tim Ehrhart) writes:
>> >2.  Needless use of the comma operator and parenthesis to demonstrate
>> >    manhood to the obliteration of code readability, e.g.
>> >	    if((fd=open("foo",1)<0)
>> >	    [edit]
>> >        fd = open("foo", 1);
>> >	    if (fd < 0)

but what about the line I use (in one form or another) in maybe 40% of
the C programs I write:

	while ((ch = getchar()) != EOF)

(think about it!! :-))

and on the subject of readability one thing that would help above _ALL_
others is the use of white space, not just for indenting, but for
separating lexical tokens (where appropriate) in expressions: I find
the above while statement 1000 times easier to read than:

	while((ch=getchar())!=EOF)
--
		dg@wrs.UUCP - David Goodenough

					+---+
					| +-+-+
					+-+-+ |
					  +---+

rice@swatsun (Dan Rice) (07/02/87)

	Here's a simple question for you.  Suppose I have defined structures
containing other structures, i.e., 

typedef struct {
	float x, y, z;	/* Coordinates of a vector in 3-space */
} vector;

typedef struct {
	vector o;	/* Center */
	float r;	/* Radius */
} sphere;

sphere s1, *s2;

Now, say I refer to s1.o.x and s2->o.y.  Does the compiler convert this into
a simple address reference at compile time, or is work performed at runtime?
Should I define

vector center;

center = s2->o;

if I plan to use s2->o several times?  Thanks for any help.
-- 
- Dan Rice, Swarthmore College, Swarthmore PA 19081
...!sun!liberty!swatsun!rice
 ...!seismo!bpa!swatsun!rice

rblieva@cs.vu.nl (Roemer b Lievaart) (07/02/87)

I know I won the "obfuscated C code contest", but I guess I didn't
deserve it, when I read these articles!  :-))))

No, serious, I just joined in (and will join out soon, vacation), but
I can't help thinking I must put it all straight. Probably I missed
most of the point of these articles, but I didn't miss the blunders
spread around! Flame on!

In article <1941@zeus.TEK.COM> dant@tekla.UUCP (Dan Tilque) writes:
>Ron Natalie writes:
>
>>	I also wonder about people who define TRUE to be any
>>thing, since it leads to things like
>>	if( bool == TRUE )
>>which is different than
>>	if(!bool)

Of course it's different. If bool can only be 0 or TRUE, "bool == TRUE"
is exactly the opposite of " ! bool ", isn't it? Unless TRUE is defined
as 0, which is a great idea for the next year's obf. C contest.

>
>I thought that TRUE and FALSE should be:
>
>#define FALSE 0
>#define TRUE !FALSE
>
>With these #defines the above two statements are equivalent.

Somebody is being very stupid - the two of you or me... who can tell?

>>
>>Generally, I use !p when I'm dealing with things that are supposed
>>boolean values like
>>
>>	if(!isdigit(c))
>
>It's often easy to miss a single character (especially one that doesn't
>stand out like the "!") when quickly scanning code.

I have no problems with that as I mostly write

	if ( ! isdigit(c) )

Stands out enough for me.
>
>	if (isdigit(c) == TRUE)
>
>will compile to the same object code on almost every compiler and is easier
>to grasp immediately.  The fact that it's positive logic also makes it
>easier.
>
Sigh. I really wouldn't define TRUE as 0. However

	if (isdigit(c) == 0)

is a good idea, I would say.

But!

To avoid some misunderstandings:
	isdigit(), isalpha(),  etc.
do NOT just return FALSE or "TRUE" (1)!!!
Watch the next program and its output:

-----
#include <ctype.h>

#define FALSE 0
#define TRUE  (!FALSE)

main()
{
	printf("%d %d %d\n", isalpha('a'), isupper('Z'), isdigit('0') ) ;

	if ( isalpha('a') ) puts("yes") ;
			else puts("no") ;

	if ( isalpha('a') == TRUE ) puts("yes") ;
			else puts("no") ;
}
------
output:
------
2 1 4
yes
no
--------
Do you get it? isalpha('a') is not true! $%&#?!
It's not true, it's 2. This may look sickening to some, but if you think so,
learn Pascal. 

And so I come to another article (too bad rn lets you 'F'ollowup only one
article at a time) :

In article <whatever> jas@rtech.UUCP (Jim Shankland) writes:
> In article <6034@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
> This is, indeed, a silly debate, and I'm sure we will never come to
> consensus; but years of moving other people's C code to dozens of
> different machines has absolutely convinced me that being able to say:
> 
>     if (integer-or-pointer-valued-expr)
> 
> is an error-prone misfeature, and that we'd all be better off programming
> as though it didn't exist.
> -- 
> Jim Shankland

I get your point Jim, but I must say it's rather Pascal-minded (or whatever
other languages that are boolean-freaking).
In C it doesn't always work that perfect, as I illustrated above. So what do
you do? You adapt yourself to C (instead of all those people who want to
adapt C or the other C-programmers) and you learn to work with things like

	if ( strlen( name ) )  ...

and get used to it, so that it is just as clear for you as is

	if ( strlen( name ) != 0 )    or even "better"
	if ( strlen( name ) > 0 )

It is for me now. It can be for you too.
Just as you learn that in Pascal

	if a then	is equivalent to	if a = true then

you'll have to learn that in C

	if ( a )	is equivalent to	if ( a != 0 )

[ new subject: ]
However, I must say I'm very against using '0' instead of NULL.
I know its the same, but I want to know whether someone is working
with integers or pointers. (Yes I know, maybe I have too learn that, too,
just as the thing I described above, but I don't think so.
Because the example above really can give wrong results,
as indicated, and a strict 0 / NULL discrimination only makes your programs
clearer, without side-effects.)
And so I come to a last point:

Our lint complains about:

main(){
	bar( NULL ) ;
}

bar ( foo )
char *foo ;
{ ... }

I know that indeed there better should be a complaint in the second case, but
it should be like:
"the compiler doesn't know that the parameter 0 is a pointer; possible
alignment problem"  or something like that. Instead lint says:
"bar, arg. 1 used inconsistently   ..."
Which is not true. I didn't use the argument inconsistently, I passed a
NULL-pointer which is a correct pointer. The problem is that the
compiler doesn't KNOW it's a pointer (it's just the integer 0), as
long as I won't put "(char *)" in front of it.
Okay, I can get over this. But how about this:

char *strings[] = { "one", "two", "three", NULL } ;

This won't pass our compiler. Which is dumb, as the compiler knows pointers
to characters are coming up, finds a zero, and does not conclude it should
be a NULL-pointer. But is does conclude so in:

char *onestring = NULL ;

Strange, I would say. How do other compilers react to this?

Anyway, that's enough for today.
It's almost bedtime, let's not dream about null-pointers, let's dream about
birds, music, love and all those other things some programmers sometimes
forget... :-)   (And I may be well one of those addicts!)

	Let's live,
		Roemer B. Lievaart,
		Amsterdam.

Disclaimer: The opinions expressed by the VU are not mine.

marc@pismo.cs.ucla.edu (Marc Kriguer) (07/02/87)

In article <13008@topaz.rutgers.edu>, ron@topaz.rutgers.edu.UUCP writes:
> MY PET PEEVES:
>
> 2.  Needless use of the comma operator and parenthesis to demonstrate
>     manhood to the obliteration of code readability, e.g.
>
>          if((fd=open("foo",1)<0)

       (versus)

>          fd = open("foo", 1);
>          if(fd == -1)
> -Ron


This is an example of the use of the comma operator?  I thought the comma
operator was something along the likes of:

	{ a = b, c}

which sets a to b, then returns the value of c.  Isn't the comma in Ron's
example just a delimiter between parameters to open() ?
 _  _  _                        Marc Kriguer
/ \/ \/ \
  /  /  / __     ___   __       BITNET: REMARCK@UCLASSCF
 /  /  / /  \   /  /  /         ARPA:   marc@pic.ucla.edu
/  /  /  \__/\_/   \_/\__/      UUCP:   {backbones}!ucla-cs!pic.ucla.edu!marc

ron@topaz.rutgers.edu (Ron Natalie) (07/02/87)

> The main advantage of this idiom is for "while" statements.  The usual
> example is "while ((c = getchar()) != EOF) ...", which cannot be
> written cleanly without the embedded assignment.  The use in "if"
> statements often permits one to collapse nested ifs, which can
> *improve* code readability.

If "c" is of type "char" this is still not written cleanly.  EOF is
type int.  The assignment into a character causes getchar's int return
to be changed to char, invalidating the following comparison.  The
effect is that characters of value 0xFF will cause erroneous end of
file indication.  This is a common error in C, but it is fortunate
that most text this is used on never contains any characters with the
most significant bit set.

To do this right you need an extra int temporary value

	while ((i = getchar()) != EOF)
	    or
	while( i = getchar(), i != EOF)

followed by
	    c = i;

-Ron

ron@topaz.rutgers.edu (Ron Natalie) (07/02/87)

More precisely you, I meant to the say the comma operator OR parentheses.
The example used parentheses but you could have as easily said:

	if(fd = open("foo", 0), fd < 0)

cramer@kontron.UUCP (Clayton Cramer) (07/03/87)

> In article <13008@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie) writes:
> > I have always wondered why people think NULL is more mnemonic than 0.
> > -Ron
> 
> It's not mnemonic, on some machines its just wrong.  NULL is ***NOT***
> defined as zero on all machines.  Therefore the software writen with
> that assumption will not work on such a machine.  More than likely
> the machine will be blamed even though the writer of the software is
> to blame.

Or even the same machine.  For example, if you are compiling in small
model on the 8086 with the Microsoft compiler:

	#define NULL 0

but in large model:

	#define NULL 0L

Depending on how you are using it, the results may come out the same --
but relying on the compiler to notice that are comparing int and long
(in large model) is a very poor practice.

Clayton E. Cramer

guy%gorodish@Sun.COM (Guy Harris) (07/03/87)

> The assignment into a character causes getchar's int return
> to be changed to char, invalidating the following comparison  The
> effect is that characters of value 0xFF will cause erroneous end of
> file indication.  This is a common error in C, but it is fortunate
> that most text this is used on never contains any characters with the
> most significant bit set.

On a machine where "char" is not signed, this won't work even if you
run it on text consisting entirely of 7-bit characters.  "getchar"
will return EOF when it hits end-of-file; EOF (-1) will get converted
to '\377' (we assume 8-bit characters here; translate as needed for
other character sizes) which compares equal to 255 but not to -1.
Thus, you'll never see the end-of-file indication.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

guy%gorodish@Sun.COM (Guy Harris) (07/03/87)

> It's not mnemonic, on some machines its just wrong.  NULL is ***NOT***
> defined as zero on all machines.

There does not exist, and will never exist, a machine for which a
legitimate C compiler will produce different code for:

	char *p;
	if (p == 0)

and

	char *p;
	if (p == NULL)

if NULL is defined as 0 - just 0, nothing else.  The same is true for

	char *p;
	p = 0;

and

	char *p;
	p = NULL;

(If anybody cares to challenge this statement, I will point out that
all such challenges in the past have been refuted.  There are
passages in K&R and in the ANSI C draft that make it abundantly clear
that the compiler must do the right thing both for "if (p == 0)" and
"p = 0"; for example, if you have 16-bit "int"s and 32-bit pointers,
the compiler must not generate a 16-bit assignment or a 16-bit
comparison.  If this seems counter-intuitive to anybody, replace
"char *p" with "long p", and note that the compiler obviously should
not generate 16-bit assignments or comparisons in those cases.  A
pointer is no different here.)

Neither pointer assignment nor pointer comparison, then, requires any
decoration to be wrapped around 0 to make them work.  The only case
where you have to wrap decoration around 0 to make it work is when
you are passing null pointers as arguments to procedures.  In the
general case, there is no magic goop that you can wrap around *all*
instances of 0 used as null pointers that will make it work for all
those instances.

For instance, you might have a word-addressible 16-bit machine, where
character pointers require 32 bits.  In this case, passing "(char *)0"
to a procedure might be done by passing 32 bits of zero, and passing
"(int *)0" might be done by passing 16 bits of zero.

As such, the only correct way to handle this is to cast all
occurrences of 0 or NULL to a pointer of the appropriate type when
passing them as procedure arguments.  (If you don't, and you have a
type-checker like "lint", if it's any good it will complain about
code that doesn't properly cast the pointers.)

Some vendors offering C implementations on machines where all
pointers are the same length, and all null pointers have the same bit
pattern, cheat by defining NULL as 0L or (char *)0.  This makes the
code work - *on that particular machine* - but doesn't make it
correct, in the sense that it will work correctly on any correct C
implementation.  (It also won't silence a good type-checker.  In
fact, the "NULL is (char *)0" trick would NOT silence complaints from a
type-checker if you write code that does the casting properly,
whereas leaving NULL as 0 would do so.)
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

guy%gorodish@Sun.COM (Guy Harris) (07/03/87)

> Or even the same machine.  For example, if you are compiling in small
> model on the 8086 with the Microsoft compiler:
> 
> 	#define NULL 0
>         
> but in large model:
> 
> 	#define NULL 0L
>         
> Depending on how you are using it, the results may come out the same --
> but relying on the compiler to notice that are comparing int and long
> (in large model) is a very poor practice.

Where are "int"s and "long"s being compared here?  Ron Natalie was
referring to constructs such as

	char *p;
	if (p == NULL)

and in this case a *pointer*, which is a totally different animal
from an "int" or a "long", is being compared with NULL, which is
generally an "int" except on some implementations that cheat and make
it a "long".  Any correct C compiler will generate a 32-bit
comparison in the large model for the code above (or whatever
comparison tests *all* the bits in the pointer that are actually
used; if, say, 8 of those bits are not considered part of the
pointer's value, they need not be compared).
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

rwa@auvax.UUCP (Ross Alexander) (07/03/87)

The following discussion is all relative to

	#define EOF -1
	char c;
	while ( ( c = getchar() ) != EOF ) { /* do something */ }

In article <22635@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
> > The assignment into a character causes getchar's int return
> > to be changed to char, invalidating the following comparison> 
> On a machine where "char" is not signed, this won't work even if you
> run it on text consisting entirely of 7-bit characters.  "getchar"
> will return EOF when it hits end-of-file; EOF (-1) will get converted
> to '\377' ... which compares equal to 255 but not to -1.

Isn't there a little room here for arguement?  Now of course K&R say
that an expression is coerced to the type it is being assigned to.
This is _a good thing_, since otherwise there's no way to assign to a
char (think about the promotion rules :-).  But isn't assignment, in
the strictest sense, a side effect?  Or phrased a little differently,
is the comparison being made to the value of c or the value of the
getchar() ?  So the coercion for the assignment of `int' to `char' in
this case is a side effect and shouldn't have an affect on the value
of the whole expression `( c = getchar() )'.  Interestingly enough,
the VAX 4.2bsd cc agrees with me, and produces (code to get value of
getchar() ommited, but it's in r0):  

	L9998:
		cvtlb	r0,-1(fp)	; coercion and assignment to c as byte
		cmpl	r0,$-1		; but comparison is int to int
		jeql	L36		; break the loop iff same

whilst the SUN 4.2 cc feels otherwise, and produces (getchar() value in d0):

	L2000001:
		movb	d0,a6@(-0x1)	; assignment to c
		cmpb	#-0x1,d0	; comparison of a byte to a byte
		jeq	L23		; break loop iff same

Now while some might argue that the second interpretation is properly
conformant, and the first is brain-damaged (smile, gentle people), I
would argue that the first way of doing things is less likely to get
the poor programmer (myself, for instance) into trouble.  I argue
that this is because in this context, the assignment is a side
effect, and isn't intended to `narrow' the value of the expression.  
Just to point out where the logical extension of case two gets us,
what is the value of x in this fragment:

	float x;
	int i;

	x = 2.0 + ( i = 3.5 );

I would say 5.5; others might say 5.0, it seems.  But if I _wanted_
5.0, I would expect to write

	x = 2.0 + (int) ( i = 3.5 );

and I appeal to the principle of least astonishment for justification.

...!ihnp4!alberta!auvax!rwa	Ross Alexander @ Athabasca University

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/04/87)

In article <1571@sfsup.UUCP> mpl@sfsup.UUCP writes:
>If you read "The Elements of Programming Style" by Kernighan and Plauger (?)
>(which you may or may not agree with), the authors suggest that a test for
>< 0 is better than a test for -1, since a coding error could possibly cause
>a function to return a value that is out of the legal range (> 0) but is not
>the intended error value (-1).

Some people (Dijkstra and Gries among them) no longer believe that this
kind of "conservative" test is preferable to the exact test; they base
this on what it takes to prove an implementation of an algorithm to be
correct.

>Not only that, but on many machines, a test for < 0 may be more
>efficient than a test against a specific value.

I doubt that the difference is detectable (remember, the code just made
a system call!).

spencer@osu-cgrg.UUCP (Steve Spencer) (07/04/87)

In article <1213@carthage.swatsun.UUCP>, rice@swatsun (Dan Rice) writes:
> 
> center = s2->o;
> 
> if I plan to use s2->o several times?  Thanks for any help.
> -- 
> - Dan Rice, Swarthmore College, Swarthmore PA 19081
> ...!sun!liberty!swatsun!rice
>  ...!seismo!bpa!swatsun!rice

Yes, I would (and have) used something like what you describe when I have to
use the current values in s2->o many times in a function (example: if s2->o
was the intersection point of a ray with a sphere in a ray-tracing program
and the function in which I was working was a shader. :-))

BUT......

Some C compilers will let you get away with 

   center = s2->o;

(i.e.: structure assignments)

But some won't.  Your best bet for non-flakiness is:

   center.x = (s2->o).x;
   center.y = (s2->o).y;
   center.z = (s2->o).z;

Sure, it's three statements instead of one, but 

 (1) You are certain that all three values were assigned correctly.
 (2) If you had to access (as you speculated) s2->o.x multiple times,
     you are saving yourself cycles in the long run.

Hope I helped...


steve

-- 
...I'm growing older but not up...       - Jimmy Buffett

Stephen Spencer, Graduate Student
The Computer Graphics Research Group
The Ohio State University
1501 Neil Avenue, Columbus OH 43210
{decvax,ucbvax}!cbosg!osu-cgrg!spencer        (uucp)

ark@alice.UUCP (07/04/87)

In article <262@auvax.UUCP>, rwa@auvax.UUCP writes:
> what is the value of x in this fragment:
> 
> 	float x;
> 	int i;
> 
> 	x = 2.0 + ( i = 3.5 );
> 
> I would say 5.5; others might say 5.0, it seems.  But if I _wanted_
> 5.0, I would expect to write
> 
> 	x = 2.0 + (int) ( i = 3.5 );
> 
> and I appeal to the principle of least astonishment for justification.

Ah yes, what should be the value of an assignment? The LHS or the
RHS?  Both answers have their pitfalls, and C says the correct
answer is (a copy of) the LHS.  Thus the answer in your example
is 5.0.  While this produces a surprise in your particular case,
consider:

	double d, a, sqrt();
	d = 2;
	a = sqrt(d);
	a = sqrt(d=2);

Now, don't you want the second and third assignment above to do
the same thing?

In APL, the value of an assignment is the value of its right-hand side.
This mostly works well, but leads to some surprises too:

	A <- 0 0 0 0 0
	B <- A[2 3 4] <- 6
	B <- A[2 3 4]

The first line sets A to a five-element array, all of whose
elements are zero.  The second line sets A[2], A[3], and A[4]
to 6, and also sets B to 6.  The third line sets B to a three-
element array whose elements are all 6.

ron@topaz.rutgers.edu (Ron Natalie) (07/04/87)

Microsoft C (if it really requires NULL to be 0L) is broken by all
standards that C compilers have ever been designed to.  K&R explicitly
defined 0 comparisons to pointers to be the rule and no one since has
come up with any convincing arguments to the contrary.

-Ron

rwhite@nu3b2.UUCP (Robert C. White Jr.) (07/05/87)

In article <262@auvax.UUCP>, rwa@auvax.UUCP (Ross Alexander) writes:
} The following discussion is all relative to
} 
} 	#define EOF -1
} 	char c;
} 	while ( ( c = getchar() ) != EOF ) { /* do something */ }
} 
} Isn't there a little room here for arguement?  Now of course K&R say
} that an expression is coerced to the type it is being assigned to.
} This is _a good thing_, since otherwise there's no way to assign to a
} char (think about the promotion rules :-).  But isn't assignment, in
} the strictest sense, a side effect?  Or phrased a little differently,
} is the comparison being made to the value of c or the value of the
} getchar() ?  So the coercion for the assignment of `int' to `char' in
} this case is a side effect and shouldn't have an affect on the value
} of the whole expression `( c = getchar() )'.  Interestingly enough,
} the VAX 4.2bsd cc agrees with me, and produces (code to get value of
} getchar() ommited, but it's in r0):  
} 
} 	L9998:
} 		cvtlb	r0,-1(fp)	; coercion and assignment to c as byte
} 		cmpl	r0,$-1		; but comparison is int to int
} 		jeql	L36		; break the loop iff same
} 
} whilst the SUN 4.2 cc feels otherwise, and produces (getchar() value in d0):
} 
} 	L2000001:
} 		movb	d0,a6@(-0x1)	; assignment to c
} 		cmpb	#-0x1,d0	; comparison of a byte to a byte
} 		jeq	L23		; break loop iff same
} 
} Now while some might argue that the second interpretation is properly
} conformant, and the first is brain-damaged (smile, gentle people), I
} would argue that the first way of doing things is less likely to get
} the poor programmer (myself, for instance) into trouble.  I argue
} that this is because in this context, the assignment is a side
} effect, and isn't intended to `narrow' the value of the expression.  
} Just to point out where the logical extension of case two gets us,
} what is the value of x in this fragment:
} 
} 	float x;
} 	int i;
} 
} 	x = 2.0 + ( i = 3.5 );
} 
} I would say 5.5; others might say 5.0, it seems.  But if I _wanted_
} 5.0, I would expect to write
} 
} 	x = 2.0 + (int) ( i = 3.5 );
} 
} and I appeal to the principle of least astonishment for justification.

Sir,
	In my copy of K&R's "The C Programming Language" the last
line of section 2.10 titled "Assignment Operators and Expressions"
states: "The type of an assignment expression is the type of the left
oppernd"
	In "The C Programmers Handbook" [a Prentice Hall publication
sanctioned by bell labs if it really matters] at the top of page #12
just under the title "ASSIGNMENT" is the following:

NOTE -- The value of an assignment expression
is the value of the left opperand _AFTER_ the assignment.

It would seem obvious to me, even without these quotes as backup, that
your vax is leading you down the garden path and you are happy to follow.
The original questioner also must be questioned.  K&R repeatedly and
constantly state that in the expression (c = getchar()) c must be type
int in order to preserve the EOF condidion because EOF must lie outside
the range of any and all possible characters and that any value held by
a variable of type char is a valid character.

In your first code fragment c should be type int, and any opperations on c
will be cast as approprate.

In your statement that the assignment is, or should be, a secondary
consideration to the value...  The parens "(" ")" would say quite
the contrary.  As all math is done as longs and doubles under "C",
at least according to the stndard, the type of an expression must come
from somewhare.  The precidence rules dont allow the expression without
the parens, to return any value but 0 or 1 th the variable c.

In your second code fragment the answer is 5.0.  You dont need the
cast to int because the declaration of i as int does that.

The rule of least astonishment involves the comparison of explicit
opperations to implicit, stating that the implicit should follow
the explicit.  That is, if you place every parened fraagment in an
atonomus statment the values should remain unchanged.

therefore:

char	c;
while ((c = getchar()) != EOF) {
	/* Valid Opperations */
	}

Becomes:

char	c;
c = getchar();
while (c != EOF) {
	/* Valid Opperations */;
	c = getchar ();
	}

Which you may recognize, in form anyway, from pascal.  By definition
both these loops must behave exactly the same.  By your argument the
second fails but the first would opperate properly.  With least astonishment
in "C" this loop will never terminate, in either form, with c typed as char
but it will function correctly with c typed as int becaues, as you know,
type char cannot hold EOF but int can.


Robert.

Disclaimer:  My mind is so fragmented by random excursions into a
	wilderness of abstractions and incipient ideas that the
	practical purposes of the moment are often submerged in
	my consciousness and I don't know what I'm doing.
		[my employers certainly have no idea]

g-rh@cca.CCA.COM (Richard Harter) (07/05/87)

In article <6051@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <1571@sfsup.UUCP> mpl@sfsup.UUCP writes:
>Some people (Dijkstra and Gries among them) no longer believe that this
>kind of "conservative" test is preferable to the exact test; they base
>this on what it takes to prove an implementation of an algorithm to be
>correct.

	Are you sure that this is what they are saying?  Although I
have strong reserverations about Dijkstra and his theories about programming
he is no dummy.  The argument, I would suppose, is that a "conservative"
test may conceal errors in the code.  This is a well taken point.  But
the cure is to be both "conservative" and "exact".  For example, if a
function is supposed to return either -1 or a non-negative integer then
you should test for

	-1	special return
	<-1	invalid return
	>-1	legitimate return

and robust code will make all three tests.  To test for <0 is to fuse
two different classes of returns.  (People who worry about the cost
of two tests instead of one in these situations end up with code with
mystery bugs in them.)

	Incidentally, my objection to Dijkstra et. al., is just in
this conception of code as an implementation of an algorithm.  In my
experience, algorithms, once you have a stock of them under your belt
are a minor element in software generation.  Coding an algorithm is
simple (usually -- I grant that there are some hairy ones.)  You can
use black box techniques and treat them in isolation.  The important
part in design and implementation is the system specification, structure,
and control.  Algorithms are just parts on the inventory shelf.

-- 

Richard Harter, SMDS Inc. [Disclaimers not permitted by company policy.]
			  [I set company policy.]

jr@amanue.UUCP (Jim Rosenberg) (07/05/87)

In article <262@auvax.UUCP>, rwa@auvax.UUCP (Ross Alexander) writes:
> The following discussion is all relative to
> 
> 	#define EOF -1
> 	char c;
> 	while ( ( c = getchar() ) != EOF ) { /* do something */ }
> 
> In article <22635@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
> > > The assignment into a character causes getchar's int return
> > > to be changed to char, invalidating the following comparison> 
> > On a machine where "char" is not signed, this won't work even if you
> > run it on text consisting entirely of 7-bit characters.  "getchar"
> > will return EOF when it hits end-of-file; EOF (-1) will get converted
> > to '\377' ... which compares equal to 255 but not to -1.
> 
>Isn't there a little room here for arguement?  Now of course K&R say
>that an expression is coerced to the type it is being assigned to.
>This is _a good thing_, since otherwise there's no way to assign to a
>char (think about the promotion rules :-).  But isn't assignment, in
>the strictest sense, a side effect?  Or phrased a little differently,
>is the comparison being made to the value of c or the value of the
>getchar() ?  So the coercion for the assignment of `int' to `char' in
>this case is a side effect and shouldn't have an affect on the value
>of the whole expression `( c = getchar() )'.  Interestingly enough,
>the VAX 4.2bsd cc agrees with me ...

Gads!!!  On this one, when Guy talks you better listen.  I earned some cute
money once over exactly this point.  Someone was trying to port a program from
BSD on a VAX to System V on a 3B2.  It ran fine on the VAX (don't they
always??) but dumped core on the 3B2.  In the code above c should be declared
int -- nothing else, int!  I can't testify to how this works on all compilers
under the sun, but on the 3B2 the compiler treated char as unsigned.  It
considered the type of (c = getchar()) to be the type of c -- i.e. unsigned
char.  Since EOF was defined as -1, c received 0xff.  But when the value of
the assignment was compared to EOF it was coerced back to an int.  Since the
compiler considered it unsigned, when the value of the assignment was coerced
back to an int it became plain old 0xff -- **NO SIGN EXTENSION**.  When
getchar() finally hit end of file the code compared 0xff to 0xffffffff and got
**FALSE**.  Sorry to repeat exactly what Guy just said, but it didn't seem to
get through.

K&R (referring to assignment operators):  "The value is the value stored in
the left operand after the assignment has taken place."
                 ^^^^^
and

"If both operands have arithmetic type, the right operand is converted to the
type of the left preparatory to the assignment."
                 ^^^^^^^^^^^
I don't think that leaves any room for doubt.  The type of an assignment is
supposed to be the type to which the rvalue must be coerced to be stored in
the lvalue.  H&S state explicitly:

"the type of the result is equal to the (unconverted) type of the left
operand."                                                         ^^^^

I don't have a copy of the ANSI draft, but I would assume they must make this
explicit.

K&R, in discussing whether sign extension must occur when a char is converted
to an int:

"Whether or not sign-extension occurs for characters is machine dependent."

So there's nothing inherently broken about the way the 3B2 does it.  But that
code above is very definitely broken.  On a 3B2 it will run right past the
EOF.  It works on a VAX because C on the VAX sign-extends char in promoting it
to int.  The VAX is probably the *WORST* machine in the world on which to try
to decide if something is "clean C" by trying it and seeing if it works.  All
kinds of dirt work on a VAX.  :-)
-- 
 Jim Rosenberg
     CIS: 71515,124                         decvax!idis! \
     WELL: jer                                   allegra! ---- pitt!amanue!jr
     BIX: jrosenberg                 seismo!cmcl2!cadre! /

mcdonald@uiucuxe.cso.uiuc.edu (07/05/87)

>/* Written 12:25 pm  Jul  2, 1987 by ron@topaz.rutgers.edu in uiucuxe:comp.lang.c */
>
>> The main advantage of this idiom is for "while" statements.  The usual
>> example is "while ((c = getchar()) != EOF) ...", which cannot be
>> written cleanly without the embedded assignment.  The use in "if"
>> statements often permits one to collapse nested ifs, which can
>> *improve* code readability.
>
>If "c" is of type "char" this is still not written cleanly.  EOF is
>type int.  The assignment into a character causes getchar's int return
>to be changed to char, invalidating the following comparison.  The
>effect is that characters of value 0xFF will cause erroneous end of
>file ...
>To do this right you need an extra int temporary value
>
>	while ((i = getchar()) != EOF)
>	    or
>	while( i = getchar(), i != EOF)
>
>followed by
>	    c = i;

How about using the notorious comma operator TWICE:

         while(i = getchar(), c = i , i != EOF)

Is this correct, or would you need

         while( (i = getchar(), c = i ), i != EOF )        ?

As a poor C novice I see no way around this short of putting c = i
on a separate line.

Doug McDonald

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/05/87)

Let me explain by example why
	if ( bool_expr == TRUE )
is dangerous (it should be obvious why it's redundant),
so that
	if ( bool_expr )
should be preferred.  There is no corresponding problem in C with
	if ( bool_expr == FALSE )
although I would recommend
	if ( !bool_expr )
instead.  As other have remarked, the proper use for symbolic constants
TRUE and FALSE is as values to be assigned to variables (or function
returns) that are considered to be inherently Boolean in nature.

(Actually, I so strongly consider treating Boolean objects as inherent
primitive types to be important that my own definitions are lower-case,
to appear just like language keywords.)

EXAMPLE:

The original UNIX System V Release 2.0 definitions in <stdio.h> were
	#define feof(p)		((p)->_flag & _IOEOF)
	#define ferror(p)	((p)->_flag & _IOERR)
These are supposed to be treated as Boolean expressions, but their
TRUE values are NOT 1.  In my version of <stdio.h> I changed these to
	#define feof(p)		(((p)->_flag & _IOEOF) != 0)
	#define ferror(p)	(((p)->_flag & _IOERR) != 0)
for safety's sake.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/05/87)

In article <1213@carthage.swatsun.UUCP> rice@swatsun (Dan Rice) writes:
>sphere s1, *s2;
>Now, say I refer to s1.o.x and s2->o.y.  Does the compiler convert this into
>a simple address reference at compile time, or is work performed at runtime?

It of course depends on the implementation, but usually s1.o.x will become a
direct memory reference while s2->o.y will of necessity (because s2 is
variable) turn into a single-level indexed reference (the ->o and .y will
usually be folded into a single offset).

>Should I define
>vector center;
>center = s2->o;
>if I plan to use s2->o several times?

That depends on what you plan to do with it.  The above causes a block move
of the contents of the vector structure, whereas direct use of s2->o will
not necessarily cause much data motion or even much additional overhead.
In most cases you needn't allocate temporaries like this just for the sake
of "efficiency".  In case of bottleneck code, the best approach is to try
each proposed efficiency hack individually to see if it makes a difference.
Usually it won't, so you're better off keeping the code as intelligible as
possible.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/05/87)

In article <262@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes:
>In article <22635@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
>> "getchar"
>> will return EOF when it hits end-of-file; EOF (-1) will get converted
>> to '\377' ... which compares equal to 255 but not to -1.
>Isn't there a little room here for arguement?

No, of course not (:-)).  The semantics of the expression in question
are well understood to agree with Guy Harris's explanation and in fact
the forthcoming ANS for C will back him up on this.

>the VAX 4.2bsd cc agrees with me...

? The most buggy of the generally-used C compilers, eh?

>whilst the SUN 4.2 cc feels otherwise...

Is Sun up to SunOS Release 4.2 already?

All the argument for your "intuition" of what should happen is beside
the point -- there are regular semantics for C expressions, and no
legitimate compiler should circumvent them.  The whole point of
specifications and standards is to define exactly what the rules are.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/05/87)

In article <13148@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes:
>Microsoft C (if it really requires NULL to be 0L) is broken ...

True, but I doubt that the "large model" MicroSoft C REALLY "requires"
that definition.  I suspect it was intended to make sloppy code such as
	func( 1.0, NULL, &a );
"work" even though such code is simply non-portable NO MATTER WHAT rules
one tries to impose on the definition for "NULL".  (For people who STILL
don't understand this, consider that sizeof(char *) need not be the same
as sizeof(int *), so the correct size for the second parameter to func()
depends on the particular pointer type that func() expects; there is NO
universal pointer size in such an implementation.)

Personally I would rather see the code BREAK, so that the programmer would
LEARN how to code portably and FIX the code for once and for all.  (This
is the same reason that I chose to make "ordinary" chars unsigned on my
Gould implementation of the System V implementation (as opposed to signed
on my VAX implementation), even though the native Gould compiler made them
signed, presumably to help prop up buggy code from VAX 4.nBSD.  I have seen
a lot of 3Bx AT&T code assume that chars are unsigned, which is just the
opposite bug.)  We occasionally hear (in this and the UNIX newsgroups)
marketing "justifications" for adapting implementations to certain
erroneous coding practices, but the customer is NOT "always right"!  In
fact, when I evaluate a system I would give NEGATIVE scoring points to
those vendors who deliberately pander to such errors.

ka@hropus.UUCP (Kenneth Almquist) (07/06/87)

Let me give a little history here.  In the mid 70's, there were two
documents describing C:  the C Language Tutorial and the C Reference
Manual.  The C Tutorial stated that the value of an assignment statement
was the right hand side, and the C compiler agreed.  When I read this, I
was impressed with the attention to detail this showed.  Unfortunately,
I was not so attentive to detail, and failed to note that the C Reference
Manual disagreed.  So when the stdio library was written, I started
writing, "char c; while ((c = getchar()) != EOF) ...", assuming that the
test for EOF would work as the C Language Tutorial said it would.

Eventually (e. g. around 1979 or 1980), the UNIX Support Group at Bell
Labs noticed the discrepancy between the C Reference Manual and the C
compilers, and modified the C compilers to agree with the Reference Manual.
This generally made the object code produced by the compilers larger, and
broke a bunch of existing code in subtle ways (the change was just made,
not announced), but certainly the discrepancy between the documentation
and the code had to be resolved some how.  Probably the BSD Vax compiler
is based upon an early version of the AT&T Vax compiler which predated the
modifications.

As you can probably tell, I would never have made the value of the assign-
ment operator be the left hand side.  But this is a minor matter compared
with, say, the priority of the & and | operators, and Dennis Ritchie
endorsed the left hand side approach in the C Reference Manual, even if
he contradicted himself elsewhere.  It seems a little late to re-fight
the battle at this point.
					Kenneth Almquist

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/06/87)

In article <1219@ius2.cs.cmu.edu> edw@ius2.cs.cmu.edu (Eddie Wyatt) writes:
-... use NULL instead of 0 because there are those implements of C
-(which are wrongly implemented) that don't treat 0 the way they should.

Name one.

DRIEHUIS%HLERUL5.BITNET@wiscvm.wisc.EDU (07/06/87)

First: the subject of NULL being defined as 0 has been explored
rather thoroughly over the time I read comp.lang.c, and as far as
I'm concerned an implementation of C with NULL being defined as
something other than 0 is plainly wrong.
Second: the open() call is defined as returning a file
descriptor, which is explicitly defined as a small, positive
integer. Therefore, checking against minus one is equivalent to
checking against being negative. As for the readability, I
dislike both non-zero constants (as in 'fd == -1') and
symbolic constants (as in 'fd == ERROR'), if the definition
of (in this case) a file descriptor specifically says that
only *positive* values are valid (it says so on page 159 in K&R
for the truly devout).
Of course, you should keep in mind that fopen() returns EOF,
which is defined as -1, and that assuming that any negative
return from fopen() means an error has occurred, is non-portable
as far as machines with possibly valid negative pointers are
concerned (This is the case on small-model 808x for instance).
                                        - Bert
Bert Driehuis, LICOR Leiden, The Netherlands <DRIEHUIS@HLERUL5>
                        V.N.G. The Hague
<I speak for neither of the above>
P.S. While proofreading this letter, I noticed I had written
something to the extent that zero equals minus one (I thought:
'x == -1', I wrote down: 'checking against zero'). Now, I
remember someone being flamed for implying that if(!p) is
equivalent to if(p!=NULL), and retaliating that this provides
conclusive proof that C isn't readable. Well, I just proved
conclusively that English isn't writable (at least to me) - Bert.

edw@ius2.cs.cmu.edu (Eddie Wyatt) (07/06/87)

In article <8168@brl-adm.ARPA>, DRIEHUIS%HLERUL5.BITNET@wiscvm.wisc.EDU writes:
> Of course, you should keep in mind that fopen() returns EOF,
> which is defined as -1, and that assuming that any negative
> return from fopen() means an error has occurred, is non-portable
> as far as machines with possibly valid negative pointers are
> concerned (This is the case on small-model 808x for instance).
>                                         - Bert
> Bert Driehuis, LICOR Leiden, The Netherlands <DRIEHUIS@HLERUL5>
>                         V.N.G. The Hague

  fopen returns EOF on error???  My man page says it returns NULL (0)
if it can't open a file.

  If what you really meant was open, well my man page says it
returns -1 not EOF on error.

          (This really doesn't belong in comp.lang.c)

-- 
					Eddie Wyatt

e-mail: edw@ius2.cs.cmu.edu

terrorist, cryptography, DES, drugs, cipher, secret, decode, NSA, CIA, NRO.

preece@ccvaxa.UUCP (07/06/87)

   stevesu@copper.TEK.COM:
> It has been pointed out that my original posting had a blatant
> error in it: (!p) and (p != NULL) are not at all equivalent, but
> in fact are exactly opposite.  I was at first acutely embarrassed,
> but I am now flatly amazed that the statements have been argued
> so hotly, by some of the best minds on the network, without
> noting this crucial fact.
----------
Actually, when I wrote my response to the original note I did
notice and correct the error in my own text; I didn't comment on
the error, though I chuckled.

Why, though, does the presence of the error and your own failure to
notice it imply to you that one form or the other is preferable?
All it says to me is that MIXING the two forms is dangerous.

While we have, here, house coding standards, the bottom line is
"Make it fit in with the surrounding code."  If you are maintaining
a program written in "if (!p)" style, PLEASE, PLEASE don't decide
that you want to use the "right" "if (p == NULL)" form in your
code; likewise don't use "if (!p)" if the rest of the program
uses "if (p == NULL)".  Consistency within a program is the
most important thing.

-- 
scott preece
gould/csd - urbana
uucp:	ihnp4!uiucdcs!ccvaxa!preece
arpa:	preece@Gould.com

rwa@auvax.UUCP (Ross Alexander) (07/06/87)

In article <225@amanue.UUCP>, jr@amanue.UUCP (Jim Rosenberg) writes:
> In article <262@auvax.UUCP>, rwa@auvax.UUCP (Ross Alexander) writes:
> > The following discussion is all relative to [... and other maunderings]

I withdraw!  I recant!  assignment-produces-an-implicit-cast, see K&R Appendix
A, 7.14!  I wuz rong!!!  Please let's save net bandwidth (and my shattered
ego).  Gads.  Everybody's an expert [grumble mutter rhubarb rhubarb rhubarb].

As an aside (I _DO NOT_ propose this be made standard), just as a very
personal, off-the-cuff-remark, the rule does seem like a bit of a lose.
There must be a good reason for it.  What is it?  After all, after the
store the original value is still sitting around in a register (of course
you can dream up architectures where this is not so, yes, I know, I know ),

Anyway, it annoys me to see what I think of as a side-effect (the
store) throwing away perfectly useful information - and then the
first thing that happens after the store is the char expression is
re-promoted to int!  Waste motion, waste motion...  Yes, a clever
compiler will do the comparison char-to-char to avoid doing the
promotion, because it's equivalent (in a formal sense) to the
int-to-int case, but the whole original point was that the assignment
introduces a bug through the loss of the out-of-band value (EOF).  

So as Guy Harris observes, the vax 4.2bsd compiler generates incorrect
code.  Yet it seems to me that this code is more robust that the
`correct' code :-).  This is something for the designers of P to
think about, I guess.

However, the semantics of C are perfectly clear on this, and I was wrong.

...!ihnp4!alberta!auvax!rwa	Ross Alexander @ Athabasca University

mpl@sfsup.UUCP (07/06/87)

In article <13112@topaz.rutgers.edu>, ron@topaz.rutgers.edu.UUCP writes:
> To do this right you need an extra int temporary value
> 
> 	while ((i = getchar()) != EOF)
> 	    or
> 	while( i = getchar(), i != EOF)
> 
> followed by
> 	    c = i;

Since when?  WHat's wrong with:

	int	c;

	while ((c = getchar()) != EOF) {
		/* use c as you would any char variable */
		/* because char's are promoted to int ANYWAY */
		/* in expressions - no need for a temp variable */
	}

jfh@killer.UUCP (07/06/87)

In article <221@amanue.UUCP>, jr@amanue.UUCP (Jim Rosenberg) writes:
> In article <13008@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie)
> writes:
> >  [... other pet peeves ... ]
> >
> > 2.  Needless use of the comma operator and parenthesis to demonstrate
> >     manhood to the obliteration of code readability, e.g.
> > 
> > 	    if((fd=open("foo",1)<0)
> > 
> >     SCREW this, too difficult, how about writing the code to indicate
> >     what is going on:
> > 
> > 	    fd = open("foo", 1);
> > 	    if(fd == -1)
> 
> I recall many moons ago whilst browsing K&R and before really learning C that I
> swore up and down that I would forego such (what I thought at the time to be)
> unwarranted over-abbreviation as:
> 
> 	while ((c = getchar()) != EOF) {
> 
> It didn't take me long once I was actually using C on a regular basis to
> realize that forgoing constructs such as the one above is not only ridiculous,
> a case can be made that it is actually LESS CLEAR stylistically.  Yes, it is
> more difficult to read for novices to C.  But consider the alternative:
> 
> 	for (;;) {
> 		c = getchar();
> 		if (c == EOF)
> 			break;
> 
> I would argue that a strong case can be made that the for loop is actually
> LESS CLEAR than the while loop.  By announcing the for loop as a forever loop,

I guess this is why some programmers make more than others.  I find the
open() example to be perfectly readable.  The while() example is also
my preferred method.  Some of the other code I preferred to use:

	struct something {
		int	whoknows;
		char	whatever;
	};

	func (member)
	struct	something	*member;
	{
		if (member && member->whatever == ILLEGALVALUE)

relying on the left-to-right order of && evaluation.  I hated PASCAL in
college because the above had to be coded

		if member <> nil then
			if member^.whatever = ILLEGALVALUE then
				. . . .

Also, I like to perform if-then-else's in expressions some times so
the value inside a block of code is more uniform -

	if (isupper (c) || (islower (c) && (c = toupper (c)))) { ...

This is a lousy example for me since I almost never use ctype ... but
you get the point.  Inside the then part 'c' is an UPPER case letter -
if there is an else-part then c is not a letter of any case.

My favorite this year is

	switch (fork ()) {
		case -1:	/* fork() failed */
			perror ("some message");
			break;
		case 0:		/* in child process */
			exec ("my_prog", 0);
			perror ("some other message");
			exit (-1);
		default:	/* in parent process */
			who_knows_what ();
	}

I got tired of declaring int fork(); on one line, int child on another and
then having to write all of those stupid if-then-else's.  This example seems
to show the _exact_ nature of the fork() system call.  My big beef with
the for(;;) example the poster gave is that it is a perfect example of a
middle tested loop - and in this case the code to make it a top tested
loop is very simple.  If you want to use a for() loop, at least use a
good one:

	for (c = getchar ();c != EOF;c = getchar ()) { ...

I don't care for this much because of the two getchar calls on the same
line - looks wasteful. And besides,

	while ((c = getchar ()) != EOF) { ...

has the exact same behavior and may well optimize to the exact same code.
(It can be optimized to exactly the same code as the for() loop).  But
then I never assume anything about my optimizer.

I ask the group, thich do you prefer?

	if (access (file, 0)) {
		fd = open (file, 0);
		if (fd >= 0)
			good_stuff ();
		else
			error_handler ();
	} else
		error_handler ();

	- or -

	if (access (file, 0) && (fd = open (file, 0)) >= 0)
		good_stuff ()
	else
		error_handler ();

When I see the first example, I read it in English as

	If I can access the file, then
		get a file descriptor for the file from open ();
		if the file descriptor is legal, then
			do good_stuff();
		else
			do error_handler();
	else
		do error_handler();

and by the time I am down to the second call to error_handler I have
forgotten why the test failed in the first place.

The second example I read as

	If I can access the file and get a legal file descriptor from open, then
		do good_stuff();
	else
		do error_handler();

I might even be inclined to throw in a || creat (file, 0666) someplace in there
just in case I might want to have the file no matter what.

This to me is consice code.  Anything more is a waste of my time.

- John.

henry@utzoo.UUCP (Henry Spencer) (07/06/87)

> Some people (Dijkstra and Gries among them) no longer believe that this
> kind of "conservative" test is preferable to the exact test; they base
> this on what it takes to prove an implementation of an algorithm to be
> correct.

Since in the general case such proof requires the use of the ultimate high-
level software tool -- a graduate student -- those of us without access to
such facilities can safely disregard this argument! :-) :-)
-- 
Mars must wait -- we have un-         Henry Spencer @ U of Toronto Zoology
finished business on the Moon.     {allegra,ihnp4,decvax,pyramid}!utzoo!henry

jpn@teddy.UUCP (John P. Nelson) (07/07/87)

>Microsoft C (if it really requires NULL to be 0L) is broken by all
>standards that C compilers have ever been designed to.  K&R explicitly
>defined 0 comparisons to pointers to be the rule and no one since has
>come up with any convincing arguments to the contrary.

I don't know, I think this is defensible.  The reason that Microsoft did
this was so that UNIXisms like:

   execl(a, b, c, NULL);

Continue to work even on the bizzare 8086 architecture.  I realize that
this is strictly incorrect (NULL should be cast to (char *)), but a lot
of existing code depends on this behavier.  Of course, pointer
assignments and comparisons to (int)0 MUST continue to work properly
also, or the compiler is broken (and in fact, they DO work properly)!

dg@wrs.UUCP (David Goodenough) (07/07/87)

In article <1213@carthage.swatsun.UUCP> rice@swatsun (Dan Rice) writes:
>	Here's a simple question for you.  Suppose I have defined structures
>containing other structures, i.e., 
>
>typedef struct {
>	float x, y, z;	/* Coordinates of a vector in 3-space */
>} vector;
>
>typedef struct {
>	vector o;	/* Center */
>	float r;	/* Radius */
>} sphere;
>
>sphere s1, *s2;
>
>Now, say I refer to s1.o.x and s2->o.y.  Does the compiler convert this into
>a simple address reference at compile time, or is work performed at runtime?

It's done at compile time.

>Should I define
>	vector center;
>	center = s2->o;
>if I plan to use s2->o several times?  Thanks for any help.

You'd do better with

	vector *center;
	center = &(s2->o);

and refer via center->. You will get two savings out of doing it this way:
whenever you do a structure reference e.g.

	foo.bar

the compiler internally has to convert it to

	(&foo)->bar

i.e. it's more eficient to work with the address of a structure; and

	center = s2->o  

is a whole structure assignement (12 bytes worth of move on a 68K)
whereas

	center = &(s2->o)

is only a pointer assigment (i.e. 4 bytes). NOTE also that these two
will have very different results if you start assigning back into
center: with your method since center is a complete new copy of the
structure

	center.x = 3.0

would not assign 3.0 into s2->o.x; whereas with pointer work

	center->x = 3.0

drops the value into s2->o.x - something to be aware of when you're deciding
which way to go.
--
		dg@wrs.UUCP - David Goodenough

					+---+
					| +-+-+
					+-+-+ |
					  +---+

mc68020@nonvon.UUCP (mc68020) (07/07/87)

   I have been reading all the articles in this series on readable code.  I
have seen some thouroughly ridiculous and outrageous comments and suggestions,
fortunately more than outweighed by the rational, well though out comments
of a few of our resident experts.

   What bothers me, however, is that while all of this discussion about style,
readability and portability is educational and even, for the most part, 
enjoyable, I have seen not one mention of what I consider to be the single
most important element of well written, portable code (in *ANY* language,
not just C!).  That element is DOCUMENTATION.  COMMENTS.  

   It is all well and good for the expert C guru to casually dismiss this 
with the most common (and most heinously WRONG) argument "any programmer
good enough to work on this code should be able to figure out what it does."

   Sure, a good programmer needs to be able to analyze a piece of code.  This
does not by any means imply that *EVERY* piece of code written should enforce
such analysis through lack of proper documentation.  Such excuses are precisely
that, the excuses or rationalizations of lazy, arrogant programmers.

   The plain fact of the matter is that any program that is undocumented is
a poorly written program, I don't care *WHO* wrote it, how elegant the
executable code is, how beautiful the algorithms...without documentation,
the program is incomplete.

   As an example, there is a small C program included in the news 2.11
distribution, under the misc directory, named article.c.  I will not name
any names, but it was written by a very well known and highly respected
member of the UNIX/net community.  I am sure that the program does whatever
it does elegantly and efficiently.  Therein lies the rub:  What the ****
does the program do?  No documentation header, no comments saying:  this program
does XYZ.

   There are a few scattered comments in the code which are virtually useless
to anyone but a guru level programmer, and probably not of much value to them.

   This program should NOT have been included in the distribution.  What
for?  So I can take several hours to several days of my time analyzing it
to figure out what the **** it does?????  Of course, in all fairness, it is
almost as well documented as the news software itself.

    I *HATE* approaching the problem of porting some piece of code from 
mod.sources to my system, because in 98% of the cases, there is little or
no documentation included.

    It is *WONDERFUL* that someone took the time to write a piece of useful
code, and chose to share it with the rest of us.  Commendations and kudos
to the contributors!  I should think, however, that they would be ashamed
to place in the public domain incomplete, poorly written programs with
THEIR names on them! (by that I am referring the above harped upon issue
of undocumented programs being incomplete programs)

    Frankly, I see two major classes of people foisting this crap off on the
world:  those who are simply too lazy to do things correctly (probably far
and away the largest group), and those whose arrogance causes them to 
deliberate obfuscate.

    Yes, I accuse many of the experts, particularly the older experts,
of deliberately making their code difficult to read, and to understand.
I believe their attitude is something like: "*I* had to suffere through
poorly documented code, and no comments, so everyone else should too."

   I have had some experts say such things to me in so many words.  I say
to them, and to all of you, that such an attitude is anti-social, counter-
productive, rude, and downright vicious.  So far as I am concerned, if that
is your attitude, then KEEP your damned knowledge, because it is flawed,
just as you are flawed.

   My apologies for the long-widedness and slight flamage.  This is an issue
I take very personally.

dg@wrs.UUCP (David Goodenough) (07/07/87)

In article <262@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes:
>The following discussion is all relative to
>
>	#define EOF -1
>	char c;
>	while ( ( c = getchar() ) != EOF ) { /* do something */ }

In the above case I have worked this on several machines (PDP 11/44A,
VAX 11/750, 68K, Z80) and in the first three cases I get the following
_WARNING_:

"Non-portable character comparison"

However on the Z80 it gets it wrong (because chars are implicitly
unsigned) .....

>I argue
>that this is because in this context, the assignment is a side
>effect, and isn't intended to `narrow' the value of the expression.  
>Just to point out where the logical extension of case two gets us,
>what is the value of x in this fragment:
>
>	float x;
>	int i;
>
>	x = 2.0 + ( i = 3.5 );
>
>I would say 5.5; others might say 5.0, it seems.  But if I _wanted_
>5.0, I would expect to write
>
>	x = 2.0 + (int) ( i = 3.5 );
>
>and I appeal to the principle of least astonishment for justification.

/*   FLAME THROWER OUT!!  :-)  */

Now hold on just a minute !!!!! - aren't we forgetting something about
the order of evaluation rules here: the (i = 3.5) _MUST_ be evaluated
first, and evaluates to an _int_ type expression whose value is 3 (i.e.
the value assigned into i, so we effectively have:

	x = 2.0 + 3

and the day that evaluates to 5.5 I'm gonna jump off the Golden Gate:
Think about the following:

	float x;
	float y;
	int i;
	int j;

	j = (x = 2.0 + (i = 3.5));

	    (y = 2.0 + (i = 3.5));

Are you going to try to convince me that the two lines above put different
values into x and y??????, because that seems to be what is being said here:
type coercion works from the INSIDE OUT, NOT the OUTSIDE IN, i.e. let
each expression look after itself; i.e. you can't cast the (i = 3.5)
based on the fact that it's about to be assigned into a float, it is
an integer type expression pure and simple.

/*   FLAME THROWER AWAY!!  :-)  */

--
		dg@wrs.UUCP - David Goodenough

					+---+
					| +-+-+
					+-+-+ |
					  +---+

guy%gorodish@Sun.COM (Guy Harris) (07/07/87)

> I don't know, I think this is defensible.  The reason that Microsoft did
> this was so that UNIXisms like:
> 
>    execl(a, b, c, NULL);
> 
> Continue to work even on the bizzare 8086 architecture.

1) This is NOT a "UNIXism".  There are UNIX implementations on which this
does NOT work.

2) The 8086 architecture may be bizarre, but this doesn't have a lot
to do with pointers and integers not being the same size.  There is
at least one 68000 C implementation on a UNIX system that provides
16-bit "int"s and 32-bit pointers, and *this* is defensible as well,
given the 68000 and 68010's incomplete support for 32-bit arithmetic.
(On balance, going with 16-bit "int"s on a 680[01]0, especially in a
UNIX system, may not be the right choice, given that:

	1) the maximum value representable in an "int", in pre-ANSI C
	   implementations, is the maximum size of an object and the
	   largest value you can use as a size for "malloc" *OR*
	   "realloc"

and

	2) the 68020 makes it less of a win to have 16-bit "int"s, so
	   you'd either have to sacrifice binary compatibility when
	   upgrading, throw in hacks to make it work, or hobble the
	   68020

but going with 16-bit "int"s is certainly not INdefensible.)

3) There are probably some architectures that most people would
consider more bizarre than the 8086, for which a C implementation
would make sense, but where

	1) not all pointers are the same size

or

	2) null pointers are NOT represented as a string of
	   zero-valued bits

and, as such, on which you MUST properly cast null pointers when
passing them as arguments.  Defining NULL as 0L may act as a
Band-Aid(TM) for some architectures, but it may merely postpone the
day of reckoning in some cases.  One might as well do it right in the
first place....

> I realize that this is strictly incorrect (NULL should be cast to
> (char *)), but a lot of existing code depends on this behavier.

It's not a question of '"strictly" incorrect', it's a question of
"incorrect".  There was some code in 4.1BSD that "depended" on
location 0 containing a byte with a value of zero, but, well, that
code doesn't work everywhere and has to be fixed.  There may be cases
where contorting the implementation to cater to bad code causes more
problems than it solves.  And if you don't supply negative
reinforcement, some bad habits are never unlearned.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

dave@sds.SciCom.MN.ORG (dave schmidt x194) (07/07/87)

> > I don't know, I think this is defensible.  The reason that Microsoft did
> > this was so that UNIXisms like:
> > 
> >    execl(a, b, c, NULL);
> > 
> > Continue to work even on the bizzare 8086 architecture.
> 
> 1) This is NOT a "UNIXism".  There are UNIX implementations on which this
> does NOT work.

Absolutely true.  However, looking through my MicroSoft Xenix manual,
I can find correct synopses such as

	int execl( path, arg0, arg1, ..., argN, (char *)0 )
	char *path, *arg0, *arg1, ..., *argN;

and

	int wait(stat_loc)
	int *stat_loc;

	int wait((int *)0)

but I can also find incorrect (or at least misleading) statements such as

	char *strtok(s1, s2)
	char *s1, *s2;

	the description for strtok() states:
	"... Subsequent calls with NULL for the first argument ..."

and

	char *defread(pattern)
	char *pattern;

	the description for defread() states:
	"... or it [the program] may call defopen with NULL, ..."

which certainly implies that strtok(NULL, s2) and defread(NULL) are valid
calls.  Hopefully, the misleading implications above are peculiar to the
MS manuals and are not indicative of what the System V or BSD manuals say.
If, however, other manuals mislead the reader in this manner, then it is
small wonder that people (a) don't cast NULL when passing it as an argument
to functions, (b) regard things such as execl(a,b,c,NULL) as an "UNIXism",
(c) think (as I used to, before my enlightenment :-) that #define-ing NULL
as something other than 0 (like (char *)0) is correct, since if it were
otherwise one couldn't write defread(NULL) which the manual says is OK.

Anyway, how about this: if your reference manuals seem to imply that 
passing NULL to a function without a cast is an OK thing to do, drop
me an e-mail message and I'll summarize to the net later if there is
sufficient response.

Dave Schmidt

{cbosgd, ihnp4, rutgers}!meccts!sds!dave

wcs@ho95e.ATT.COM (Bill.Stewart) (07/08/87)

In article <262@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes:
: The following discussion is all relative to
: 
: 	#define EOF -1
: 	char c;
: 	while ( ( c = getchar() ) != EOF ) { /* do something */ }
: 
: In article <22635@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
: > > The assignment into a character causes getchar's int return
: > > to be changed to char, invalidating the following comparison> 
: > On a machine where "char" is not signed, this won't work even if you
: > run it on text consisting entirely of 7-bit characters.  "getchar"
: > will return EOF when it hits end-of-file; EOF (-1) will get converted
: > to '\377' ... which compares equal to 255 but not to -1.
: 
: Isn't there a little room here for arguement?  Now of course K&R say
: that an expression is coerced to the type it is being assigned to.
	As with other expressions, the definition of the language
	specifies the value and type of the expression.  For an
	assignment expression "lvalue = rvalue", the type is the type
	of the lvalue, and the value is the value assigned to the lvalue.

: But isn't assignment, in the strictest sense, a side effect?
	Yes, in some sense.  Consider ++n and n++.  Both have the side effect
	of incrementing n, but they have different values.

: and I appeal to the principle of least astonishment for justification.
	Consider the case 
		char c,d;
		c = -1;
		printf( "%d %d\n", c, (d = -1) );
	By "least astonishment", it should print the same number twice.
	On your VAX, that will be -1 -1; on my 3B it will be 255 255,
	because characters have values in the range 0 - 255, not -128 - 127.
	While the 255 is mildly surprising, it's a lot less surprising than
	255 -1 would be.  But yes, people occasionally get surprised
	when they move code from VAXen to other machines.
-- 
# Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

devine@vianet.UUCP (07/08/87)

In article <13148@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie) writes:
> Microsoft C (if it really requires NULL to be 0L) is broken by all
> standards that C compilers have ever been designed to.  K&R explicitly
> defined 0 comparisons to pointers to be the rule and no one since has
> come up with any convincing arguments to the contrary.

  Yes, MS does define NULL to be 0L for large model programs.  It is
correctly set to the integer constant 0 for small and medium models.
The error is compounded by having NULL #define'd like that in *several*
include files -- not just stdio.h.

  That split definition comes from the MS decision [*] to support its
linker.  This thinking also created the atrocity of "near" and "far" pointers.

Bob Devine

[*] It also may be argued that this all stems from Intel's decision to be
    forever backward-compatible with an old chip design.

jr@amanue.UUCP (07/08/87)

In article <1221@ius2.cs.cmu.edu>, edw@ius2.cs.cmu.edu (Eddie Wyatt) writes:
> In article <221@amanue.UUCP>, jr@amanue.UUCP (Jim Rosenberg) writes:
> > 	while ((c = getchar()) != EOF) {
> > 		.
> > 		.
> > 		.
> > 
> > ... But consider the alternative:
>                    ^^^^^^^^^^^^^^^
> > 
> > 	for (;;) {
> > 		c = getchar();
> > 		if (c == EOF)
> > 			break;
> > 		.
> > 		.
> > 		.
>    "the alternative"!  Come on there are a thousand and one ways to code
> the semantics of the above loop.  For example :
> 
> 
> 	c = getchar();
> 	while (c != EOF)
> 	    {
> 	    .
> 	    .
> 	    .
> 	    c = getchar();
> 	    }
> 

I don't mean to flame you, but as it happens I believe your solution is
dead wrong, on two counts.  (1) The two loops are *NOT EQUIVALENT*!  They
may be equivalent to Pascal programmers, but they certainly are not to C
programmers.  The reason is simply that in your version you must completely
forego the use of the continue statement.  The fact that break and continue
are missing from Pascal is a general class disaster, whereas the fact that
they're present in C is a constant joy.  A continue statement inside your
loop will obviously fail to reexecute getchar().  I suppose you could get
around that as follows:

	for (c = getchar(); c != EOF; c = getchar()) {
		.
		.
		.
	}

which frankly I don't find objectionable.

(2) Once again I must protest that your version is really less clear.  Think
about how *pseducode* for this loop might look:

	while (the next character is not EOF) {
		.
		. do stuff with that character
		.
	}

As I understand it the whole idea behind structured programming is to build
the logic insofar as possible by making boxes, where the control flow is
determined by what kind of box you have.  If we ignore (1) above your method
will work, but it doesn't really *EXPRESS THE THOUGHT* cogently.  It is not
c I'm really testing for EOF -- c is a handy tool that I use because I have
to have a variable name.  The real thought here is that it's "the next
character" that I'm testing for EOF.  By putting the call to getchar() right
there in while statement I state clearly that this is a "testing-the-next-
character" box -- which your code doesn't.

It's hard for me to fathom why the idea of assignment as an operator causes
such consternation.  It's obviously due to the fact that most progammers
learned some other language before they learned C (so did I for that matter)
and you get caught in a rut where it becomes hard to escape the habit that
assignment is a statement, and so must go on a line by itself.  Requiring
assingments to be on a line by themselves is no more necessary to code
clarity than adding dozens of useless temporary variables to arithmetic
expressions because you feel you should never be faced with the sight of
3 right parentheses in a row.

I will grant anybody that this is an aesthetic or philosophical argument,
and there are bound to be differing points of view.  Where I will get upset
is with those who believe that legitimate coding techniques like

	while ((c = getchar()) != EOF) {

are mere attempts to show off.  For *me* personally, putting the assignment
into the condition of the while statement is more clear and says what I mean
much more effectively than your version.  If you like call it a matter of
personal taste.
-- 
 Jim Rosenberg
     CIS: 71515,124                         decvax!idis! \
     WELL: jer                                   allegra! ---- pitt!amanue!jr
     BIX: jrosenberg                 seismo!cmcl2!cadre! /

shaffer@operations.dccs.upenn.edu (Earl Shaffer) (07/08/87)

YEA!!!  Finally, someone has the guts to say that bad code is bad
code!  Code is not a scupture.  Commercial companies are interested
in a commercial product.  It must work, be maintainable, and be
tested.

People take that ".. real programmers dont comment their code, if it
was hard to write it should be hard to understand. " seriously!  Are
they working for your company?
==============================================================================
Earl Shaffer - University of Pennsylvania - Data Communications Department
"Time was invented so that everything wouldn't happen at once." Steven Wright
==============================================================================

dsill@NSWC-OAS.arpa (Dave Sill) (07/08/87)

Ross Alexander (auvax!rwa) has been blasted for saying:
>Isn't there a little room here for arguement [over the value of an
>assignment expression]?

But if one reads his next sentence:
>Now of course K&R say
>that an expression is coerced to the type it is being assigned to.
one can see that he was not disputing "the correct behavior" of C
compilers implementing K&R, but was asking us to question whether this
is consistent with the philosophy of the language.

Given this context, his point:
>So the coercion for the assignment of `int' to `char' in
>this case is a side effect and shouldn't have an affect on the value
>of the whole expression `( c = getchar() )'.
makes a lot of sense.  Another example may help clarify the point.
Consider
	long i, func();
	short j;

	i = j = func(...);
Under K&R, the return value of func will be converted to short before
the assignment to j takes place.  If, however, this value is too large
to fit in a short it will be truncated.  The value assigned to i,
then, will be the one assigned to j.

Ross's point, I believe, was that this is probably more likely to
cause trouble than if the value of an assignment expression was
defined to be the right-hand side. I wouldn't suggest changing the
semantics of C in this case, but it's something to keep in mind.

-Dave Sill
 dsill@nswc-oas.arpa

The opinions expressed above are those of the author and do not
reflect the opinions or policy of the Department of Defense or the
U.S. Navy.

ron@topaz.rutgers.edu.UUCP (07/08/87)

If your variable "c" is a character, all of your alternatives are wrong.
EOF is an integer -1 return from getchar.  If you cast it (or assign, same
thing) to char before the test, at best you can hope for is that you will
only get misleading information from time to time (in the case that 0xFF
was really input) or that it doesn't work at all (where (char)  -1 is 0xFF
and won't ever equal -1).

-Ron

gwyn@brl-smoke.UUCP (07/09/87)

In article <598@nonvon.UUCP> mc68020@nonvon.UUCP (mc68020) writes:
[stuff about insufficient comments]

I think the reason there hasn't been much discussion about code commenting
is because the need for it is well understood and drummed into the heads
of most CS students.  You are quite correct in remarking that much code
available in the UNIX community is not very maintainable, with lack of
documentation being a primary factor.  Remember, however, that a lot of
the original UNIX code was developed as "fast prototypes" by people who
needed functions for their research projects, so that production quality
was not an issue for them.  What is much harder to excuse is the fact that
the code quality and documentation have not been much improved by the folks
who package and commercially license it.  One wonders what their job is, if
not to clean up the product before distribution.  BWK's "pic" particularly
comes to mind as a very useful tool that has some incredible nonportable
"quick hacks" that should have been fixed inside AT&T, not by me and others
who receive the code after forking over a lot of money.

Another solution might be for even the researchy types to exert care to
make their code high-quality and as portable as possible.  I've found that
it doesn't take much longer to produce quality code, and in the long run
EVEN FOR ONE'S PERSONAL USE it saves time to do so.  Many's the time when
I've thanked the DAG of past years for anticipating future maintenance
questions and providing helpful information in the original sources.

I haven't even mentioned structured software design methodology, which
inherently produces accurate, complete software documentation.  (UNIX
types tend to resist doing things in an organized manner.)

garys@bunker.UUCP (Gary M. Samuelson) (07/09/87)

In article <598@nonvon.UUCP> mc68020@nonvon.UUCP (mc68020) writes:

>   ... while all of this discussion about style,
>readability and portability is educational and even, for the most part, 
>enjoyable, I have seen not one mention of what I consider to be the single
>most important element of well written, portable code (in *ANY* language,
>not just C!).  That element is DOCUMENTATION.  COMMENTS.  

I heartily agree with the above and with most of what followed (deleted
to save space).

Since code will be read more times than written (hopefully, it will be
written only once), it should be written in such a way that it is easy
to read.  The headaches you save may be your own.  In particular, one
should resist the temptation to be clever just for the sake of cleverness.
Six months later, you won't remember why you yourself wrote that bizarre
piece of code, or why it works.  Even if you do, the peculiar circumstances
that made it work then don't apply anymore.

>    Frankly, I see two major classes of people foisting this crap off on the
>world:  those who are simply too lazy to do things correctly (probably far
>and away the largest group), and those whose arrogance causes them to 
>deliberate obfuscate.

Well, there is a third class.  Some of us are actually not permitted
to "do it right."

Out in the "real world," as they called it when I was still a student,
we have things called schedules and deadlines.  Documentation doesn't
get written because it takes time and might, the bosses say, delay the
introduction of our product past the time when it would make economic
sense to produce it.  Now I personally think this is shortsighted, and
penny-wise-but-pound-foolish, and that investing time in documentation
would save more than enough development time to justify it, even on the
short-to-moderate term, but I don't know how to prove it to the powers
that be.  Those one or two levels up say that they believe in the value
of documentation, but somewhere up the line the commitment is not there.
"Yes, documentation is important, but so-and-so customer is screaming
for such-and-such a feature, so we can't schedule time to write anything
but code."

I'm sure that there are published studies which show that writing
good documentation results in reduced development time, fewer bugs,
and ease of maintenance. Pointers, please?

But even if I present all the theory in the world, and point to my
own experience (MY software is always within a month of schedule),
the reply is always the same: "So-and-so customer is screaming...
we're going to lose a several million dollar contract..."  But at
some point improving the car is no longer worthwhile; it's time to
build a plane.  And while we're working on the plane, someone has
to start thinking about the rocketship.

People -- managers being no exception -- tend to believe what they
want to, and don't want to change (they would have to admit being
wrong in order to change).  Not even real life examples are always
persuasive.  In one case, I said to the boss three levels up, "Behold,
I did it my way, even the way I have been recommending as a better way,
and lo, the project was done on time, yea, even early."  (On time
around here is rare; early is unheard of.)  And this person actually
told me that if I had done it the Traditional way, it would have
take even less time.  An unassailable position, false though it be.

Well, managers have their problems, too.  Managers have to be able
to measure progress, and it's hard to measure the progress in a software
development effort, especially when software, per se, is not currently
being developed, but documentation is.  Lines of code per day is a rotten
metric, but what else is there?  LOC is zero during documentation, and
artificially high for inefficient programmers (if Goofus writes 100 lines
to perform a function that Gallant's 10 line function performs, it's Goofus
who appears more productive by the LOC metric).  Some measurement more like
features per fortnight is needed, but it's still going to be zero
during the documentation phase.

Then, when Goofus puts in lots of overtime to clean up what he should
have done right in the first place, he looks like the hero, and he gets
the bonus or the promotion.  Gallant, who works much more efficiently,
and has therefore completed his own tasks within a normal work week,
and still had time to help others, gets criticized for not putting out.

To change all this, you have to convince managers to start managing
differently.  So you say something like this: "Using our traditional
ways of development, this project is going to take two years (even though
the Old Guard who have been here forever have committed themselves to
a one year schedule).  Using the new state-of-the-art development
techniques, we (essentially, the new kids on the block) will complete
the project in 18 months.  This is six months more than the Old Guard
said it would take, but six months less than what it will really take
them, based on the fact that every project done by the Old Guard has
taken at least twice as long as the schedule said.  Since we are going
to spend two-thirds of our time planning and documenting, and only
one-third coding, if you use the metrics you are accustomed to using
(e.g., lines of code), there won't be any measurable progress for the first
12 months."

And the manager says, "I see.  You claim that our most senior,
most experienced engineers won't make their schedule.  You make
this claim with less than half the years of experience they have.
Further, you make this claim before the project even starts; before
they have run into any problems.  You further claim that you will
be able to meet your proposed schedule, even though the project hasn't
started, and even though you don't know what problems lie ahead of
you.  And you admit that for the first 12 months, I won't be able to
show *my* superiors any of the kind of progress that they have come
to expect, but after those 12 months are up, when the Old Guard says
they will be *done*, you will be ready to *start* coding.  And besides
all that, you expect me to measure your progress in a manner which you
yourself will prescribe."

"Well, yes, that about sums it up," you are forced to admit.

And the manager says, "Well, when you have a few more years of experience
then we will see what you can do when you are given a project to manage.
In the meantime, you'll have to work with the Old Guard and do things
their way."

And then you go off and learn how to do things the old way, and spend the
rest of your life holding one hand over your mouth (to keep from complaining
too much -- one has to have a good attitude, you know) and the other on your
stomach (to keep from retching every time you have to make your software
interface with spaghetti code that should have been re-written -- I won't
say re-designed, because it hasn't been designed the first time yet --
years ago).

>    Yes, I accuse many of the experts, particularly the older experts,
>of deliberately making their code difficult to read, and to understand.
>I believe their attitude is something like: "*I* had to suffer through
>poorly documented code, and no comments, so everyone else should too."

Well, I am not one of the older experts, but I have to confess to doing
what you say, but not for the reason you state.  I have written things
which were difficult to read not to confuse anyone, but because I was
trying to be "clever."  I have repented of such things, and try to
suppress that temptation now.  Ironically, some of these tricks I
gleaned from this network.

Gary Samuelson

rwhite@nu3b2.UUCP (07/10/87)

In general x = y = z as a "C" statment is simply a short way to write
y = z;
x = y;

If we make z long, y short, and x int and we invoke the pruposed change
we get the unique situation that any of the following sequences MAY
or MAY NOT evaluate as true on a machine depending only on the size
of the types:
after x = y = z;

( x == y || y == z || x == z) WILL BE FALSE ON SOME MACHINES

Part of the idea of x = y = z where y is the smallest range(ed) type is
that  x and y WILL BE EQUAL on subsiquent tests for all ranges and returns
of z.  If x is the smallest range(d) type x and y WILL BE EQUAL only for
those ranges and returns of z which are valid for x.  This is one of the
core issues of return validation under "C".

The question can be simplified to:

Do you think: y = z; x = y; if (x == y) {...};
AND	    : x = y = z; if (x == y) {...};
shoul behave the in a identical manner on all machines?

Before you say no, consider flag masking and complex algebra.  Simple
functions like:

#define	FLAGPART	0x003f
#define ERROR		0x0010

...

	if ((tempval &= FLAGPART) != ERROR) {
		/* Process Other VALID Flags */
		}
	else {
		/* Process Error Flags		*/
		}

Yes, there are other ways to write this, but as the things get more
complex, how would you like to descover that all your temporary variables
contain masks instead of values.  Or even worse since ++x is identical
to x += 1 and x = x + 1 then l = n[++x] equals l = n[(x += 1)] always
yeilding the first element of n or l = n[(x = x + 1)] with x as
enum and n demensioned in the enum type maxenum which is undefined
unless this trick is applied to enums with special quantities [like char]
which wrap in their domain if x = makenum n[++x] is OUT OF BOUNDS in
the PERPOSED system [max + 1] but a valid wrap in STANDARD "C"

To the claim that x = y = z is too prone to simple types without the
mod, I directly say:

1) PAY ATTENTION WHEN YOU CODE.

2) IF YOU ACCEDENTLY TEND TO WRITE y = x = z you will probably write
	x = z = y or
	x = y;
	y = z;
	and miss the difference, it is an OVERSITE of equal precidence
	and equal danger.

If you try to "protect" the programmer until assignment ORDER dosn't
matter [and to put it simply that IS what you are trying to do] then
your results wont matter either.

The SHORTCUTS in "C", like multiple concatinated assignment and all the
opp-equals [etc], are for ease of coding without excess size or complexity
they never were intended to prevent the careless from making mistakes.
If you add, and keep adding, protections you will have COBOL when you
are finished.  The reasons things in "C" have a "scanning" or "grouping"
direction is because they are scanned and/or grouped out to their
EXPLICIT form.  x = y = z is NOT give x and y the value of z, it IS
give y the value of z and then give x the value of y, [you will note that
it groups RIGHT TO LEFT]  this is correct an should not be changed.  The
value of this is obvious, contains no abstractions, and is not dependant
on anything other than programmer intent.


Robert.

Disclaimer:  My mind is so fragmented by random excursions into a
	wilderness of abstractions and incipient ideas that the
	practical purposes of the moment are often submerged in
	my consciousness and I don't know what I'm doing.
		[my employers certainly have no idea]

ggs@ulysses.homer.nj.att.com (Griff Smith) (07/10/87)

In article <6087@brl-smoke.ARPA>, gwyn@brl-smoke.UUCP writes:
> In article <598@nonvon.UUCP> mc68020@nonvon.UUCP (mc68020) writes:
> [stuff about insufficient comments]
> 
> I think the reason there hasn't been much discussion about code commenting
> is because the need for it is well understood and drummed into the heads
> of most CS students.

I hope the students you know of are the norm.  When I was teaching a class
the only thing that got their attention was an ultimatum: "If I can't
understand your program in ten minutes, your grade is no better than a C".

> Remember, however, that a lot of
> the original UNIX code was developed as "fast prototypes" by people who
> needed functions for their research projects, so that production quality
> was not an issue for them.  What is much harder to excuse is the fact that
> the code quality and documentation have not been much improved by the folks
> who package and commercially license it.  One wonders what their job is, if
> not to clean up the product before distribution.

Sigh...  My spies at Summit tell me that part of the clean-up consists of
removing all author names and all attempts at humor.  There is now a
company-wide "quality" campaign, however, so maybe some of it will surface.

A few years ago there was resistance to changing anything if it didn't
improve the benchmarks.  It took me two years to convince people that
a command with ten bugs in 500 lines of code was broken.  I kept getting
the comment that "there haven't been any complaints, so nobody must be
using the features that are broken".  I was also told that the required
feature proposals, design reviews and quality assurance documentation
would be too expensive.  I finally gave them a fixed version accompanied by
a feature verification script and got it accepted.

Good comments, Doug.  Second the motion about doing it right the first time.
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{allegra|ihnp4}!ulysses!ggs
Internet:	ggs@ulysses.uucp

Leisner.Henr@Xerox.COM (marty) (07/10/87)

>RE:  If I understand correctly, according to K&R the "usual arithmetic
>conversions" will be applied to the operands of the conditional
>operator causing the int 0 to be converted to long int 0.

I've found it is often a bad idea to assume these funny type conversions
will take place.  If it doesn't do what you expect, this is one of the
worst possible bugs to track down.

C has the power to let you do anything you want -- including shoot
yourself in the foot.

The PC memory models make life rougher for the programmer, but it also
forces you to compare apples to apples.  A lot of C code with the
implicit assumptions sizeof(char *) == sizeof(int) is very hard to port.

Also, it is bad practice to define NULL to be something memory model
dependent.  A better practice is:

#define NULL	(char *) 0

This automatically takes care of the sizeof dependencies.  This gets to
be a major issue when routines are being passed and/or return NULL.





marty
GV:  leisner.henr
NS:  martin leisner:henr801c:xerox
UUCP: martyl@rockvax.uucp

ron@topaz.rutgers.edu (Ron Natalie) (07/10/87)

Yes, but you can't use NULL defined to be anything other than 0
if you hope to use near and far pointers.  They've defeated themselves
by trying to guess how the user was going to use NULL ahead of time.

-Ron

rice@pilchuck.Data-IO.COM (Ken Rice) (07/10/87)

Putting an assignment statement into a control statement like this

	while( c=test() != EOF )
		;

Makes for readable code, I agree--but can play havoc with maintainability.
If this code breaks, it's can be hard to single-step through this section
and find out what "c" is after test() (depends upon your tools, of course).

I often have to rewrite to separate things into visible parts for maintenance
like was suggested

	c=test();
	while(c!=EOF)
	    {

	    c=test();
	    }

Now, I can see what's going on with my debugger.

The problem is that, by changing the code, I may have broken it more. It's
different now and, by definition, untested. Should I but it back? That
risks breaking it too. Should I leave it? Should I have been a Forest 
Ranger? 

This is a case where one quality issue--readability--can reduce the level
of another quality issue. Readability is my usual soapbox, until I run
into this crap

	return(x=test());

Then, I want to kill.

Let's write readable and MAINTAINABLE code; Code that we won't have to
change to see how it works. If it reads well and fixes lousy, it's wrong.

Ken Rice

tim@praxis.co.uk (Tim Magee) (07/10/87)

	I'm catching up on comp.lang.c, due to a break in our news supply.
    this reply may duplicate something I haven't seen yet. From the tenor of
    previous articles, though, I doubt it.

In article <1941@zeus.TEK.COM> dant@tekla.UUCP (Dan Tilque) writes:
- Ron Natalie writes:
- >	I also wonder about people who define TRUE to be any
- >thing, since it leads to things like
- >	if( bool == TRUE )
- >which is different than
- >	if(!bool)
- I thought that TRUE and FALSE should be:
- #define FALSE 0
- #define TRUE !FALSE
- With these #defines the above two statements are equivalent.

	Are you absolutely sure? suppose 'bool' is TRUE, then the first
	says:
		if (TRUE == TRUE)
	while the second says
		if (!TRUE), or	if (!!FALSE).
	Are you still sure?

- >Generally, I use !p when I'm dealing with things that are supposed
- >boolean values like
- >	if(!isdigit(c))
- It's often easy to miss a single character (especially one that doesn't
- stand out like the "!") when quickly scanning code.
- 	if (isdigit(c) == TRUE)
- will compile to the same object code on almost every compiler...

	I quote from the SUN 3.2 UN*X manual page for is****, the ctype macros:

    'Each is a predicate returning non-zero for true, zero for false.'

	The two examples may well compile to the same object code. However the
    second example won't do anything like it seems to be aimed at achieving,
    because the ctype macros return the result of AND-ing integer constants of
    various values with members of an external array. A hallmark of readable
    code is that it does what it appears to say!

	The reason most people write obscure C is because they think it's
    faster. Why do they care? I guess most C programmers work on UN*X or a
    poor man's copy, and the amount of time UN*X spends executing other
    peoples' code is large enough to make your trivial 50 nanoseconds saving
    worth nothing.

	In the case of the 'isdigit' above, and making 'if (!isdigit(c))'
    less unreadable, how do you feel about:

#define NOT !
...
	if (NOT isdigit(c)) {

    Tim M.
-- 
    Tim Magee
...........P[{}([{>)}>>)}<(>]([)>}<[([{<[>)}}>)>]})]}[({>)>}...........
    "Eliminate the impossible, my dear Watson, and what remains..."
    "Is still an uncountably infinite set, Holmes."

tim@amdcad.AMD.COM (Tim Olson) (07/10/87)

In article <8249@brl-adm.ARPA> Leisner.Henr@Xerox.COM (marty) writes:
>Also, it is bad practice to define NULL to be something memory model
>dependent.  A better practice is:
>
>#define NULL	(char *) 0
>
>This automatically takes care of the sizeof dependencies.  This gets to
>be a major issue when routines are being passed and/or return NULL.

AARRRGH!!!  *PLEASE* people -- we've been over this time after time:

1)	#define NULL 0

2)	When using NULL as a parameter, cast it to the *correct* type

The above definition of NULL will choke on systems which have
different bit patterns for different pointer types.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)

blarson@castor.usc.edu (Bob Larson) (07/10/87)

In article <2365@bunker.UUCP> garys@bunker.UUCP (Gary M. Samuelson) writes:
Paraphase:  Managers frequently don't see time spend writing documentation
as progress to the goal of a completed project.

True.  This also applies to reading documentation, only you don't have
any physical work to point to after you are done.  The people who
can't see spending the time reading new manuals as they become
available tend to be the same ones that treat you as a guru when you
use the knolage gained.  Even harder to directly prove is the benifits
of reading newsgroups such as comp.lang.c.  (With manuals, you can
occasionally point out things like "the answer is in the chapter on
wildcards in the primos reference manual".)

[Fortuanatly, I have an manager who knows the benifits of writing
readable code and doesn't breath over my shoulder all the time.]
--
Bob Larson		Arpa: Blarson@Ecla.Usc.Edu
Uucp: {sdcrdcf,seismo!cit-vax}!oberon!castor!blarson
"How well do we use our freedom to choose the illusions we create?" -- Timbuk3

dg@wrs.UUCP (David Goodenough) (07/10/87)

In article <228@amanue.UUCP> jr@amanue.UUCP (Jim Rosenberg) writes:
>It's hard for me to fathom why the idea of assignment as an operator causes
>such consternation.  It's obviously due to the fact that most progammers
>learned some other language before they learned C (so did I for that matter)
>and you get caught in a rut where it becomes hard to escape the habit that
>assignment is a statement, and so must go on a line by itself.  Requiring
>assingments to be on a line by themselves is no more necessary to code
>clarity than adding dozens of useless temporary variables to arithmetic
>expressions because you feel you should never be faced with the sight of
>3 right parentheses in a row.

_WELL_SAID_!! I agree totally - I grew up (god forbid) on a mixture of
BASIC, FORTRAN, and ALGOL, and when I finally latched onto how much fun
I could have with imbedded assignments, I've never looked back since.
It _IS_ necessary to use a little caution, because they can be overdone

	a = (ptr = expr[i = 3])->foo;

kind of thing (!!), but when used in moderation I find them _VERY_ useful
from the point of readability.

For those of you that don't agree - I've got my asbestos suit and fire
extinguisher ready :-)
--
		dg@wrs.UUCP - David Goodenough

					+---+
					| +-+-+
					+-+-+ |
					  +---+

mat@mtx5a.COM (m.terribile) (07/11/87)

>    "the alternative"!  Come on there are a thousand and one ways to code
> the semantics of the above loop.  For example :
> 
> 	c = getchar();
> 	while (c != EOF)
> 	    {
> 	    .
> 	    .
> 	    .
> 	    c = getchar();
> 	    }
> 
>   Look, no assignments in the conditionals, no hidden gotos (break). 
> For the people that argue the first form is bad, this would 
> probably be the approach they would take.

C'mon!  This is everything that is wrong with Pascal.  The data for the
first pass through the loop are gathered by operation X in location A.
The data for the second pass of the loop are gathered by identical operation
X in widely separated location B.  Give us a break, fer pitsaeks!
-- 

	from Mole End			Mark Terribile
		(scrape .. dig )	mtx5b!mat
					(Please mail to mtx5b!mat, NOT mtx5a!
						mat, or to mtx5a!mtx5b!mat)
					(mtx5b!mole-end!mat will also reach me)
    ,..      .,,       ,,,   ..,***_*.

peter@sugar.UUCP (Peter DaSilva) (07/11/87)

> defined.  Now, if you're dealing with a mentality that would
> do something like
> 	#define TRUE 0		/* don't try this at home, kids */
> then you've got major problems, and it wouldn't help you if that
> misguided individual had avoided typedefs and/or #defines.

I've seen this one in an otherwise mediocre-but-not-gross program
written by someone who didn't grok the return value from strcmp()
in all fullness.

Come to think of it, i did have other major problems at the time.
More of politics than programing though.
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)

zeeff@b-tech.UUCP (Jon Zeeff) (07/11/87)

In article <598@nonvon.UUCP> mc68020@nonvon.UUCP (mc68020) writes:
>
>   As an example, there is a small C program included in the news 2.11
>distribution, under the misc directory, named article.c.  I will not name

>    Frankly, I see two major classes of people foisting this crap off on the
>world:  those who are simply too lazy to do things correctly (probably far
>and away the largest group), and those whose arrogance causes them to 
>deliberate obfuscate.

I can see why you didn't use your real name in your article.  How 
about a class of people who were very generous in working on the news 
code at all and had a limited amount of "free" time to spend on it.  
They spent what time they had where they felt it was best used.  
Personally, I'm grateful.  Why don't *you* write better documenatation
for it (or quit using it if you don't like it).

-- 
Jon Zeeff           		Branch Technology, Ann Arbor, MI
seismo!umix!b-tech!zeeff  	zeeff%b-tech.uucp@umix.cc.umich.edu

--  Communication begets knowledge and knowledge begets power --

rwhite@nu3b2.UUCP (Robert C. White Jr.) (07/11/87)

I will not include the entire thing, but you ar correct to convert

while ((x = func()) == y) {...}
into
x = func()
while (x == y) {...; x = func();}
for debugging purposes.

To your question: "should I leave it that way?" I SAY NO.  The acceptability
of the expansion is required by most rules governing these things.  The
expanded version, however, may take MUCH more space when done on a
whole program.  In a sense, the structure [at object level] is poluted
by the extra call.  The entire code group, even durring debugging, is
more CONSICE and easy to deal with in the compresses form.  The aproach
to debugging the compressed form is actually easier when done from the
correct angle.  I may be a trick nobody ever explained, but the invocation
level and values expressed in the return statment are generally better
debugging points when they can be compared to the assigned-to variable's
value in the macro statment [it all gets displayed together].

The expansion should be saved for times when you are checking the algorythm
itself. [algorythm analisys should NEVER be done with a debugging tool
(there is a controversial opinion for you :-) because you will often take
value progression for granted, or miss a multi-path variant because you
couldn't massage the value correctly]  ALWAYS compress up-to but not as
far as the unreadability threshold.  You should be able to read the line
once and get the executition path the first time, if not the functional
behavior. (-: give the optomizer a break eh? ;-)

As if you really ever cared what I thought....		8-)

Robert.

Disclaimer:  My mind is so fragmented by random excursions into a
	wilderness of abstractions and incipient ideas that the
	practical purposes of the moment are often submerged in
	my consciousness and I don't know what I'm doing.
		[my employers certainly have no idea]

mouse@mcgill-vision.UUCP (der Mouse) (07/11/87)

In article <849@tjalk.cs.vu.nl>, rblieva@cs.vu.nl (Roemer b Lievaart) writes:
>>>	if( bool == TRUE )
>>> [which] is different than
>>>	if(!bool)
> Of course it's different. If bool can only be 0 or TRUE,

If it were possible to guarantee somehow that bool could take on only
the values 0 and TRUE, we'd be nearly home.  Unfortunately, one-bit
unsigned bitfields are ugly to declare (must be structure members),
uncommon, and usually inefficient.  Therefore, we are, generally, stuck
with using types that can take on more than two values.

> To avoid some misunderstandings:
> 	isdigit(), isalpha(),  etc.
> do NOT just return FALSE or "TRUE" (1)!!!
> 	if ( isalpha('a') ) puts("yes") ; else puts("no") ;
> 	if ( isalpha('a') == TRUE ) puts("yes") ; else puts("no") ;
[prints]
> yes
> no

> Do you get it? isalpha('a') is not true! $%&#?!
> It's not true, it's 2.

isalpha('a') *is* true.  It is not TRUE.  True, in C, is defined to be
non-zero.  2, therefore, is true.  That's why the first if worked
correctly, or should I say unsurprisingly.

> However, I must say I'm very against using '0' instead of NULL.
> [clarity of intended type argument]

I agree in principle.  Unfortunately, I disagree in practice.  My
reasons are strictly pragmatic: (a) too many C implementations have
NULL defined as something wrong (ie, other than plain unadorned 0), (b)
using 0 doesn't mean I have to include some .h file, and (c) using 0
makes it plain to someone trying to debug the code and/or a port of it
that the problem is *not* a mis-defined NULL.

> Our lint complains about:
> main() { bar(NULL); }
> bar(foo) char *foo; {...}

> [Lint's complaint should be <suggestion>].  Instead lint says:
> "bar, arg. 1 used inconsistently   ..."
> Which is not true.  I didn't use the argument inconsistently, I
> passed a NULL-pointer which is a correct pointer.

No you didn't, you passed the integer zero (assuming a correct
definition of NULL), which is no sort of pointer at all.  Lint is
perfectly correct.

> char *strings[] = { "one", "two", "three", NULL } ;

> This won't pass our compiler.

Then either your compiler or the definition of NULL that you're getting
is broken, I would say.  (No definitive reference comes to mind;
anybody care to produce one?)

					der Mouse

				(mouse@mcgill-vision.uucp)

mouse@mcgill-vision.UUCP (der Mouse) (07/11/87)

In article <1219@ius2.cs.cmu.edu>, edw@ius2.cs.cmu.edu (Eddie Wyatt) writes:
>>> [ stuff re syscall(...) < 0 versus syscall(...) == -1 ]
>> Almost agreed: but if a negative return code other than -1 is
>> returned the code doesn't react the same.
> I can think of no Unix system call that doesn't return -1 on error.
> So I would say that it's a pretty good bet that "if (call(...) < 0)"
> and "if (call(...) == -1)" will act the same in all cases.

( First point: this is drifting off of C and into UNIX, so I'm trying
  to move it to comp.unix.questions. )

Well, yes, all syscalls return -1 on error.  However, that is not to
say that none ever return negative values except for an error return of
-1.  In fact, I was surprised by this recently.  I was using lseek() on
/dev/kmem on a VAX, and it was (correctly) returning success values on
the order of 0x80020000, which were negative!  I had to check for -1
explicitly instead of my usual check for <0 (yes, I am in the <0 camp
as far as actual coding practice goes).

					der Mouse

				(mouse@mcgill-vision.uucp)

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/12/87)

In article <654@pilchuck.Data-IO.COM> rice@pilchuck.Data-IO.COM (Ken Rice) writes:
>	while( c=test() != EOF )
>If this code breaks, ...

Of course, it's already broken.  Presumably you meant
	while( (c = test()) != EOF )

Don't feel bad; not long after commenting on (!p) vs. (p != NULL),
I made a similar slip and wrote that (1 == 0) was precisely 1; I was
actually talking about the example (0 == 0) but I (and others) didn't
notice my typo.  It does add support to the notion that code should
be written as straightforwardly as possible; I admit that it's harder
to evaluate (1 == 0) mentally than to evaluate 1 -- indeed that's why
I was making that response in the first place.

>	c=test();
>	while(c!=EOF)
>	    {
>
>	    c=test();
>	    }

Ugh, this Pascalism is horrible!  There is conceptually a single
operation ("get next character") being performed over and over until
the operation fails (EOF).  Writing the operation in two places is
not only conceptually more difficult, it make it more likely that a
future change (perhaps from c=test() to c=GetNextChar()) will miss
one of the repeated occurrences, introducing a perhaps subtle bug.

The standard C idiom
	while ( (c = get_next()) != EOF )
		do_stuff_with( c );
(with `c' an (int), not a (char)!) may look complicated, but after one
gains experience it seems like the "obvious", natural way to write this.
Perhaps an ideal programming language would make this something like:
	until Get_Next_Character named `c' indicates No_More_Chars,
		Do_Stuff_With `c'
but that's not far from the way one should read the C idiom.

>	return(x=test());
>Then, I want to kill.

Especially since 99 times out of 100 this should have been
	return test();

>Let's write readable and MAINTAINABLE code; Code that we won't have to
>change to see how it works. If it reads well and fixes lousy, it's wrong.

I agree with the desire but dispute that the Pascalism is more fixable.

greg@utcsri.UUCP (07/12/87)

In article <47000012@uiucuxe> mcdonald@uiucuxe.cso.uiuc.edu writes:
>>> The main advantage of this idiom is for "while" statements.  The usual
>>> example is "while ((c = getchar()) != EOF) ...", which cannot be
>>If "c" is of type "char" this is still not written cleanly.  EOF is
>>type int.  The assignment into a character causes getchar's int return
>>to be changed to char, invalidating the following comparison.  The
>>effect is that characters of value 0xFF will cause erroneous end of
>>file ...
>>To do this right you need an extra int temporary value

>>	while( i = getchar(), i != EOF){
>>	    c = i;
>
>How about using the notorious comma operator TWICE:
>
>         while(i = getchar(), c = i , i != EOF)
>
>Is this correct, or would you need
>
>         while( (i = getchar(), c = i ), i != EOF )        ?
>
Both are correct, (a,b,c) is equivalent to ((a,b),c).
But WHY DO YOU NEED char c IN THE FIRST PLACE???
There is no reason to use char instead of int, except to save space in arrays
or structs. Simple auto variables should be declared as int. Using a char
instead of int in this case will rarely save space, and will often produce
slower code (grotty 8088 machines notwithstanding). Not to mention all the EOF
pitfalls we've been talking about (oops I mentioned it).  The mnemonic effect
of making the variable a 'char' because you are using it to store a character
is not really worth the trouble. If the type had been called 'byte' we would
be using it less often and having fewer problems.

>As a poor C novice I see no way around this short of putting c = i
>on a separate line.

And there is nothing wrong with that, is there?
-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

rbutterworth@orchid.UUCP (07/13/87)

In article <17466@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) writes:
> In article <8249@brl-adm.ARPA> Leisner.Henr@Xerox.COM (marty) writes:
> >#define NULL	(char *) 0
> AARRRGH!!!  *PLEASE* people -- we've been over this time after time:
> 1)	#define NULL 0

This has got to be the most discussed topic in this news group.
Every few months it comes up again and again.  Dozens of articles
state one thing, dozens of others say something else, and eventually
it always boils down to the fact that "#define NULL 0" is always
correct, "#define NULL ((void*)0)" is (unfortunately) acceptable on
some compilers, and everything else is wrong.

Considering that this is an open-and-shut case of a simple fact in the
language, isn't it amazing how much it is discussed and argued about?

Clearly there is something wrong with the concept of NULL, or there
wouldn't be such confusion.

NULL was introduced as an attempt at making the use of the literal 0
a little more obvious.  e.g. in "p=NULL;" vs. "p=0;", the use of NULL
indicates that a null-pointer is intended and so reminds the reader
that "p" is a pointer of some kind and not an integer.  But note that
by "indicates" I mean that it indicates it to a human being reading
the code, not to the compiler.  It is simply a documentation device;
it has no such special meaning in the language and to the compiler it
is simply treated the same as the literal 0.

To me, all this confusion indicates that perhaps it was a mistake
to have ever defined NULL in the first place.  Surely the number of
people that have been fooled by this simple device far exceeds the
usefulness that was originally intended.

If I were god, when I wrote K&R, I think I'd either build NULL into
the language, or I would use the following definition of NULL:
#define NULL(type) ((type)0)
In the former case "p=NULL" would work fine, but "func(NULL)" would
produce an error since there is no type for it to be coerced to.
In the latter case, the user would be forced to code "p=NULL(char*);"
or "func(NULL(int*));".

Either way there wouldn't be any of this confusion that we have now,
and will probably always have.  (Actually with ANSI officially supporting
"(void*)0", things will probably be even more confusing than they ever
were.)

I for one will never use NULL in anything I write.  I'm sure no one
could have forseen this when it was first introduced, but in retrospect
it really is a half-assed solution that has caused far more problems
than it has solved.

edw@ius2.cs.cmu.edu (Eddie Wyatt) (07/13/87)

In article <228@amanue.UUCP>, jr@amanue.UUCP (Jim Rosenberg) writes:
> > 
> > 	c = getchar();
> > 	while (c != EOF)
> > 	    {
> > 	    .
> > 	    .
> > 	    .
> > 	    c = getchar();
> > 	    }
> > 
> 
> 
> I don't mean to flame you, but as it happens I believe your solution is
> dead wrong, on two counts.  (1) The two loops are *NOT EQUIVALENT*!  They
> may be equivalent to Pascal programmers, but they certainly are not to C
> programmers.  The reason is simply that in your version you must completely
> forego the use of the continue statement.  The fact that break and continue
> are missing from Pascal is a general class disaster, whereas the fact that
> they're present in C is a constant joy.  A continue statement inside your
> loop will obviously fail to reexecute getchar().  I suppose you could get
> around that as follows:
>
> (2) Once again I must protest that your version is really less clear.  Think
> about how *pseducode* for this loop might look:
> 
> 	while (the next character is not EOF) {
> 		.
> 		. do stuff with that character
> 		.
> 	}
> 
>  Jim Rosenberg
>      CIS: 71515,124                         decvax!idis! \
>      WELL: jer                                   allegra! ---- pitt!amanue!jr
>      BIX: jrosenberg                 seismo!cmcl2!cadre! /


    
   1) Next time, don't quote me out of context!!! I say I like

      while((c = getchar()) != EOF)

      over my proposed alternative.


    2) This posting was more in response to your absolutism "the alternative".


    3) The altered loop will work for all but "gotos" and "continues"
       Also the loop could be made to work with continues if one
       made the substitution "{ c = getchar(); continue; }" for
       "continue" (what a hack).

    4) I think you have missed the points others have been trying to make.

       o. Novices do have problems with the construct of assignments
	  in conditions.

       o. it deviates from the notion that expressions do not have side
	  effects.  (call it a Pascalism if you will).


 Final comment, read the f*ck*ng article and think before you post!!!!

-- 
					Eddie Wyatt

e-mail: edw@ius2.cs.cmu.edu

terrorist, cryptography, DES, drugs, cipher, secret, decode, NSA, CIA, NRO.

mouse@mcgill-vision.UUCP (der Mouse) (07/13/87)

In article <1597@sfsup.UUCP>, mpl@sfsup.UUCP writes:
> In article <13112@topaz.rutgers.edu>, ron@topaz.rutgers.edu.UUCP writes:
>> To do this right you need an extra int temporary value
>>	while ((i = getchar()) != EOF)
>> followed by  c = i;

> Since when?  WHat's wrong with:
> 	int	c;
> 	while ((c = getchar()) != EOF) {
> 		/* use c as you would any char variable */
> 		/* because char's are promoted to int ANYWAY */
> 		/* in expressions - no need for a temp variable */
> 	}

The difference becomes significant if you ever take the address of c.
For example,

	write(fd,&c,1);

takes on an entirely different meaning.  If c is a char, this is
portable (provided sizeof(char)==1 -- are there any machines on which
this is not true?), but if c is an int you get what you probably expect
on a little-endian and it breaks mysteriously on a big-endian.

					der Mouse

				(mouse@mcgill-vision.uucp)

dsill@NSWC-OAS.arpa (Dave Sill) (07/13/87)

John Haugh <killer!jfh> wrote:
>I ask the group, [which] do you prefer?
>
>	if (access (file, 0)) {
>		fd = open (file, 0);
>		if (fd >= 0)
>			good_stuff ();
>		else
>			error_handler ();
>	} else
>		error_handler ();
>
>	- or -
>
>	if (access (file, 0) && (fd = open (file, 0)) >= 0)
>		good_stuff ()
>	else
>		error_handler ();

In general I agree with John.  When possible, complex nested
conditional structures should be rewritten as complex conditional
expressions in a single structure.

However, one must be careful to do this only when it is appropriate.
For example, in code like the above, where there are two separate
calls to error_handler() it would be preferable, if generating an
error message, to be able to identify the exact cause of the error.
If the file can't be access'd, the first structure would allow a
message like "File exists but you can't access it" (assuming the file
had already been successfully stat'd), while the second would have to
say something like "Couldn't open file."

It's important not to let the structure of the code determine the
functionality of the program (aka "the tail wagging the dog"
syndrome).  The desired functionality should be predetermined before
the code writing is done.

-Dave Sill
 dsill@nswc-oas.arpa

The opinions expressed above are those of the author and do not
necessarily reflect those of the Department of Defense or the U.S.
Navy.

cg@myrias.UUCP (Chris Gray) (07/13/87)

On the subject of 'while' loops, Doug Gwyn (gwyn@brl.arpa) suggests:

> Perhaps an ideal programming language would make this something like:
> 	until Get_Next_Character named `c' indicates No_More_Chars,
> 		Do_Stuff_With `c'

Actually, what (to me) is much better, so I've included it in a couple
of compilers I've done is:

	while
	    ch := getNextCharacter();
	    ch ~= EOF
	do
	    doStuffWith(ch);
	od;

Actually, in my latest language, I would write:

	while read(inputChannel; ch) do
	    doStuffWith(ch);
	od;

where 'read' is the usual language construct, but which can return a failure
indicator where needed.
-- 
Chris Gray		Myrias Research, Edmonton	+1 403 432 1616
	{seismo!mnetor,ubc-vision,watmath,vax135}!alberta!myrias!cg

barmar@think.uucp (Barry Margolin) (07/14/87)

In article <1219@ius2.cs.cmu.edu> edw@ius2.cs.cmu.edu (Eddie Wyatt) writes:
>    I can think of no Unix system call that doesn't return -1 on error.
>So I would say that it's a pretty good bet that "if (call(...) < 0)" and
>"if (call(...) == -1)" will act the same in all cases. Though, one should
>always consult the man pages for return values if in doubt.

The first sentence is probably correct; however, it doesn't imply the
second.  I recently fixed a bug in some network code that assumed the
two were equivalent.  It was calling the function that translates a
host name into a network address, expressed as an int.  The man page
didn't explicitly mention that many valid addresses are negative
numbers, so the programmer assumed that a <0 test would be OK.  Until
I fixed it, this program was unable to communicate with any hosts on
internet class B and C networks.

karl@haddock.ISC.COM (Karl Heuer) (07/15/87)

In article <5065@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>In article <47000012@uiucuxe> mcdonald@uiucuxe.cso.uiuc.edu writes:
>>>> The main advantage of this idiom is for "while" statements.  The usual
>>>> example is "while ((c = getchar()) != EOF) ...", which cannot be
>>>If "c" is of type "char" this is still not written cleanly ...
>>>To do this right you need an extra int temporary value.
>But WHY DO YOU NEED char c IN THE FIRST PLACE???

As the person who posted the original idiom, I'll mention here that I intended
the variable c to be type int; I left out the explicit declaration and the
#include <stdio.h> because I assumed both were well-known.  If I had known the
subject was going to change like this, I would have included a footnote.

>There is no reason to use char instead of int, except to save space in arrays
>or structs.  Simple auto variables should be declared as int.

Not entirely true; consider "while ((c=getchar()) != EOF) write(1, &c, 1);".
This is incorrect% for "int" as well as for "char", and the simplest way to
make it work is to use one of each: "char c; int i; while ((i=getchar()) != \
EOF) { c=i; write(1, &c, 1); }".

>The mnemonic effect of making the variable a 'char' because you are using it
>to store a character is not really worth the trouble.

Given the semantics of getchar(), I agree.  However, I consider getchar() to
be a botch; given proper error%% handling, getchar() *should* return a char.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
% With "int c" it happens to work on a VAX.  That doesn't make it right.
%% This includes end-of-file, although it isn't an "error" in the usual sense.

devine@vianet.UUCP (Bob Devine) (07/15/87)

In article <8249@brl-adm.ARPA>, Leisner.Henr@Xerox.COM (marty) writes:
> Also, it is bad practice to define NULL to be something memory model
> dependent.  A better practice is:
> 
> #define NULL	(char *) 0
> 
> This automatically takes care of the sizeof dependencies.  This gets to
> be a major issue when routines are being passed and/or return NULL.

  Don't do this.  For x86 C Compiler memory models, (char*) might be 2,
or 4 bytes depending on the model.  BUT, there is often mixing of models
and there are portability concerns beyond the immediate need.

  Just use 0 for NULL and cast it appropriately.

Bob Devine

chips@usfvax2.UUCP (Chip Salzenberg) (07/16/87)

John Haugh <killer!jfh> wrote:
>
>	if (access (file, 0) && (fd = open (file, 0)) >= 0)
>		good_stuff ()
>	else
>		error_handler ();
>

Just a nit:  I believe that the return value of access() will be zero if
the file does exist; thus the test should be "if (access(...) == 0" etc.

Keeping the world safe for debuggers, this is Chip Salzenberg signing off
at WBUG, Bug Radio.

-- 
Chip Salzenberg		   UUCP: "uunet!ateng!chip" ..or.. "chips@usfvax2.UUCP"
A.T. Engineering, Tampa    Fidonet: 147/40
"Use the Source, Luke!"	   My opinions do not necessarily agree with anything.

smith@COS.COM (Steve Smith) (07/16/87)

In article <2365@bunker.UUCP> garys@bunker.UUCP (Gary M. Samuelson) writes:

>...  Some of us are actually not permitted
>to "do it right."
> ...
>"Yes, documentation is important, but so-and-so customer is screaming
>for such-and-such a feature, so we can't schedule time to write anything
>but code."

Amen!!

Gary's points are well taken.  Besides managerial incompitence, there
is another problem to deal with, and that is the (all kowtow)
CUSTOMER.  As a former employee of a Beltway Bandit (suburban
Washington DC defense contractor) whose only desire is to remain
nameless, I had to constantly deal with managers who swore up and down
that "the Customer needs first rate doccumentation".  Very nice.
Except that "doccumentation" meant "doccumentation to MIL-STD-1679"
AND NOTHING ELSE.

For those of you who are mercifully unaware of this monstrosity, it
was written in the early '70's to standardize the doccumentation of
COBOL programs.  The doccumentation has two sections, variables and
code.  All data structures, variables, and subroutines must be
treated as globals.  Variables are described by little pictures
telling what the bits mean.  Defining structures in terms of other
structures is forbidden.  There is no way to represent any form of
concurrency or multitasking.  This awful stuff must be written and
approved by the Customer before you can do any coding at all
(including proof-of-concept).  Once it is approved, it cannot be
changed.

This may be OK for COBOL (I count COBOL as an unnatural act, anyway).
For FORTRAN or C, it is just marginal.  For C++, PASCAL, ADA, or any
AI language, it is impossible.

The effect of all this is to push the code away from a modular, object
oriented design toward "structured spaghetti code".  I don't care if a
program doesn't use GOTOs, if it has hundreds of teeny little modules
all calling each other with no apparent large scale structure, it's
going to be hard to figure out.  If you DO try to maintain some high
level structure through "informal design doccuments" (specifically
forbidden by most contracts) you confuse the bejesus out of everybody,
because you now have two sets of contradictory doccumentation.

The net effect of this is that Management and the Customer THINK that
they are insuring good doccumentation, while what happens is the
reverse.  After any maintenence activity at all, any structure that
might have existed is completely trashed.

Faugh!  Anybody who has ever done program maintenence knows what s/he
needs.  The exact format is unimportant.

mpl@sfsup.UUCP (M.P.Lindner) (07/17/87)

Right on!  I can't tell you how many times I've seen this:

	while ((foo & 17) || (*((struct goo *) &x[3])->blah)(97)->bar & 11)
		n++;	/* increment n */

Argh!  Or how about a 200 line source file to look at, with no mention of
where the entry points are (if there's no "main()").  static declarations
can hepl this somewhat, but LET'S COMMENT!

cramer@kontron.UUCP (Clayton Cramer) (07/18/87)

> In article <2365@bunker.UUCP> garys@bunker.UUCP (Gary M. Samuelson) writes:
> 
> >...  Some of us are actually not permitted
> >to "do it right."
> > ...
> >"Yes, documentation is important, but so-and-so customer is screaming
> >for such-and-such a feature, so we can't schedule time to write anything
> >but code."

My least pleasant experience in this vein was sometime back -- I won't
name any names -- but it's a painful example.

A company I worked for had been run by some less than scrupulous sorts
(or perhaps incompetents).  The California division had been saying,
"Yes, we have X people writing design documents for the new product.
We're making great progress."  The management team from corporate
headquarters flies out to California, and finds out that basically,
the steaming pile of badly written, pie-in-the-sky documentation for
the product was so vague and meaningless as to be useless.  The
California management runs to the startup they've been putting 
together in the meantime (on company time, I understand), and me
the peon ends up a supervisor.

Corporate's management asks, "How long will it take to build the
product?"  I say, "Well, the current design documents really don't
say anything.  I figure in about six to nine months we should have a 
functional specification and design document.  Then we'll get everyone 
to look it over, sign off, and start detail design and coding."

They look at me as though I'm pulling a fast one on them.  (Remember,
they've already invested a year and half of several people's time
writing a functional spec which is nonsense.)  They say, "You will
have to design and code in parallel."

Can you blame them?  They were already taken for a ride once -- and
put out huge chunks of money, with no real return -- at least for the
people paying the bills.  They wanted proof the engineering department
that hadn't jumped ship weren't just a bunch of Newport Beach 
slimebuckets ripping them off.

It worked better than I would have guessed -- but I guess it's just
my strong leadership and project management skills that made it work. :-)

Clayton E. Cramer

rwhite@nu3b2.UUCP (Robert C. White Jr.) (07/25/87)

In article <840@mcgill-vision.UUCP>, mouse@mcgill-vision.UUCP (der Mouse) writes:
> The difference becomes significant if you ever take the address of c.
> For example,
> 
> 	write(fd,&c,1);
> 
> takes on an entirely different meaning.  If c is a char, this is
> portable (provided sizeof(char)==1 -- are there any machines on which
> this is not true?), but if c is an int you get what you probably expect
> on a little-endian and it breaks mysteriously on a big-endian.

	Oddly enough, by definition what you have used above is a valid
construct on just about every machine I have ever seen.  Since the low
order byte of a word, and the low order word of a double-word are stored
"above" the high-order portion you get:

		+--------+--------+
		| low    | high   |
		+--------+--------+
		    ^
		    |
		This is the point indicated by the address-of operator
and if char is the size of int, it does not matter.  If char is a byte
and int is a word, the low order bits of the int corrispond to the
byte that would be union aligned of type char [etc].  By the definition
your code fragment WILL work [i.e. produce the correct result] wether
c is int or char.  [which is why the spec. tells you to use int]
	On any machines which do not use this inter-position, the use
of pointers is massaged in the memory model to produce the same net
effect.  If this arangement, or some adaptive work-around were not
built into the language, the casting between types [i.e. assigning
the value of a sufficiently small int to a char] would produce more
logical shifting and adaptive work than the actual math/assignment.
 int x; char y; x = 13 ; y = x;  would be unreasonably large.
	The only real problem comes down to potiner MATH, but as long
as aray of int/char are used consistantly that wont matter either.

	If you dont believe me, get out your DDT and take a look.
	If I am wrong..... SUE ME!


Robert.

Disclaimer:  My mind is so fragmented by random excursions into a
	wilderness of abstractions and incipient ideas that the
	practical purposes of the moment are often submerged in
	my consciousness and I don't know what I'm doing.
		[my employers certainly have no idea]

guy%gorodish@Sun.COM (Guy Harris) (07/27/87)

> 	Oddly enough, by definition what you have used above is a valid
> construct on just about every machine I have ever seen.

"valid", yes, in the sense that the C compiler won't reject it; it may
give you a warning, but it won't reject it.

However, that construct won't do what you want it to do on a
big-endian machine; therefore, since the 3B2 is a big-endian machine,
if the "3b2" in the name of the machine you're posting from indicates
what type of machine it is, you mustn't have "seen" that machine.

I took the code:

	main()
	{
		int c = 'a';

		write(1, &c, 1);
	}

and compiled it on a 3B2, ran it, and piped the output through "od -c"
to see exactly what the output was.  "od -c" printed:

	0000000  \0  \0
	0000001

(the second '\0' is an artifact of "od" working in 2-byte words).
Note that it did NOT write an 'a', but wrote a '\0'.  This is what
was in the high-order byte of the (4-byte) quantity "c".

> Since the low order byte of a word, and the low order word of a double-word
> are stored "above" the high-order portion you get:
> 
> 		+--------+--------+
> 		| low    | high   |
> 		+--------+--------+
> 		    ^
> 		    |
> 		This is the point indicated by the address-of operator

Wrong.  On big-endian machines, the *high*-order byte of a 2-byte or
4-byte quantity is the one whose address has the same bit-pattern as
the address of the quantity itself.

> 	On any machines which do not use this inter-position, the use
> of pointers is massaged in the memory model to produce the same net
> effect.

Wrong.  This is not done on any big-endian machine that I've worked
with.

> If this arangement, or some adaptive work-around were not
> built into the language, the casting between types [i.e. assigning
> the value of a sufficiently small int to a char] would produce more
> logical shifting and adaptive work than the actual math/assignment.
>  int x; char y; x = 13 ; y = x;  would be unreasonably large.

Wrong.  The code sequence for "x = 13; y = x" generated for a 68020
(another big-endian machine) by our compiler is:

	moveq	#13,d1		/* 13 */
	movl	d1,a6@(-4)	/* x = 13; "x" is at a6@(-4) */
	movb	a6@(-1),a6@(-5)	/* y = x */

The "movb" merely picks up the appropriate byte of "x" and stuffs it
into "y".  Please note that the address of "x" is <contents of a6>-4,
but the address of the appropriate byte of "x" is <contents of a6>-1,
or <address of x>+3.

No shifting or "adaptive work" (whatever THAT means) is required.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

m5@bobkat.UUCP (Mike McNally ) (07/28/87)

In article <1126@nu3b2.UUCP> rwhite@nu3b2.UUCP (Robert C. White Jr.) writes:
 >
 >	Oddly enough, by definition what you have used above is a valid
 >construct on just about every machine I have ever seen.  Since the low
 >order byte of a word, and the low order word of a double-word are stored
 >"above" the high-order portion you get:
 >
 >		+--------+--------+
 >       | low    | high   |
 >       +--------+--------+
 >           ^
 >           |
 >       This is the point indicated by the address-of operator
 >and if char is the size of int, it does not matter.  If char is a byte
 >and int is a word, the low order bits of the int corrispond to the
 >byte that would be union aligned of type char [etc].  By the definition
 >your code fragment WILL work [i.e. produce the correct result] wether
 >c is int or char.  [which is why the spec. tells you to use int]

Sorry, Robert, but I'm going to have to sue you.  On many machines this
is very much untrue.  Multi-byte primitive objects are stored

    +-------+-------+-------+       +-------+
    | MSB   | MSB-1 | MSB-2 | . . . | LSB   |
    +-------+-------+-------+       +-------+

The C language does not specify the byte ordering used for multi-byte
primitive data objects.

 >   On any machines which do not use this inter-position, the use
 >of pointers is massaged in the memory model to produce the same net
 >effect.  

Not on any I've ever used.  It's very simple and inexpensive to do
casts:

    int i;
    char c;

    i = c;

translates to (on a 680x0):

    MOV.B   c, D0
    EXT.W   D0      ; Maybe followed by an ext.l if ints are 32 bits

On *any* machine, something has to be done to set the value of the
upper eight bits of the int, so there is no difference in code.

 >       If this arangement, or some adaptive work-around were not
 >built into the language, the casting between types [i.e. assigning
 >the value of a sufficiently small int to a char] would produce more
 >logical shifting and adaptive work than the actual math/assignment.
 > int x; char y; x = 13 ; y = x;  would be unreasonably large.

Wrong wrong wrong.

 >   The only real problem comes down to potiner MATH, but as long
 >as aray of int/char are used consistantly that wont matter either.
 >
 >   If you dont believe me, get out your DDT and take a look.
 >   If I am wrong..... SUE ME!
 >
 >
 >Robert.

My lawyers will be in touch.

-- 
Mike McNally, mercifully employed at Digital Lynx ---
    Where Plano Road the Mighty Flood of Forest Lane doth meet,
    And Garland fair, whose perfumed air flows soft about my feet...
uucp: {texsun,killer,infotel}!pollux!bobkat!m5 (214) 238-7474

dg@wrs.UUCP (David Goodenough) (07/28/87)

In article <1126@nu3b2.UUCP> rwhite@nu3b2.UUCP (Robert C. White Jr.) writes:
>In article <840@mcgill-vision.UUCP>, mouse@mcgill-vision.UUCP (der Mouse) writes:
>> The difference becomes significant if you ever take the address of c.
>> For example,
>> 
>> 	write(fd,&c,1);
>> 
>> takes on an entirely different meaning.  If c is a char, this is
>> portable (provided sizeof(char)==1 -- are there any machines on which
>> this is not true?), but if c is an int you get what you probably expect
>> on a little-endian and it breaks mysteriously on a big-endian.
>
>	Oddly enough, by definition what you have used above is a valid
>construct on just about every machine I have ever seen. .....

Since the NUXI problem exists (byte order between DEC & IBM) exactly
one of the architectures will fail - in addition motorola 6800's will fail
as will 6809's, both of these put high byte first, and the 68K will fail:
in a 4 byte int the bytes are stored highest first, lowest last. On the
rare occasions I use write(fd, &c, 1) (terminal output only) I *ALWAYS*
declare c as a char - it is the only safe portable way to do it (If anyone
disagrees I've got my asbestos suit ready :-)
--
		dg@wrs.UUCP - David Goodenough

					+---+
					| +-+-+
					+-+-+ |
					  +---+

dave@murphy.UUCP (Dave Cornutt) (07/31/87)

In article <22250@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
> > When I see 
> > 
> > 	if (!p)
> > 
> > I read it as
> > 
> > 	if p is not valid then ...
> > 
> > The (!p) syntax tells me that p is among the class of items that may be
> > treated as boolean (under the C language conventions) and that we are
> > testing whether it is false.  This is not a matter of "saving characters";
> > it is a matter of classification.
> 
> But what does it mean to say that a pointer is "false"?  Pointers
> themselves really aren't Boolean; there is a boolean predicate *on* a
> pointer, namely the "is this pointer null" predicate.  

I was hoping you were going to go in a different direction here.  I
*do* like to use the constructs "if (p)" and "if (!p)"; I think that
they are quite clear in what they mean, because I do think of pointers
as being true or false: if a pointer points to something valid, then
it's a "true" pointer because you can dereference it.  If it is nil,
then it's a "false" pointer because you can't dereference it.  I think
that this is quite clear; it's not just a matter of saving a couple of
keystrokes, but that, in a large program, whenever you can say
something clearly with less text, I think it adds to the readability
of the program.  The meaning of the syntax is quite clear in both K&R
and the ANSI draft, and there should be no problems with portability
among conforming compilers.  Also, there should be no performance
difference, since any halfway decent compiler should be able to
recognize that "if (p)" and "if (p == NULL)" are the same and generate
the same code.  I do it purely because I *like* it. 

Then again, it isn't something I'm religious about.  Lots of people that
I respect write "if (p == NULL)" and I don't flame them for it.  It's
a free country, individual choice, etc...

>You could view
> the construct "p", used in a context that requires a Boolean
> expression, as really meaning "is_non_null(p)", and "!p" as meaning
> "!is_non_null(p)", or "is_null(p)".
> 
> > When I see
> > 
> > 	if (p != NULL)
> > 
> > it tells me two rather different things.  First of all it tells me that
> > p is an item for which there are one or more coded values, among which
> > is NULL, and that for all cases where p is not NULL, there is some action
> > to be taken.
> 
> All of which happen to be the case for pointers.  Are you arguing
> that "!p" is somehow better than "p != NULL"?  This is a matter of
> taste; if you do not view "!p" as shorthand for "is_null(p)", then "p
> == NULL" makes more sense as a way of writing "is_null(p)", and many
> good programmers do not view "!p" as such a shorthand.

I *do* view !p as a shorthand for p == NULL.  I could make an argument that
"p == NULL" implies that p is an enumerated type, but I really don't
believe that it does, so I won't.  

> > Secondly it tells me that the file that the statement is in
> > includes stdio.h (or that the author of the code is a dweeb.)  And that
> > should tell me that the code in this file needs stdio.h, FOR I DO NOT
> > CONSIDER IT GOOD PROGRAMMING PRACTICE TO INCLUDE INCLUDE FILES WHICH ARE
> > NOT USED.

A legitimate gripe; you run into the same if you use TRUE and FALSE.  ANSI
is going to take care of this by moving these types of things into a
separate include file which just about everything will have to include.

> Also, you didn't address the issue of
> 
> 	if (!strcmp(str1, str2))
> 

This has always been one of my pet peeves.  Even after years of C
programming experience, I still find myself sometimes looking at this
and reading it as "if str1 is not equal to str2..."  I wrote a little
function of my own which takes a string containing an operator in
between the other two arguments, so it looks something like this:

if (strcompare(str1,"==",str2))

An extremely simplistic parser decodes the operator, then strcmp is called
to actually do the compare, and the routine figures out what to return
depending on what the operator was; if the indicated operation was true,
it returns nonzero, else zero.  So,

if (strcompare(str1,"<",str2))

and

if (strcompare(str1,"!=",str2))

mean what they appear to mean.  This isn't as efficient as defining macros
to do the compare, but I think it reads a lot better.

---
"I dare you to play this record" -- Ebn-Ozn

Dave Cornutt, Gould Computer Systems, Ft. Lauderdale, FL
[Ignore header, mail to these addresses]
UUCP:  ...!{sun,pur-ee,brl-bmd,seismo,bcopen,rb-dc1}!gould!dcornutt
 or ...!{ucf-cs,allegra,codas,hcx1}!novavax!gould!dcornutt
ARPA: dcornutt@gswd-vms.arpa

"The opinions expressed herein are not necessarily those of my employer,
not necessarily mine, and probably not necessarilla

mpl@sfsup.UUCP (08/01/87)

In article <24246@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
> > 	Oddly enough, by definition what you have used above is a valid
> > construct on just about every machine I have ever seen.
> 
> "valid", yes, in the sense that the C compiler won't reject it; it may
> give you a warning, but it won't reject it.
	[deleted stuff]
> 	main()
> 	{
> 		int c = 'a';
> 
> 		write(1, &c, 1);
> 	}

So, the point is (if I remember the original article correctly) that
one should use:

	main()
	{
		int	i;
		char	c;

		while ((i = getchar()) != EOF) {
			c = i;
			write(1, &c, 1);
		}
	}

perhaps.  However, why not use the (more efficient) code:

	main()
	{
		int	i;

		while ((i = getchar()) != EOF)
			putchar(i);
	}

I mean, I think this all stemmed from an early reply to the original article
which claimed you *needed* an intermediate variable (i) no matter *what*
you wanted to do with the character.  For code like this "write" business
an intermediate (c) might be useful, but you can still do without in a
better fashion.  How about:

	#include	<sys/param.h>

	main()
	{
		int	i;

		while ((i = getchar()) != EOF)
			write(1, lobyte(loword(i)), 1);
	}

ANSI C promises to be better at providing standard macros like this as well.
So....
Happy hacking!

Mike

guy%gorodish@Sun.COM (Guy Harris) (08/02/87)

> > But what does it mean to say that a pointer is "false"?  Pointers
> > themselves really aren't Boolean; there is a boolean predicate *on* a
> > pointer, namely the "is this pointer null" predicate.  
> 
> I was hoping you were going to go in a different direction here.  I
> *do* like to use the constructs "if (p)" and "if (!p)"; I think that
> they are quite clear in what they mean, because I do think of pointers
> as being true or false: if a pointer points to something valid, then
> it's a "true" pointer because you can dereference it.  If it is nil,
> then it's a "false" pointer because you can't dereference it.

Well, you can hope for lots of things; many of them won't come true,
though, so such hope is largely a waste of energy.

I don't personally care for "if (p)" and "if (!p)", but I don't
strongly object to them either.  However, defending them by saying
that you're testing whether a pointer is "true" or "false" is not
valid.  You can say that, for pointers, they test whether the pointer
does or does not point to something valid; however, pointers are NOT
Booleans.  They do not have a truth value as their value.  There is a
boolean *predicate* on pointers, namely the "is this pointer valid"
predicate.  If you want to say that "if (!p)" means "if (p is not
valid)", fine, but this is very different from "p is false".

> Then again, it isn't something I'm religious about.  Lots of people that
> I respect write "if (p == NULL)" and I don't flame them for it.  It's
> a free country, individual choice, etc...

I don't flame them either.  (Anyone who disagrees with that statement
has serious problems with reading comprehension; such problems are,
unfortunately, quite common on USENET.)

> > > Secondly it tells me that the file that the statement is in
> > > includes stdio.h (or that the author of the code is a dweeb.) ...
> 
> A legitimate gripe; you run into the same if you use TRUE and FALSE. ...

If you are replying to an article that includes quotes from other
articles, please try to keep the replies to the various articles
separate (or, even better, make your reply to those quotes a reply to
the article in which they appeared).  Thank you.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

blm@cxsea.UUCP (Brian Matthews) (08/03/87)

In article <525@murphy.UUCP> dave@murphy.UUCP (Dave Cornutt) writes:
|In article <22250@sun.uucp>, guy%gorodish@Sun.COM (Guy Harris) writes:
|> Also, you didn't address the issue of
|> 	if (!strcmp(str1, str2))
|This has always been one of my pet peeves.  Even after years of C
|programming experience, I still find myself sometimes looking at this
|and reading it as "if str1 is not equal to str2..."  I wrote a little
|function of my own which takes a string containing an operator in
|between the other two arguments, so it looks something like this:
|
|if (strcompare(str1,"==",str2))

I usually don't like mucking up code with lots of strange macros, but one I do
find useful is something like:

#define	    strrel(s,op,t)	(strcmp((s), (t)) op 0)

where op can be any of relational operators, ==, <, etc.

So, your example would become:

if (strrel(str1,==,str2))

This avoids the overhead of an additional procedure call, and parsing "==" (or
"<", or whatever), but maintains the (in my opinion) increased readability.
-- 
Brian L. Matthews                               "A man with one watch knows
...{mnetor,uw-beaver!ssc-vax}!cxsea!blm          what time it is; a man with
+1 206 251 6811                                  two watches isn't so sure."
Computer X Inc. - a division of Motorola New Enterprises

DHowell.ElSegundo@Xerox.COM (08/04/87)

In article <525@murphy.UUCP>, Dave Cornutt <dave@murphy.uucp> writes,
>...any halfway decent compiler should be able to
>recognize that "if (p)" and "if (p == NULL)" are the same and generate
>the same code.

I'd hope that any decent compiler would recognize that they are
completely opposite.  The fact that it easy to make the above mistake is
precisely the reason why one of these constructs is not clear.  I leave
it as an exercise to the reader to decide which one.

Dan <DHowell.ElSegundo@Xerox.COM>

DISCLAIMER: The opinions expressed above may not be anyone's opinions at
all, but the random output of a bunch of monkeys on computer terminals.