[comp.lang.c] thanks for "down" answers

slores%gables.span@umigw.miami.edu (Stanislaw L. Olejniczak) (12/11/88)

Many, many, many thanks to the numerous good souls who send me detailed
explanations why my example would not work.  I am particularily indebted to the
several wonderful people who took apart the example and detailed why things
would not work.  I appologize ( I really do) for the missing parens for getchar.

I also would like to thank ALL for not broiling me too much.  I guess I deserved
to be flamed for posting such outrageous example, and I appreciate all
respondents were understanding about it.

Severeal respondents have pointed out that many compilers would NOT accept
			(char_var = getchar()) != EOF
because getchar() returns an integer, EOF may be a negative integer, and on many
compilers char variables may not accept signed integers.  I have entirely missed
that point.  This is how I was shown and taught.  I have directly asked a couple
of the especially kind respondents on their way of handling this.  If you have
an unusual excellent suggestion I would be most glad to read about it.

Again, with many thanks, and very grateful
Stan
--
Stan Olejniczak           Internet:   slores%gables.span@umigw.miami.edu
University of Miami       UUCP:       {uunet!gould}!umbio!solejni
Miami, Florida, USA       Voice:      (305)-547-6005
My opinions cannot possibly represent the views of anyone else!

guy@auspex.UUCP (Guy Harris) (12/13/88)

>Severeal respondents have pointed out that many compilers would NOT accept
>			(char_var = getchar()) != EOF
>because getchar() returns an integer, EOF may be a negative integer, and
>on many compilers char variables may not accept signed integers.

Well, actually, most compilers will accept it, which is the problem -
it'll pass the compiler without complaint, but *still* not work on
machines where "char" is unsigned.  And, frankly, it may not work on
machines where "char" is signed, either; the problem is that
"getchar()", on a machine with 8-bit bytes, can return either

	1) a value in the range 0 to 255, which represents a character
	   read from the standard input

or

	2) EOF, usually -1, which represents an end-of-file condition.

The intent is that EOF not be a value in the range 0 to 255 (some
implementation may give it such a value, but that merely means the
implementor didn't know what they were doing).

On a machine with unsigned "char"s, and 8-bit bytes, a "char" can have a
value in the range 0 to 255.  If EOF is -1, assigning EOF to a "char"
on a 2's complement machine gives the value 255, which does not compare
equal to EOF (-1).

On a machine with signed "char"s, and 8-bit bytes, a "char" can have a
value in the range -128 to 127.  If EOF is -1, assigning EOF to a "char"
gives the value -1 - but then, on a 2's complement machine, so does
assigning the value 255.  This means that if you read a character from
the file with the hex value 0xFF - which is "y with a diaresis" in ISO
Latin #1, so even in a pure text file you can have such a character - it
will look just like an EOF.

>I have entirely missed that point.  This is how I was shown and taught.

Oh dear.  Sounds like the person who taught you needs a little remedial
education; could you please point out to them that assigning the result
of "getchar()" to a "char" variable is incorrect?

>I have directly asked a couple of the especially kind respondents on
>their way of handling this.  If you have an unusual excellent
>suggestion I would be most glad to read about it.

There's only one valid suggestion, and that's to have the variable to
which the value of "getchar()" is assigned be of some signed integral
type larger than "char"; "int" is the best choice ("short" might work on
some implementations, possibly most implementations, but it's wisest not
to fool Mother Nature; "long" will work, but it's overkill and may be
inefficient).

jlh@loral.UUCP (Physically Phffft) (12/14/88)

In article <685@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
>>I have entirely missed that point.  This is how I was shown and taught.
>
>Oh dear.  Sounds like the person who taught you needs a little remedial
>education; could you please point out to them that assigning the result
>of "getchar()" to a "char" variable is incorrect?

It gets worse.  2-3 weeks ago one of my instructors decided to explain
fork, exec, and waits.  In all his examples he used wait ( (char *) 0).
I pointed out to him that wait wanted an address in which to stuff a result,
and using 0 was probably not a good idea.  His reply was 'thats how it is
in my manual', after a few minutes of discussion it got upgraded to 'I tried
it on my system and it works'.  So, Chris, Doug, and Henry, prepare yourself
for 30 or so bright and eager new programmers who will think 'wait ((char *) 0)'
is the preferred way to do things.  Coming your way this June!

								Jim

-- 
Jim Harkins		jlh@loral.cts.com
Loral Instrumentation, San Diego

djones@megatest.UUCP (Dave Jones) (12/15/88)

From article <1886@loral.UUCP>, by jlh@loral.UUCP (Physically Phffft):

...

> In all his examples he used wait ( (char *) 0).
> I pointed out to him that wait wanted an address in which to 
> stuff a result, and using 0 was probably not a good idea.  His 
> reply was 'thats how it is in my manual', after a few minutes
> of discussion it got upgraded to 'I tried it on my system and it 
> works'. 

Both perfectly valid and correct responces.

From the manual:

     #include <sys/wait.h>

     pid = wait(status)
     int pid;
     union wait *status;

     pid = wait(0)
     int pid;

[Your instructor correctly casts the 0 to a pointer-type, which
the manual omits.]

If you want the status, you pass a non-null pointer and wait
knows what to do.  If you don't want the status, you pass a
null pointer, and wait knows what *not* to do.

To paraphrase Samuel L. Clemmons, I think that you will discover
in couple of years that your instructors have learned quite a bit
in the interim. :-)

wald-david@CS.YALE.EDU (david wald) (12/16/88)

In article <1082@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
>From article <1886@loral.UUCP>, by jlh@loral.UUCP (Physically Phffft):
>> In all his examples he used wait ( (char *) 0).
>
>From the manual:
>
>     #include <sys/wait.h>
>
>     pid = wait(status)
>     int pid;
>     union wait *status;
>
>     pid = wait(0)
>     int pid;
>
>Your instructor correctly casts the 0 to a pointer-type, which
>the manual omits.

On the other hand, it's the wrong type, and there's no guarantee that
a (char *) will work any better than an int, rather than a
(union wait *).

============================================================================
David Wald                                              wald-david@yale.UUCP
						       waldave@yalevm.bitnet
============================================================================

pcg@aber-cs.UUCP (Piercarlo Grandi) (12/16/88)

In article <685@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:

	[ good reminder on why the result of getchar(3) has int length ]

    On a machine with signed "char"s, and 8-bit bytes, a "char" can have a
    value in the range -128 to 127.

This is exactly true.

    This means that if you read a character from the file with the hex value
    0xFF - which is "y with a diaresis" in ISO Latin #1, so even in a pure
    text file you can have such a character ...

This is not entirely accurate. Classic C says that chars must be able to
represent the machine's character set, whose character codes are assumed to
be *positive*:  '... it is guaranteed that a member of the standard character
set is non negative'. So a "char" variable may indeed represent negative
values, but they cannot be characters... So, EOF == -1 is guaranteed to work
indeed.  In other words, getchar(3) is not a "getbyte()".

    ... - it will look just like an EOF.

Supposedly not, because getchar(3) returns an "int", not a "char" that is
widened to an "int".

There are two points here:

[1] In Classic C *characters* (as opposed to "char" values) can only be non
    negative. There is no problem even when "char" is signed by default.

[2] In dpANS C you can explicitly control whether a "char" is signed or not,
    so there is no problem either.

[3] In any case, the result of getchar(3) is an "int", not a "char".

Admittedly, the ice is not thick here (pun on K&R :->). If one wants absolute
safety in reading *bytes* (not characters), one must use fread(3) (ugh!).
-- 
Piercarlo "Peter" Grandi			INET: pcg@cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science	UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)

guy@auspex.UUCP (Guy Harris) (12/16/88)

 >It gets worse.  2-3 weeks ago one of my instructors decided to explain
 >fork, exec, and waits.  In all his examples he used wait ( (char *) 0).
 >I pointed out to him that wait wanted an address in which to stuff a result,
 >and using 0 was probably not a good idea.  His reply was 'thats how it is
 >in my manual', after a few minutes of discussion it got upgraded to 'I tried
 >it on my system and it works'.  So, Chris, Doug, and Henry, prepare yourself
 >for 30 or so bright and eager new programmers who will think
 >'wait ((char *) 0)' is the preferred way to do things.

No, if you're not interested in the return status of the process for
which you're waiting, and you're running on a UNIX system more recent
than, say, V6, the preferred way of doing this is

	wait((int *)0)

not

	wait((char *)0)

If the manual says "char *", it's wrong.  (Yes, I know about BSD's
"union wait"; it was a dumb idea, and the BSD *kernel* still thinks it's
supposed to be a pointer to "int".)

Passing a NULL pointer is *not* an error; "wait" treats that as an
indication that it is not to return the exit status.  Aside from the
incorrect data type, your instructor was correct.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/16/88)

In article <1082@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes:
>From article <1886@loral.UUCP>, by jlh@loral.UUCP (Physically Phffft):
>> I pointed out to him that wait wanted an address in which to 
>> stuff a result, and using 0 was probably not a good idea.  His 
>> reply was 'thats how it is in my manual', after a few minutes
>> of discussion it got upgraded to 'I tried it on my system and it 
>> works'. 
>Both perfectly valid and correct responces.

Well, not really.  First of all, the manual synopsis of the null-pointer
case is wrong, if you're talking about the 4.3BSD manual as your response
indicates.  In fact, Berkeley broke the non-null case too by changing
it from pointing to an int to pointing to a "union wait".  It should be
a pointer-to-int.  (This is specified as such in IEEE Std 1003.1-1988.)

The second point is, just because something happens to work does not
mean it has been done correctly.  Correct things work, but not the
converse.  Code that "works on my system" often stops working when
ported to another system, or even when a new compiler is installed.

>[Your instructor correctly casts the 0 to a pointer-type, which
>the manual omits.]

No, he INcorrectly cast it to a char* instead of an int* (or, to try
to follow the misdefinition in the 4.3BSD manual, a union wait*).

>To paraphrase Samuel L. Clemmons, I think that you will discover
>in couple of years that your instructors have learned quite a bit
>in the interim. :-)

But not enough, apparently.

guy@auspex.UUCP (Guy Harris) (12/17/88)

>This is not entirely accurate. Classic C says that chars must be able to
>represent the machine's character set, whose character codes are assumed to
>be *positive*:  '... it is guaranteed that a member of the standard character
>set is non negative'. So a "char" variable may indeed represent negative
>values, but they cannot be characters...

Err, umm, no, they cannot be characters in the "standard character set".
This does not mean that they are not "characters"; I suspect some
speaker of whatever language uses "y with a diaresis" is likely to be
a{mused|nnoyed} by a claim that it is not a character....

>    ... - it will look just like an EOF.
>
>Supposedly not, because getchar(3) returns an "int", not a "char" that is
>widened to an "int".

The issue being discussed was why

	char c;

	if ((c = getchar()) != EOF)

is a Bad Thing regardless of whether "char"s are signed or not; on a
machine where "char"s are signed, a "y with a diaresis" character *will*
look like an EOF *in the "if" clause in question", because the "int"
result of "getchar" - having the value 255 - will get stuffed through
the "char-sized knothole" represented by "c", and will come out the
other end looking like -1, i.e. EOF, which tends to look like EOF....

>There are two points here:
>
>[1] In Classic C *characters* (as opposed to "char" values) can only be non
>    negative. There is no problem even when "char" is signed by
>    default.

Err, umm, I'm sure you're tired of having this pointed out to you, but
you need to provide a reference that demonstrates that "character" and
"member of the *standard* character set" ("italics" mine) are
equivalent; just because you *interpreted* it as meaning that doesn't
mean that *was* what it meant.  If you don't like having it pointed out
to you, provide more references; it's really that simple....

gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/17/88)

In article <411@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>Admittedly, the ice is not thick here (pun on K&R :->). If one wants absolute
>safety in reading *bytes* (not characters), one must use fread(3) (ugh!).

But fread() is defined in terms of getc().  getc() and getchar() are
required to read any value, not just those corresponding to meaningful
characters of the local character set.  There IS a possible mapping
when reading text files, in order to accommodate various line delimiter
conventions, but that applies to all the functions including fread().
Binary streams have no such mapping, and getc() on them is perfectly
safe.