slores%gables.span@umigw.miami.edu (Stanislaw L. Olejniczak) (12/11/88)
Many, many, many thanks to the numerous good souls who send me detailed explanations why my example would not work. I am particularily indebted to the several wonderful people who took apart the example and detailed why things would not work. I appologize ( I really do) for the missing parens for getchar. I also would like to thank ALL for not broiling me too much. I guess I deserved to be flamed for posting such outrageous example, and I appreciate all respondents were understanding about it. Severeal respondents have pointed out that many compilers would NOT accept (char_var = getchar()) != EOF because getchar() returns an integer, EOF may be a negative integer, and on many compilers char variables may not accept signed integers. I have entirely missed that point. This is how I was shown and taught. I have directly asked a couple of the especially kind respondents on their way of handling this. If you have an unusual excellent suggestion I would be most glad to read about it. Again, with many thanks, and very grateful Stan -- Stan Olejniczak Internet: slores%gables.span@umigw.miami.edu University of Miami UUCP: {uunet!gould}!umbio!solejni Miami, Florida, USA Voice: (305)-547-6005 My opinions cannot possibly represent the views of anyone else!
guy@auspex.UUCP (Guy Harris) (12/13/88)
>Severeal respondents have pointed out that many compilers would NOT accept > (char_var = getchar()) != EOF >because getchar() returns an integer, EOF may be a negative integer, and >on many compilers char variables may not accept signed integers. Well, actually, most compilers will accept it, which is the problem - it'll pass the compiler without complaint, but *still* not work on machines where "char" is unsigned. And, frankly, it may not work on machines where "char" is signed, either; the problem is that "getchar()", on a machine with 8-bit bytes, can return either 1) a value in the range 0 to 255, which represents a character read from the standard input or 2) EOF, usually -1, which represents an end-of-file condition. The intent is that EOF not be a value in the range 0 to 255 (some implementation may give it such a value, but that merely means the implementor didn't know what they were doing). On a machine with unsigned "char"s, and 8-bit bytes, a "char" can have a value in the range 0 to 255. If EOF is -1, assigning EOF to a "char" on a 2's complement machine gives the value 255, which does not compare equal to EOF (-1). On a machine with signed "char"s, and 8-bit bytes, a "char" can have a value in the range -128 to 127. If EOF is -1, assigning EOF to a "char" gives the value -1 - but then, on a 2's complement machine, so does assigning the value 255. This means that if you read a character from the file with the hex value 0xFF - which is "y with a diaresis" in ISO Latin #1, so even in a pure text file you can have such a character - it will look just like an EOF. >I have entirely missed that point. This is how I was shown and taught. Oh dear. Sounds like the person who taught you needs a little remedial education; could you please point out to them that assigning the result of "getchar()" to a "char" variable is incorrect? >I have directly asked a couple of the especially kind respondents on >their way of handling this. If you have an unusual excellent >suggestion I would be most glad to read about it. There's only one valid suggestion, and that's to have the variable to which the value of "getchar()" is assigned be of some signed integral type larger than "char"; "int" is the best choice ("short" might work on some implementations, possibly most implementations, but it's wisest not to fool Mother Nature; "long" will work, but it's overkill and may be inefficient).
jlh@loral.UUCP (Physically Phffft) (12/14/88)
In article <685@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes: >>I have entirely missed that point. This is how I was shown and taught. > >Oh dear. Sounds like the person who taught you needs a little remedial >education; could you please point out to them that assigning the result >of "getchar()" to a "char" variable is incorrect? It gets worse. 2-3 weeks ago one of my instructors decided to explain fork, exec, and waits. In all his examples he used wait ( (char *) 0). I pointed out to him that wait wanted an address in which to stuff a result, and using 0 was probably not a good idea. His reply was 'thats how it is in my manual', after a few minutes of discussion it got upgraded to 'I tried it on my system and it works'. So, Chris, Doug, and Henry, prepare yourself for 30 or so bright and eager new programmers who will think 'wait ((char *) 0)' is the preferred way to do things. Coming your way this June! Jim -- Jim Harkins jlh@loral.cts.com Loral Instrumentation, San Diego
djones@megatest.UUCP (Dave Jones) (12/15/88)
From article <1886@loral.UUCP>, by jlh@loral.UUCP (Physically Phffft): ... > In all his examples he used wait ( (char *) 0). > I pointed out to him that wait wanted an address in which to > stuff a result, and using 0 was probably not a good idea. His > reply was 'thats how it is in my manual', after a few minutes > of discussion it got upgraded to 'I tried it on my system and it > works'. Both perfectly valid and correct responces. From the manual: #include <sys/wait.h> pid = wait(status) int pid; union wait *status; pid = wait(0) int pid; [Your instructor correctly casts the 0 to a pointer-type, which the manual omits.] If you want the status, you pass a non-null pointer and wait knows what to do. If you don't want the status, you pass a null pointer, and wait knows what *not* to do. To paraphrase Samuel L. Clemmons, I think that you will discover in couple of years that your instructors have learned quite a bit in the interim. :-)
wald-david@CS.YALE.EDU (david wald) (12/16/88)
In article <1082@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes: >From article <1886@loral.UUCP>, by jlh@loral.UUCP (Physically Phffft): >> In all his examples he used wait ( (char *) 0). > >From the manual: > > #include <sys/wait.h> > > pid = wait(status) > int pid; > union wait *status; > > pid = wait(0) > int pid; > >Your instructor correctly casts the 0 to a pointer-type, which >the manual omits. On the other hand, it's the wrong type, and there's no guarantee that a (char *) will work any better than an int, rather than a (union wait *). ============================================================================ David Wald wald-david@yale.UUCP waldave@yalevm.bitnet ============================================================================
pcg@aber-cs.UUCP (Piercarlo Grandi) (12/16/88)
In article <685@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
[ good reminder on why the result of getchar(3) has int length ]
On a machine with signed "char"s, and 8-bit bytes, a "char" can have a
value in the range -128 to 127.
This is exactly true.
This means that if you read a character from the file with the hex value
0xFF - which is "y with a diaresis" in ISO Latin #1, so even in a pure
text file you can have such a character ...
This is not entirely accurate. Classic C says that chars must be able to
represent the machine's character set, whose character codes are assumed to
be *positive*: '... it is guaranteed that a member of the standard character
set is non negative'. So a "char" variable may indeed represent negative
values, but they cannot be characters... So, EOF == -1 is guaranteed to work
indeed. In other words, getchar(3) is not a "getbyte()".
... - it will look just like an EOF.
Supposedly not, because getchar(3) returns an "int", not a "char" that is
widened to an "int".
There are two points here:
[1] In Classic C *characters* (as opposed to "char" values) can only be non
negative. There is no problem even when "char" is signed by default.
[2] In dpANS C you can explicitly control whether a "char" is signed or not,
so there is no problem either.
[3] In any case, the result of getchar(3) is an "int", not a "char".
Admittedly, the ice is not thick here (pun on K&R :->). If one wants absolute
safety in reading *bytes* (not characters), one must use fread(3) (ugh!).
--
Piercarlo "Peter" Grandi INET: pcg@cs.aber.ac.uk
Sw.Eng. Group, Dept. of Computer Science UUCP: ...!mcvax!ukc!aber-cs!pcg
UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)
guy@auspex.UUCP (Guy Harris) (12/16/88)
>It gets worse. 2-3 weeks ago one of my instructors decided to explain >fork, exec, and waits. In all his examples he used wait ( (char *) 0). >I pointed out to him that wait wanted an address in which to stuff a result, >and using 0 was probably not a good idea. His reply was 'thats how it is >in my manual', after a few minutes of discussion it got upgraded to 'I tried >it on my system and it works'. So, Chris, Doug, and Henry, prepare yourself >for 30 or so bright and eager new programmers who will think >'wait ((char *) 0)' is the preferred way to do things. No, if you're not interested in the return status of the process for which you're waiting, and you're running on a UNIX system more recent than, say, V6, the preferred way of doing this is wait((int *)0) not wait((char *)0) If the manual says "char *", it's wrong. (Yes, I know about BSD's "union wait"; it was a dumb idea, and the BSD *kernel* still thinks it's supposed to be a pointer to "int".) Passing a NULL pointer is *not* an error; "wait" treats that as an indication that it is not to return the exit status. Aside from the incorrect data type, your instructor was correct.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/16/88)
In article <1082@goofy.megatest.UUCP> djones@megatest.UUCP (Dave Jones) writes: >From article <1886@loral.UUCP>, by jlh@loral.UUCP (Physically Phffft): >> I pointed out to him that wait wanted an address in which to >> stuff a result, and using 0 was probably not a good idea. His >> reply was 'thats how it is in my manual', after a few minutes >> of discussion it got upgraded to 'I tried it on my system and it >> works'. >Both perfectly valid and correct responces. Well, not really. First of all, the manual synopsis of the null-pointer case is wrong, if you're talking about the 4.3BSD manual as your response indicates. In fact, Berkeley broke the non-null case too by changing it from pointing to an int to pointing to a "union wait". It should be a pointer-to-int. (This is specified as such in IEEE Std 1003.1-1988.) The second point is, just because something happens to work does not mean it has been done correctly. Correct things work, but not the converse. Code that "works on my system" often stops working when ported to another system, or even when a new compiler is installed. >[Your instructor correctly casts the 0 to a pointer-type, which >the manual omits.] No, he INcorrectly cast it to a char* instead of an int* (or, to try to follow the misdefinition in the 4.3BSD manual, a union wait*). >To paraphrase Samuel L. Clemmons, I think that you will discover >in couple of years that your instructors have learned quite a bit >in the interim. :-) But not enough, apparently.
guy@auspex.UUCP (Guy Harris) (12/17/88)
>This is not entirely accurate. Classic C says that chars must be able to >represent the machine's character set, whose character codes are assumed to >be *positive*: '... it is guaranteed that a member of the standard character >set is non negative'. So a "char" variable may indeed represent negative >values, but they cannot be characters... Err, umm, no, they cannot be characters in the "standard character set". This does not mean that they are not "characters"; I suspect some speaker of whatever language uses "y with a diaresis" is likely to be a{mused|nnoyed} by a claim that it is not a character.... > ... - it will look just like an EOF. > >Supposedly not, because getchar(3) returns an "int", not a "char" that is >widened to an "int". The issue being discussed was why char c; if ((c = getchar()) != EOF) is a Bad Thing regardless of whether "char"s are signed or not; on a machine where "char"s are signed, a "y with a diaresis" character *will* look like an EOF *in the "if" clause in question", because the "int" result of "getchar" - having the value 255 - will get stuffed through the "char-sized knothole" represented by "c", and will come out the other end looking like -1, i.e. EOF, which tends to look like EOF.... >There are two points here: > >[1] In Classic C *characters* (as opposed to "char" values) can only be non > negative. There is no problem even when "char" is signed by > default. Err, umm, I'm sure you're tired of having this pointed out to you, but you need to provide a reference that demonstrates that "character" and "member of the *standard* character set" ("italics" mine) are equivalent; just because you *interpreted* it as meaning that doesn't mean that *was* what it meant. If you don't like having it pointed out to you, provide more references; it's really that simple....
gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/17/88)
In article <411@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >Admittedly, the ice is not thick here (pun on K&R :->). If one wants absolute >safety in reading *bytes* (not characters), one must use fread(3) (ugh!). But fread() is defined in terms of getc(). getc() and getchar() are required to read any value, not just those corresponding to meaningful characters of the local character set. There IS a possible mapping when reading text files, in order to accommodate various line delimiter conventions, but that applies to all the functions including fread(). Binary streams have no such mapping, and getc() on them is perfectly safe.