[net.unix-wizards] getc

FIRTH%TARTAN@CMU-CS-C.ARPA (05/24/84)

In all conscience,

	while ( (c = getc()) != EOF )

ought to work.  If somebody is to be blamed, it is surely not the
people who wrote the code, but the people who made a C implementation
that broke it.

-------

guy@rlgvax.UUCP (05/26/84)

> In all conscience,

> 	while ( (c = getc()) != EOF )

> ought to work.  If somebody is to be blamed, it is surely not the
> people who wrote the code, but the people who made a C implementation
> that broke it.

Assuming you're referring to the case where "c" was declared as "char" and
it didn't work, the code was incorrect.  "getc" is documented as returning
an "int".  The reason is that it is desirable that it can return all possible
values that fit into a "char" (in the manual page it says "Getc returns the
next character (i.e., byte)), but if it returned a "char" there would be
no distinguished value which would indicate EOF.  It "ought to work" only
if 1) it is defined not to work except on 7-bit ASCII text files (or, at least,
files not containing the character '\0', or '\377', or '\351', or whatever
your choice for EOF is) or 2) it is defined as returning an "int", so that
in addition to all possible one-byte values it can also return a distinguished
value for EOF.  Consider this as possibly a weak vote for languages in
which a procedure (or expression; "getc" is a macro in UNIX) can return a
success/failure indication as well as a value.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

nather@utastro.UUCP (Ed Nather) (05/27/84)

[]
>
>In all conscience,
>
>	while ( (c = getc()) != EOF )
>
>ought to work.  If somebody is to be blamed, it is surely not the
>people who wrote the code, but the people who made a C implementation
>that broke it.

It will work if "c" is declared "int."
It will not work if "c" is declared "char."

Variable declarations are an essential part of the program, and should be
included in illustrative code fragments, so problems are not concealed.

Grumph.

-- 
                                 Ed Nather
                                 {allegra,ihnp4}!{ut-sally,noao}!utastro!nather
                                 Astronomy Dept., U. of Texas, Austin

ken@turtlevax.UUCP (05/29/84)

Beware that an alternative test for end-of-file doesn't seem to work on
4.2bsd like it did on 4.1 and before.  I am referring to feof():

	while (!feof(stdin)) putchar(getchar());

does not work.  It seems that the EOF indicator does not come on until
the EOF marker has been read. Previous versions of the standard I/O
library set the EOF flag if the last character has been read and the
next one will be and EOF.  TO run correctly on 4.2, one needs to do:

	while ((i = getchar()) != EOF) putchar(i);

or

	for (i = getchar(); !feof(stdin); i = getchar()) putchar(i);

-- 
Ken Turkowski @ CADLINC, Palo Alto, CA
UUCP: {amd70,decwrl,flairvax}!turtlevax!ken

chris@basser.SUN (Chris Maltby) (05/30/84)

[]
> >
> >In all conscience,
> >
> >	while ( (c = getc()) != EOF )
> >
> >ought to work.  If somebody is to be blamed, it is surely not the
> >people who wrote the code, but the people who made a C implementation
> >that broke it.
> 
> It will work if "c" is declared "int."
> It will not work if "c" is declared "char."
> 
> Variable declarations are an essential part of the program, and should be
> included in illustrative code fragments, so problems are not concealed.
> 
>                                  Ed Nather

WRONG! The code above will work if c is int or char.
Char variables are promoted to int in expressions (see C manual)
and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
different (Any C implementors there? (kvm?)). 

Chris Maltby
University of Sydney

opus@drutx.UUCP (ShanklandJA) (05/30/84)

(sigh.)

> > >	while ( (c = getc()) != EOF )
> > >
> > >ought to work.  If somebody is to be blamed, it is surely not the
> > >people who wrote the code, but the people who made a C implementation
> > >that broke it.
> > 
> > It will work if "c" is declared "int."
> > It will not work if "c" is declared "char."
> > 
> 
> WRONG! The code above will work if c is int or char.
> Char variables are promoted to int in expressions (see C manual)
> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different (Any C implementors there? (kvm?)). 
> 
> Chris Maltby
> University of Sydney

But it is not defined whether char is a signed of unsigned type in C.
On machines where char is unsigned, c will never have the value -1,
and the comparison with EOF will always fail.

All this is quite clearly described on page 40 of K&R.

Jim Shankland
..!ihnp4!druxy!opus

ark@rabbit.UUCP (Andrew Koenig) (05/30/84)

>>> 
>>> In all conscience,
>>> 
>>> 	while ( (c = getc()) != EOF )
>>> 
>>> ought to work.  If somebody is to be blamed, it is surely not the
>>> people who wrote the code, but the people who made a C implementation
>>> that broke it.
>> 
>> It will work if "c" is declared "int."
>> It will not work if "c" is declared "char."
>> 
>> Variable declarations are an essential part of the program, and should be
>> included in illustrative code fragments, so problems are not concealed.
>> 
>>                                  Ed Nather

> WRONG! The code above will work if c is int or char.
> Char variables are promoted to int in expressions (see C manual)
> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different (Any C implementors there? (kvm?)). 
> 
> Chris Maltby
> University of Sydney
> 
 
Ed Nather is right here: a char -1 is not identical to an int -1.
C isn't obligated to sign-extend characters when converting to ints,
although it is obligated to refrain from sign-extending unsigned
chars.  Getc (and getchar) return ints, not chars, and the result
returned is always non-negative (except EOF), even on those
machines that sign-extend characters.  If I write:

	char c;

	while ((c = getc (file)) != EOF) ...

I will lose on a machine that sign-extends chars as soon as I read
a char with all its bits turned on, but it will work OK if c is an
int.


			--Andrew Koenig

alan@allegra.UUCP (Alan S. Driscoll) (05/30/84)

> ... Char variables are promoted to int in expressions (see C manual)
> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different...

"Whether or not sign-extension occurs for characters is machine
dependent, but it is guaranteed that a member of the standard
character set is non-negative."

	-- C Reference Manual, September 1980

-- 

	Alan S. Driscoll
	AT&T Bell Laboratories

pedz@smu.UUCP (05/30/84)

#R:sri-arpa:-113800:smu:18600012:000:185
smu!pedz    May 30 14:32:00 1984

Would it still work if c (the variable) was declared to be a
char instead of an int.  It seems to me that there would be
a truncation/sign-extension problem (or could be).

Perry
15884

toml@druxm.UUCP (LaidigTL) (05/30/84)

> Char variables are promoted to int in expressions (see C manual)
> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different (Any C implementors there? (kvm?)). 
> 
> Chris Maltby
> University of Sydney

Jim Shankland answered this adequately, but for those who feel that only
the C Reference Manual contains truth, see page 183 (section 6.1,
"Characters and integers").  Note that of the four machines they
describe, only one sign-extends (converts a char whose leftmost bit is
one to a negative int).

		Tom Laidig
		AT&T Information Systems Laboratories, Denver
		...!ihnp4!druxm!toml

chris@umcp-cs.UUCP (05/31/84)

I beg to differ.  K&R, p. 40:

	``There is one subtle point about the conversion of characters
	to integers.  The language does not specify whether variables
	of type {\tt char} are signed or unsigned quantities.  When a
	{\tt char} is converted to an {\tt int}, can it ever produce a
	{\it negative} integer?  Unfortunately, this varies from
	machine to machine, reflecting differences in architecture.
	One some machines ({\csc pdp-11}, for instance), a {\tt char}
	whose leftmost bit is 1 will be converted to a negative
	integer (``sign extension'').  On others, a {\tt char} is
	promoted to an {\tt int} by adding zeros at the left end, and
	thus is always positive.''

(Now if you state that all {\it civlized} compilers default to signed
characters and allow {\tt unsigned char} datatypes, I will agree.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

guy@rlgvax.UUCP (Guy Harris) (06/01/84)

> (Now if you state that all {\it civlized} compilers default to signed
> characters and allow {\tt unsigned char} datatypes, I will agree.)

Well, on some machines supporting signed characters is painful; if the
machine's byte manipulation instructions don't extend the sign bit, a
program with "char" could involve more instructions than one involving
"unsigned char".  (Always using "unsigned char" isn't a fix, either; on
some machines (like the PDP-11), "unsigned char" requires more code than
"char".)  (Our machines all have signed characters, by the way.)

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

guido@mcvax.UUCP (Guido van Rossum) (06/01/84)

>	while (!feof(stdin)) putchar(getchar());
>
>does not work.  It seems that the EOF indicator does not come on until
>the EOF marker has been read. Previous versions of the standard I/O
>library set the EOF flag if the last character has been read and the
>next one will be and EOF.

How *could* this ever have worked under UNIX???  Remember that the input
can be a pipe.  You only know there's no more data when a READ system
call returns <= 0.

--
	Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam
	guido @ mcvax

bsafw@ncoast.UUCP (Brandon Allbery) (06/04/84)

	The local "lint" tells me that ((ch = getc ()) != EOF) is illegal on
IBM-based Cs.  This fits in with the (assumption) that an IBM/370 C would use
EBCDIC, NOT ASCII, and all 8 bits of the character data are significant, so
the ONLY way to trap an EOF would be the feof () function.  (OK, for normal
text files, 0xff is not normally used, but they may have thought they were
stretching it.  They don't use 0x0a either, usually, although I've never run
Unix on an IBM.)

-- 
--------------------------------------------------------------------------------

						Brandon Allbery
						decvax!cwruecmp!ncoast!bsafw
"...he himself being one universe's prime	MCI MAIL: 161-7070
example of utter, rambunctious free will!"	USMail (core dump):
							6504 Chestnut Road
							Independence, OH 44131

jack@vu44.UUCP (Jack Jansen) (06/04/84)

With the PR1ME c-compiler, chars are unsigned, and they have
their parity bit on(!!). This means that 
    while( (c=getc())!= EOF)
doesn't work, since the (c=getc()) is not sign-extended to
an integer, but just zero padded. As soon as I found this
out I worked myself through the C manual, but this behavior 
doesn't seem to violate the standard.....
	Jack, {philabs|decvax}!mcvax!vu44!jack

keesan@bbncca.ARPA (Morris Keesan) (06/04/84)

----------------------------

> > >	  while ( (c = getc()) != EOF )
> > 
> > It will work if "c" is declared "int."
> > It will not work if "c" is declared "char."
> > 
> >                                  Ed Nather
> 
> WRONG! The code above will work if c is int or char.
> Char variables are promoted to int in expressions (see C manual)
> and a char -1 is IDENTICAL with an int -1. Unsigned char c could be
> different (Any C implementors there? (kvm?)). 
> 
> Chris Maltby
> University of Sydney

  1) When saying things like "See C manual", it would sure help if people would
     give references -- preferably section numbers or page numbers.
  2) The reference missing above is section 6.6, "Arithmetic conversions", on
     page 184 of Kernighan and Ritchie:

	A GREAT MANY operators cause conversions . . . called the "usual
	arithmetic conversions."

	First, any operands of type char . . . are converted to int.

    (Emphasis mine -- there are some expressions where the usual arithmetic
    conversions don't apply; above, they apply to !=, but NOT to = ).
  3) From section 6.1 of the C manual (page 183 of K&R): 

	Whether or not sign-extension occurs for characters is machine
	dependent . . .  Of the machines treated by this manual, only the
	PDP-11 sign-extends.

     On many machines, char and unsigned char are equivalent.  On these
     machines, (char)-1 and (int)-1 are very different.  C provides no
     way to specify 'signed char' (a shortcoming of the language).
  4) Even on machines that sign-extend characters, the above code is incorrect
     if c is declared "char", because it will halt not only on EOF, but also on
     (char)-1, which is a valid char.
-- 
					Morris M. Keesan
					{decvax,linus,wjh12,ima}!bbncca!keesan
					keesan @ BBN-UNIX.ARPA

ken@turtlevax.UUCP (Ken Turkowski) (06/05/84)

The number 0xff is a legal return value from getc(), and is different from -1.
Therefore, c in (c = getc(file)) should be int.
-- 
Ken Turkowski @ CADLINC, Palo Alto, CA
UUCP: {amd70,decwrl,flairvax}!turtlevax!ken

paul@ism780.UUCP (06/06/84)

#R:rlgvax:-194900:ism780:14400009:000:632
ism780!paul    Jun  4 17:21:00 1984

[Nothing happens till it happens twice.]

All the comments I have seen here on

#define EOF (-1)
      char c;
      while ( (c = getc()) != EOF )

ignore one possibility:
if chars are signed and the file being read contains a byte equal to -1,
the loop will terminate BEFORE the end-of-file is reached!  If, that is,
the compiler implements assignment expressions correctly.  The VAX System III
compiler, for one, gets it wrong.

Paul Perkins
...{uscvax|ucla-vax|vortex}!ism780!paul
...decvax!yale-co!ima!ism780!paul
"Any opinions expressed in this message are not necessarily those of any
real person, organization, or computer."

gam@proper.UUCP (Gordon Moffett) (06/07/84)

From: bsafw@ncoast.UUCP  Brandon Allbery
Organization: North Coast XENIX, Cleveland


> 	The local "lint" tells me that ((ch = getc ()) != EOF) is illegal on
> IBM-based Cs.  This fits in with the (assumption) that an IBM/370 C would use
> EBCDIC, NOT ASCII, and all 8 bits of the character data are significant, so

No, no, no!  It has nothing to do with EBCDIC; the comparison fails
because EOF is explicitly OUTSIDE of the underlying character set,
WHATEVER IT HAPPENS TO BE.  It is exactly (int)-1, and not (char)-1.

Amdahl's UTS (v7 and Sys V) runs on 370's and uses ascii anyway, but
the chars are unsigned, and that is why any comparison of a character to
-1 (EOF) is always false ... and that is where this discussion
started most recently (and I think its been beaten to death now ...).

nather@utastro.UUCP (Ed Nather) (06/10/84)

[]
    >Re the following code:
    >	char c;
    >	while ((c = getc(file)) != EOF);
    >
    >The issue is whether the value returned from the assignment operator
    >should be the value of the left or of the right hand side of the as-
    >signment.  C defines the value to be the value of the left hand side
    >of the assignment. 
    >
    >I contend that this decision was a mistake.
    >
    >				Kenneth Almquist

I disagree.  In my view, the parentheses indicate quite clearly the sequence
of operations expected:

	1. Call getc with the argument "file".
	2. Store the result into the variable location called "c".
	3. Compare that value with the value of "EOF".

The "truncate and store" operation precedes the comparison, and I would be
thoroughly confused if the sequence shown gave a different result from the
(unfolded) sequence

	c = getc(file);
	if(c != EOF)
		...

-- 

                                 Ed Nather
                                 {allegra,ihnp4}!{ut-sally,noao}!utastro!nather
                                 Astronomy Dept., U. of Texas, Austin

pedz@smu.UUCP (06/14/84)

#R:mcvax:-583600:smu:18600014:000:351
smu!pedz    Jun 14 12:40:00 1984

This is not true at all.  getch is defined to return a value which
is not a legal character value upon encountering EOF.  This value
is what is defined as EOF in the stdio.h.  Although it is generally 
true that EOF is -1 it is not defined to be such by the language and
any program which tests for -1 instead of EOF is wrong!

Perry
convex!smu!pedz

guido@mcvax.UUCP (06/15/84)

[This discussion is getting silly.  Trying to stamp out one more
 misundelstanding...]

Someone suggests that a -1 byte in the file can be promoted to EOF.

No it can't, for the simple reason that getc() is defined as returning
an *int* in the range 0..255.  My v7 manual doesn't state this, but just
have a short look at the definition of getc() in /usr/include/stdio.h!

Surely this was intended; the designers of the package were very well
aware of what they were doing: the do warn that the value EOF (-1) returned
by getw() [did you know that it existed?! ever used it?] can be a perfectly
valid integer.

Of course, when assigned to a signed char variable, the values in the range
128..255 become negative; but only then.

--
	Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam
	guido @ mcvax

jim@ism780.UUCP (06/21/84)

#R:mcvax:-583600:ism780:14400012:000:1710
ism780!jim    Jun 12 22:10:00 1984

> [This discussion is getting silly.  Trying to stamp out one more
>  misunderstanding...]
> 
> Someone suggests that a -1 byte in the file can be promoted to EOF.
> 
> No it can't, for the simple reason that getc() is defined as returning
> an *int* in the range 0..255.  My v7 manual doesn't state this, but just
> have a short look at the definition of getc() in /usr/include/stdio.h!
> 
> Surely this was intended; the designers of the package were very well
> aware of what they were doing: the do warn that the value EOF (-1) returned
> by getw() [did you know that it existed?! ever used it?] can be a perfectly
> valid integer.
> 
> Of course, when assigned to a signed char variable, the values in the range
> 128..255 become negative; but only then.

I guess people so blinded by their desire to be more right than the other guy
that they insist on misreading and misinterpreting everything.
The last sentence indicates that you have all the info to realize that
the code under discussion {char c; while ((c = getchar()) != EOF) ...}
promotes -1's in the input to EOF on sign-extending machines,
yet you insist on contradicting that.  Sigh sigh sigh.

-- Jim Balter, INTERACTIVE Systems (ima!jim)


P.S.  For a really good joke, check out the treatment of putw.
      According to some version of the manual,

DIAGNOSTICS
     These functions return the	constant EOF upon error.  Since	this is	a good
     integer, ferror(3s) should be used to detect putw errors.

However, not only does putw not return its argument, so you shouldn't need
ferror to detect an error, but, contrary to the documentation, it does not
return EOF upon error; rather it returns the value of ferror() (0 or 1)
in any case.