[comp.lang.c] getchar and EOF

scs@adam.mit.edu (Steve Summit) (04/07/91)

In article <1991Apr4.215605.2801@syssoft.com> tom@ssi.UUCP (Rodentia) writes:
>In article <3465@litchi.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>>In <3555@inews.intel.com> bhoughto@hopi.intel.com (Blair P. Houghton) writes:
>>>toupper.c:    while ( (int) (c = getchar()) != EOF )
>>The cast implies that c is char.  If so, this line is buggy.
>Does this mean that there if c is char, there is no way to assign the
>getchar and test it for EOF without having it cast down to char?

Yes.

Proper use of getchar is (as it ought to be) simple.
Any recent confusion is an unfortunate but inevitable result
of a particularly absurd line of discussion.

getchar can return any char value, plus the single, "out of band"
value EOF [note 1].  Obviously, a variable of type char cannot
hold any-char-value-or-EOF, so getchar() is specified to return,
and any variable used to hold its return value must be declared
as, an int.

If getchar's return value is assigned to a variable of type char,
or otherwise cast to char, EOF becomes indistinguishable from
some valid char value, usually '\377'.  Mapping two values onto
one is an information-losing transformation, so no amount of
casting back to int, after the fact, can restore the lost
information (i.e. distinguish EOF from that other char value).

The simple rule is, always use int variables to hold getchar's
return value.  By doing so, you almost never have to worry (or
even think) about this issue.

                                            Steve Summit
                                            scs@adam.mit.edu

[note 1]  As a return value from getchar, EOF is guaranteed to
be distinct from all char values.  This is because getchar
essentially returns (as the ANSI C standard explicitly requires
it to; see sec. 4.9.7.1) normal characters as unsigned characters
cast to int (i.e. as positive values, even on a machine on which
chars are usually signed), while EOF is always a negative value.

I was going to say that "EOF is guaranteed not to compare equal
to any char value," but this is not really true.  If you have

	signed char c = '\377';

and EOF is -1, then c == EOF will succeed.  ("signed" is a new
ANSI C keyword; the test also succeeds if c is a "plain" char, on
machines for which plain chars are signed.)

ark@alice.att.com (Andrew Koenig) (04/08/91)

In article <1991Apr7.064003.8552@athena.mit.edu> scs@adam.mit.edu writes:

> getchar can return any char value, plus the single, "out of band"
> value EOF [note 1].  Obviously, a variable of type char cannot
> hold any-char-value-or-EOF, so getchar() is specified to return,
> and any variable used to hold its return value must be declared
> as, an int.

Well, almost right.

It returns a non-negative integer or EOF.  If you are on a machine
where chars are naturally signed, getchar will happily return values
that are incapable of comparing equal to any char and are not EOF.
-- 
				--Andrew Koenig
				  ark@europa.att.com

enag@ifi.uio.no (Erik Naggum) (04/08/91)

In article <1991Apr7.064003.8552@athena.mit.edu>, Steve Summit writes:

>[note 1]  As a return value from getchar, EOF is guaranteed to
>be distinct from all char values.  This is because getchar
>essentially returns (as the ANSI C standard explicitly requires
>it to; see sec. 4.9.7.1) normal characters as unsigned characters
>cast to int (i.e. as positive values, even on a machine on which
>chars are usually signed), while EOF is always a negative value.

>I was going to say that "EOF is guaranteed not to compare equal
>to any char value," but this is not really true.

It is interesting to note that Unicode, one of the proposed new
character sets with "universal" scope and coverage, with its 16-bit
characters has recognized the value of EOF (-1) to applications and
programmers alike, and states

		Not a character code

	FFFF	This 16-bit value is guaranteed not to
		be any Unicode character at all

--
[Erik Naggum]					     <enag@ifi.uio.no>
Naggum Software, Oslo, Norway			   <erik@naggum.uu.no>

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (04/09/91)

In <1991Apr7.064003.8552@athena.mit.edu> scs@adam.mit.edu (Steve Summit) writes:

>As a return value from getchar, EOF is guaranteed to
>be distinct from all char values.

I have always assumed that EOF is guaranteed to be -1.  I think there
is enough history behind EOF == -1 (just as with NULL == 0) that it
isn't likely to be anything else.
--
Rahul Dhesi <dhesi@cirrus.COM>
UUCP:  oliveb!cirrusl!dhesi

jfc@athena.mit.edu (John F Carr) (04/09/91)

In article <3043@cirrusl.UUCP>
	dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:

>I have always assumed that EOF is guaranteed to be -1.  I think there
>is enough history behind EOF == -1 (just as with NULL == 0) that it
>isn't likely to be anything else.

The ANSI standard says EOF is a negative number.  It does not have to be -1.
If I ever write a C implementation which doesn't need to be binary
compatible with an existing UNIX library, I'll make EOF something like -256
to make sure it can never equal a signed or unsigned char value.
Traditionally, EOF is -1.  I don't know what POSIX says.

--
    John Carr (jfc@athena.mit.edu)

bhoughto@nevin.intel.com (Blair P. Houghton) (04/09/91)

In article <1991Apr7.064003.8552@athena.mit.edu> scs@adam.mit.edu writes:
>I was going to say that "EOF is guaranteed not to compare equal
>to any char value," but this is not really true.  If you have
>
>	signed char c = '\377';
>
>and EOF is -1, then c == EOF will succeed.  ("signed" is a new
>ANSI C keyword; the test also succeeds if c is a "plain" char, on
>machines for which plain chars are signed.)

This is a strong example of the oft-forgotten distinction that
`char' is a datatype that implies bytes rather than characters.

If one could assume only characters, then (barring
locale-specific features) one can assume (7-bit) ascii,
which has values only from '\0' to '\177', obviating this
confusion with `(signed char)(-1)'.  But one can't, so one
shouldn't (the ascii-only assumption is bogus at the
outset, since it ignores the loadable (8-bit) fonts of most
ANSI terminals, of which DEC VT character terminals are a
near conformant).

But like I said before, anything much more complex than
7-bit ascii usually deserves more care than getchar(3).

				--Blair
				  "Trigraphs? We don' got no trigraphs...
				   We don' need no steenking trigraphs!"

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (04/10/91)

In article <3727@inews.intel.com> bhoughto@nevin.intel.com (Blair P. Houghton) writes:
> (the ascii-only assumption is bogus at the
> outset, since it ignores the loadable (8-bit) fonts of most
> ANSI terminals, of which DEC VT character terminals are a
> near conformant).

I would phrase that as ``since it ignores the loadable (8-bit) fonts of
most VT-compatible terminals, of which ANSI terminals are a near
conformant.'' Let's keep straight who copied whose models here.

(I really wish ANSI hadn't futzed with the entirely reasonable VT
end-of-line behavior. But I'm afraid this is getting a bit far from
comp.lang.c, so followups to comp.terminals.)

---Dan

enag@ifi.uio.no (Erik Naggum) (04/12/91)

In article <1991Apr8.222824.24474@athena.mit.edu>, John F Carr writes:
   In article <3043@cirrusl.UUCP>, Rahul Dhesi writes:

   >I have always assumed that EOF is guaranteed to be -1.  I think there
   >is enough history behind EOF == -1 (just as with NULL == 0) that it
   >isn't likely to be anything else.

   The ANSI standard says EOF is a negative number.  It does not have to be -1.
   If I ever write a C implementation which doesn't need to be binary
   compatible with an existing UNIX library, I'll make EOF something like -256
   to make sure it can never equal a signed or unsigned char value.
   Traditionally, EOF is -1.  I don't know what POSIX says.

POSIX.1 points to the C standard and has no further comments on EOF.

--
[Erik Naggum]					     <enag@ifi.uio.no>
Naggum Software, Oslo, Norway			   <erik@naggum.uu.no>

gwyn@smoke.brl.mil (Doug Gwyn) (04/13/91)

In article <3727@inews.intel.com> bhoughto@nevin.intel.com (Blair P. Houghton) writes:
>But like I said before, anything much more complex than
>7-bit ascii usually deserves more care than getchar(3).

getchar() is often used to handle so-called "binary" data,
and getc() even more so.