[comp.lang.c] EOF considered harmful

ian@r6.uucp (Ian Cottam) (10/22/89)

Some observations on the little program below (following some recent
discussions in this group):
_______________________
#include <stdio.h>

int
main()
{
	char ch;
	while ( ! feof(stdin) ) {
		ch= getchar();
		putchar(ch);
	}

	return 0;
}
______________________

1) This program runs as quickly as the ``((ch= getchar()) != EOF)''
   version (on my SUN3 with gcc).

2) Although in this specific example the variable ch is redundant, I
   include it to show that declaring it to be a char is quite sensible
   given the feof() test.  Thus a common error in C code -- a colleague
   of mine even found this error in K&R first edition -- the sign extension
   (or not) character comparison with EOF is avoided.

3) The, to my mind, awful idiom in 1) above is avoided, with the extra
   benefit that the ``(ch= getchar() != EOF)'' slip up will not be made.

4) This code is completely portable to implementations that have:
		sizeof(char) == sizeof(int)

5) Although, to my mind, the above is a compelling argument to abandon the
   explicit test for EOF, everyone reading this newsgroup (except me of
   course :-) ) will ignore it!
-----------------------------------------------------------------
Ian Cottam, Room IT101, Department of Computer Science,
University of Manchester, Oxford Road, Manchester, M13 9PL, U.K.
Tel: (+44) 61-275 6157         FAX: (+44) 61-275-6280
Internet: ian%cs.man.ac.uk@nss.cs.ucl.ac.uk   
JANET: ian@uk.ac.man.cs    UUCP: ..!mcvax!ukc!man.cs!ian
-----------------------------------------------------------------

-----------------------------------------------------------------
Ian Cottam, Room IT101, Department of Computer Science,
University of Manchester, Oxford Road, Manchester, M13 9PL, U.K.
Tel: (+44) 61-275 6157         FAX: (+44) 61-275-6280
Internet: ian%cs.man.ac.uk@nss.cs.ucl.ac.uk   
JANET: ian@uk.ac.man.cs    UUCP: ..!mcvax!ukc!man.cs!ian
-----------------------------------------------------------------

peter@ficc.uu.net (Peter da Silva) (10/24/89)

> 	while ( ! feof(stdin) ) {
> 		ch= getchar();
> 		putchar(ch);
> 	}

> 5) Although, to my mind, the above is a compelling argument to abandon the
>    explicit test for EOF, everyone reading this newsgroup (except me of
>    course :-) ) will ignore it!

The fact that it appends a 0xFF byte to the output is, of course, quite
irrelevant.

I did exactly the same thing (though in a much more complex loop) about
eight years ago. It didn't take more than 15 minutes to find it, but I'd
already posted the program on a local CP/M bulletin board so it took a
little longer to get over the embarrasment. If CP/M didn't use ^Z as EOF
it would have been a much quicker fix.
-- 
Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation.
Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-'
"I feared that the committee would decide to go with their previous        'U`
 decision unless I credibly pulled a full tantrum." -- dmr@alice.UUCP

ark@alice.UUCP (Andrew Koenig) (10/24/89)

In article <266@m1.cs.man.ac.uk>, ian@r6.uucp (Ian Cottam) writes:

> Some observations on the little program below (following some recent
> discussions in this group):
> 
> #include <stdio.h>

> int
> main()
> {
> 	char ch;
> 	while ( ! feof(stdin) ) {
> 		ch= getchar();
> 		putchar(ch);
> 	}
> 	return 0;
> }

[several reasons that this program is a better way
 of testing for end of file than  ((ch=getchar()) != EOF),
 ending in the following:]

> 5) Although, to my mind, the above is a compelling argument to abandon the
>    explicit test for EOF, everyone reading this newsgroup (except me of
>    course :-) ) will ignore it!

The trouble, of course, is that this program doesn't work!

The feof() test determines whether a call to getc() has already returned
EOF, not whether the next call to getc() will.  Thus the last time
through the loop, the feof() test will return `false,' the getchar()
call will return EOF, and the call to putchar() will write out a
single extra character whose value is the truncation of EOF.

I agree that the above is a compelling argument, but I suspect we
may not have quite the same view of its direction.
-- 
				--Andrew Koenig
				  ark@europa.att.com

tps@chem.ucsd.edu (Tom Stockfisch) (10/24/89)

In article <266@m1.cs.man.ac.uk> ian@r6.UUCP (Ian Cottam) writes:
>	char ch;
>	while ( ! feof(stdin) ) {
>		ch= getchar();
>		putchar(ch);
>	}
>1) This program runs as quickly as the ``((ch= getchar()) != EOF)''
>   version (on my SUN3 with gcc).

The problem with this version is that feof() does
not test for an error condition on stdin, whereas
getc() will return EOF on i/o error.  So your
program might loop infinitely if there is
an i/o error.

The above code fragment will also fill up the disk
with (char)EOF if chars are unsigned.
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

ian@r6.uucp (Ian Cottam) (10/24/89)

In article <266@m1.cs.man.ac.uk> ian@r6.UUCP I (Ian Cottam) write:
>Some observations on the little program below (following some recent
>discussions in this group):
>_______________________
>#include <stdio.h>
>
>int
>main()
>{
>	char ch;
>	while ( ! feof(stdin) ) {
>		ch= getchar();
>		putchar(ch);
>	}
>
>	return 0;
>}
>______________________
Whoops! Red face!  Is this any better (he says cautiously)?
_______
#include <stdio.h>

int
main()
{
	char ch;
	for(;;) {
		ch= getchar();
		if ( feof(stdin) || ferror(stdin) )
			break;
		else
			putchar(ch);
	}

	return 0;
}
_________


-----------------------------------------------------------------
Ian Cottam, Room IT101, Department of Computer Science,
University of Manchester, Oxford Road, Manchester, M13 9PL, U.K.
Tel: (+44) 61-275 6157         FAX: (+44) 61-275-6280
Internet: ian%cs.man.ac.uk@nss.cs.ucl.ac.uk   
JANET: ian@uk.ac.man.cs    UUCP: ..!mcvax!ukc!man.cs!ian
-----------------------------------------------------------------

frank@zen.co.uk (Frank Wales) (10/24/89)

In article <266@m1.cs.man.ac.uk> ian@r6.UUCP (Ian Cottam) writes:
>Some observations on the little program below (following some recent
>discussions in this group):
>_______________________
>#include <stdio.h>
>
>int
>main()
>{
>	char ch;
>	while ( ! feof(stdin) ) {
>		ch= getchar();
>		putchar(ch);
>	}
>
>	return 0;
>}

Unfortunately, this program may not work correctly.  The manual for
feof(3) states:

  "Feof returns non-zero when EOF has *previously* been detected reading
   the named input stream, otherwise zero".  [my emphasis]

This causes one extra getchar()/putchar() cycle to be executed after EOF
before the program stops, which in turn results in passing putchar() something
that probably doesn't look like EOF any more, not that putchar() has to
cope with being handed EOF in any case.  Pascal programmers beware.  :-)

Running this program on HP-UX 3.1 and SunOS 4.0.1, I always get at least
one extra char at EOF, usually a space.

Alternative versions of the above could be:

    #include<stdio.h>
    main()
    {
      char c=getchar();

      while (!feof(stdin))
      {
	(void)putchar(c);
	c=getchar(c);
      }
      return 0;
    }

or, according to taste:

    #include<stdio.h>
    main()
    {
      for(;;)
      {
	char c=getchar();
	if (feof(stdin))
	  return 0;
	(void)putchar(c);
      }
    }

On the very odd occasion when I have used feof() in a loop condition, I have
always ensured that it was valid to check it, if necessary by throwing in

    ungetc(getc(stream),stream);  /* grungy hack */

ahead of it.  I haven't done this too often, though, since it is rare that
I need to test for end of file on a stream separate from reading the
stream itself, and rarer that I can't rearrange the logic to avoid it.
--
Frank Wales, Systems Manager,        [frank@zen.co.uk<->mcvax!zen.co.uk!frank]
Zengrange Ltd., Greenfield Rd., Leeds, ENGLAND, LS9 8DB. (+44) 532 489048 x217

condict@cs.vu.nl (Michael Condict) (10/25/89)

In article <266@m1.cs.man.ac.uk> ian@r6.UUCP (Ian Cottam) writes:
| Some observations on the little program below (following some recent
| discussions in this group):
| _______________________
| #include <stdio.h>
| 
| int
| main()
| {
| 	char ch;
| 	while ( ! feof(stdin) ) {
| 		ch= getchar();
| 		putchar(ch);
| 	}
| 
| 	return 0;
| }
| ______________________
| 
| 1) This program runs as quickly as the ``((ch= getchar()) != EOF)''
|    version (on my SUN3 with gcc).
| 
| 2) Although in this specific example the variable ch is redundant, I
|    include it to show that declaring it to be a char is quite sensible
|    given the feof() test.  Thus a common error in C code -- a colleague
|    of mine even found this error in K&R first edition -- the sign extension
|    (or not) character comparison with EOF is avoided.
| 
| 3) The, to my mind, awful idiom in 1) above is avoided, with the extra
|    benefit that the ``(ch= getchar() != EOF)'' slip up will not be made.
| 
| 4) This code is completely portable to implementations that have:
| 		sizeof(char) == sizeof(int)
| 

It is hard to argue any of the above (4) points, either for or against, since
the program is just wrong.  The feof test indicates whether EOF has PREVIOUSLY
been encountered in stdin.  It does not mean that it is safe to read a char.
This program always produces an extra character of output (EOF truncated to
a char) that wasn't in the input.

The program does, however, nicely display another disadvantage of an
eof-testing predicate: it is all too easy for the eof test to be "out of sync"
with the call to the input function.  This happens to beginner Pascal
programmers all the time.  It cannot happen with an in-band EOF value.

Michael Condict		condict@cs.vu.nl
Vrije University
Amsterdam
-- 
Michael Condict		condict@cs.vu.nl
Vrije University
Amsterdam

dg@lakart.UUCP (David Goodenough) (10/27/89)

ian@r6.UUCP (Ian Cottam) sez:
> 3) The, to my mind, awful idiom in 1) above is avoided, with the extra
>    benefit that the ``(ch= getchar() != EOF)'' slip up will not be made.

	while (EOF != (ch = getchar()))
	  putchar(ch);

NOW try leaving the parentheses off the (ch = getchar()) bit. As a side
note (and as has already been observed) the general construct:

	if (CONSTANT == variable)

will always be safer than:

	if (variable == CONSTANT)

because CONSTANT isn't an lvalue, so the compiler complains if you put
= instead of ==
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com			  +---+