[net.lang.c] fgets

greg@utcsri.UUCP (Gregory Smith) (04/10/86)

In article <2476@brl-smoke.ARPA> rbj@icst-cmr (Root Boy Jim) writes:
>Which brings me to another point. Fgets is worthless on binary
>data. It returns its first argument, which I already know.
>If a null is part of the data, how do you know where it stopped
>reading. Well if you're lucky, there will be a newline in there
>and that's the end of it. But if you're reading blocks of nulls,
>you're SOL. I would like fgets to return the number of chars read.
>
That's exactly what 'read' is there for, no?
Still - I agree. Even if there is  a single null in a line,
you will effectively lose everthing between that null and the next '\n'
if you read it with fgets.

I too have a question regarding fgets. fgets, as has been said, normally
stops reading at the end of a line ( after a '\n'). 
I had the following problem with EOF detection:  Suppose that the
last line in the file is "wombat soup", and this is followed by '\n' and
EOF, as is the normal case for text files. So my second-to-last call to
fgets reads "wombat soup\n" and does not set feof(infile). My last call
to fgets, however, just sets feof(infile) and returns! It didn't write
anything into the buffer. So the program saw "wombat soup\n" twice. If
the last line is *not* '\n'-terminated, the last call to fgets puts a
null-terminated "wombat soup" into the string and sets feof(infile),
which is reasonable and what I expected. So why doesn't fgets stick
a '\0' in the buffer when it sees EOF immediately? Isn't this a bug?
What I did to fix it was to set line_buffer[0]=NUL *before* calling
fgets, which is simple enough to do. Still.... grumble, grumble...

We have 4.2 BSD on vax11/780.

-- 
"If you aren't making any mistakes, you aren't doing anything".
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg

henry@utzoo.UUCP (Henry Spencer) (04/11/86)

> ...So why doesn't fgets stick
> a '\0' in the buffer when it sees EOF immediately? Isn't this a bug?
> What I did to fix it was to set line_buffer[0]=NUL *before* calling
> fgets, which is simple enough to do. Still.... grumble, grumble...

Probably the right answer to this is that you should be checking the
return value from fgets, rather than consulting feof separately.  The
semantics of the return value aren't well explained in the manual, but
the code is doing the right thing:  you get NULL back only if there was
*nothing* available on the input.  If there's a partial line at the end
of the file, you get that partial line and a non-NULL return, and then
*next* time you get a NULL.

It is widely agreed that the details of the semantics of fgets could
have been done better.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

jsdy@hadron.UUCP (Joseph S. D. Yao) (04/16/86)

In article <2524@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>I had the following problem with EOF detection:  Suppose that the
>last line in the file is "wombat soup", and this is followed by '\n' and
>EOF, as is the normal case for text files. So my second-to-last call to
>fgets reads "wombat soup\n" and does not set feof(infile). My last call
>to fgets, however, just sets feof(infile) and returns! It didn't write
>anything into the buffer. So the program saw "wombat soup\n" twice.

Good heavens, man.  Do you mean to say they don't teach you to check
your return values?  That's what they're for, after all.  The correct
paradigm is:
	char buf[...];		/* If arg or char*, can't use sizeof */
	extern char *fgets();

	while (fgets(buf, sizeof(buf), file) != (char *) NULL) {
		...
	}
This is why fgets() returns a value: the fact that a non-NULL return
is the value of buf, the usefulness of which was questioned by an
earlier writer, is just to make it something non-NULL.  (If buf is
NULL-valued, you have other problems.)

Once again, apropos another comment in the above note, fgets() is
intended for reading "standard text files," which are strings of
ASCII characters (assumed non-NUL), each "line" of which is termi-
nated by a newline (NL) character.  For anything else, one should
check whether fread() might not be a better routine to use.
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

BJORNDAS%CLARGRAD.BITNET@WISCVM.WISC.EDU (08/20/86)

O great C gurus, help a relative greenhorn!  Why is it that fgets()
returns NULL when it reaches end of file, whereas all the other
standard i/o functions seem to return EOF at that point?  This
confuses me, especially since one would suppose fputs() to be the
partner function of fgets() and therefore to work in the same way.

Sterling Bjorndahl
BJORNDAS at CLARGRAD on BITNET

bzs@bu-cs.BU.EDU (Barry Shein) (08/22/86)

fgets() returns a char * to the string read. Traditionally, any
function that cannot return a promised pointer (such as when an EOF
occurs) returns NULL (there exists a few syscalls which return ((char
*) -1) or equivalent, c'est la vie, this has been hashed out, I guess
the rule that syscalls return -1 won [eg. sbrk].)

puts() and fputs() always return the result of the last putc() done
(at least it does in 4.2/4.3bsd.) No mention of this is made in the
manual pages I have. This will be the last character of the string
unless an error occurred, in which case it will be EOF. Notice that
for puts() this will always be '\n' (or EOF on error.)

You're intuitions seem right, they should probably both return a
similar thing (ie. fputs() should probably return a pointer to the
string printed, or NULL on error, or maybe fgets() should return the
value of the last 'getc()' done, I like the former better.) Oh well,
such is history.

Of course, in the case of getc() et al, EOF makes sense as an
out-of-band character (ie. a value that cannot be a legal char,
hence distinguishable, but no relation to a pointer.)

Thus, fgets() is consistent with the idea of returning a NULL as
a failed pointer while fputs() is consistent with the documentation
in that the doc seems to promise nothing...

	-Barry Shein, Boston University

guy@sun.uucp (Guy Harris) (08/22/86)

> Why is it that fgets() returns NULL when it reaches end of file,
> whereas all the other standard i/o functions seem to return EOF
> at that point?

Because "fgets" returns a value of type "char *" while most of the other
functions and macros return a value of type "int".  EOF is not a valid value
of type "char *", so "fgets" can't return EOF.  NULL is a valid value of
type "char *", and doesn't refer to any object, so it's the proper choice
for an "out-of-band" value for "fgets" to return on error.

Now, you can ask "why does 'fgets' return a value of type 'char *'?"  at
this point.  It returns a pointer to the buffer that it just filled in;
obviously *somebody* found this useful, although I don't find it so.  If
"fgets" didn't return that pointer, it could have been defined as returning
a value of type "int" instead, and that value would have been 0 on success
and EOF on failure.  It's too late to change it, though.

BTW: "fgets()" returns NULL on end-of-file OR error; don't write code that
assumes that "fgets()" returning NULL means that the end of the file was
found, use "ferror" or "feof" to disambiguate these cases.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

cramer@kontron.UUCP (Clayton Cramer) (08/29/86)

> O great C gurus, help a relative greenhorn!  Why is it that fgets()
> returns NULL when it reaches end of file, whereas all the other
> standard i/o functions seem to return EOF at that point?  This
> confuses me, especially since one would suppose fputs() to be the
> partner function of fgets() and therefore to work in the same way.
> 
> Sterling Bjorndahl
> BJORNDAS at CLARGRAD on BITNET

Ah, heck.  Next you'll complain because fgets() and gets() do different
things to the \n at the end of a string.

There are days I wish every line of C currently existing would evaporate,
so that the I/O functions could be...rationalized.

Clayton E. Cramer

karl@haddock (09/03/86)

sun!guy (Guy Harris) writes:
>Now, you can ask "why does 'fgets' return a value of type 'char *'?"  at
>this point.  It returns a pointer to the buffer that it just filled in;
>obviously *somebody* found this useful, although I don't find it so.

Me neither.  It sounds like a "why not?" situation.

>If "fgets" didn't return that pointer, it could have been defined as
>returning a value of type "int" instead, and that value would have been 0 on
>success and EOF on failure.

Better yet, return the number of characters read (so 0 on failure).

>It's too late to change it, though.

Not without changing the name again.  (That's how fgets() evolved from
gets(), though the latter still exists.)

Karl W. Z. Heuer (ima!haddock!karl; karl@haddock.isc.com), The Walking Lint

karl@haddock (09/03/86)

bzs@bu-cs.BU.EDU (Barry Shein) writes:

>Traditionally, any function that cannot return a promised pointer ...
>returns NULL (there exists a few syscalls which return ((char *) -1) or
>equivalent, c'est la vie, this has been hashed out, I guess the rule that
>syscalls return -1 won [eg. sbrk].)

Last time I used a pdp11, there was a bug in lseek().  The error return,
which should have been (long)-1 (i.e. 0xffffffff) was 0xffff0000.  I never
determined whether this was an AT&T standard bug or a local glitch.  The
surprising thing is that lseek *was* making the check, but it explicitly
set r1 (the lower half) to 0 instead of -1!  On systems where char* is wider
than int, sbrk() could have a similar problem.

Karl W. Z. Heuer (ima!haddock!karl; karl@haddock.isc.com), The Walking Lint

guy@sun.uucp (Guy Harris) (09/04/86)

> Better yet, return the number of characters read (so 0 on failure).

Better still, return the number of characters read, but return EOF on
failure; 1) this is what "fputs" does and 2) this encourages you to
distinguish between EOF and error, something that several UNIX utilities, to
their everlasting shame, do not do.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

Bader@b.psy.cmu.edu (Miles Bader) (09/05/86)

Neither of your suggestions (# of characters, EOF or 0) makes any more sense
to me than the current behavior, which seems perfectly reasonable.  If
nothing better is being offered, why waste time arguing about it?

karl@haddock (09/08/86)

sun!guy (Guy Harris) writes:
>[haddock!karl writes:]
>>Better yet, return the number of characters read (so 0 on failure).

>Better still, return the number of characters read, or EOF;

One could argue that zero is the number of chars read, and that there is no
failure return as such (cf. fread()).  But see below.

>1) this is what "fputs" does

The successful result of fputs/puts is not mentioned in my manual; all one
can conclude is that it differs from the failure result (EOF).  X3J11 says
that the successful result is zero.

>and 2) this encourages you to distinguish between EOF and error.

Are you suggesting that the result should be EOF if end-of-file was reached,
but 0 if a read error occurred?  This is workable for gets()/puts() (except
for fputs() of an empty string), but not (e.g.) scanf(), and is probably a
bad idea in general.

My revised opinion: functions other than system calls should return NULL for
an out-of-band pointer, EOF for an OOB character, or ERROR (which should be
defined someplace) for an OOB int.  ERROR and EOF are logically different,
even if they have the same value.  Physical end-of-file, read-error, and a
legitimate result of -1 can (and should) be distinguished with feof, ferror,
and/or errno.

The ngets() function (if it gets written) should return ERROR on failure.
It can return zero only if passed a zero-length buffer.  Similarly, nputs()
should return the number of characters written, or ERROR.  (This one could
be called "puts", except that on an implementation that uses zero for puts()
success it may break programs that depend on this.)

>distinguish between EOF and error, something that several UNIX utilities,
>to their everlasting shame, do not do.

Programs should always check for failure returns, and distinguish the types
of failure when it matters (as it usually does when end-of-file is one type).
But that's a more general problem, and although there are some fairly nice
ways to solve it, they would break a lot of existing code.

Karl W. Z. Heuer (ima!haddock!karl; karl@haddock.isc.com), The Walking Lint

karl@haddock (09/09/86)

Bader@b.psy.cmu.edu writes:
>[concerning the return value of gets()/fgets()]
>Neither of your suggestions (# of characters, EOF or 0) makes any more sense
>to me than the current behavior, which seems perfectly reasonable.  If
>nothing better is being offered, why waste time arguing about it?

In the current behavior, the value on successful return is the buffer arg --
a useless value already available to the user.  The character count is not
available except by calling strlen(), which is somewhat redundant since the
library function already has the value (or enough information to construct
it in constant time).

There is a similar duplication of effort in strcpy() followed by strcat(),
for which reason I think strcpy() should've returned the END of the string
(pointer to the '\0') instead of the beginning.  I need that value more
often than I need a copy of the first argument.

Karl W. Z. Heuer (ima!haddock!karl; karl@haddock.isc.com), The Walking Lint