[comp.unix.internals] Checking return values

scs@adam.mit.edu (Steve Summit) (10/25/90)

Jon Kamens mentioned a good example of why checking the return
values of ALL system calls (e.g. close), even when you think they
"can't fail," is a good idea.  A few people have disputed this,
choosing to label filesystems (e.g. remote ones) which can return
an error on close "broken."  Evidently the wish is to retain the
notion that close "can't fail."

As Boyd Roberts and others have correctly pointed out, the best
policy is to check for errors from all routines which can return
them, system calls and otherwise.  An interface specification is
a contract between implementer and caller.  If it says "this
routine returns -1 and sets errno on error," then the most robust
code will dutifully check for the error condition, even if the
programmer "knows" that the implementation currently in use
"never fails."  Those words on the specification mean that some
later reimplementor is perfectly within rights to add new error
returns.  (If, on the other hand, the routine were declared as
returning void, or without any other error indication, a later
reimplementor would not have this option, and the calling code
would be justified in not checking.)  Remember that binary
compatibility means that the operating system can change out from
under your program without your even recompiling or relinking, so
it is particularly important to anticipate changes in system call
returns.

You can't even blame the man page for not being updated to
reflect new errno values (e.g. EDQUOT for close when remote
filesystems were introduced).  Looking at a man page and deciding
that you don't care about any of the errno values listed there is
as bad as looking at the kernel source code and deciding that you
don't care about any of the errno values you find there.  Both
are liable to change (and the man page is likely to be incorrect
in the first place, since gleaning syscall errno values from the
kernel and keeping man pages up-to-date is drudgework that is not
likely to be done reliably).

(Remember, too, as Kristoffer Eriksson has pointed out, that
close() could conceivably fail, with EIO, under the good,
old-fashioned Unix filesystem, on a local disk, due to
buffering and write-behind.)

If there are errno values you truly don't care about, check for
them explicitly, but complain about the ones you don't not care
about (and the ones that haven't been invented yet):

	openfile(name)
	char *name;
	{
	int fd;

	if((fd = open(name, 1)) < 0)
		{
		switch(errno)
			{
			case ENOENT:
				/* doesn't exist, so create: */
				fd = creat(name, 0666);
				if(fd >= 0)
					return fd;
				/* else fall through: */

			default:
				fprintf(stderr, "%s: can't open %s: ",
							progname, name);
				perror("");
				exit(1);
			}
		}

	return fd;
	}

In earlier and simpler days, I never checked the return value of
close.  Then, when I started messing around with distributed
filesystems, I discovered that it was reasonable for close() to
fail.  I didn't complain that the distributed filesystem was
"broken" (in fact, I was writing it, so I could have "fixed" it
if it was).  I shook my finger at myself, and chastised myself
for having been disobeying the "always check error returns" rule.
I've been checking the return value from close (at least, for
fd's open for writing :-( ) ever since.

The call I've still been ignoring the return value of (shake,
shake, naughty, naughty) is fstat.  Anybody know when that can
fail, as long as the file descriptor and stat buf pointer are
valid?  Actually, don't answer that.  I'm going to listen to my
own advice and start checking that value too, even if it "can't
fail" today.

Note that all of this stuff is relative.  You're not a heathen
atheist scum if you don't check the return value of a call that
you are pretty sure can't fail.  (There are even good reasons for
not doing so, such as on systems with limited code or error
message space.)  Above I used words like "the best policy is..."
and "...the most robust code will dutifully check...".  The more
care you put into things like rigorous error return checking and
other specification details, the more robust, forward-looking,
and widely portable your code will be.  If you are not writing
production-quality code, and you leave out some checking, your
code may need tweaking later.  (Putting /* XXX */ as a reminder
in soft spots in the code is a common device.)

By the way, I've included comp.unix.programmer in the
distribution, since checking return values is something all
programmers (Unix and otherwise) need to know about.  (I missed
out on the great comp.unix.wizards shakedown and all the brouhaha
surrounding it.  What a lot of squalling!  Not that some of the
new groups aren't reasonable, but what was so wrong with
unix-wizards that it had to be removed?  Answer by mail only,
if at all, please -- I don't want to start more flames.)

                                            Steve Summit
                                            scs@adam.mit.edu

ronald@robobar.co.uk (Ronald S H Khoo) (10/25/90)

[ wizard readers: skip now.  followups redirected. ]

scs@adam.mit.edu (Steve Summit) writes:

>			default:
>				fprintf(stderr, "%s: can't open %s: ",
>							progname, name);
>				perror("");

usenet: can't open scs@adam.mit.edu: not a tty.

(Steve, I'm shocked!  *you* of all people :-)

For those of you still puzzled: perror() must be called *immediately* after the
error returns, or the internal error state may be modified by calls
to system calls in the intervening situation.  It's annoying, but

	perror(name);
	fprintf(stderr, "%s: error opening %s (see error message above)\n",
		progname, name);

is more correct.

-- 
ronald@robobar.co.uk +44 81 991 1142 (O) +44 71 229 7741 (H)

boyd@necisa.ho.necisa.oz (Boyd Roberts) (10/26/90)

In article <1990Oct24.193712.8693@athena.mit.edu> scs@adam.mit.edu (Steve Summit) writes:
>The call I've still been ignoring the return value of (shake,
>shake, naughty, naughty) is fstat.  Anybody know when that can
>fail, as long as the file descriptor and stat buf pointer are
>valid?  Actually, don't answer that.  I'm going to listen to my
>own advice and start checking that value too, even if it "can't
>fail" today.

Good idea.  The fstat (or stat) may fail when it goes to write/read
the inode times on the file-system.  The file-system could go bad.



Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''