[comp.unix.programmer] Checking return values

scs@adam.mit.edu (Steve Summit) (10/25/90)

Jon Kamens mentioned a good example of why checking the return
values of ALL system calls (e.g. close), even when you think they
"can't fail," is a good idea.  A few people have disputed this,
choosing to label filesystems (e.g. remote ones) which can return
an error on close "broken."  Evidently the wish is to retain the
notion that close "can't fail."

As Boyd Roberts and others have correctly pointed out, the best
policy is to check for errors from all routines which can return
them, system calls and otherwise.  An interface specification is
a contract between implementer and caller.  If it says "this
routine returns -1 and sets errno on error," then the most robust
code will dutifully check for the error condition, even if the
programmer "knows" that the implementation currently in use
"never fails."  Those words on the specification mean that some
later reimplementor is perfectly within rights to add new error
returns.  (If, on the other hand, the routine were declared as
returning void, or without any other error indication, a later
reimplementor would not have this option, and the calling code
would be justified in not checking.)  Remember that binary
compatibility means that the operating system can change out from
under your program without your even recompiling or relinking, so
it is particularly important to anticipate changes in system call
returns.

You can't even blame the man page for not being updated to
reflect new errno values (e.g. EDQUOT for close when remote
filesystems were introduced).  Looking at a man page and deciding
that you don't care about any of the errno values listed there is
as bad as looking at the kernel source code and deciding that you
don't care about any of the errno values you find there.  Both
are liable to change (and the man page is likely to be incorrect
in the first place, since gleaning syscall errno values from the
kernel and keeping man pages up-to-date is drudgework that is not
likely to be done reliably).

(Remember, too, as Kristoffer Eriksson has pointed out, that
close() could conceivably fail, with EIO, under the good,
old-fashioned Unix filesystem, on a local disk, due to
buffering and write-behind.)

If there are errno values you truly don't care about, check for
them explicitly, but complain about the ones you don't not care
about (and the ones that haven't been invented yet):

	openfile(name)
	char *name;
	{
	int fd;

	if((fd = open(name, 1)) < 0)
		{
		switch(errno)
			{
			case ENOENT:
				/* doesn't exist, so create: */
				fd = creat(name, 0666);
				if(fd >= 0)
					return fd;
				/* else fall through: */

			default:
				fprintf(stderr, "%s: can't open %s: ",
							progname, name);
				perror("");
				exit(1);
			}
		}

	return fd;
	}

In earlier and simpler days, I never checked the return value of
close.  Then, when I started messing around with distributed
filesystems, I discovered that it was reasonable for close() to
fail.  I didn't complain that the distributed filesystem was
"broken" (in fact, I was writing it, so I could have "fixed" it
if it was).  I shook my finger at myself, and chastised myself
for having been disobeying the "always check error returns" rule.
I've been checking the return value from close (at least, for
fd's open for writing :-( ) ever since.

The call I've still been ignoring the return value of (shake,
shake, naughty, naughty) is fstat.  Anybody know when that can
fail, as long as the file descriptor and stat buf pointer are
valid?  Actually, don't answer that.  I'm going to listen to my
own advice and start checking that value too, even if it "can't
fail" today.

Note that all of this stuff is relative.  You're not a heathen
atheist scum if you don't check the return value of a call that
you are pretty sure can't fail.  (There are even good reasons for
not doing so, such as on systems with limited code or error
message space.)  Above I used words like "the best policy is..."
and "...the most robust code will dutifully check...".  The more
care you put into things like rigorous error return checking and
other specification details, the more robust, forward-looking,
and widely portable your code will be.  If you are not writing
production-quality code, and you leave out some checking, your
code may need tweaking later.  (Putting /* XXX */ as a reminder
in soft spots in the code is a common device.)

By the way, I've included comp.unix.programmer in the
distribution, since checking return values is something all
programmers (Unix and otherwise) need to know about.  (I missed
out on the great comp.unix.wizards shakedown and all the brouhaha
surrounding it.  What a lot of squalling!  Not that some of the
new groups aren't reasonable, but what was so wrong with
unix-wizards that it had to be removed?  Answer by mail only,
if at all, please -- I don't want to start more flames.)

                                            Steve Summit
                                            scs@adam.mit.edu

ronald@robobar.co.uk (Ronald S H Khoo) (10/25/90)

[ wizard readers: skip now.  followups redirected. ]

scs@adam.mit.edu (Steve Summit) writes:

>			default:
>				fprintf(stderr, "%s: can't open %s: ",
>							progname, name);
>				perror("");

usenet: can't open scs@adam.mit.edu: not a tty.

(Steve, I'm shocked!  *you* of all people :-)

For those of you still puzzled: perror() must be called *immediately* after the
error returns, or the internal error state may be modified by calls
to system calls in the intervening situation.  It's annoying, but

	perror(name);
	fprintf(stderr, "%s: error opening %s (see error message above)\n",
		progname, name);

is more correct.

-- 
ronald@robobar.co.uk +44 81 991 1142 (O) +44 71 229 7741 (H)

richard@aiai.ed.ac.uk (Richard Tobin) (10/26/90)

In article <1990Oct25.075856.4923@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes:
>For those of you still puzzled: perror() must be called *immediately* after
>the error returns

Or else you must save and restore errno.  Is this guaranteed to work?

-- Richard

-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (10/26/90)

In article <1990Oct25.075856.4923@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes:
> 	perror(name);
> 	fprintf(stderr, "%s: error opening %s (see error message above)\n",
> 		progname, name);

Correct but ugly. As long as you have an array sys_errlist[] with the
messages, you can just copy errno into a variable and use that. To skip
coding most of the common idioms you might snarf err.c from the pty
package.

---Dan

boyd@necisa.ho.necisa.oz (Boyd Roberts) (10/26/90)

In article <1990Oct24.193712.8693@athena.mit.edu> scs@adam.mit.edu (Steve Summit) writes:
>The call I've still been ignoring the return value of (shake,
>shake, naughty, naughty) is fstat.  Anybody know when that can
>fail, as long as the file descriptor and stat buf pointer are
>valid?  Actually, don't answer that.  I'm going to listen to my
>own advice and start checking that value too, even if it "can't
>fail" today.

Good idea.  The fstat (or stat) may fail when it goes to write/read
the inode times on the file-system.  The file-system could go bad.



Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

ronald@robobar.co.uk (Ronald S H Khoo) (10/26/90)

In article <3648@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:

> Or else you must save and restore errno.  Is this guaranteed to work?

I had thought not, but apparently the answer is "yes" because 4.1.3
says that errno expands to a "modifiable lvalue" and 4.9.10.4 says that perror()
"maps the error number in the integer expression errno to an error message"

So I guess you're right.  OK.

In <8215:Oct2521:30:3890@kramden.acf.nyu.edu>,
Dan Bernstein <brnstnd@kramden.acf.nyu.edu> suggested using
sys_errlist[] which is not standard.  I suppose you can always provide
a strerror() written in terms of sys_errlist[] if you don't already have
strerror() [4.11.6.2].  Many people don't have strerror() which is why
I did not mention it previously.  (I have it, though)

-- 
ronald@robobar.co.uk +44 81 991 1142 (O) +44 71 229 7741 (H)

chris@mimsy.umd.edu (Chris Torek) (10/26/90)

>In article <1990Oct25.075856.4923@robobar.co.uk> ronald@robobar.co.uk
>(Ronald S H Khoo) writes:
>> 	perror(name);
>> 	fprintf(stderr, "%s: error opening %s (see error message above)\n",
>> 		progname, name);

In article <8215:Oct2521:30:3890@kramden.acf.nyu.edu>
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>Correct but ugly. As long as you have an array sys_errlist[] ...

But you do not---not on all systems, anyway.  (It was not documented
originally and a number of vendors changed things.)

Instead of inventing Yet Another Way To Print Errors, why not use the
one mandated by ANSI C, called `strerror'?

	char *strerror(int err);
	/* returns the system error string associated with system error
	   number `err' */

	fprintf(stderr, "%s: error opening %s: %s\n",
	    progname, name, strerror(errno));

strerror is to be declared in <string.h> (`errno' is declared in <errno.h>)
and found in the standard C library.  If it does not exist on your system,
add it (edit <string.h> if necessary; be sure to save the original, and to
note that the change may need to be reapplied on every new release until
your vendor catches on to the fact that the ANSI C standard exists).

A suitable implementation (which may be compiled and merged into
libc.a if necessary) appears below.  (I could delete the copyright,
since I wrote the thing, but I am feeling perverse. :-) )

/*
 * Copyright (c) 1989 The Regents of the University of California.
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms are permitted
 * provided that the above copyright notice and this paragraph are
 * duplicated in all such forms and that any documentation,
 * advertising materials, and other materials related to such
 * distribution and use acknowledge that the software was developed
 * by the University of California, Berkeley.  The name of the
 * University may not be used to endorse or promote products derived
 * from this software without specific prior written permission.
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
 * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
 */

#include <string.h>

/*
 * Return the error message corresponding to some error number.
 */
char *
strerror(e)
	int e;
{
	extern int sys_nerr;
	extern char *sys_errlist[];
	static char unknown[30];

	if ((unsigned)e < sys_nerr)
		return (sys_errlist[e]);
	(void) sprintf(unknown, "Unknown error: %d", e);
	return (unknown);
}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

sarima@tdatirv.UUCP (Stanley Friesen) (10/26/90)

In article <1990Oct25.075856.4923@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes:
>scs@adam.mit.edu (Steve Summit) writes:
>>			default:
>>				fprintf(stderr, "%s: can't open %s: ",
>>							progname, name);
>>				perror("");
>For those of you still puzzled: perror() must be called *immediately* after the
>error returns, or the internal error state may be modified by calls
>to system calls in the intervening situation.

My solution, when I feel I *really* must place something in between the error
and the call to perror(), is to save and restore errno.  Thus:

	default:
		old_err = errno;
		fprintf(stderr, "%s: can't open %s: ", progname, name);
		errno = old_err;
		perror("");


Actually, I am likely to make the fprintf() above into an sprintf and pass
the resulting string to perror() - it gets the same results, and doesn't
involve passing an empty string to perror().

had enough?

-- 
---------------
uunet!tdatirv!sarima				(Stanley Friesen)

jim@segue.segue.com (Jim Balter) (10/31/90)

In article <27201@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
[ stuff about strerror, including an implementation ]

If you use strerror to implement perror, beware of a subtle trap:

#include <errno.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
	char *	p;

	p = strerror(31415);	/* presumably unknown error */
	errno = 27182;		/* ditto */
	perror("");
	printf("%s\n", p);	
}


A naive implementation will print 27182 twice, which is wrong, because ANSI
says that the string returned by strerror may be overwritten by a subsequent
call of strerror, but it doesn't say that perror may overwrite it or that
perror may call strerror.  This and many other gotchas were known to the
committee, but unfortunately they didn't provide an implementers' guide
enumerating them.