[comp.unix.wizards] checking close's return value

henry@utzoo.uucp (Henry Spencer) (09/21/88)

In article <20981@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>[new magtape] devices use a very large buffer, and in many cases
>the tapes don't even start to move until the write-end-of-file
>command is issued by the device driver in the close.  If anything
>goes wrong and the data isn't written correctly, the close()
>function returns an error status but everything simply ignores it.
>
>If you are writing (or buying) software that is going to write
>to these devices, I strongly suggest you make sure that it
>checks the return value of close().

The same comment, actually, is much more broadly applicable.  It's not
at all inconceivable for devices that use the buffer cache to report an
error in asynchronous I/O by returning an error from close().  One should
always check the result from close().

The same goes, double, for fclose().  There it's even stronger, because
fclose() has a high probability of doing buffer flushes that involve
actual I/O.
-- 
NASA is into artificial        |     Henry Spencer at U of Toronto Zoology
stupidity.  - Jerry Pournelle  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

mike@turing.unm.edu (Michael I. Bushnell) (09/22/88)

In article <1988Sep20.230150.7574@utzoo.uucp>, henry@utzoo (Henry Spencer) writes:
>In article <20981@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:

>>If you are writing (or buying) software that is going to write
>>to these devices, I strongly suggest you make sure that it
>>checks the return value of close().

>The same comment, actually, is much more broadly applicable.  It's not
>at all inconceivable for devices that use the buffer cache to report an
>error in asynchronous I/O by returning an error from close().  One should
>always check the result from close().

Sigh.  If only.  The problem here is that I have *never* seen a man
page for close which returns errors like EQUOT or EIO.  The 4.3+NFS
man page says close can only fail with EBADF, and SunOS 4.0 says that
close will only fail with EBADF or EINTR.  

If close returns errors like EQUOT or EIO, then the man page needs to
be rewritten.  UNIX does *not* guarantee that hardware related errors
will get reflected on write.  This is one of its deficiencies, but is
unavoidable given the implementation of the filesystem.  The actual
disk write may take place hours after the write(2) system call
(assuming update isn't running).  Do we resurrect the process to
return an error from the close(2)?  What about processes that don't
explicitly close their file descriptors?  That has always been
acceptable practice, but now we are told we are supposed to close
everything before we exit so we can check for undocumented errors.

The quota problem is one that NFS did not implement very well, alas.
The EIO problem is a different story...Since applications should not
count on write(2) working, even if the call returned success, in the
even that hardware goes bad, they should not complain about EIO being
returned late.  At least it was returned sometime.  And if the kernel
gives me EIO on a close, and the man page says nothing about it, they
should NOT return that error.
-- 
-- 
                N u m q u a m   G l o r i a   D e o 

       \                Michael I. Bushnell
        \               HASA - "A" division
        /\              mike@turing.unm.edu
       /  \ {ucbvax,gatech}!unmvax!turing.unm.edu!mike

rb@ist.CO.UK (News reading a/c for rb) (09/23/88)

From article <1988Sep20.230150.7574@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer):
> It's not at all inconceivable for devices that use the buffer
> cache to report an error in asynchronous I/O by returning an
> error from close().

NFS is a case in point - there all sorts of cases where
close() can fail after other operations have succeeded.

bzs@xenna (Barry Shein) (09/25/88)

For years I've suggested from time to time that there should be a
signal assigned for I/O errors which is by default OFF but can be
enabled calling signal(). It should call the signal handler with the
fd that caused the signal and the errcode it would have returned to
the call that generated it (possibly some indication of the system
call tho that might be hard.)

The advantage is that you don't have to wrap all your I/O's with
checks for errors (although I suppose there's still the grey area of
short read/writes, I think semantics can be worked out for that with a
little work.) Another advantage (similar) is that I can simply add a
few lines to an existing program (like cat) and now check errors
without hunting down every place it does I/O. More importantly, I
don't need the souces to a library to add error checking (assuming I
can return sanely.)

The disadvantages (and implementation difficulties) are several, the
worst being derived from the fact that the error will not be detected
until all I/O physically completes. This means that I could, if
nothing is done to prevent it, close a file and later receive a signal
on that fd that there was an I/O error even tho by now I have it
opened to a different file, very confusing. I suppose one semantic
requirement might be that if the signal is enabled then a close()
automaticallly implies an fsync() first, at any rate I don't think
it's too much of a rat's nest, I didn't say the change was trivial.

I dunno, if it's useful for SIGFPE it seems similarly useful for I/O.

P.S. Yes, I am fully aware of what a SYNAD is.

	-Barry Shein, ||Encore||

daryl@ihlpe.ATT.COM (Daryl Monge) (09/25/88)

In article <1213@unmvax.unm.edu> mike@turing.unm.edu (Michael I. Bushnell) writes:
>UNIX does *not* guarantee that hardware related errors
>will get reflected on write.  This is one of its deficiencies, but is
>unavoidable given the implementation of the filesystem.  The actual
>disk write may take place hours after the write(2) system call
>(assuming update isn't running).

So true.  I would like it if close(2) would insure all blocks were
successfully written to disk before it returned.
(Possibly by an fcntl(2) option if every one doesn't want this?)

Daryl Monge				UUCP:	...!att!ihcae!daryl
AT&T					CIS:	72717,65
Bell Labs, Naperville, Ill		AT&T	312-979-3603

jfh@rpp386.Dallas.TX.US (The Beach Bum) (09/26/88)

In article <3542@ihlpe.ATT.COM> daryl@ihlpe.UUCP (Daryl Monge) writes:
>In article <1213@unmvax.unm.edu> mike@turing.unm.edu (Michael I. Bushnell) writes:
>>UNIX does *not* guarantee that hardware related errors
>>will get reflected on write. ...
>> ... The actual
>>disk write may take place hours after the write(2) system call
>
>So true.  I would like it if close(2) would insure all blocks were
>successfully written to disk before it returned.
>(Possibly by an fcntl(2) option if every one doesn't want this?)

this can be handled by open(...,|O_SYNCW);

the following routine will cause the given file descriptor to have
the O_SYNCW bit set:

#include <fcntl.h>

int	setsync (fd)
int	fd;
{
	int	flags;

	if ((flags = fcntl (fd, F_GETFL, 0)) == -1)
		return (-1);

	flags |= O_SYNCW;
	return (fcntl (fd, F_SETFL, flags));
}
-- 
John F. Haugh II (jfh@rpp386.Dallas.TX.US)                   HASA, "S" Division

      "Why waste negative entropy on comments, when you could use the same
                   entropy to create bugs instead?" -- Steve Elias