[comp.unix.internals] dealing with close

jik@athena.mit.edu (Jonathan I. Kamens) (10/29/90)

In article <thurlow.657136228@convex.convex.com>, thurlow@convex.com (Robert Thurlow) writes:
|> Even here, a workaround might be to have the
|> process retry the close so the kernel will retry the NFS writes, after
|> telling the user he is over quota so that he can try to delete some
|> files on the server.  If your process exited, _close() could just go
|> ahead and burn the blocks out of the cache.

  If a user process tries to access a file/directory in an AFS volume that is
currently being operated upon (e.g. moved to another fileserver, backed up,
released to read-only from read-write, etc.) by the AFS servers, the process
hangs in the call that is doing the accessing, and the kernel does a uprintf()
telling the user something like, "afs: Waiting for busy volume 536870973 in
cell athena.mit.edu" (that message is taken verbatim from when this happened
to me this evening during the nightly backup of my home directory).  The
kernel then delays for a noticeable but relatively small amount of time
(probably on the order of ten real-time seconds, although I can't say what the
exact interval is) and tries to do the access again; if it fails again, the
same message is printed.  This loops until the access succeeds.

  It might be worthwhile to consider a similar approach to dealing with EDQUOT
errors, both on write() and on close().  Although I'm not convinced I'd want
the kernel to keep trying forever (heck, I'm not even sure it keeps trying
forever in the AFS case -- it may eventually decide that something is screwed
up on the server and return an error to the user process, which is almost
certainly the right thing to do), I think it would be reasonable for the
kernel to uprintf() a message about quotas and try to write a few more times,
after suitable delays.  This would give the user a chance to rectify the
problem before data lossage occurs.

  Another possibility is to add a new system call, something like try_close().
It takes a file descriptor, just like close(), but only actually completes the
close() if it is possible to do so without errors (although it should treat
EBADF and EINTR the same way close() does, since there is nothing the
programmer can do about them in any case).  So, if a programmer is concerned
about data integrity, he can do a try_close() before he does a close(), and if
try_close() returns EDQUOT or some such thing, the program can print a warning
and wait for advice from the user before continuing.

  We can generalize that and say that there should be a flush() system call
that takes a file descriptor and verifies that all output to it has been
performed and was successful.  I believe that the hypothetical effects of such
a system call can be simulated both on NFS and AFS files by doing lseek(fd,
(off_t) 0, L_INCR) (substitute SEEK_CUR for L_INCR on a POSIX system, and/or 1
for L_INCR on a SysV system).  A program which is paranoid about being sure
that data gets written to disk can therefore define a macro vwrite that does
something like so:

	static int _vwrite_tmp
	#define vwrite(fd,buf,nbytes) \
		((_vwrite_tmp = write(fd,buf,nbytes)) >= 0 \
			? flush(fd) >= 0 \
				? _vwrite_tmp \
				: -1 \
			: -1)

I'm not sure whether or not I need more parentheses in there to force the
grouping to the way I want, but you get the idea.

  (Credit where credit is due: The suggestion that started me thinking about
try_close() comes from John Carr here at Athena, but any problems with the
suggestions I've posted are of course completely my fault :-)

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

hunt@dg-rtp.rtp.dg.com (Greg Hunt) (10/29/90)

In article <1990Oct29.051212.13740@athena.mit.edu>, jik@athena.mit.edu
(Jonathan I. Kamens) writes:
>
>   We can generalize that and say that there should be a flush() system
call
> that takes a file descriptor and verifies that all output to it has
been
> performed and was successful.  I believe that the hypothetical effects
of
> such
> a system call can be simulated both on NFS and AFS files by doing
lseek(fd,
> (off_t) 0, L_INCR) (substitute SEEK_CUR for L_INCR on a POSIX system,
> and/or 1
> for L_INCR on a SysV system).  A program which is paranoid about being
sure
> that data gets written to disk can therefore define a macro vwrite
that does
> something like so:
> 

Doesn't the already existing fsync() system call do what you want?  It
flushs the data buffers and any inode information to disk, and doesn't
return until it completes.  Any errors resulting from the completion
of buffered NFS operations are returned by the call as well, so it
solves some other problems mentioned about close().

Before I close() a critical file, I always code a fsync() for the file
to guarantee that the output is safely on disk.  I check for errors on
both calls and report them to the user.

Maybe the fsync() call doesn't exist in all flavors of UNIX?

I also disagree with a previous poster (whose name escapes me) about
checking error returns.  I believe that good programmers always check
for errors from all system calls, whether they're documented as 
returning errors or not.  Then you deal with those that you decide you
can handle somehow, and report any others to the user.  That way your
program won't accidently get caught by semantic changes from future OS
changes.  It's also easy to code, so it's not a big hassle.

--
Greg Hunt                        Internet: hunt@dg-rtp.rtp.dg.com
DG/UX Kernel Development         UUCP:     {world}!mcnc!rti!dg-rtp!hunt
Data General Corporation
Research Triangle Park, NC       These opinions are mine, not DG's.

jgm@fed.expres.cs.cmu.edu (John G. Myers) (10/30/90)

People have stated on this newsgroup that if close(2) returns an error
such as EDQUOT, but releases the file descriptor (as AFS is wont to
do), then the application can do nothing to recover from the error.

This is not the case--most programs are not able to determine and/or
correct the underlying cause of any given error.  Most times, what is
important is that they find out that an error occured and report it to
the user and/or their parent process.  Also, an open file descriptor
on a file is not always necessary for recovering from an error.  For
example, there is a program which I have modified to deal more
gracefully with AFS: compress(1).

When given an argument, stock compress reads from an input file and
writes an output file.  When the output file is written, it closes it
and unlinks the input file.  If it encounters an error from write(2),
it prints an error message, unlinks the output file, and leaves the
input file alone.

Unfortunately, stock compress does not check the return value from
close(2).  If the user goes over quota, compress does not notice this
and unlinks the input file anyway.  The compress we run at
andrew.cmu.edu has been modified to check the return value of close
and deal with an error by reporting it to the user, unlinking the
output file, and leaving the input file alone.%

-----
% It also has been modified to set a magic "make sure this file gets
shipped all the way to the fileserver before returning from close()"
bit so that network communication errors will also be noticed.
--
_.John G. Myers		Internet: jgm@fed.expres.cs.cmu.edu
(412) 268-2984		LoseNet:  ...!seismo!ihnp4!wiscvm.wisc.edu!give!up
"It's not bogus, It's an IBM standard" --Esther Filderman

jik@athena.mit.edu (Jonathan I. Kamens) (10/30/90)

In article <1990Oct29.142933.5893@dg-rtp.dg.com>, hunt@dg-rtp.rtp.dg.com (Greg Hunt) writes:
|> Doesn't the already existing fsync() system call do what you want?  It
|> flushs the data buffers and any inode information to disk, and doesn't
|> return until it completes.  Any errors resulting from the completion
|> of buffered NFS operations are returned by the call as well, so it
|> solves some other problems mentioned about close().

  The semantics of fsync() are not clear when discussing remote filesystems,
i.e. it isn't clear for some filesystem types exactly what fsync() "should" do
and what it does in reality.

  In AFS, for example, files are stored locally while they are being created
or edited.  Should fsync() make sure that the file has been flushed to the
disk, or make sure that it has been sent across the network to the AFS server?
As it happens, it does the latter, but the only way you can know that for sure
is by experimenting (which is what I just did :-).

  Also, what happens if fsync() fails?  Is the file descriptor valid, and is
all of the data still available in the file, even though the file could not be
pushed to disk?  I don't know about this, which is why I'm asking.... if
fsync() will cause the kernel to throw away any data that it can't save to the
disk, then my suggestion to create another system call that would notify you
on success *and not throw away data* on failure is still pertinent.

  Despite the fact that I'm not sure fsync() completely fits the bill for what
I'm talking about, I must confess that until I read your message, I thought
that fsync() took a FILE *, not a file descriptor, and that it simply verified
that the FILE *'s buffer had ben write()ten to disk.  That's fflush(), of
course, not fsync().  You learn something new every day :-).

--
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

hunt@dg-rtp.rtp.dg.com (Greg Hunt) (10/30/90)

In article <1990Oct29.202811.9409@athena.mit.edu>, jik@athena.mit.edu
(Jonathan I. Kamens) writes:
> 
>   The semantics of fsync() are not clear when discussing remote
filesystems,
> i.e. it isn't clear for some filesystem types exactly what fsync()
"should"
> do
> and what it does in reality.
> 

You're right.  I forgot to mention in my original article that my
perspective is how the Data General DG/UX "fsync" system call works.
I don't know how other systems handle it.

>   Also, what happens if fsync() fails?  Is the file descriptor valid,
and is
> all of the data still available in the file, even though the file
could not
> be
> pushed to disk?  I don't know about this, which is why I'm asking....
if
> fsync() will cause the kernel to throw away any data that it can't
save to
> the
> disk, then my suggestion to create another system call that would
notify you
> on success *and not throw away data* on failure is still pertinent.

I'm not 100% certain, but from reading the DG/UX man page on fsync,
there is a clear (to me) implication that when an fsync fails the
file descriptor remains open and valid, and the data remains buffered
in the system.  If this isn't the way it works in reality, then I would
also want the new system call that you propose.

--
Greg Hunt                        Internet: hunt@dg-rtp.rtp.dg.com
DG/UX Kernel Development         UUCP:     {world}!mcnc!rti!dg-rtp!hunt
Data General Corporation
Research Triangle Park, NC       These opinions are mine, not DG's.