jik@athena.mit.edu (Jonathan I. Kamens) (10/29/90)
In article <thurlow.657136228@convex.convex.com>, thurlow@convex.com (Robert Thurlow) writes: |> Even here, a workaround might be to have the |> process retry the close so the kernel will retry the NFS writes, after |> telling the user he is over quota so that he can try to delete some |> files on the server. If your process exited, _close() could just go |> ahead and burn the blocks out of the cache. If a user process tries to access a file/directory in an AFS volume that is currently being operated upon (e.g. moved to another fileserver, backed up, released to read-only from read-write, etc.) by the AFS servers, the process hangs in the call that is doing the accessing, and the kernel does a uprintf() telling the user something like, "afs: Waiting for busy volume 536870973 in cell athena.mit.edu" (that message is taken verbatim from when this happened to me this evening during the nightly backup of my home directory). The kernel then delays for a noticeable but relatively small amount of time (probably on the order of ten real-time seconds, although I can't say what the exact interval is) and tries to do the access again; if it fails again, the same message is printed. This loops until the access succeeds. It might be worthwhile to consider a similar approach to dealing with EDQUOT errors, both on write() and on close(). Although I'm not convinced I'd want the kernel to keep trying forever (heck, I'm not even sure it keeps trying forever in the AFS case -- it may eventually decide that something is screwed up on the server and return an error to the user process, which is almost certainly the right thing to do), I think it would be reasonable for the kernel to uprintf() a message about quotas and try to write a few more times, after suitable delays. This would give the user a chance to rectify the problem before data lossage occurs. Another possibility is to add a new system call, something like try_close(). It takes a file descriptor, just like close(), but only actually completes the close() if it is possible to do so without errors (although it should treat EBADF and EINTR the same way close() does, since there is nothing the programmer can do about them in any case). So, if a programmer is concerned about data integrity, he can do a try_close() before he does a close(), and if try_close() returns EDQUOT or some such thing, the program can print a warning and wait for advice from the user before continuing. We can generalize that and say that there should be a flush() system call that takes a file descriptor and verifies that all output to it has been performed and was successful. I believe that the hypothetical effects of such a system call can be simulated both on NFS and AFS files by doing lseek(fd, (off_t) 0, L_INCR) (substitute SEEK_CUR for L_INCR on a POSIX system, and/or 1 for L_INCR on a SysV system). A program which is paranoid about being sure that data gets written to disk can therefore define a macro vwrite that does something like so: static int _vwrite_tmp #define vwrite(fd,buf,nbytes) \ ((_vwrite_tmp = write(fd,buf,nbytes)) >= 0 \ ? flush(fd) >= 0 \ ? _vwrite_tmp \ : -1 \ : -1) I'm not sure whether or not I need more parentheses in there to force the grouping to the way I want, but you get the idea. (Credit where credit is due: The suggestion that started me thinking about try_close() comes from John Carr here at Athena, but any problems with the suggestions I've posted are of course completely my fault :-) -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710
hunt@dg-rtp.rtp.dg.com (Greg Hunt) (10/29/90)
In article <1990Oct29.051212.13740@athena.mit.edu>, jik@athena.mit.edu (Jonathan I. Kamens) writes: > > We can generalize that and say that there should be a flush() system call > that takes a file descriptor and verifies that all output to it has been > performed and was successful. I believe that the hypothetical effects of > such > a system call can be simulated both on NFS and AFS files by doing lseek(fd, > (off_t) 0, L_INCR) (substitute SEEK_CUR for L_INCR on a POSIX system, > and/or 1 > for L_INCR on a SysV system). A program which is paranoid about being sure > that data gets written to disk can therefore define a macro vwrite that does > something like so: > Doesn't the already existing fsync() system call do what you want? It flushs the data buffers and any inode information to disk, and doesn't return until it completes. Any errors resulting from the completion of buffered NFS operations are returned by the call as well, so it solves some other problems mentioned about close(). Before I close() a critical file, I always code a fsync() for the file to guarantee that the output is safely on disk. I check for errors on both calls and report them to the user. Maybe the fsync() call doesn't exist in all flavors of UNIX? I also disagree with a previous poster (whose name escapes me) about checking error returns. I believe that good programmers always check for errors from all system calls, whether they're documented as returning errors or not. Then you deal with those that you decide you can handle somehow, and report any others to the user. That way your program won't accidently get caught by semantic changes from future OS changes. It's also easy to code, so it's not a big hassle. -- Greg Hunt Internet: hunt@dg-rtp.rtp.dg.com DG/UX Kernel Development UUCP: {world}!mcnc!rti!dg-rtp!hunt Data General Corporation Research Triangle Park, NC These opinions are mine, not DG's.
jgm@fed.expres.cs.cmu.edu (John G. Myers) (10/30/90)
People have stated on this newsgroup that if close(2) returns an error such as EDQUOT, but releases the file descriptor (as AFS is wont to do), then the application can do nothing to recover from the error. This is not the case--most programs are not able to determine and/or correct the underlying cause of any given error. Most times, what is important is that they find out that an error occured and report it to the user and/or their parent process. Also, an open file descriptor on a file is not always necessary for recovering from an error. For example, there is a program which I have modified to deal more gracefully with AFS: compress(1). When given an argument, stock compress reads from an input file and writes an output file. When the output file is written, it closes it and unlinks the input file. If it encounters an error from write(2), it prints an error message, unlinks the output file, and leaves the input file alone. Unfortunately, stock compress does not check the return value from close(2). If the user goes over quota, compress does not notice this and unlinks the input file anyway. The compress we run at andrew.cmu.edu has been modified to check the return value of close and deal with an error by reporting it to the user, unlinking the output file, and leaving the input file alone.% ----- % It also has been modified to set a magic "make sure this file gets shipped all the way to the fileserver before returning from close()" bit so that network communication errors will also be noticed. -- _.John G. Myers Internet: jgm@fed.expres.cs.cmu.edu (412) 268-2984 LoseNet: ...!seismo!ihnp4!wiscvm.wisc.edu!give!up "It's not bogus, It's an IBM standard" --Esther Filderman
jik@athena.mit.edu (Jonathan I. Kamens) (10/30/90)
In article <1990Oct29.142933.5893@dg-rtp.dg.com>, hunt@dg-rtp.rtp.dg.com (Greg Hunt) writes: |> Doesn't the already existing fsync() system call do what you want? It |> flushs the data buffers and any inode information to disk, and doesn't |> return until it completes. Any errors resulting from the completion |> of buffered NFS operations are returned by the call as well, so it |> solves some other problems mentioned about close(). The semantics of fsync() are not clear when discussing remote filesystems, i.e. it isn't clear for some filesystem types exactly what fsync() "should" do and what it does in reality. In AFS, for example, files are stored locally while they are being created or edited. Should fsync() make sure that the file has been flushed to the disk, or make sure that it has been sent across the network to the AFS server? As it happens, it does the latter, but the only way you can know that for sure is by experimenting (which is what I just did :-). Also, what happens if fsync() fails? Is the file descriptor valid, and is all of the data still available in the file, even though the file could not be pushed to disk? I don't know about this, which is why I'm asking.... if fsync() will cause the kernel to throw away any data that it can't save to the disk, then my suggestion to create another system call that would notify you on success *and not throw away data* on failure is still pertinent. Despite the fact that I'm not sure fsync() completely fits the bill for what I'm talking about, I must confess that until I read your message, I thought that fsync() took a FILE *, not a file descriptor, and that it simply verified that the FILE *'s buffer had ben write()ten to disk. That's fflush(), of course, not fsync(). You learn something new every day :-). -- Jonathan Kamens USnail: MIT Project Athena 11 Ashford Terrace jik@Athena.MIT.EDU Allston, MA 02134 Office: 617-253-8085 Home: 617-782-0710
hunt@dg-rtp.rtp.dg.com (Greg Hunt) (10/30/90)
In article <1990Oct29.202811.9409@athena.mit.edu>, jik@athena.mit.edu (Jonathan I. Kamens) writes: > > The semantics of fsync() are not clear when discussing remote filesystems, > i.e. it isn't clear for some filesystem types exactly what fsync() "should" > do > and what it does in reality. > You're right. I forgot to mention in my original article that my perspective is how the Data General DG/UX "fsync" system call works. I don't know how other systems handle it. > Also, what happens if fsync() fails? Is the file descriptor valid, and is > all of the data still available in the file, even though the file could not > be > pushed to disk? I don't know about this, which is why I'm asking.... if > fsync() will cause the kernel to throw away any data that it can't save to > the > disk, then my suggestion to create another system call that would notify you > on success *and not throw away data* on failure is still pertinent. I'm not 100% certain, but from reading the DG/UX man page on fsync, there is a clear (to me) implication that when an fsync fails the file descriptor remains open and valid, and the data remains buffered in the system. If this isn't the way it works in reality, then I would also want the new system call that you propose. -- Greg Hunt Internet: hunt@dg-rtp.rtp.dg.com DG/UX Kernel Development UUCP: {world}!mcnc!rti!dg-rtp!hunt Data General Corporation Research Triangle Park, NC These opinions are mine, not DG's.