rivest@THEORY.LCS.MIT.EDU (02/20/89)
Hi -- The version of emacs that the LCS Theory group is using, (18.50.16 of Mon Oct 24 1988 on allspice) seems to be having problems with properly detecting whether or not the disk version of a file in a buffer has been changed. More precisely, it frequently reports the error message: "File on disk has changed, ..." when in fact the only program manipulating that file is emacs itself. This occurs most frequently during the use of rmail, but it can also be triggered by an auto-save of a file being edited. (The most common occurence is when it is writing back the RMAIL file after having retrieved some new messages.) I'm not able to force this to occur in any deterministic manner, so I can't give you a simple sequence of commands to execute to cause it to appear. Nonetheless, it occurs all too frequently. It may be important to note that most users of the theory group use NFS as provided by the "allspice system" in such a way that all files accessed by emacs are stored on a remote file server. It is conceivable that the bug is somehow an NFS bug and not an EMACS bug, or that it is due to some unfortunate interaction between NFS and EMACS. Ray and I have experimented to see if it might be due to clock skew between the client and the server; our experiments indicate that this is unlikely to be the cause. The bug appeared recently when Ray Hirschfeld reconciled our server (theory) with the allspice system. This primarily brought in the X11 system to all the theory machines (although some had been running X11 previously). It is probably not an X problem, since it has appeared when I was logged from a modem from a dumb terminal. I know this is not a crisp bug report, but the bug is not a crisp one either. -- Is this a known problem? If so, is there a known fix? I will attempt to gather more information on this problem, but would appreciate any information or guidance you may have at this point... Thanks! Ron Rivest
jis@ATHENA.MIT.EDU (Jeffrey I. Schiller) (02/21/89)
I have also run into this problem and tracked it down to an interaction between the way GNU Emacs determines whether or not a file has been modified, and the asynchronous nature of NFS. Basically when you write out a file with Emacs, it writes the file, closes the file and then "stats" it to get its modification time. With a local file system the modification time for the file is stable after the close() routine completes. However with NFS the close() routine can complete and return control to Emacs (which then stats() the file for its modification time) while output to the NFS file is still pending. Some amount of time later the output is done being written to the file and the modification time stabilizes on a value further in the future then when Emacs stat()'d it. This of course later confuses Emacs into thinking that the file has been changed since the last time it was written. This tends to happen more often with large files, and RMAIL files do tend to be large (my RMAIL file is currently at 1.5 Megabytes for example). Unfortunately I don't have a clean solution to this problem (there isn't a way with the current interface to the kernel to deterministically know when a file is done being written and thus its modification time stable). A kludge might be a add a pause between the time the file is close()'d and the stat() is performed. The length of time for this pause could be proportional to the size of the file (so unnecessary delays aren't added when small files are written). I don't know what a good set of parameters for the pause would be. -Jeff
rivest@THEORY.LCS.MIT.EDU (02/21/89)
Thanks for your reply and the diagnosis of the problem. I'm sorry to hear there doesn't seem to be a clean fix to the problem. Actually, your suggestion to put a pause in reminds me that another "problem" we've been having recently is that emacs seems to take an unreasonably long time already to save files. It is certainly much longer than it took, say, using Ultrix. Maybe someone has already stuffed a number of long pauses into the file-save code (but not enough to catch all occurrences of this timing problem...) I think the best fix, given the current kernel interface, is the following: -- when emacs detects (by means of its stats() information) that a file has been ``changed'', it should actually go read the file and see if it is actually any different than the buffer. If the file and buffer contents are the same, then it should swallow its complaint and not bother the user with a spurious notification that a change has occurred. (I'd rather have a little extra delay on writing than having to try to distinguish real from spurious warning messages myself.) Cheers, Ron Rivest
rivest@THEORY.LCS.MIT.EDU (02/22/89)
Jeff -- Regarding your explanation of the NFS/Emacs bug we discussed, wherein emacs stats() the file before NFS is done getting it all written onto the remote server, causing a spurious warning from emacs that the file has changed on disk: It seems to me that either the problem ought to be easily fixable, or else the problem is more serious than I thought. Consider the following scenario: -- I open a file for output with NFS, and write to it. -- I close the file. -- I open the file and read from it. Presumably, NFS will guarantee that the contents of the file I read are identical to what I wrote. That is, NFS should not have the bug that the read operation can get inaccurate data because the write is not yet finished. If NFS has this bug, then things are worse than I thought. If NFS doesn't have this bug, then can't we use a dummy read operation to more or less force NFS to finish writing and close the file so stats() will work OK? Thanks, Ron
jis@ATHENA.MIT.EDU (Jeffrey I. Schiller) (02/23/89)
Date: Wed, 22 Feb 89 13:27:27 PST From: guy@auspex.com (Guy Harris) > -- I open a file for output with NFS, and write to it. > -- I close the file. > -- I open the file and read from it. >Presumably, NFS will guarantee that the contents of the file I read are >identical to what I wrote. Yes, but it does *not* necessarily do so by syncing all unwritten data to the server before doing the "read". The unwritten data is stored somewhere on the NFS client; the "read" can just pick up that data rather than going to the server. Indeed that is what is happening, a read is going to the local cache. >If NFS doesn't have this bug, then can't we use a dummy read operation >to more or less force NFS to finish writing and close the file so >stats() will work OK? Try doing an "fsync" before the "close", assuming your OS has "fsync" (SunOS and Ultrix should both have it); if "fsync" is implemented properly (as far as I know, it is so implemented in SunOS), it will not return until *all* unwritten data for the file descriptor handed to it has been sent to the server. The SunOS version of "vi" does an "fsync" after writing out a file and before closing it. I believe Ron is using a Wisconsin port of the Sun NFS code. Last time I looked at the source code for fsync, it only guarantees that the writes have been queued (which only means that they are on the "async_daemon" queue [the queues serviced by the /etc/biod processes]) not that the i/o has in fact completed. An alternative to what I suggested in my last message might be to change the comparison code in Emacs so that rather then requiring the file modification date to exactly match the buffer modification date, allow a certain small tolerance, proportional to file size. This will eliminate an unnecessary pause and in effect provide the same semantics. -Jeff