karl@cbrma.UUCP (Karl Kleinpaste) (11/10/86)
mike@louis.UUCP writes: >Recently we have been doing a study of NFS fileservers and we have >come across unreliability in NFS (i.e writing something to a remote >file and finding something different when reading it back) when the >server was under extreme load. Now we are starting to notice the same >behaviour on our existing Sun fileservers. > >The question is, have other noticed this and does anyone know why >it happens? [mumble] Yes, I've seen such a thing. At OSU, there is a small set of Suns (11?), 3 of which are Sun-2s and the rest are recently-purchased Sun-3s. Unfortunately, one of the Sun-2s is the server for *all* the rest. Some would call this a Bad Thing, and they would be right. It is equipped with 2 Eagle drives for a decent amount of disc, and all those other Suns are usually quite busy during office hours. This problem was first noticed in, of all things, the "hack" game, and more recently in GNU Emacs. GNU Emacs has lisp code to detect whether a file has changed on disc more recently than the last time the current user either read the file in or wrote his changes out. Periodically, when the server node is seriously overloaded (which is the case more and more often), GNU Emacs utters the evil phrase, "File has changed on disc; save anyway [y or n]?" It is *believed* (that is, we can't quite prove it yet) that this is due to the sequence of events where [a] Joe User saves his file, which causes additional work for an already-overloaded server, [b] GNU Emacs stat(2)'s the file to get its modification time, but [c] the server is so overloaded that the file wasn't finished being written at the time of the stat(2), so [d] Joe goes on and hacks at his file a while longer, [e] issues another save for it, at which time [f] GNU Emacs stat(2)'s the file again, compares it against its saved write-time, and [g] finds that the last modification time is later than the saved write-time. Potent words of evil tend to get uttered by Joe when he sees GNU Emacs' comment, because (generally speaking) he hasn't the FAINTEST idea what caused it. > And, of course, does anyone know how to stop it? OSU is choosing to solve the whole problem (that is, overall performance, not just GNU Emacs and similar programs' foolish comments) by replacing the Sun-2 file server with >1 Sun-3 file servers. You do what you have to. Unfortunately, it costs significant $$$ to do what you have to in such cases. -- Karl Kleinpaste
west@onion.cs.reading.ac.uk (Jerry West) (11/15/86)
With regard to the Emacs filetime problems on a nova of Suns hanging off one server.... we found that we had more problems with date(1) being different on different machines. Sun have "fixed" ls et al to allow for files created some (small) time in the future, but GNU Emacs might be falling foul of this. An rdate(8) in crontab keeps things in line. Jerry West
jim@cs.strath.ac.uk (Jim Reid) (11/16/86)
In article <85@onion.cs.reading.ac.uk> west@onion.UUCP (Jerry West) writes: >With regard to the Emacs filetime problems on a nova of Suns hanging >off one server.... we found that we had more problems with date(1) being >different on different machines. Sun have "fixed" ls et al to allow for >files created some (small) time in the future, but GNU Emacs might be >falling foul of this. An rdate(8) in crontab keeps things in line. Of course, you should also ensure the client and server machines have their kernels configured for the same timezone. A former colleague of mine was baffled by time funnies until he found that the client kernel thought it was in California (PST) while the server had been properly configured for local time in Norway! Jim