tr@bellcore.com (tom reingold) (10/05/89)
I manage a network of 100 Suns and two Pyramids. Something strange happened today, and it has happened before, too. A user was working on two Suns simultaneously. He was editing one file on one Sun, and reading the file using LaTeX on the other. The file was NFS mounted on both Suns. It physically resided on the NFS server, a Pyramid. One of the Suns had an out-of-date copy! It was a minute old. Unmounting the filesystem on the Sun client and remounting it fixed the problem. This is disturbing. I would like to know what causes this. If I have to live with it, I would like to know what workarounds exist other than remounting. Of course, users cannot do this, and I am not always around to do it for them. Thank you. Tom Reingold |INTERNET: tr@bellcore.com Bellcore |UUCP: bellcore!tr 444 Hoes La room 1H217 |PHONE: (201) 699-7058 [work], Piscataway, NJ 08854-4182 | (201) 287-2345 [home]
thurlow@convex.com (Robert Thurlow) (10/06/89)
tr@bellcore.com (tom reingold) writes: >A user was working on two Suns simultaneously. He was editing one file >on one Sun, and reading the file using LaTeX on the other. The file >was NFS mounted on both Suns. It physically resided on the NFS server, >a Pyramid. One of the Suns had an out-of-date copy! It was a minute >old. >Unmounting the filesystem on the Sun client and remounting it fixed the >problem. We've had this; what seems to be a common cause for it is that the time is not synchronized between the updating client and the server, so the file attributes don't get through the server and show up on disk until something changes ("gee, 'now' plus one minute - I think I'll hang onto this request for awhile"). Moving the file to a new name and back seemed to be another work-around here. Are you running the time daemon, or did you check for a system time difference? This may not be the only cause of the problem. >This is disturbing. I would like to know what causes this. If I have >to live with it, I would like to know what workarounds exist other than >remounting. Of course, users cannot do this, and I am not always >around to do it for them. And please post! I want to know the answer to this, too. Rob T -- Rob Thurlow - Expatriate Canadian thurlow@convex.com "From the heart of 'The Friendship State'"
mjb@acd4.UUCP ( Mike Bryan ) (10/07/89)
In article <1967@convex.UUCP> thurlow@convex.com (Robert Thurlow) writes: >tr@bellcore.com (tom reingold) writes: >>A user was working on two Suns simultaneously. He was editing one file >>on one Sun, and reading the file using LaTeX on the other. The file >>was NFS mounted on both Suns. It physically resided on the NFS server, >>a Pyramid. One of the Suns had an out-of-date copy! It was a minute >>old. > >We've had this; what seems to be a common cause for it is that the >time is not synchronized between the updating client and the server, >so the file attributes don't get through the server and show up on >disk until something changes. Well, here goes an attempt to describe what's happening. We had this problem with our systems, and because of it have had to abandon using NFS for our customer systems (at least for now). An NFS client maintains a cache of accessed files. This cache includes file attributes (such as modification time and ownership/protection). If it has a file's data locally, and the attributes were "recently" read, it will not try to access the server. It's the definition of "recent" that causes the problems. The client will periodically re-read the file attributes from the server. If it determines that the file has been modified, it will decide the local data is invalid, and request the file data from the server. The problem you are seeing is that the client can take too long to realize the file has changed. (The following might be a bit off technically, it's been almost a year since I investigated all of this... If so, I apologize. However, I'm certain any errors are minimal, and it should get the gist across.) Normally, the file attributes are checked every 3 seconds. However, if the system times are skewed, it can take longer. (I don't remember exactly which times are being compared, but some it has something to do with the last time the attributes were read and the times within those attributes.) In Ultrix 2.3, at least, these "re-check" times are controlled by the following four kernel variables: Name Value (in seconds) ------------------ ------------------ nfsac_regtimeo_min 3 nfsac_regtimeo_max 60 nfsac_dirtimeo_min 30 nfsac_dirtimeo_max 60 The "*_min" values determine how often it decides to try to look at the file attributes. These values don't hold if the time is skewed, however. The "*_max" values determine how often they are re-read NO MATTER WHAT. If the times are skewed your data should be no more than 60 seconds out of date (and this *is* what you reported seeing). The above values are in two sets: "*dir*" applies to directory files, and "*reg*" applies to regular files. I don't know if these same names are used in other O/S's, but I'd bet Sun is at least close, since Ultrix changed very little of Sun NFS for 2.2/2.3. What does all this mean? Well, you can try changing these kernel values. We did, and saw the data-skew problem lessen as expected. However, you pay a performance penalty, since requests are more likely to access the server rather than use the cache. Also, even at "0", there is up to a one second delay, since the code apparently waits until the time difference is strictly greater than the given value. (Without source, I can't say for sure, however.) Admittedly, I did not try a "-1", but that might cause problems, especially if they are unsigned variables. (Hmm, infinite time/data skew. How lovely!) Also, you can supposedly remove all data skew by using the NFS lock daemons and applying a lock to the file in question. Since we were running 2.3 Ultrix at the time, and it did not have NFS locking, I haven't verified this, nor do I know the details. Maybe I'll check it out again since we are gearing up for Ultrix 3.0/3.1 support now. Note: All of the above deals with the case of keeping data synched between a client and its server. If you have multiple clients, and one client is reading what another client is writing, you have an additional delay added by the time for the data to propagate from the writing client to the server. This is controlled by the sync/update procedure, and can cause further delays of up to 30 seconds. (NFS *might* be a write-through cache, but I don't think so.) We at least had the writes occuring on the server, but we were unable to use NFS for this particular application even then, as we had to have the same synchronous read/write semantics as for local files. *Sigh*. Anyway, hope this helps anyone who has noticed the same problem. Normally, it should not cause serious problems, especially if you keep the system times synchronized. If you aren't expecting it though, it can be quite frustrating. -- Mike Bryan, Applied Computing Devices, 100 N Campus Dr, Terre Haute IN 47802 Phone: 812/232-6051 FAX: 812/231-5280 Home: 812/232-0815 UUCP: uunet!acd4!mjb ARPA: mjb%acd4@uunet.uu.net "Did you make mankind after we made you?" --- XTC, "Dear God"
beepy%commuter@Sun.COM (Brian Pawlowski) (10/07/89)
> >>A user was working on two Suns simultaneously. He was editing one file > >>on one Sun, and reading the file using LaTeX on the other. The file > >>was NFS mounted on both Suns. It physically resided on the NFS server, > >>a Pyramid. One of the Suns had an out-of-date copy! It was a minute > >>old. > > > >We've had this; what seems to be a common cause for it is that the > >time is not synchronized between the updating client and the server, > >so the file attributes don't get through the server and show up on > >disk until something changes. > > Well, here goes an attempt to describe what's happening. We had this > problem with our systems, and because of it have had to abandon using > NFS for our customer systems (at least for now). I'm curious what the application was that you had to abandon using NFS, and what the systems were. Your explanation was cogent, I'll babble on along the same lines. Not to be a pain, but it's not so much a "problem" as a implementation behaviour. The caching with consistency checks (typically on the order of every 30 seconds) introduces a window where a client's modifications to a file are not noticed by other client's for the cache consistency check (which consists of a NFS GETATTR call to inspect the modified time of the file to see if the cached file data is still valid). Unfortunately these consistency checks are not documented very well, and possibly differ from implementation to implementation. > The problem you are seeing is that the client can take too > long to realize the file has changed. There is a tradeoff here between performance and consistency. Actually, cache check intervals in our current NFS implementation differ for directories and plain files. Directory timeouts are: 30 - 60 seconds, File timeouts are: 3 - 60 seconds. In the NFSSRC 4.0 reference port acregmin=n Hold cached attributes for at least n seconds after file modification. acregmax=n Hold cached attributes for no more than n seconds after file modifica- tion. acdirmin=n Hold cached attributes for at least n seconds after directory update. acdirmax=n Hold cached attributes for no more than n seconds after directory update. actimeo=n Set min and max times for regular files and directories to n seconds. Regular defaults are: fg,retry=10000,timeo=7,retrans=3,port=NFS_PORT,hard,\ acregmin=3,acregmax=60,acdirmin=30,acdirmax=60 These values are bounded in the kernel between the above values. The reasoning in being able to set the file timeout lower than 30 seconds is to allow stronger (more frequent) consistency checks. In an internal version of NFS, there is also an option to turn off caching on a mount point. Caching is important to performance. NFS introduces a window of inconsistency defined by the timeout period. Other distributed filesystems (Sprite, Andrew) use call backs to ensure data consistency. Another option one might consider to ensure consistency of single writer multiple readers (or multiple writers multiple readers) is to use locking. If you have a custom app. you might try it. This would force exact consistency and proper serialization of readers and writers. > This is controlled by the sync/update > procedure, and can cause further delays of up to 30 seconds. (NFS > *might* be a write-through cache, but I don't think so.) So, with a 30 second sync update on the writer, and a 30 second default cache check window on the reader, you get: 60 seconds (bingo). > > Anyway, hope this helps anyone who has noticed the same problem. > Normally, it should not cause serious problems, especially if you keep > the system times synchronized. If you aren't expecting it though, it > can be quite frustrating. > Hmmmm... yeah I guess it can be a problem - especially if you're not expecting it. But it is a performance attack for NFS. Cache consistency checking on a more frequent basis would load both the network and the server with unnecessary checks for what I believe for most circumstances; that is synchronized writers and readers is an infrequent case in most applications. Think about it: I typically work on my own files, in my own area, and when I share files for program development, I use SCCS to synchronize access and manage the software. Many files accessed through NFS are read-only (shared executables). Locking is available in the case requiring synchronization and tight consistency. Brian Pawlowski Brian Pawlowski <beepy@sun.com> <sun!beepy> Sun Microsystems, NFS Development
thurlow@convex.com (Robert Thurlow) (10/07/89)
mjb@acd4.UUCP ( Mike Bryan ) writes: >In article <1967@convex.UUCP> thurlow@convex.com (Robert Thurlow) writes: >>tr@bellcore.com (tom reingold) writes: >.... If the times are skewed your data should be no more than >60 seconds out of date (and this *is* what you reported seeing). Hmmm, okay. I haven't seen this, but I've heard about stuff more on the order of ten minutes. (I'll have to play with that if I see it, because it would be pretty interesting! :-) Has anyone seen something this pathological? Rob T -- Rob Thurlow - Expatriate Canadian thurlow@convex.com "From the heart of 'The Friendship State'"
guy@auspex.auspex.com (Guy Harris) (10/08/89)
>In the NFSSRC 4.0 reference port > > acregmin=n Hold cached attributes for at least > n seconds after file modification. > acregmax=n Hold cached attributes for no more > than n seconds after file modifica- > tion. ... You forgot to mention that those are per-mount options from "/etc/fstab" in SunOS 4.x and NFSSRC4.0, as opposed to per-system options to tweak by patching kernel variables. >In an internal version of NFS, there is also an option to turn off caching >on a mount point. If you're referring to "noac", it's in SunOS 4.0.3, and even documented in MOUNT(8): noac Suppress attribute caching. Some code for it is in NFSSRC4.0 as well.
russ@Alliant.COM (Russell McFatter) (10/09/89)
I also experienced this problem in a different manifestation... While using a diskless workstation, I edited a file (on a NFS partition), and the program I was working on would see the old version of the file for several minutes. Turns out that this behavior was a bug in SunOS 4.0.1 (NFS writes are not properly flushed to disk on the server) and it was fixed in SunOS 4.0.3. Check out the 4.0.3 release notes for all the details. --- Russ McFatter russ@alliant.alliant.COM
mjb@acd4.UUCP ( Mike Bryan ) (10/10/89)
In article <125974@sun.Eng.Sun.COM> beepy%commuter@Sun.COM (Brian Pawlowski) writes: > > [Info about data inconsistencies on NFS clients caused by attribute > caching deleted.] > >I'm curious what the application was that you had to abandon using NFS, >and what the systems were. > > ... > >Not to be a pain, but it's not so much a "problem" as a implementation >behaviour. Hmmm, hate to disagree with you, but it is a problem. Agreed, it is caused by implementation behaviour, but I think that that behaviour is wrong in some cases. It works fine 99.9% of the time, but in some cases, full UNIX semantics are just plain necessary. I like hearing about the availability of turning off attribute cacheing at a mount point. Now I just have to wait 32.7 years for DEC to absorb that code into Ultrix. :-) As for the exact reason we couldn't use NFS, I'll try to give a brief non-proprietary description. There are two processes, "A" and "B". Process A is receiving data, and putting it into a file. On some types of data, it will send a message to process B, telling it there is data available. Process B then reads the data from the file. If everything is on a single machine, there is no problem. The problem is when they are on different machines (process A on the NFS server, process B on the NFS client). Then process B gets the message, and goes to read the file. If it had been read "recently", as is often the case, the client will not have the updated data. Since NFS could not be relied on, our solution was to pass the necessary data with the message. Whether or not this is a preferred method is irrelevent. If NFS had supported full UNIX semantics, we could have expanded our system without a code change. As it was, we had to change our programs to handle the network case. As I said before, NFS locking was not available. Perhaps the "new" feature of turning off the attribute cache could be used to our benifit here... it sure would be nice. As a side note, we also investigated a product called FreedomNet, from RTI. It did support full UNIX semantics, including reading from character and block devices. It is a "stateful" architecture, as opposed to the "stateless" NFS. FreedomNet also supplied a lot of other neat features for distributed systems. Its drawbacks were a slight performance degradation, a few minor bugs in the version we had, and the price. We finally decided it was too expensive for our purposes, and we don't use it either. I personally feel it is a nice package however, and if a site can afford it (a few thousand dollars), they should investigate it. >Think about it: I typically work on my own files, in my >own area, and when I share files for program development, I use SCCS >to synchronize access and manage the software. Many files accessed >through NFS are read-only (shared executables). Locking is available >in the case requiring synchronization and tight consistency. Well, I don't see how SCCS can help you get around the NFS problem. Also, locking is not always possible. For starters, Ultrix 2.3 did not support it. Not to mention the fact (but I will anyway) that you cannot always use locking, especially when using pre-supplied system utilities (such as formatting a file and then reading/printing the formatted file on another client). The ability to turn off cacheing at a mount point is a good idea, however, and might be just what we need. On a more positive note, we *do* use NFS internally very heavily for our development systems. We have a network of five VAXen, each of which is an NFS server to the other four. Through a judicious sprinkling of symbolic links, a user can log into any of the machines and still see the same files. NFS works great for us in this respect. (Except for access to character/block devices... oh well :-( ) -- Mike Bryan, Applied Computing Devices, 100 N Campus Dr, Terre Haute IN 47802 Phone: 812/232-6051 FAX: 812/231-5280 Home: 812/232-0815 UUCP: uunet!acd4!mjb ARPA: mjb%acd4@uunet.uu.net "Did you make mankind after we made you?" --- XTC, "Dear God"
mogul@decwrl.dec.com (Jeffrey Mogul) (10/17/89)
In article <1989Oct9.191726.23428@acd4.UUCP> mjb@acd4.UUCP ( Mike Bryan ) writes: >As for the exact reason we couldn't use NFS, I'll try to give a brief >non-proprietary description. There are two processes, "A" and "B". >Process A is receiving data, and putting it into a file. On some >types of data, it will send a message to process B, telling it there >is data available. Process B then reads the data from the file. If >everything is on a single machine, there is no problem. The problem >is when they are on different machines (process A on the NFS server, >process B on the NFS client). Then process B gets the message, and >goes to read the file. If it had been read "recently", as is often >the case, the client will not have the updated data. On fundamental philosophical grounds, I'll basically agree with you that NFS doesn't provide the appropriate consistency guarantees here ... people interested in this might want to read the paper by V. Srinivasan and myself at the upcoming SOSP. However, I think your problem has a simple, if not entirely efficient, solution. Since you seem to have control over the source of the program used by Process B, I think if you have it do an "fstat()" of the file after receiving the synchronization message and before calling "read()" on the file, this will force the NFS client code on machine B to check with the server. This is not, as far as I can tell, an official feature of the NFS specification, but is rather the way that NFS is actually implemented (at least in the earlier reference ports). In your application, this should not be extremely inefficient, since you will always be paying the latency of writing the data to the server disk once per "transaction", and so the extra cost (latency) of doing the "getattr()" RPC (the underlying implementation of "fstat()") should be nearly negligible. I don't think it's a good general solution, especially given that one doesn't always have source code for the application. -Jeff P.S.: I'll make sure that someone in the Ultrix group at Digital knows that you want the "noac" option when mounting NFS files; no promises, of course (I'm in research).
cs@Eng.Sun.COM (Carl Smith) (10/18/89)
In article <205@jove.dec.com>, mogul@decwrl.dec.com (Jeffrey Mogul) writes: ... > Since you seem to have control over the source of the program used > by Process B, I think if you have it do an "fstat()" of the file > after receiving the synchronization message and before calling "read()" > on the file, this will force the NFS client code on machine B to > check with the server. This is not, as far as I can tell, an official > feature of the NFS specification, but is rather the way that NFS is > actually implemented (at least in the earlier reference ports). Oh, dear. I do hope no one will begin to rely on characteristics of some NFS implementations to guarantee the correct behavior of their applications. NFS runs on too many operating systems to make that a pleasant experience. The ``noac'' mount option that Sun and others use is a bit heavy- handed for this application. It seems to me all that's needed is a cache flush fcntl. Carl
mogul@decwrl.dec.com (Jeffrey Mogul) (10/18/89)
In article <126470@sun.Eng.Sun.COM> cs@Eng.Sun.COM (Carl Smith) writes: [copy of my suggestion to use fstat() to synch the reader-side cache] > > Oh, dear. I do hope no one will begin to rely on characteristics >of some NFS implementations to guarantee the correct behavior of their >applications. NFS runs on too many operating systems to make that a >pleasant experience. Actually, I tried to convey that point ... the NFS spec certainly doesn't require that this works, and I take no responsibility for it working. On the other hand, all the suggestions about mucking with timers have the same feature ... the NFS spec doesn't even bound the possible lifetime of the attributes cache timers! In general, the consistency properties of NFS are "specified" by the behaviour of the reference port, not by any of the language in the spec. > The ``noac'' mount option that Sun and others use is a bit heavy- >handed for this application. It seems to me all that's needed is a cache >flush fcntl. Just more patches on top of kludges. There's no intrinsic reason why a network file system can't guarantee consistency and provide reasonable performance at the same time; it's just that NFS has gone too far in the "stateless" direction to avoid compromising one for the other. -Jeff