[comp.protocols.nfs] NFS client has out-of-date files

tr@bellcore.com (tom reingold) (10/05/89)

I manage a network of 100 Suns and two Pyramids.  Something strange
happened today, and it has happened before, too.

A user was working on two Suns simultaneously.  He was editing one file
on one Sun, and reading the file using LaTeX on the other.  The file
was NFS mounted on both Suns.  It physically resided on the NFS server,
a Pyramid.  One of the Suns had an out-of-date copy!  It was a minute
old.

Unmounting the filesystem on the Sun client and remounting it fixed the
problem.

This is disturbing.  I would like to know what causes this.  If I have
to live with it, I would like to know what workarounds exist other than
remounting.  Of course, users cannot do this, and I am not always
around to do it for them.

Thank you.

Tom Reingold                   |INTERNET:       tr@bellcore.com
Bellcore                       |UUCP:           bellcore!tr
444 Hoes La room 1H217         |PHONE:          (201) 699-7058 [work],
Piscataway, NJ 08854-4182      |                (201) 287-2345 [home]

thurlow@convex.com (Robert Thurlow) (10/06/89)

tr@bellcore.com (tom reingold) writes:
>A user was working on two Suns simultaneously.  He was editing one file
>on one Sun, and reading the file using LaTeX on the other.  The file
>was NFS mounted on both Suns.  It physically resided on the NFS server,
>a Pyramid.  One of the Suns had an out-of-date copy!  It was a minute
>old.

>Unmounting the filesystem on the Sun client and remounting it fixed the
>problem.

We've had this; what seems to be a common cause for it is that the
time is not synchronized between the updating client and the server,
so the file attributes don't get through the server and show up on
disk until something changes ("gee, 'now' plus one minute - I think
I'll hang onto this request for awhile").  Moving the file to a new
name and back seemed to be another work-around here.  Are you running
the time daemon, or did you check for a system time difference?  This
may not be the only cause of the problem.

>This is disturbing.  I would like to know what causes this.  If I have
>to live with it, I would like to know what workarounds exist other than
>remounting.  Of course, users cannot do this, and I am not always
>around to do it for them.

And please post!  I want to know the answer to this, too.

Rob T
--
Rob Thurlow - Expatriate Canadian                      thurlow@convex.com
"From the heart of 'The Friendship State'"

mjb@acd4.UUCP ( Mike Bryan ) (10/07/89)

In article <1967@convex.UUCP> thurlow@convex.com (Robert Thurlow) writes:
>tr@bellcore.com (tom reingold) writes:
>>A user was working on two Suns simultaneously.  He was editing one file
>>on one Sun, and reading the file using LaTeX on the other.  The file
>>was NFS mounted on both Suns.  It physically resided on the NFS server,
>>a Pyramid.  One of the Suns had an out-of-date copy!  It was a minute
>>old.
>
>We've had this; what seems to be a common cause for it is that the
>time is not synchronized between the updating client and the server,
>so the file attributes don't get through the server and show up on
>disk until something changes.

Well, here goes an attempt to describe what's happening.  We had this
problem with our systems, and because of it have had to abandon using
NFS for our customer systems (at least for now).

An NFS client maintains a cache of accessed files.  This cache
includes file attributes (such as modification time and
ownership/protection).  If it has a file's data locally, and the
attributes were "recently" read, it will not try to access the server.
It's the definition of "recent" that causes the problems.

The client will periodically re-read the file attributes from the
server.  If it determines that the file has been modified, it will
decide the local data is invalid, and request the file data from the
server.  The problem you are seeing is that the client can take too
long to realize the file has changed.

(The following might be a bit off technically, it's been almost a year
since I investigated all of this... If so, I apologize.  However, I'm
certain any errors are minimal, and it should get the gist across.)

Normally, the file attributes are checked every 3 seconds.  However,
if the system times are skewed, it can take longer.  (I don't remember
exactly which times are being compared, but some it has something to
do with the last time the attributes were read and the times within
those attributes.)

In Ultrix 2.3, at least, these "re-check" times are controlled by the
following four kernel variables:

	Name			Value (in seconds)
	------------------	------------------
	nfsac_regtimeo_min	 3
	nfsac_regtimeo_max	60
	nfsac_dirtimeo_min	30
	nfsac_dirtimeo_max	60

The "*_min" values determine how often it decides to try to look at
the file attributes.  These values don't hold if the time is skewed,
however.  The "*_max" values determine how often they are re-read NO
MATTER WHAT.  If the times are skewed your data should be no more than
60 seconds out of date (and this *is* what you reported seeing).  The
above values are in two sets:  "*dir*" applies to directory files, and
"*reg*" applies to regular files.  I don't know if these same names
are used in other O/S's, but I'd bet Sun is at least close, since
Ultrix changed very little of Sun NFS for 2.2/2.3.

What does all this mean?  Well, you can try changing these kernel
values.  We did, and saw the data-skew problem lessen as expected.
However, you pay a performance penalty, since requests are more likely
to access the server rather than use the cache.  Also, even at "0",
there is up to a one second delay, since the code apparently waits
until the time difference is strictly greater than the given value.
(Without source, I can't say for sure, however.)  Admittedly, I did
not try a "-1", but that might cause problems, especially if they are
unsigned variables.  (Hmm, infinite time/data skew.  How lovely!)

Also, you can supposedly remove all data skew by using the NFS lock
daemons and applying a lock to the file in question.  Since we were
running 2.3 Ultrix at the time, and it did not have NFS locking, I
haven't verified this, nor do I know the details.  Maybe I'll check it
out again since we are gearing up for Ultrix 3.0/3.1 support now.

Note: All of the above deals with the case of keeping data synched
between a client and its server.  If you have multiple clients, and
one client is reading what another client is writing, you have an
additional delay added by the time for the data to propagate from the
writing client to the server.  This is controlled by the sync/update
procedure, and can cause further delays of up to 30 seconds.  (NFS
*might* be a write-through cache, but I don't think so.)  We at least
had the writes occuring on the server, but we were unable to use NFS
for this particular application even then, as we had to have the same
synchronous read/write semantics as for local files.  *Sigh*.

Anyway, hope this helps anyone who has noticed the same problem.
Normally, it should not cause serious problems, especially if you keep
the system times synchronized.  If you aren't expecting it though, it
can be quite frustrating.

-- 
Mike Bryan, Applied Computing Devices, 100 N Campus Dr, Terre Haute IN 47802
Phone: 812/232-6051  FAX: 812/231-5280  Home: 812/232-0815
UUCP: uunet!acd4!mjb  ARPA: mjb%acd4@uunet.uu.net
"Did you make mankind after we made you?" --- XTC, "Dear God"

beepy%commuter@Sun.COM (Brian Pawlowski) (10/07/89)

> >>A user was working on two Suns simultaneously.  He was editing one file
> >>on one Sun, and reading the file using LaTeX on the other.  The file
> >>was NFS mounted on both Suns.  It physically resided on the NFS server,
> >>a Pyramid.  One of the Suns had an out-of-date copy!  It was a minute
> >>old.
> >
> >We've had this; what seems to be a common cause for it is that the
> >time is not synchronized between the updating client and the server,
> >so the file attributes don't get through the server and show up on
> >disk until something changes.
> 
> Well, here goes an attempt to describe what's happening.  We had this
> problem with our systems, and because of it have had to abandon using
> NFS for our customer systems (at least for now).

I'm curious what the application was that you had to abandon using NFS,
and what the systems were. Your explanation was cogent, I'll babble on
along the same lines.

Not to be a pain, but it's not so much a "problem" as a implementation
behaviour. The caching with consistency checks (typically on the order
of every 30 seconds) introduces a window where a client's modifications
to a file are not noticed by other client's for the cache consistency check
(which consists of a NFS GETATTR call to inspect the modified time
of the file to see if the cached file data is still valid).  Unfortunately
these consistency checks are not documented very well, and possibly
differ from implementation to implementation.

>   The problem you are seeing is that the client can take too
> long to realize the file has changed.

There is a tradeoff here between performance and consistency. Actually,
cache check intervals in our current NFS implementation differ for directories
and plain files. Directory timeouts are: 30 - 60 seconds, File timeouts are: 
3 - 60 seconds. In the NFSSRC 4.0 reference port

               acregmin=n    Hold cached attributes for at  least
                             n seconds after file modification.
               acregmax=n    Hold cached attributes for  no  more
                             than  n seconds after file modifica-
                             tion.
               acdirmin=n    Hold cached attributes for at  least
                             n seconds after directory update.
               acdirmax=n    Hold cached attributes for  no  more
                             than   n   seconds  after  directory
                             update.
               actimeo=n     Set min and max  times  for  regular
                             files and directories to n seconds.

               Regular defaults are:
                    fg,retry=10000,timeo=7,retrans=3,port=NFS_PORT,hard,\
                    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

These values are bounded in the kernel between the above values.
The reasoning in being able to set the file timeout lower than 30 seconds
is to allow stronger (more frequent) consistency checks. In an internal
version of NFS, there is also an option to turn off caching on a mount
point.

Caching is important to performance. NFS introduces a window of inconsistency
defined by the timeout period. Other distributed filesystems (Sprite, Andrew)
use call backs to ensure data consistency.

Another option one might consider to ensure consistency of single writer
multiple readers (or multiple writers multiple readers) is to use locking.
If you have a custom app. you might try it. This would force exact consistency
and proper serialization of readers and writers.

> This is controlled by the sync/update
> procedure, and can cause further delays of up to 30 seconds.  (NFS
> *might* be a write-through cache, but I don't think so.)

So, with a 30 second sync update on the writer, and a 30 second default
cache check window on the reader, you get: 60 seconds (bingo).

> 
> Anyway, hope this helps anyone who has noticed the same problem.
> Normally, it should not cause serious problems, especially if you keep
> the system times synchronized.  If you aren't expecting it though, it
> can be quite frustrating.
> 

Hmmmm... yeah I guess it can be a problem - especially if you're not expecting
it. But it is a performance attack for NFS. Cache consistency checking
on a more frequent basis would load both the network and the server
with unnecessary checks for what I believe for most circumstances;
that is synchronized writers and readers is an infrequent case in most
applications. Think about it: I typically work on my own files, in my
own area, and when I share files for program development, I use SCCS
to synchronize access and manage the software. Many files accessed
through NFS are read-only (shared executables). Locking is available
in the case requiring synchronization and tight consistency.

Brian Pawlowski

			Brian Pawlowski <beepy@sun.com> <sun!beepy>
			Sun Microsystems, NFS Development

thurlow@convex.com (Robert Thurlow) (10/07/89)

mjb@acd4.UUCP ( Mike Bryan          ) writes:

>In article <1967@convex.UUCP> thurlow@convex.com (Robert Thurlow) writes:
>>tr@bellcore.com (tom reingold) writes:
>....  If the times are skewed your data should be no more than
>60 seconds out of date (and this *is* what you reported seeing).

Hmmm, okay.  I haven't seen this, but I've heard about stuff more
on the order of ten minutes.  (I'll have to play with that if I
see it, because it would be pretty interesting! :-)  Has anyone
seen something this pathological?

Rob T
--
Rob Thurlow - Expatriate Canadian                      thurlow@convex.com
"From the heart of 'The Friendship State'"

guy@auspex.auspex.com (Guy Harris) (10/08/89)

 >In the NFSSRC 4.0 reference port
 >
 >               acregmin=n    Hold cached attributes for at  least
 >                             n seconds after file modification.
 >               acregmax=n    Hold cached attributes for  no  more
 >                             than  n seconds after file modifica-
 >                             tion.

...

You forgot to mention that those are per-mount options from "/etc/fstab"
in SunOS 4.x and NFSSRC4.0, as opposed to per-system options to tweak
by patching kernel variables.

 >In an internal version of NFS, there is also an option to turn off caching
 >on a mount point.

If you're referring to "noac", it's in SunOS 4.0.3, and even documented
in MOUNT(8):

               noac          Suppress attribute caching.

Some code for it is in NFSSRC4.0 as well.

russ@Alliant.COM (Russell McFatter) (10/09/89)

I also experienced this problem in a different manifestation...  While
using a diskless workstation, I edited a file (on a NFS partition), and
the program I was working on would see the old version of the file for
several minutes.  Turns out that this behavior was a bug in SunOS 4.0.1
(NFS writes are not properly flushed to disk on the server) and it was
fixed in SunOS 4.0.3.  Check out the 4.0.3 release notes for all the
details.

--- Russ McFatter
    russ@alliant.alliant.COM

mjb@acd4.UUCP ( Mike Bryan ) (10/10/89)

In article <125974@sun.Eng.Sun.COM> beepy%commuter@Sun.COM (Brian Pawlowski) writes:
>
> [Info about data inconsistencies on NFS clients caused by attribute
>  caching deleted.]
>
>I'm curious what the application was that you had to abandon using NFS,
>and what the systems were.
>
> ...
>
>Not to be a pain, but it's not so much a "problem" as a implementation
>behaviour.

Hmmm, hate to disagree with you, but it is a problem.  Agreed, it is
caused by implementation behaviour, but I think that that behaviour is
wrong in some cases.  It works fine 99.9% of the time, but in some
cases, full UNIX semantics are just plain necessary.  I like hearing
about the availability of turning off attribute cacheing at a mount
point.  Now I just have to wait 32.7 years for DEC to absorb that code
into Ultrix.  :-)

As for the exact reason we couldn't use NFS, I'll try to give a brief
non-proprietary description.  There are two processes, "A" and "B".
Process A is receiving data, and putting it into a file.  On some
types of data, it will send a message to process B, telling it there
is data available.  Process B then reads the data from the file.  If
everything is on a single machine, there is no problem.  The problem
is when they are on different machines (process A on the NFS server,
process B on the NFS client).  Then process B gets the message, and
goes to read the file.  If it had been read "recently", as is often
the case, the client will not have the updated data.

Since NFS could not be relied on, our solution was to pass the
necessary data with the message.  Whether or not this is a preferred
method is irrelevent.  If NFS had supported full UNIX semantics, we
could have expanded our system without a code change.  As it was, we
had to change our programs to handle the network case.  As I said
before, NFS locking was not available.  Perhaps the "new" feature of
turning off the attribute cache could be used to our benifit here...
it sure would be nice.

As a side note, we also investigated a product called FreedomNet, from
RTI.  It did support full UNIX semantics, including reading from
character and block devices.  It is a "stateful" architecture, as
opposed to the "stateless" NFS.  FreedomNet also supplied a lot of
other neat features for distributed systems.  Its drawbacks were a
slight performance degradation, a few minor bugs in the version we
had, and the price.  We finally decided it was too expensive for our
purposes, and we don't use it either.  I personally feel it is a nice
package however, and if a site can afford it (a few thousand dollars),
they should investigate it.

>Think about it: I typically work on my own files, in my
>own area, and when I share files for program development, I use SCCS
>to synchronize access and manage the software. Many files accessed
>through NFS are read-only (shared executables). Locking is available
>in the case requiring synchronization and tight consistency.

Well, I don't see how SCCS can help you get around the NFS problem.
Also, locking is not always possible.  For starters, Ultrix 2.3 did
not support it.  Not to mention the fact (but I will anyway) that you
cannot always use locking, especially when using pre-supplied system
utilities (such as formatting a file and then reading/printing the
formatted file on another client).  The ability to turn off cacheing
at a mount point is a good idea, however, and might be just what we
need.

On a more positive note, we *do* use NFS internally very heavily for
our development systems.  We have a network of five VAXen, each of
which is an NFS server to the other four.  Through a judicious
sprinkling of symbolic links, a user can log into any of the machines
and still see the same files.  NFS works great for us in this respect.
(Except for access to character/block devices... oh well :-( )

-- 
Mike Bryan, Applied Computing Devices, 100 N Campus Dr, Terre Haute IN 47802
Phone: 812/232-6051  FAX: 812/231-5280  Home: 812/232-0815
UUCP: uunet!acd4!mjb  ARPA: mjb%acd4@uunet.uu.net
"Did you make mankind after we made you?" --- XTC, "Dear God"

mogul@decwrl.dec.com (Jeffrey Mogul) (10/17/89)

In article <1989Oct9.191726.23428@acd4.UUCP> mjb@acd4.UUCP ( Mike Bryan          ) writes:
>As for the exact reason we couldn't use NFS, I'll try to give a brief
>non-proprietary description.  There are two processes, "A" and "B".
>Process A is receiving data, and putting it into a file.  On some
>types of data, it will send a message to process B, telling it there
>is data available.  Process B then reads the data from the file.  If
>everything is on a single machine, there is no problem.  The problem
>is when they are on different machines (process A on the NFS server,
>process B on the NFS client).  Then process B gets the message, and
>goes to read the file.  If it had been read "recently", as is often
>the case, the client will not have the updated data.

On fundamental philosophical grounds, I'll basically agree with
you that NFS doesn't provide the appropriate consistency guarantees
here ... people interested in this might want to read the paper
by V. Srinivasan and myself at the upcoming SOSP.  However, I think
your problem has a simple, if not entirely efficient, solution.

Since you seem to have control over the source of the program used
by Process B, I think if you have it do an "fstat()" of the file
after receiving the synchronization message and before calling "read()"
on the file, this will force the NFS client code on machine B to
check with the server.  This is not, as far as I can tell, an official
feature of the NFS specification, but is rather the way that NFS is
actually implemented (at least in the earlier reference ports).

In your application, this should not be extremely inefficient, since
you will always be paying the latency of writing the data to the server
disk once per "transaction", and so the extra cost (latency) of doing
the "getattr()" RPC (the underlying implementation of "fstat()") should
be nearly negligible.  I don't think it's a good general solution,
especially given that one doesn't always have source code for the
application.

-Jeff
P.S.: I'll make sure that someone in the Ultrix group at Digital
knows that you want the "noac" option when mounting NFS files; no
promises, of course (I'm in research).

cs@Eng.Sun.COM (Carl Smith) (10/18/89)

In article <205@jove.dec.com>, mogul@decwrl.dec.com (Jeffrey Mogul) writes:
...
> Since you seem to have control over the source of the program used
> by Process B, I think if you have it do an "fstat()" of the file
> after receiving the synchronization message and before calling "read()"
> on the file, this will force the NFS client code on machine B to
> check with the server.  This is not, as far as I can tell, an official
> feature of the NFS specification, but is rather the way that NFS is
> actually implemented (at least in the earlier reference ports).

	Oh, dear.  I do hope no one will begin to rely on characteristics
of some NFS implementations to guarantee the correct behavior of their
applications.  NFS runs on too many operating systems to make that a
pleasant experience.
	The ``noac'' mount option that Sun and others use is a bit heavy-
handed for this application.  It seems to me all that's needed is a cache
flush fcntl.

			Carl

mogul@decwrl.dec.com (Jeffrey Mogul) (10/18/89)

In article <126470@sun.Eng.Sun.COM> cs@Eng.Sun.COM (Carl Smith) writes:
  [copy of my suggestion to use fstat() to synch the reader-side cache]
>
>	Oh, dear.  I do hope no one will begin to rely on characteristics
>of some NFS implementations to guarantee the correct behavior of their
>applications.  NFS runs on too many operating systems to make that a
>pleasant experience.

Actually, I tried to convey that point ... the NFS spec certainly doesn't
require that this works, and I take no responsibility for it working.
On the other hand, all the suggestions about mucking with timers have
the same feature ... the NFS spec doesn't even bound the possible lifetime
of the attributes cache timers!  In general, the consistency properties
of NFS are "specified" by the behaviour of the reference port, not by
any of the language in the spec.

>	The ``noac'' mount option that Sun and others use is a bit heavy-
>handed for this application.  It seems to me all that's needed is a cache
>flush fcntl.

Just more patches on top of kludges.  There's no intrinsic reason why
a network file system can't guarantee consistency and provide reasonable
performance at the same time; it's just that NFS has gone too far in
the "stateless" direction to avoid compromising one for the other.

-Jeff