rayan@cs.toronto.edu (Rayan Zachariassen) (09/10/89)
I have an application which sits as a daemon with a bunch of files open all the time, to avoid reopening them 3 times a second when things get busy. One of the files it keeps open is the active file for news on an NFS partition. Once in a blue moon, non-repeatable but occurring in clusters, an lseek() followed by a read() (actually rewind() and fgets()) on the open file descriptor (pointer) will fail with a stale NFS file handle error. This happens inside stdio, but this is the error reported by trace on the active daemon. To summarize: NFS client (SunOS 4.0.3) has daemon keeping file open for read on NFS server (SunOS 3.5), said file is frequently modified locally on the server, perhaps even recreated (i.e. unlinked and new independent file by same name created) by the server. Question: Why is the stale NFS handle error returned? Any way to compensate within the daemon apart from doing a close()/open() sequence? Any way to tell that a read() will return ESTALE without actually doing the read() ? Hypothesis: NFS doesn't deal as expected ("unix semantics") with disappearing file links. Any info would be appreciated. rayan
chuq@Apple.COM (Chuq Von Rospach) (09/10/89)
>Question: >Why is the stale NFS handle error returned? Any way to compensate within >the daemon apart from doing a close()/open() sequence? Any way to tell >that a read() will return ESTALE without actually doing the read() ? A stale file handle is reported when the Fhandle to the file in question becomes out of date. Every inode on an NFS-server has a generation count that is modified whenever that inode is re-used (or whenever fsirand is run, like when you run newfs). What's probably happening to you, since this is a news partition, is that expire is being run. As part of expire, a new active file is created, then there's a renaming of active to oactive and nactive to active. Your daemon now has an fhandle pointing to what is now oactive. Since NFS is stateless, the fhandle has a generation count to avoid the following scenario: o client opens server:/tmp/foo o server deletes /tmp/foo o server creates new file /tmp/financial_data (which happens to re-use the same inode) o client attempts to use the fhandle given it by /tmp/foo. Without the generation count, since the fhandle would point to an inode with a real file attached, the access would succeed, even though the file you opened has nothing to do with the data you are now playing with. Generally, you'll see stale fhandles under two circumstances: o the filesystem you have mounted gets restored from backup and you haven't umounted it. o the file you have open has been deleted and re-created. the stale message is a way of telling you that something has happened to the file since you opened it that makes the fhandle you have invalid, so you don't continue using an fhandle that may well be pointing to incorrect data. -- Chuq Von Rospach <+> Editor,OtherRealms <+> Member SFWA/ASFA chuq@apple.com <+> CI$: 73317,635 <+> AppleLink: CHUQ [This is myself speaking. No company can control my thoughts.] Perhaps I should say Dr. *Von* Rospach, Dr. Rospach? (Gasp)