(Doug Alan) (06/17/88)
I just installed a patch to NFS that allows you to mount the entire
filesystem of a remote computer, rather than having to mount all of
its individual disk partitions. The patch came from someone at BRL,
but the file I have, does not say who -- it only says that his first
name is Doug. Unfortunately, I have already noticed a bug or two.
The date of the patch I have is 26 Jan 1987.
The most prominent bug is as follows: Let's say the NFS server is
called "server" and you are using a client machine. 'server' has
several disk partions: /a, /b, and /c. On the client machine, you
have mounted server:/ on /@/server. You now cd to /@/server/c/foodir,
and do a 'pwd'. 'pwd' should tell you that you are in
/@/server/c/foodir. But instead of doing that, it says that you are
in /foodir. If you cd to /@/server/c/foodir/subdir, then pwd says
that you are in /foodir/subdir. In contrast, if you cd to
/@/server/etc, pwd tells you that you are in /@/server/etc -- which,
indeed, you are.
I have also just noticed another problem since installing this patch.
I cannot say whether or not this bug has always been there, or whether
it appeared upon installing this patch. This problem is intermittent
and I can not reproduce it on demand. I was looking at a text file
that was on the remote machine. Unfortunately, there appeared to be a
bunch of nulls on the end of the file that weren't really there. On
this particular file, the problem was reproducable for a while, but
eventually it stopped happening.
So, does anyone have a fix for the first problem mentioned above, or
know someone who does? And does anyone know whether or not the second
problem is caused by the patch and how I should fix it?
The computers involved are VAXstation II's running 4.3BSD+NFS (from U.
of Wisc.).
|>oug /\lan
"Once more at dawn I drive
the weary cattle of my soul to the mud hole of your eyes"
gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/18/88)
In article <9514@eddie.MIT.EDU> (Doug Alan) writes: >The patch came from someone at BRL, but the file I have, does not say who -- >it only says that his first name is Doug. Probably Doug Kingston, who no longer works for BRL, although mail to DPK@BRL.MIL might still reach him.
mouse@mcgill-vision.UUCP (der Mouse) (06/26/88)
In article <9514@eddie.MIT.EDU>, (Doug Alan) writes: > I just installed a patch to NFS that allows you to mount the entire > filesystem of a remote computer, rather than having to mount all of > its individual disk partitions. > The most prominent bug is as follows: Let's say the NFS server is > called "server" and you are using a client machine. [Server has > local disk partition /c. server:/ is mounted on /@/server. we cd to > /@/server/c/foodir, pwd says /foodir instead.] When I was writing my NFS server, I ran into similar problems. This sounds very much as though the inumbers in the returned structures are the real disk inumbers - which is wrong. This leads to the server seeing the rather unpleasant situation of two distinct files having the same (dev,inum) pair. My solution was to stripe the space of available inumbers, based on the number of local disk partitions on the server. However, given that you're using a patch to an existing NFS implementation, you don't have the freedom to do this. I think you're pretty much out of luck, unless you want to dive rather deeply into the NFS implementation on the server. Why does this cause the above anomoly with pwd? Because getwd (in pwd) reads .. and finds the inumber for .; this gives it the foodir part. Then it looks at ../.. and notices it has the same inumber as .., because they're both roots of filesystems on the server (server's / and /c are both filesystem roots on the server, so they're both inode 2 - the device number gets lost by the time they reach the client). Normally, the only time foo/ and foo/.. have the same (dev,inum) is when foo is /, so getwd assumes it's reached /. Try mounting the disk on the server somewhere else, instead of /c. Make it somewhere at least two levels down in the hierarchy: /etc/bardir, say. Then try cd /@/server/etc/bardir/foodir and see what pwd has to say. This time, you see, getwd() will not see two consecutive directories with the same inumber as it winds its way up through .., ../.., ../../.., etc. By now some of you may be wondering how come files with the same inumber don't get confused with one another. This is because files are accessed by file handles, not inumber. And the file handles, presumably, are different. (If they weren't, such files *would* get confused.) > I have also just noticed another problem since installing this patch. > I cannot say whether or not this bug has always been there, or > whether it appeared upon installing this patch. This problem is > intermittent and I can not reproduce it on demand. I was looking at > a text file that was on the remote machine. Unfortunately, there > appeared to be a bunch of nulls on the end of the file that weren't > really there. On this particular file, the problem was reproducable > for a while, but eventually it stopped happening. The problem persisted for as long as the block was present in the client's buffer cache, I feel sure. The question then is "how did it get there?". Are the filesystem mounts hard or soft (ie, do timeouts cause the operation to fail or to retry indefinitely)? With soft mounts, the client implementation may wind up producing a bufferful of nulls when it shouldn't. If this is what's happening, it's a bug. I could also see this being due to a race condition: the client tries to read the last block of the file, based on its idea of the size of the file. However, in between its getting the size of the file and its attempt to read the last block, someone else (another client, or a process on the server) truncates the file to a shorter size. The result may well be a bufferful of nulls. der Mouse uucp: mouse@mcgill-vision.uucp arpa: