rmilner@zia.aoc.nrao.edu (Ruth Milner) (08/08/90)
Most of you are probably familiar with the old problem of innocent bystanders getting hung when an NFS server goes down, even if they don't refer to it in any way. The problem is due to stat(2) checking the entries in /, and the workaround is to have the actual mount points for the NFS filesystems in a subdirectory a couple of levels down rather than actually in /. There is a twist to this, however, that just bit me. If you have symbolic links in / which point in any semi-direct fashion to the NFS mount point, then the actual directory entry in / for the symbolic link must come *after* the directory entries for any local filesystems being referenced. For example, in /etc/fstab I have cholla:/mnt /nfs/cholla/mnt nfs rw,bg,intr,noquota 0 0 and in / lrwxrwxrwx 1 root 11 Aug 7 11:23 /cholla -> /nfs/cholla/ So that users can refer to /cholla/mnt without worrying about the extra directory. Note that even though this symbolic link doesn't point to the mount point itself, everyone was still getting hung whenever they tried to do such things as pwd, cd, read mail, etc. etc. The reason was because the slot in the directory file "/" which was occupied by the symbolic link "cholla" was ahead of the slot occupied by the entry for the local filesystem containing user files (called "/u"). Note that a normal "ls" will only show you the alphabetical order. Using the "-f" option, however, you can see the real order within /, which is the order in which stat(2) searches. In order to shuffle these around, you must rm the offending symbolic link, create enough junk files (touch(1) suffices for this) to occupy any free slots ahead of all your local filesystems, and then recreate the link. At that point you can then get rid of your junk files. From then on, anyone who really isn't referring to the dead NFS server in any way (i.e. no PATH entries, etc.) will be able to work as usual. If anyone out there has attemped the normal workaround without success, you might want to check whether you are running into this charming little quirk. BTW, this isn't fixed in 4.1. One of our users on a diskless client running 4.1 has just reported exactly the same problem, even though the dead NFS server is not the one he's dependent on, and not one he's trying to access. Ruth Milner Systems Manager NRAO/VLA Socorro NM rmilner@zia.aoc.nrao.edu