lamy@ai.utoronto.ca (Jean-Francois Lamy) (01/04/89)
The problem of totally unrelated processes hanging on disk wait when a server about which they could not care less goes down is a well-known NFS curse. Under 3.x one should make sure that no directory ever contains mount points from two different machines. Under 4.0, getwd sometimes feels the urge to scan /etc/mtab and perform lstats on whatever mount points are specified there. If the remote server is down, presto, one hung process. So under 4.0 one should consider using symbolic links to refer to the actual mount points. Consider the following line from /etc/fstab, neat.ai:/ai/cdr /ai/cdr nfs rw,hard,intr 0 0 If /ai/cdr on the local machine is a symlink to /nfs/neat.ai/ai/cdr, mount will follow the symlink when mounting the file system, getwd won't try to follow the link and won't access the dead machine. Tada! Things like "df" still hang under this set-up, no hope for a quick fix there. [[ This also happens under 3.x, due to getwd, but not because it scans mtab. Basically it gets stuck while stat-ing files in a directory where NFS mount points occur, such as the root ('/'). For a complete description of the problem where non-responsive NFS partitions hang getwd calls, see my article in v6n143 (and watch out for the typo). I *think* that one can prevent an NFS server from hanging a client if the client mounts the partition soft,noquota. --wnl ]] [enter soapbox mode] Sometimes one wonders if people at Sun ever mount file systems from many servers or if their servers ever drop dead... The automounter is of little use if the remote server dies while the partition is mounted. The only thing the automounter does for this problem is decrease the window where a file system is mounted. We have seen automounter daemons get stuck on their own and have stopped using it. Jean-Francois Lamy lamy@ai.utoronto.ca, uunet!ai.utoronto.ca!lamy AI Group, Department of Computer Science, University of Toronto, Canada M5S 1A4