kyle@xanth.UUCP (06/29/87)
Currently if a user cd's to a directory on an NFS filesystem that is mounted on a machine which is down, the user and his terminal are hung. The shell cannot be killed because the it is in 'disk' wait and thus sleeping at too high a priority to receive signals. Thus we have a process that is blocked potentially for hours in what is supposed to be a short term wait state. Why can't the relevant system calls that can be blocked by an overloaded or dead NFS server return -1, with errno == EWOULDBLOCK? kyle jones <kyle@xanth.cs.odu.edu> old dominion university, norfolk, va
mb@ttidca.TTI.COM (Michael Bloom) (07/01/87)
In article <1446@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes: >Why can't the relevant system calls that can be blocked by an overloaded or >dead NFS server return -1, with errno == EWOULDBLOCK? > >kyle jones <kyle@xanth.cs.odu.edu> old dominion university, norfolk, va We do something like this at TTI. In the retry loop in rfscall (which is right after the clget call in rfscall), we add a test for a server being down. If it is, we do the following: { /* MB, ER - Citicorp/TTI */ status = rpcerr.re_status = RPC_SYSTEMERROR; rpcerr.re_errno = EHOSTDOWN; goto splitfast; } The splitfast label is added just before the mi_printed = 0 and clfree call near the bottom of the routine The way we test for a server being down is very site dependant, allowing for a maximum of 4 class C networks. It uses a table, indexed by network and host, that a daemon we call nfsmonitor modifies by writing kmem. - Michael Bloom Citicorp/TTI (213 450-9111x3097)