[comp.unix.wizards] NFS and EWOULDBLOCK

kyle@xanth.UUCP (06/29/87)

Currently if a user cd's to a directory on an NFS filesystem that is mounted
on a machine which is down, the user and his terminal are hung.  The shell
cannot be killed because the it is in 'disk' wait and thus sleeping at too
high a priority to receive signals.  Thus we have a process that is blocked
potentially for hours in what is supposed to be a short term wait state.

Why can't the relevant system calls that can be blocked by an overloaded or
dead NFS server return -1, with errno == EWOULDBLOCK?

kyle jones   <kyle@xanth.cs.odu.edu>   old dominion university, norfolk, va

mb@ttidca.TTI.COM (Michael Bloom) (07/01/87)

In article <1446@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:

>Why can't the relevant system calls that can be blocked by an overloaded or
>dead NFS server return -1, with errno == EWOULDBLOCK?
>
>kyle jones   <kyle@xanth.cs.odu.edu>   old dominion university, norfolk, va

We do something like this at TTI. In the retry loop in rfscall (which
is right after the clget call in rfscall), we add a test for a server  being down.
If it is, we do the following:

		  {		/* MB, ER - Citicorp/TTI  */
		    status = rpcerr.re_status = RPC_SYSTEMERROR;
		    rpcerr.re_errno = EHOSTDOWN;
		    goto splitfast;
		  }

The splitfast label is added just before the mi_printed = 0 and clfree
call near the bottom of the routine

The way we test for a server being down is very site dependant,
allowing for a maximum of 4 class C networks. It uses a table, indexed
by network and host, that a daemon we call nfsmonitor modifies by
writing kmem.

- Michael Bloom
  Citicorp/TTI (213 450-9111x3097)