[comp.sys.sun] Sun-4 severe NFS hanging problem--and its solution

dan@watson.bbn.com (Dan Franklin) (04/04/89)
We were having severe NFS problems involving our (only) Sun 4/110, running
SunOS 4.0.1. The symptom was that a process on the Sun-4 attempting to
copy (say, with cp) a "large" file (greater than 2k bytes or so) between
the Sun-4 and any of several others, including a Sun-3/160 (SunOS 3.4), a
MicroVAX (Ultrix 2.3), and our diskless Sun-3/50 machines (SunOS 3.4),
would almost always hang.  Generally we got an accompanying "NFS server
<hostname> not responding still trying" message.  It usually didn't return
at all, until a long time after it was interrupted.

While a cp was hung, all of the machines involved in the cp operation
continue to respond to other commands, including other NFS commands.
However, on the initiating machine, you couldn't access the directory
containing the file being cp'd.  But you could look at that file on the
serving machine, as well as on other machines besides the Sun-4 that have
that file mounted.  Other network services, including FTP and rlogin,
worked perfectly.  These symptoms seemed to be very different from those
discussed in other Sun-4 hanging situations.  No nfsd ever ended up in a
permanent "D" wait state on any of the machines, including the Sun-4.
Unrelated NFS activities on the two machines in question worked perfectly.

The answer turns out to be--buffer size!  The Sun-4's big buffers just
couldn't be received by the Sun-3, MicroVAX, or anything else on our
network.  But when we remounted those other machines' filesystems on our
Sun-4 with "-o wsize=512", the problem went away.  This suggestion came
from someone here at BBN (Matt Landau), who had had a similar problem a
long time ago between Sun-3 and Sun-2 machines; at that time the Sun
Hotline suggested this workaround.  So history repeats itself.

	Dan Franklin