dan@watson.bbn.com (Dan Franklin) (04/04/89)
We were having severe NFS problems involving our (only) Sun 4/110, running SunOS 4.0.1. The symptom was that a process on the Sun-4 attempting to copy (say, with cp) a "large" file (greater than 2k bytes or so) between the Sun-4 and any of several others, including a Sun-3/160 (SunOS 3.4), a MicroVAX (Ultrix 2.3), and our diskless Sun-3/50 machines (SunOS 3.4), would almost always hang. Generally we got an accompanying "NFS server <hostname> not responding still trying" message. It usually didn't return at all, until a long time after it was interrupted. While a cp was hung, all of the machines involved in the cp operation continue to respond to other commands, including other NFS commands. However, on the initiating machine, you couldn't access the directory containing the file being cp'd. But you could look at that file on the serving machine, as well as on other machines besides the Sun-4 that have that file mounted. Other network services, including FTP and rlogin, worked perfectly. These symptoms seemed to be very different from those discussed in other Sun-4 hanging situations. No nfsd ever ended up in a permanent "D" wait state on any of the machines, including the Sun-4. Unrelated NFS activities on the two machines in question worked perfectly. The answer turns out to be--buffer size! The Sun-4's big buffers just couldn't be received by the Sun-3, MicroVAX, or anything else on our network. But when we remounted those other machines' filesystems on our Sun-4 with "-o wsize=512", the problem went away. This suggestion came from someone here at BBN (Matt Landau), who had had a similar problem a long time ago between Sun-3 and Sun-2 machines; at that time the Sun Hotline suggested this workaround. So history repeats itself. Dan Franklin