[comp.sys.sun] PC-NFS 2.0 vs. SunOS 4.0.1?

jdp@vine.vine.com (John D. Polstra) (04/26/89)

I apologize in advance for the length of this posting.  This is a rather
complicated question.

I have a Sun 3/180 which is used primarily as a stand-alone system, but
which also is a file server for a Compaq Deskpro 386/20 running PC-NFS
version 2.0 under MS-DOS 3.31.  The combination was working fine until I
upgraded to SunOS 4.0.1 from 3.5.  Now I am seeing intermittant problems
which really have me baffled.

The basic symptom I am seeing is that when the PC tries to create a file
on an NFS-mounted file system, the create occasionally fails.  The error
message (from DOS) refers to permission problems.  For example, the PC
configuration which I always use is like this:

	net name means *
	net use d: \\polstra\usr\means

and the root of the mounted directory tree on the server looks like this:

	drwxr-sr-x 19 means    means         512 Apr  5 20:47 /usr/means

("polstra" is the server, "means" is the PC, and "means" is also the
username used by the PC.)  Now, if I say "copy c:\autoexec.bat d:" on the
PC, sometimes the command fails and prints an error message about
permission denied.  If I repeat the command many times in a row, it will
succeed some of the times and fail other times.  It doesn't make any
difference whether "d:\autoexec.bat" already exists on the server before
the command is run.

The failure happens most frequently right after I reboot the PC.  Then
failures become more and more rare until after a few minutes I can't make
it fail any more at all.

I've done a lot of looking at the messages coming across the Ethernet,
using "etherfind".  Here is what's really strange:  the "RFS_CREATE" calls
which succeed are virtually identical, byte-for-byte, with those that
fail.  The only differences are in what seem to be sequence numbers and
checksums.  Here is the output of "etherfind -v -r -x -u -t -i ie0
-between means polstra" for a create which failed:

 0.06 UDP from means.2049 to polstra.2049  160 bytes
 RPC Call nfs  RFS_CREATE  V2
 08 00 20 01 6f 02 02 60 8c 45 29 44 08 00 45 00  .. .o..`.E)D..E.
 00 b4 96 00 00 00 0f 11 05 22 c0 09 c8 02 c0 09  ........."......
 c8 01 08 01 08 01 00 a0 00 00 00 00 00 96 00 00  ................
 00 00 00 00 00 02 00 01 86 a3 00 00 00 02 00 00  ................
 00 09 00 00 00 01 00 00 00 20 00 00 00 00 00 00  ......... ......
 00 05 6d 65 61 6e 73 00 00 00 00 00 00 75 00 00  ..means......u..
 00 75 00 00 00 01 00 00 00 75 00 00 00 00 00 00  .u.......u......
 00 00 00 00 03 06 00 00 00 01 00 08 00 01 48 05  ..............H.
 5c 18 72 c6 00 00 00 08 00 00 00 02 5b 62 f6 6b  \.r.........[b.k
 00 00 00 00 00 0c 61 75 74 6f 65 78 65 63 2e 62  ......autoexec.b
 61 74 00 00 81 f8 00 00 00 75 00 00 00 75 00 00  at.......u...u..
 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
 ff ff 00 00 00 00 00 00 00 00 00 00              ............

(The characters at the right-hand side of the hex dump were put there by
a little Icon program that I wrote.)  Now here is the output for the
same command when it succeeded:

 5.08 UDP from means.2049 to polstra.2049  160 bytes
 RPC Call nfs  RFS_CREATE  V2
 08 00 20 01 6f 02 02 60 8c 45 29 44 08 00 45 00  .. .o..`.E)D..E.
 00 b4 99 00 00 00 0f 11 02 22 c0 09 c8 02 c0 09  ........."......
 c8 01 08 01 08 01 00 a0 00 00 00 00 00 99 00 00  ................
 00 00 00 00 00 02 00 01 86 a3 00 00 00 02 00 00  ................
 00 09 00 00 00 01 00 00 00 20 00 00 00 00 00 00  ......... ......
 00 05 6d 65 61 6e 73 00 00 00 00 00 00 75 00 00  ..means......u..
 00 75 00 00 00 01 00 00 00 75 00 00 00 00 00 00  .u.......u......
 00 00 00 00 03 06 00 00 00 01 00 08 00 01 48 05  ..............H.
 5c 18 72 c6 00 00 00 08 00 00 00 02 5b 62 f6 6b  \.r.........[b.k
 00 00 00 00 00 0c 61 75 74 6f 65 78 65 63 2e 62  ......autoexec.b
 61 74 00 00 81 f8 00 00 00 75 00 00 00 75 00 00  at.......u...u..
 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
 ff ff 00 00 00 00 00 00 00 00 00 00              ............

Only 3 bytes of the hex differ between the two cases; they are in the
second and third lines of hex.  In two places, a "96" changes to "99"; but
this is surely a sequence number, as it increments on every packet.  In
another place, an "05" changes to "02".  But this byte decrements on every
packet, so perhaps it is a checksum or another sequence number.  At any
rate, the actual RPC part of the packet (beginning at byte 42, counting
from 0) looks perfectly fine (to me) in each case.

Since the packets seem to be reaching the server without any problems, I
thought that perhaps the responses to the PC were getting garbled, causing
the PC to think that an error had occurred when in reality the create had
worked.  This is not the case, however; when the create fails, the file is
really not created.

I wrote a little program on the PC (using uSoft C 5.0) which simply does a
creat("d:\autoexec.bat", 0666) inside a loop.  It *never* fails. (!)

I tried running the Compaq at a slower speed.  That doesn't make any
difference.

I'm not running "secure NFS", and "rpc.mountd" is running with the "-n"
flag, as it apparently should.  I've tried changing the number of "nfsd"
daemons down to 1, and that doesn't make any difference.  I'm not running
the Yellow Pages.  No symbolic links are involved.

There aren't any other machines on the Ethernet, so I can't find out
whether this is a generic NFS problem or whether it's just something to do
with PC-NFS.

If anybody has a clue about this, I'd sure like to hear about it.  My news
feed is not reliable at the moment; please send me Email and I will
summarize any information that you send.

Thanks in advance,
John Polstra			(206) 932-6482
jdp@polstra.uucp	or	...!uunet!practic!polstra!jdp

[[ I really think that messages like this are better off being sent to the
nfs list, where the primary topic of discussion is PC-NFS.  The list
address is "nfs@tmc.edu" and requests to be added should be sent to
"nfs-request@tmc.edu".  --wnl ]]