jdp@vine.vine.com (John D. Polstra) (04/26/89)
I apologize in advance for the length of this posting. This is a rather complicated question. I have a Sun 3/180 which is used primarily as a stand-alone system, but which also is a file server for a Compaq Deskpro 386/20 running PC-NFS version 2.0 under MS-DOS 3.31. The combination was working fine until I upgraded to SunOS 4.0.1 from 3.5. Now I am seeing intermittant problems which really have me baffled. The basic symptom I am seeing is that when the PC tries to create a file on an NFS-mounted file system, the create occasionally fails. The error message (from DOS) refers to permission problems. For example, the PC configuration which I always use is like this: net name means * net use d: \\polstra\usr\means and the root of the mounted directory tree on the server looks like this: drwxr-sr-x 19 means means 512 Apr 5 20:47 /usr/means ("polstra" is the server, "means" is the PC, and "means" is also the username used by the PC.) Now, if I say "copy c:\autoexec.bat d:" on the PC, sometimes the command fails and prints an error message about permission denied. If I repeat the command many times in a row, it will succeed some of the times and fail other times. It doesn't make any difference whether "d:\autoexec.bat" already exists on the server before the command is run. The failure happens most frequently right after I reboot the PC. Then failures become more and more rare until after a few minutes I can't make it fail any more at all. I've done a lot of looking at the messages coming across the Ethernet, using "etherfind". Here is what's really strange: the "RFS_CREATE" calls which succeed are virtually identical, byte-for-byte, with those that fail. The only differences are in what seem to be sequence numbers and checksums. Here is the output of "etherfind -v -r -x -u -t -i ie0 -between means polstra" for a create which failed: 0.06 UDP from means.2049 to polstra.2049 160 bytes RPC Call nfs RFS_CREATE V2 08 00 20 01 6f 02 02 60 8c 45 29 44 08 00 45 00 .. .o..`.E)D..E. 00 b4 96 00 00 00 0f 11 05 22 c0 09 c8 02 c0 09 ........."...... c8 01 08 01 08 01 00 a0 00 00 00 00 00 96 00 00 ................ 00 00 00 00 00 02 00 01 86 a3 00 00 00 02 00 00 ................ 00 09 00 00 00 01 00 00 00 20 00 00 00 00 00 00 ......... ...... 00 05 6d 65 61 6e 73 00 00 00 00 00 00 75 00 00 ..means......u.. 00 75 00 00 00 01 00 00 00 75 00 00 00 00 00 00 .u.......u...... 00 00 00 00 03 06 00 00 00 01 00 08 00 01 48 05 ..............H. 5c 18 72 c6 00 00 00 08 00 00 00 02 5b 62 f6 6b \.r.........[b.k 00 00 00 00 00 0c 61 75 74 6f 65 78 65 63 2e 62 ......autoexec.b 61 74 00 00 81 f8 00 00 00 75 00 00 00 75 00 00 at.......u...u.. 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ ff ff 00 00 00 00 00 00 00 00 00 00 ............ (The characters at the right-hand side of the hex dump were put there by a little Icon program that I wrote.) Now here is the output for the same command when it succeeded: 5.08 UDP from means.2049 to polstra.2049 160 bytes RPC Call nfs RFS_CREATE V2 08 00 20 01 6f 02 02 60 8c 45 29 44 08 00 45 00 .. .o..`.E)D..E. 00 b4 99 00 00 00 0f 11 02 22 c0 09 c8 02 c0 09 ........."...... c8 01 08 01 08 01 00 a0 00 00 00 00 00 99 00 00 ................ 00 00 00 00 00 02 00 01 86 a3 00 00 00 02 00 00 ................ 00 09 00 00 00 01 00 00 00 20 00 00 00 00 00 00 ......... ...... 00 05 6d 65 61 6e 73 00 00 00 00 00 00 75 00 00 ..means......u.. 00 75 00 00 00 01 00 00 00 75 00 00 00 00 00 00 .u.......u...... 00 00 00 00 03 06 00 00 00 01 00 08 00 01 48 05 ..............H. 5c 18 72 c6 00 00 00 08 00 00 00 02 5b 62 f6 6b \.r.........[b.k 00 00 00 00 00 0c 61 75 74 6f 65 78 65 63 2e 62 ......autoexec.b 61 74 00 00 81 f8 00 00 00 75 00 00 00 75 00 00 at.......u...u.. 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ ff ff 00 00 00 00 00 00 00 00 00 00 ............ Only 3 bytes of the hex differ between the two cases; they are in the second and third lines of hex. In two places, a "96" changes to "99"; but this is surely a sequence number, as it increments on every packet. In another place, an "05" changes to "02". But this byte decrements on every packet, so perhaps it is a checksum or another sequence number. At any rate, the actual RPC part of the packet (beginning at byte 42, counting from 0) looks perfectly fine (to me) in each case. Since the packets seem to be reaching the server without any problems, I thought that perhaps the responses to the PC were getting garbled, causing the PC to think that an error had occurred when in reality the create had worked. This is not the case, however; when the create fails, the file is really not created. I wrote a little program on the PC (using uSoft C 5.0) which simply does a creat("d:\autoexec.bat", 0666) inside a loop. It *never* fails. (!) I tried running the Compaq at a slower speed. That doesn't make any difference. I'm not running "secure NFS", and "rpc.mountd" is running with the "-n" flag, as it apparently should. I've tried changing the number of "nfsd" daemons down to 1, and that doesn't make any difference. I'm not running the Yellow Pages. No symbolic links are involved. There aren't any other machines on the Ethernet, so I can't find out whether this is a generic NFS problem or whether it's just something to do with PC-NFS. If anybody has a clue about this, I'd sure like to hear about it. My news feed is not reliable at the moment; please send me Email and I will summarize any information that you send. Thanks in advance, John Polstra (206) 932-6482 jdp@polstra.uucp or ...!uunet!practic!polstra!jdp [[ I really think that messages like this are better off being sent to the nfs list, where the primary topic of discussion is PC-NFS. The list address is "nfs@tmc.edu" and requests to be added should be sent to "nfs-request@tmc.edu". --wnl ]]