chris@yarra.oz.au (Chris Jankowski) (05/31/89)
Some time ago I posted a question about Ethernet type code 8035 (hex). Thanks again to all who responded. It helped me to track down a bug in PC-NFS. The bug is described below. Hope that does not waste news bandwidth. Is that a known bug? Has anybody encountered the problem? Is my analysis correct? Your comments are much appreciated. A PC-NFS bug. ============= 1. Conditions to recreate: -------------------------- - "A" class addressing Internet network. - YP not used. - RARP not used. - name server not used. - subnetting not used - Ethernet network (as oposed to SLIP). - PC-NFS 3.0 2. Description of the problem as seen by users. ----------------------------------------------- Telnet terminal sessions with a UNIX host are run on PC/AT class of computers. The program used on PCs is TELNET being part of PC-NFS Rel. 3.0 package from Sun Microsystems. Users of such sessions experience delays (screen freezes) of about 11 seconds during which no information is sent from the host. During this time the user can still key in some input but only to the point when the input buffer on the PC gets full (8 characters). After about 11 seconds normal work resumes ie. output is again sent from the host and input is processed (including characters accumulated in the buffer). The delays seem to happen at random and are experienced at a rate of a few per day per user. Apart from the random delays there is also a situation when the delay happens always but exactly once. It is after rebooting of a PC when either starting a telnet session or when executing NET NAME command - whichever comes first. Starting of consecutive telnet sessions and/or executing NET NAME commands does not cause the delay. 3. Results of investigation. ---------------------------- The problem seems to manifest itself only if A class Internet addressing is used eg. 125.30.1.1 for a host and say 125.30.1.20 for a PC. The problem is not experienced when the network is either B or C class. Also if a network is class A all PCs on it experience the problem regardless of which host they have a telnet session with. Other hardware/software dependencies have been eliminated by swapping equipment between networks, networks partitioning etc. As the exact nature of the problem was a complete mystery an Isolan Monitor was used to capture all Ethernet packets with either source address or destination address fields equal to Etharnet address of a PC. Fortunately it was known that the delay happens always after rebooting of a PC when starting the first telnet session. Therefore it was easy to start capturing there. The following table lists all captured packets ( "A" class network, start of the first telnet session after rebooting of the PC): All packets are sent one after another within the resolution of the Isolan Monitor clock (.005s) unless specifically noted. ------------------------------------------------------------------------------ Packet|Ether.|Ether.|Ether.| Packet type, contents and comments # |source|dest. |type | |addr. |addr. |code | ------------------------------------------------------------------------------ 1. | PC | bcast| 806 | ARP request PC to the world: Send me Ethernet address of the host having Internet address such and such. ------------------------------------------------------------------------------ 2. | host | PC | 806 | ARP response Host to PC: This is my Ethernet address. ------------------------------------------------------------------------------ 3. | PC | host | 800 | IP; TCP options (window size) negotiation. ------------------------------------------------------------------------------ 4. | PC | bcast| 8035 | RARP request PC to the world: send me my Internet address !?? This packet does not make much sense in the context. The PC already knows its Internet address. Moreover RARP has been specifically disabled in NFSCONF program - so it is unreasonable to expect RARP support on the network - in fact RARP should not be used at all. The silliest thing is that the RARP packet itself already contains both Internet and Ethernet addresses of both the PC and the host in proper fields. ------------------------------------------------------------------------------ 5. | host | PC | 800 | IP; TCP options (window size) negotiation. The host responds to the packet #3. ------------------------------------------------------------------------------ 6. | PC | bcast| 8035 | RARP request PC to the world: send me my Internet address !?? This packet is an exact copy of packet #4 and is sent 3.5 seconds after packet #4. Apprently the PC times out after not getting any response to its RARP request (no wonder - RARP is not used on the network) and resends the packet. We start seeing where the delay comes from. ------------------------------------------------------------------------------ 7. | host | PC | 800 | IP TCP options (window size) negotiation. The packet is a copy of TCP packet #5 and is sent 5.8s after packet #5. The host responds to the packet #3 again as it has not received any confirmation that its previous packet (#5) was received by the PC. ------------------------------------------------------------------------------ 8. | PC | bcast| 8035 | RARP request PC to the world: send me my Internet address !?? This packet is an exact copy of packets #4 and #6 and is sent 3.5 seconds after packet #6 . The total delay is now up to 7 seconds. ------------------------------------------------------------------------------ 9. | PC | host | 800 | IP After another 3.5 seconds (10.5 in total) the PC times out for the third time and all of the sudden it drops the idea of getting its Internet address from the network and acknowledges the TCP window size negotiation packet sent by the host so long ago. ------------------------------------------------------------------------------ 10. | host | PC | 800 | IP TCP packet, telnet protocol - option negotiation begins: do option 18hex. ------------------------------------------------------------------------------ Just for comparison the following table outlays what happens on a C class network when beginning the first telnet session after rebooting of the PC. All packets are sent one after another within the resolution of the Isolan Monitor clock (.005s) unless specifically noted. ------------------------------------------------------------------------------ Packet|Ether.|Ether.|Ether.| Packet type, contents and comments # |source|dest. |type | |addr. |addr. |code | ------------------------------------------------------------------------------ 1. | PC | bcast| 806 | ARP request PC to the world: Send me Ethernet address of the host having Internet address such and such. ------------------------------------------------------------------------------ 2. | host | PC | 806 | ARP response Host to PC: This is my Ethernet address. ------------------------------------------------------------------------------ 3. | PC | host | 800 | IP TCP options (window size) negotiation. ------------------------------------------------------------------------------ 4. | PC | bcast| 800 | IP UDP packet contains the following string: PC-NFSxxxxxxxxxx where xxxxxxxx is a hex number. My guess is that it is an attempt to detect another copy of NFS with the same serial number on the network and therefore violating the terms of the PC-NFS licence. ------------------------------------------------------------------------------ 5. | host | PC | 800 | IP TCP options (window size) negotiation. The host responds to the packet #3. ------------------------------------------------------------------------------ 6. | PC | host | 800 | IP TCP Acknowledgement of packet #5. ------------------------------------------------------------------------------ 7. | host | PC | 800 | IP TCP packet, telnet protocol - option negotiation begins: do option 18hex. ------------------------------------------------------------------------------ Here everything goes smoothly. After finding that those are RARP packets which were causing the delay at the beginning of the first telnet session after the PC being rebooted it was easy to check whether the same packets are present during the random delays. And indeed this is the case. Precise trigger for the random delay happenning is still unknown - the PC knows its Internet address all the time after all. But the pattern is clear - it is very likely that the same code path is followed. At this point it is rather obvious that this must be a bug in the PC-NFS code. Hopefully it is documented precisely enough to enable the developers to fix it. What makes me think that this must be a bug is that no software vendor would let you to bypass all antipirating devices so carefully designed into their product just by specifying a particular type of network addressing. (;-)) Chris Jankowski chris@yarra.oz.au chris@yarra.oz Pyramid Technology Australia - Melbourne Append your favourite disclaimer here: