[comp.protocols.tcp-ip.ibmpc] PC-NFS bug

chris@yarra.oz.au (Chris Jankowski) (05/31/89)

Some time ago I posted a question about Ethernet type code 8035 (hex).
Thanks again to all who responded. It helped me to track down a bug in PC-NFS.
The bug is described below. Hope that does not waste news bandwidth. 
Is that a known bug? Has anybody encountered the problem? Is my analysis 
correct? Your comments are much appreciated. 


		A PC-NFS bug.
		=============

1. Conditions to recreate:
--------------------------

	- "A" class addressing Internet network.
	- YP not used.
	- RARP not used.
	- name server not used.
	- subnetting not used
	- Ethernet network (as oposed to SLIP).
	- PC-NFS 3.0

2. Description of the problem as seen by users.
-----------------------------------------------

Telnet terminal sessions with a UNIX host are run on PC/AT class of computers.
The program used on PCs is TELNET being part of PC-NFS Rel. 3.0 package from
Sun Microsystems.

Users of such sessions experience delays (screen freezes) of about 11 seconds
during which no information is sent from the host. During this time the user
can still key in some input but only to the point when the input buffer on
the PC gets full (8 characters).
After about 11 seconds normal work resumes ie. output is again sent from the
host and input is processed (including characters accumulated in the buffer).
The delays seem to happen at random and are experienced at a rate of a few 
per day per user.

Apart from the random delays there is also a situation when the delay happens
always but exactly once. It is after rebooting of a PC when either starting
a telnet session or when executing NET NAME command - whichever comes first.
Starting of consecutive telnet sessions and/or executing NET NAME commands
does not cause the delay.


3. Results of investigation.
----------------------------

The problem seems to manifest itself only if A class Internet addressing is
used eg. 125.30.1.1 for a host and say 125.30.1.20 for a PC.
The problem is not experienced when the network is either B or C class.
Also if a network is class A all PCs on it experience the problem regardless
of which host they have a telnet session with.
Other hardware/software dependencies have been eliminated by swapping 
equipment between networks, networks partitioning etc.

As the exact nature of the problem was a complete mystery an Isolan
Monitor was used to capture all Ethernet packets with either source
address or destination address fields equal to Etharnet address of a PC.
Fortunately it was known that the delay happens always after rebooting
of a PC when starting the first telnet session. Therefore it was easy
to start capturing there.

The following table lists all captured packets ( "A" class network,
start of the first telnet session after rebooting of the PC):
All packets are sent one after another within the resolution of the Isolan
Monitor clock (.005s) unless specifically noted.
------------------------------------------------------------------------------
Packet|Ether.|Ether.|Ether.|       Packet type, contents and comments
#     |source|dest. |type  |
      |addr. |addr. |code  |
------------------------------------------------------------------------------
1.    | PC   | bcast| 806  | ARP request
                             PC to the world: Send me Ethernet address of the
                             host having Internet address such and such.
------------------------------------------------------------------------------
2.    | host | PC   | 806  | ARP response
                             Host to PC: This is my Ethernet address.          
------------------------------------------------------------------------------
3.    | PC   | host | 800  | IP; TCP options (window size) negotiation. 
------------------------------------------------------------------------------
4.    | PC   | bcast| 8035 | RARP request 
			     PC to the world: send me my Internet address !??
                             This packet does not make much sense in the
                             context. The PC already knows its Internet
                             address. Moreover RARP has been specifically
                             disabled in NFSCONF program - so it is 
                             unreasonable to expect RARP support on the network
                             - in fact RARP should not be used at all.
                             The silliest thing is that the RARP packet itself
                             already contains both Internet and Ethernet
                             addresses of both the PC and the host in proper
                             fields.
------------------------------------------------------------------------------
5.    | host | PC   | 800  | IP; TCP options (window size) negotiation. 
                             The host responds to the packet #3.
------------------------------------------------------------------------------
6.    | PC   | bcast| 8035 | RARP request 
			     PC to the world: send me my Internet address !??
                             This packet is an exact copy of packet #4 and is
                             sent 3.5 seconds after packet #4.
                             Apprently the PC times out after not getting any
                             response to its RARP request (no wonder - RARP
                             is not used on the network) and resends the packet.
                             We start seeing where the delay comes from.
------------------------------------------------------------------------------
7.    | host | PC   | 800  | IP           
                             TCP options (window size) negotiation.            
                             The packet is a copy of TCP packet #5 and is sent
                             5.8s after packet #5.
                             The host responds to the packet #3 again as it
                             has not received any confirmation that its 
                             previous packet (#5) was received by the PC.
------------------------------------------------------------------------------
8.    | PC   | bcast| 8035 | RARP request 
			     PC to the world: send me my Internet address !??
                             This packet is an exact copy of packets #4 and #6
                             and is sent 3.5 seconds after packet #6 . The total
                             delay is now up to 7 seconds.
------------------------------------------------------------------------------
9.    | PC   | host | 800  | IP             
                             After another 3.5 seconds (10.5 in total) the
                             PC times out for the third time and all of the
                             sudden it drops the idea of getting its Internet
                             address from the network and acknowledges the TCP
                             window size negotiation packet sent by the host
                             so long ago.
------------------------------------------------------------------------------
10.   | host | PC   | 800  | IP             
                             TCP packet, telnet protocol - option negotiation
                             begins: do option 18hex.
------------------------------------------------------------------------------
Just for comparison the following table outlays what happens on a C class
network when beginning the first telnet session after rebooting of the PC.
All packets are sent one after another within the resolution of the Isolan
Monitor clock (.005s) unless specifically noted.

------------------------------------------------------------------------------
Packet|Ether.|Ether.|Ether.|       Packet type, contents and comments
#     |source|dest. |type  |
      |addr. |addr. |code  |
------------------------------------------------------------------------------
1.    | PC   | bcast| 806  | ARP request
                             PC to the world: Send me Ethernet address of the
                             host having Internet address such and such.
------------------------------------------------------------------------------
2.    | host | PC   | 806  | ARP response
                             Host to PC: This is my Ethernet address.          
------------------------------------------------------------------------------
3.    | PC   | host | 800  | IP           
                             TCP options (window size) negotiation.            
------------------------------------------------------------------------------
4.    | PC   | bcast| 800  | IP           
                             UDP packet                            
                             contains the following string: PC-NFSxxxxxxxxxx
                             where xxxxxxxx is a hex number.
                             My guess is that it is an attempt to detect another
                             copy of NFS with the same serial number on the
                             network and therefore violating the terms of the
                             PC-NFS licence.
------------------------------------------------------------------------------
5.    | host | PC   | 800  | IP           
                             TCP options (window size) negotiation.            
                             The host responds to the packet #3.
------------------------------------------------------------------------------
6.    | PC   | host | 800  | IP             
                             TCP 
                             Acknowledgement of packet #5.
------------------------------------------------------------------------------
7.    | host | PC   | 800  | IP             
                             TCP packet, telnet protocol - option negotiation
                             begins: do option 18hex.
------------------------------------------------------------------------------

Here everything goes smoothly.

After finding that those are RARP packets which were causing the delay
at the beginning of the first telnet session after the PC being rebooted
it was easy to check whether the same packets are present during the
random delays. And indeed this is the case. Precise trigger for the
random delay happenning is still unknown - the PC knows its Internet
address all the time after all. But the pattern is clear - it is very likely
that the same code path is followed.

At this point it is rather obvious that this must be a bug in
the PC-NFS code. Hopefully it is documented precisely enough to enable
the developers to fix it.

What makes me think that this must be a bug is that no software vendor 
would let you to bypass all antipirating devices so carefully designed
into their product just by specifying a particular type of network
addressing. (;-))

Chris Jankowski     chris@yarra.oz.au     chris@yarra.oz
Pyramid Technology Australia - Melbourne

Append your favourite disclaimer here: