[comp.unix.wizards] A possible network bug in Sun unix?

netser%limbo.uci.edu@icsg.UCI.EDU (Richard Johnson) (12/18/86)

One of our Computing Support people here (Scott Menter) noticed a strange
problem today.  We investigated and we don't know exactly what to make of
the situation.  Let me explain:

1) You rlogin from your sun workstation (Sun-3/50 in this case) to another
   system on the network.
2) Your sun workstation crashes.
3) After rebooting you try to rlogin to the same other system again and
   you can't even after multiple tries.

We investigated and found that Sun seems to always allocate the first
unused port number above 1021 for an rlogin connection.  Since the
other end of the rlogin will stick around until some I/O forces it to
recognize the connection is broken (we just cat'ed to the pty on the remote
system and it closed), you'll get the same hosta:porta - hostb:portb pair
EVERY time and that HAS to be rejected by the remote system because of the
way TCP connections are defined!

Of course all you have to do to work around it is just rlogin to some OTHER
system and then rlogin to the one you want!

Is this a bug?  Am I missing something?  (By the way, this is SUN 3.0.)

----------------------------------------------------------------------------
Richard Johnson                          netser@ics.uci.edu       (Internet)
UCI ICS Network Services                 ...!ucbvax!ucivax!netser     (UUCP)

rackow@anl-mcs.arpa (Gene Rackow) (12/18/86)

We have the same problem on our network.  The work-around that I have found
that works is- 
      1.  for the sun do a "rlogin machine &"
                 and while that is timing out
      2.  do another "rlogin machine"
                  this one will now get into the remote machine
      3.  do a "who" on machine to find who is the ghost user on hostb
      4.  kill the login shell of the ghost user.
From here, until the next crash, rlogins work properly.

I have heard rumors that this problem is corrected in 4.3bsd and/or Sun 3.2.
Can anyone confirm/deny this rumor?

Gene Rackow              rackow@anl-mcs.arpa
312-972-7126

narten@purdue.EDU (Thomas Narten) (12/18/86)

This may be a feature of Sun UNIX, but is probably not restricted to it.
It is caused by two problems:

1) Unix has a keepalive option on sockets that times out (breaks)
connections if the peer in connection goes away. For TCP, "going away"
is defined as not having recieved any packets from the peer in X
amount of time. Rlogind uses this option.

2) Sun diskless machines reboot much more quickly then normal Unix
machines, because they don't have large disks for fsck to churn away
on. In particular, they are back up and running before old connections
have timed out due to (1). 

1 is implemented by running a timer that expires whenever no packets
have been exchanged for a certain period of time. When the timer
expires, TCP sends a one byte data segment that is outside of its send
window (i.e. it already has an ACK for that sequence number). The peer
TCP, in receiving the segment, notes that it already has the data and
sends back an ACK for the sequence number that it expects to see. The
client TCP gets that ACK, and updates its timer indicating that the
connection is still alive. The connection eventually breaks if no ACKs
are received.

This works just fine as long as both TCPs are still there, or if one
end of the TCP connection goes away in the sense that the host is
unreachable. On the other hand, if one machine crashes and reboots
quickly, the following occurs:

The client TCP sends a keepalive packet, which the peer TCP receives.
Now however, there is no protocol control block for that connection,
so the peer TCP sends back a RESET. The client TCP receives the
packet, updates its keepalive timer (hum... I got a packet, the
connection must still be fine), then checks the sequence numbers that
were ACKed. The ACK is outside of its receive window and there was no
data sent in the segment, so TCP drops the packet ignoring the RESET.
(This follows the TCP spec).

>Since the other end of the rlogin will stick around until some I/O
>forces it to recognize the connection is broken (we just cat'ed to the
>pty on the remote system and it closed),

This results from the RESET being ignored since it is not within its
recieve window. If you force the TCP to send real data, the ACK that
gets returned will be within the receive window and the RESET causes
the connection to break.

One workaround is to change the line in tcp_input(...):
	tp->t_timer[TCPT_KEEP] = TCPTV_KEEP;
to something like:
	if ((tiflags&TH_RST) == 0)
		tp->t_timer[TCPT_KEEP] = TCPTV_KEEP;

This will cause the connection to eventually timeout. Both 4.2 and 4.3
BSD suffer from this problem.

>1) You rlogin from your sun workstation (Sun-3/50 in this case) to another
>   system on the network.
>2) Your sun workstation crashes.
>3) After rebooting you try to rlogin to the same other system again and
>   you can't even after multiple tries.

I tried to duplicate your behavior on our Sun machines running NFS3.2
trying to connect to 4.2, 4.3 and NFS3.0 machines. I don't have a 3.0
machine handy that I can crash at will. I would rlogin to host A,
reboot the workstation, and rlogin to A again. Each time, I was able
to rlogin successfully. Each connection used the same port numbers.
Note that under normal conditions, the following packet exchange takes
place:

A				B
send SYN, SEQ=n,ACK=0		(thinks connection is established)
				gets SYN, sends back ACK=m,SEQ=o
gets ACK, notices sequence 
number is not what it
expects & replies with:
ACK=0,SEQ=m,RESET			
				gets RESET,drops connection and sends back
				RESET,ACK=m,SEQ=o

At this point the "old" rlogin has gone away, and the next SYN will
cause the connection to become established properly. 

I suppose that things could break if the sequence number chosen by A
was the same as B was expecting, but that would be an awful
coincidence. It is the case, however, that when a machine reboots, it
starts with an initial sequence number of 0. If your machine crashes
several times in quick succession, it is possible that the sequence
numbers on the peer connection could also be very low. Still, I find
it hard to believe that this is the cause the problem.

Do you have anyway of determinig what sequence numbers are involved in
the connections or what sort of packets are floating around for the
connection in question?

Thomas Narten
narten@purdue.EDU or {ihnp4, allegra}!purdue!narten

thomson@uthub.toronto.edu (Brian Thomson) (12/22/86)

I submitted a fix to this problem to net.bugs.4bsd in July, 1985.
The TCP connection establishment protocol is supposed to recover from
these 'half-open' connections, but a problem with the 4.2BSD implementation
prevented it from working properly.
4.3 has apparently adopted the same fix I proposed, although because of
interaction with other BSD TCP bugs I no longer use it in its original
form.  Presumably, later SUN distributions made a similar fix.
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsri!uthub!thomson