steveh@hammer.UUCP (Stephen Hemminger) (05/29/85)
Here is a rather complex BSD implmentation of TCP protocol bug. It is a problem with TCP not BSD, but a good solution is hard to decide; what's your opinion. Scenario: Suppose two programs have a tcp/ip connection over a socket with keepalive's set. (example rlogin or telnet). Keepalive's mean that every thirty seconds a null packet is sent. The idea is that if this null packet is not acknowledged, then eventually the idle timer will go off and the connection will terminate. For discussion, let's call the two hosts A and B. Suppose host B goes down then the keepalive packet from A to B will never be acknowlged so A will know the connection is severed. The idle timer is 15 min. in 4.2BSD. Problem: Suppose host B goes down and then comes back up in less than 15 min. (obviously it can't be a vax :-) ) When B gets a keepalive packet, it will correctly send a RST (reset) back to A. The problem is that since the keep alive packet is a window update pointing one past the valid window, the RST is ignored by A. Yet, the idle timer on A is cleared by this invalid RST, so A will never abort the connection. The socket on A will hang forever open! Possible solutions: A) Make RST's outside window valid. The TCP protocol says that RST outside the current window are to be ignored. Let's not violate the protocol. B) Send a different keepalive packet. (i.e. inside window). What is a packet which is a guarenteed no-operation? Any data inside window has already been seen by B and therefore no acknowledgement is neeeded. C) Don't use keepalive's. Then how to you keep idle connections alive. D) Don't use tcp. then what, ISO (are you kidding? :-( ). Comments?
louie@umd5.UUCP (05/31/85)
I would have to question the existance in TCP of a KEEPALIVE construct at all. Suppose that I want to have a connection alive for many hours at a time; why should I force traffic to flow across it at periodic intervals? This is a real issue if your link-level network is a PDN, and you get charged by the packet. I you want to verify the existance of the connection, why not have whatever application that uses TCP make sure that the connection is "warm". This can easily be done in the case of a TELNET session by sending a timing-mark TELNET option every so often. -- Louis A. Mamakos WA3YMH University of Maryland, Computer Science Center Internet: louie@umd5.arpa UUCP: {seismo!umcp-cs, ihnp4!rlgvax}!cvl!umd5!louie
karn@petrus.UUCP (06/01/85)
The fundamental problem with TCP keepalives in 4.2BSD, as Jon Postel said, is that they are a "braindamaged hack" (his words). According to the entry in RFC 944 (Official ARPA Internet Protocols) for TCP, "there is no TCP 'probe' mechanism, [i.e., keepalives] and none is needed." I suggest that the best way to deal with this problem is to remove the KEEPALIVE options from all of the 4.2BSD network servers. This misfeature has caused considerable aggravation in our environment. Connections established from IBM PCs running the MIT PC/IP Telnet code get gratuitously dropped unless you type on them often enough. Apparently, the acknowledgment number contained in the 4.2 "probe" "takes back" a previous acknowledgment. The spec is not clear on this point, but in my opinion the PC code is perfectly entitled to ignore it completely, since it could only be an old duplicate. This is contrary to the assertion made in the 4.2BSD code: "Saying rcv_nxt-1 lies about what we have received, and by the protocol spec requires the correspondent TCP to respond." As described later, I believe this lie is also to blame for Steve's problem. What the code SHOULD do is to retransmit the last byte sent to the other end, exactly as if it had been lost in transmission and never acknowledged. This WOULD force the other end to respond. The code notes that there is a problem with a one-way data stream; however, in this case, you might try retransmitting your SYN (since it has a number in the sequence space) along with your Initial Sequence Number, and the other end OUGHT to respond with the desired acknowledgment (hopefully not with a RST). I have never seen the behavior Steve described on our systems, although this could be because the keepalive timer on our systems is quite a bit shorter than 15 minutes, and no machine can reboot this quickly. I'm not sure I fully understand his problem, but here goes. When the pollee comes back up, it has forgotten the sequence numbers it was using on the connection. It therefore attempts to formulate its RST response to the poller by using the ACK number contained in the poll (as dictated by the spec). Unfortunately, as I mentioned earlier, the poller is "taking back" an acknowledgement, so the sequence number contained in the RST will be outside the poller's acceptable window. Therefore it is ignored and Steve's problem occurs. So it seems that this is another problem caused by the poller lying in its ACK field about what it has received. Fix this and both Steve's problem and the dropped IBM PC Telnet connection problem should go away. I should point out here that there is a very good reason the spec says to ignore RSTs that lie outside the current window: if you didn't, an old duplicate RST could drop your connection unnecessarily. Therefore Steve's solution #A is unacceptable. Regarding solution #B, you must always acknowledge data, whether or not it lies inside your window (i.e., whether you have already seen it or not), because this could be a retransmission due to an earlier acknowledgment being lost. So probing with the last transmitted byte (assuming you DON'T lie about your acknowledgment number) ought to work. I like solution #C the best. Polling is worse than useless in virtually all situations. It causes even idle connections to get dropped if an intervening gateway goes down for a minute or so. I often have a half dozen idle rlogins going in different windows on my Sun workstation, and having to re-establish them all after somebody reboots a gateway is a real pain. I can understand losing the one I'm actively working on (although I wish the TCP giveup timers were longer), but breaking idle connections is unacceptable. I assume solution #D is a joke. Phil