tcp-ip@ucbvax.ARPA (07/25/85)
From: ucdla@ucbtopaz.CC I am wondering if anyone has developed a strategy by which TCP can inquire on the aliveness of connections on some regular interval (perhaps time driven). A problem has occurred for us in TELNET where connections abort or otherwise go away without being properly closed and our TELNET never finds out about if there is no activity on the connection. The INTERNET Implemation workbook suggests having TELNET negotiate some meaningless option every so often as a way of finding out if the other side is still there. This will fix our problem but I would like to fix it in TCP if possible so that they same fix will apply to all higher level protocols that use TCP without the need for them to do it themselves. Comments and suggestions would be most welcome
tcp-ip@ucbvax.ARPA (07/25/85)
From: louie@umd5 (Louis Mamakos) Please! Don't try to 'fix' TCP. I don't want my TCP to generate network traffic on an idle connection. The TELNET protocol has a method to accomplish what you want; send a TELNET NOOP control sequence, or negotiate the timing mark option. If your application cares about an idle connection, let it probe the remote host. If you are running IP over an X.25 PDN where you pay by the packet, having and idle TCP connection stay idle can save you real dollars. I'm sure the CSNET folks can talk to this issue. Now, if I can just get the stupid user TELNET programs out there that talk to my line-at-a-time Sperry system to send the CR-LF sequence instead of just a LF to end the line. The 4.2 BSD TELNET is a notable offender. Louis A. Mamakos WA3YMH University of Maryland, Computer Science Center Internet: louie@umd5.arpa UUCP: {seismo!umcp-cs, ihnp4!rlgvax}!cvl!umd5!louie
tcp-ip@ucbvax.ARPA (07/25/85)
From: Mark Crispin <MRC@SIMTEL20.ARPA> Hear hear! This is a great idea. It is one thing to provide connection continuation across a service interruption (a real win on satellite links!) but it is quite another to keep an idle connection around for days until the operator nukes it. I suggest using some sort of zero-window probing, and consider the other end to be down if there is a no-reply condition or if the hardware level (e.g. 1822) has a "host down" condition and reports that (this slightly violates the modularity of TCP, but if you're careful you can make it into some sort of "signal to TCP from hardware level" which isn't too kludgy). There should be a minimum "keep time" for the connection to come back -- say 30 minutes to an hour -- after which the host should discard the connection even if it doesn't get a reset from the other host. If a host comes back from a service interruption without a reload, it should immediately probe all its connections to see what connections are still valid and immediately nuke all the ones which don't answer or reset. -------
tcp-ip@ucbvax.ARPA (07/26/85)
From: "J. Spencer Love" <JSLove@MIT-MULTICS.ARPA> My recommendation is that you send out an unsolicited acknowledgement every 5 minutes or so when there is no activity of any kind on the connection. Activity is defined as any traffic, incoming or outgoing. Thus, in practice only one end of the connection will be pinging the other, since the receipt of such an unsolicited acknowledgement would constitute activity. If the host at the other end of the connection has crashed, there will be no response to the ping at the TCP level, and most TCP implementations don't understand ICMP destination unreachable messages anyway. However, when the host comes back up, the unsolicited ack will elicit a reset, aborting the half-connection. If your TCP implements ICMP destination unreachable messages, you probably don't want to actually reset a connection because of one unless the destination remains unreachable for some period of time, such as the transmission timeout. Given a little time, IP routing may find another route or the gateway may come back up. Multics uses transmission timeouts of typically one minute. Pinging is in disrepute these days because of network loading which it causes. On the good side, you are only pinging connections which are in the established state, and if the clocks of the system are not perfectly in step (well, "clocks" is an oversimplification) and the packet dropping rate is low, only one system will be doing the pinging every ping interval. If different timeouts are used based on whether the activity was transmission or reception, the systems can take turns doing the pinging. The ping interval can be quite long, perhaps several minutes, since it is not constrained to be less than a transmission timeout interval. On the bad side, you are pinging for each connection, not just for each host or even first hop gateway. The rationale for leaving the pinging to the next higher level protocol (telnet negotiations) is that then only telnet connections will present this additional load on the network. I think this is a crock of your least favorite substance. This means that ANY protocol or application which is potentially idle for long periods must reinvent this wheel. I think that any application with humans in the loop will require this, and that most other applications (e.g., SMTP, FTP data connections, finger) do not have TCP connections sitting idle in both directions for extended periods of time anyway. A reliable stream protocol should be more reliable than TCP is at telling its clients that the connection has gone away. Most real TCP implementations have ways of passing this information around. For example, if the client is blocked on read it has to handle an abort, and a telnet user has to be able to receive urgent data to implement Interrupt Process, even though these mechanisms are not formally part of the TCP spec. Multics does not do this pinging (yet?), and I know of no implementation which does. Would anyone care to offer additional reasons why implementations should NOT do this, and why the TCP spec shouldn't be amended?
tcp-ip@ucbvax.ARPA (07/26/85)
From: Bob Walsh <walsh@BBN-LABS-B.ARPA> TCP is designed to provide a reliable transport layer. It is NOT designed to ensure application <-> application reliability. Robustness of application communication is the responsibility of the application. Mechanisms have been tried to solve this problem. For example, the Berkeley 4.2BSD TCP periodically sends a byte just beyond the window in an attempt to force an ack when the connection is otherwise idle. But not every system is a 4.2BSD host, and what happens when it communicates with a generous implementation that recognizes the high per packet overhead (interrupt processing, header checking, and network resources) and buffers the incoming packet? A garbage byte may make its way in to the stream. And yet, the receiving TCP has made no mistake. Don't try to change the protocol, or you may run into problems. Interoperability is one of the benefits of standards. The RDP protocol specifiers recognized this problem and addressed it through the use of NULL messages to which all implementations must respond. They also recognized that the application layer must be responsible for end to end reliability and forced the developer to keep this in mind through RDP's abrupt closing method. bob walsh
tcp-ip@ucbvax.ARPA (07/26/85)
From: imagen!geof@su-shasta.ARPA It seems that every now and then someone new brings up the topic of keep-alives again. I guess that I'll be the one (probably of many) to answer this time. TCP is a general purpose protocol, which is used over networks where bandwidth is cheap and over networks where bandwidth is expensive. Since a major goal of the Internet protocol is to abstract the exact nature of the network(s) in use, it is almost always impossible for a host to know what networks a TCP connection is using (Indeed, the connection may be using more than one set of networks). The only way to verify that a TCP connection is working is to send a packet over the connection. On some networks, this costs money. On others it costs processing time in loaded gateways. In short, it would unacceptably expensive for TCP implementations that use SOME parts of the Internet to probe the connection every so often just to verify that the connection is still there. Since an implementation has no way of knowing what parts of the Internet it is using, this sort of probe was left out of TCP-IP. This means that probing the connection is left as a higher-level issue, one that TCP explicitly does not deal with. This is actually a good idea for another reason. Some protocols don't need to verify that an idle connection is still around, because they always push data through that connection. Other implementations don't need to determine that the connection has gone down until they are actually needed again. For example, a TAC never times out its TCP connection -- it relies on the human user (which is always present) to decide for himself that the connection is dead. If you need to generate timeouts on your telnet programs, the place to do it is within telnet itself. Now you can understand why the telnet specification suggests sending bogus option negotiations (the other thing to do is to send AYT's, but some hosts do silly things with them, like converting them into random control characters that are sent to an application). One more thing: at MIT we used to have problems with gateways crashing. Sometimes (in the early days) a gateway would crash for ten or fifteen minutes. How elated were we that TCP does not send keep-alive probes, since they would have determined (in their cleverness) that the connection should have been reset, while we human users had higher level information (Mr. Chiappa yelling down the hall) that the connection would soon return to good health. - Geof
tcp-ip@ucbvax.ARPA (07/26/85)
From: CERF@USC-ISI.ARPA The design of TCP explicitly did NOT introduce an are-you-there automatic facility. We reasoned that the process above the TCP level did not care what the condition of the other end was until it had some data to send. If the process sent data (or a request or query or a task initiation etc.) then it MIGHT care about response. Certainly in that case, TCP tries to send repeatedly and reports back when it cannot get an ACK within the specified amount of time. Once the data is accepted (ACKED), only the process using TCP knows when it should get impatient about getting an answer. TCP certainly doesn't know. If the using process does have a time out after which in really wants an answer, that is the right time for it to query the other side (e.g. send a query at the protocol layer above TCP) and find out what its condition is. It is fair to argue about reinventing the wheel for each protocol above TCP, but it was our thought at the time that each process or application would have different levels of patience regarding when to get nervous about not hearing a response. So we left out that feature in TCP, not wanting to impose something arbitrary on the next level of protocol up. Vint Cerf
tcp-ip@ucbvax.ARPA (07/27/85)
From: Ron Natalie <ron@BRL.ARPA> Actually the stupid 4.2 BSD telnet sends a CR-NUL at the end of line which really isn't what the spec had in mind either. -Ron
tcp-ip@ucbvax.ARPA (07/31/85)
From: David C. Plummer in disguise <DCP@SCRC-QUABBIN.ARPA> Folks, we've been through this before, and there is the following large comment in the TCP implementation I wrote for Symbolics: ;; Very important note: This does NOT, and is not intended to, ;; ensure the other side of the connection is alive. We are ;; not asking for any positive confirmation that this ack was ;; received. What this IS for is to generate a RESET from the ;; foreign host if the connection is known to be dead. This ;; issue was discussed on the TCP-IP@SRI-NIC mailing list from ;; 22 Nov 83 and lasting for a few days. ;; This mechanism is also used for the "zero window probe". (send-ack-for-tcb tcb (get-tcp-segment tcb t) :idle-probe))