[fa.tcp-ip] A Noop Strategy for TCP

tcp-ip@ucbvax.ARPA (07/25/85)

From: ucdla@ucbtopaz.CC


I am wondering if anyone has developed a strategy by which TCP can inquire
on the aliveness of connections on some regular interval (perhaps time
driven). A problem has occurred for us in TELNET where connections
abort or otherwise go away without being properly closed and our TELNET
never finds out about if there is no activity on the connection. The
INTERNET Implemation workbook suggests having TELNET negotiate some
meaningless option every so often as a way of finding out if the other
side is still there. This will fix our problem but I would like to fix it
in TCP if possible so that they same fix will apply to all higher level
protocols that use TCP without the need for them to do it themselves.
Comments and suggestions would be most welcome

tcp-ip@ucbvax.ARPA (07/25/85)

From: louie@umd5 (Louis Mamakos)

Please! Don't try to 'fix' TCP.  I don't want my TCP to generate network
traffic on an idle connection.  The TELNET protocol has a method to 
accomplish what you want;  send a TELNET NOOP control sequence, or
negotiate the timing mark option.

If your application cares about an idle connection, let it probe the remote
host.  
	
If you are running IP over an X.25 PDN where you pay by the packet, having
and idle TCP connection stay idle can save you real dollars.  I'm sure
the CSNET folks can talk to this issue.

Now, if I can just get the stupid user TELNET programs out there that
talk to my line-at-a-time Sperry system to send the CR-LF sequence instead
of just a LF to end the line.  The 4.2 BSD TELNET is a notable offender.

Louis A. Mamakos WA3YMH   University of Maryland, Computer Science Center
 Internet: louie@umd5.arpa
 UUCP: {seismo!umcp-cs, ihnp4!rlgvax}!cvl!umd5!louie

tcp-ip@ucbvax.ARPA (07/25/85)

From: Mark Crispin <MRC@SIMTEL20.ARPA>

     Hear hear!  This is a great idea.  It is one thing to provide
connection continuation across a service interruption (a real win
on satellite links!) but it is quite another to keep an idle
connection around for days until the operator nukes it.

     I suggest using some sort of zero-window probing, and consider
the other end to be down if there is a no-reply condition or if the
hardware level (e.g. 1822) has a "host down" condition and reports
that (this slightly violates the modularity of TCP, but if you're
careful you can make it into some sort of "signal to TCP from
hardware level" which isn't too kludgy).  There should be a minimum
"keep time" for the connection to come back -- say 30 minutes to an
hour -- after which the host should discard the connection even if
it doesn't get a reset from the other host.

     If a host comes back from a service interruption without a
reload, it should immediately probe all its connections to see what
connections are still valid and immediately nuke all the ones which
don't answer or reset.
-------

tcp-ip@ucbvax.ARPA (07/26/85)

From: "J. Spencer Love" <JSLove@MIT-MULTICS.ARPA>

My recommendation is that you send out an unsolicited acknowledgement
every 5 minutes or so when there is no activity of any kind on the
connection.  Activity is defined as any traffic, incoming or outgoing.
Thus, in practice only one end of the connection will be pinging the
other, since the receipt of such an unsolicited acknowledgement would
constitute activity.  If the host at the other end of the connection has
crashed, there will be no response to the ping at the TCP level, and
most TCP implementations don't understand ICMP destination unreachable
messages anyway.  However, when the host comes back up, the unsolicited
ack will elicit a reset, aborting the half-connection.

If your TCP implements ICMP destination unreachable messages, you
probably don't want to actually reset a connection because of one unless
the destination remains unreachable for some period of time, such as the
transmission timeout.  Given a little time, IP routing may find another
route or the gateway may come back up.  Multics uses transmission
timeouts of typically one minute.

Pinging is in disrepute these days because of network loading which it
causes.  On the good side, you are only pinging connections which are in
the established state, and if the clocks of the system are not perfectly
in step (well, "clocks" is an oversimplification) and the packet
dropping rate is low, only one system will be doing the pinging every
ping interval.  If different timeouts are used based on whether the
activity was transmission or reception, the systems can take turns doing
the pinging.  The ping interval can be quite long, perhaps several
minutes, since it is not constrained to be less than a transmission
timeout interval.  On the bad side, you are pinging for each connection,
not just for each host or even first hop gateway.

The rationale for leaving the pinging to the next higher level protocol
(telnet negotiations) is that then only telnet connections will present
this additional load on the network.  I think this is a crock of your
least favorite substance.  This means that ANY protocol or application
which is potentially idle for long periods must reinvent this wheel.  I
think that any application with humans in the loop will require this,
and that most other applications (e.g., SMTP, FTP data connections,
finger) do not have TCP connections sitting idle in both directions for
extended periods of time anyway.  A reliable stream protocol should be
more reliable than TCP is at telling its clients that the connection has
gone away.

Most real TCP implementations have ways of passing this information
around.  For example, if the client is blocked on read it has to handle
an abort, and a telnet user has to be able to receive urgent data to
implement Interrupt Process, even though these mechanisms are not
formally part of the TCP spec.

Multics does not do this pinging (yet?), and I know of no implementation
which does.  Would anyone care to offer additional reasons why
implementations should NOT do this, and why the TCP spec shouldn't be
amended?

tcp-ip@ucbvax.ARPA (07/26/85)

From: Bob Walsh <walsh@BBN-LABS-B.ARPA>


TCP is designed to provide a reliable transport layer.  It is NOT designed
to ensure application <-> application reliability.  Robustness of application
communication is the responsibility of the application.  Mechanisms have
been tried to solve this problem.

For example, the Berkeley 4.2BSD TCP periodically
sends a byte just beyond the window in an attempt to force an ack when the
connection is otherwise idle.   But not every system is a 4.2BSD host, and
what happens when it communicates with a generous implementation that
recognizes the high per packet overhead (interrupt processing, header checking,
and network resources) and buffers the incoming packet?  A garbage byte
may make its way in to the stream.  And yet, the receiving TCP has made no
mistake.

Don't try to change the protocol, or you may run into problems.
Interoperability is one of the benefits of standards.  The RDP protocol
specifiers recognized this problem and addressed it through the use of
NULL messages to which all implementations must respond.  They also
recognized that the application layer must be responsible for end to end
reliability and forced the developer to keep this in mind through RDP's
abrupt closing method.

bob walsh

tcp-ip@ucbvax.ARPA (07/26/85)

From: imagen!geof@su-shasta.ARPA


It seems that every now and then someone new brings up the topic of
keep-alives again.  I guess that I'll be the one (probably of many) to
answer this time.

TCP is a general purpose protocol, which is used over networks where
bandwidth is cheap and over networks where bandwidth is expensive. 
Since a major goal of the Internet protocol is to abstract the exact
nature of the network(s) in use, it is almost always impossible for a
host to know what networks a TCP connection is using (Indeed, the
connection may be using more than one set of networks).

The only way to verify that a TCP connection is working is to send a
packet over the connection.  On some networks, this costs money.  On
others it costs processing time in loaded gateways.  In short, it would
unacceptably expensive for TCP implementations that use SOME parts of
the Internet to probe the connection every so often just to verify that
the connection is still there.  Since an implementation has no way of
knowing what parts of the Internet it is using, this sort of probe was
left out of TCP-IP.

This means that probing the connection is left as a higher-level issue,
one that TCP explicitly does not deal with.  This is actually a good
idea for another reason.  Some protocols don't need to verify that an
idle connection is still around, because they always push data through
that connection.  Other implementations don't need to determine that
the connection has gone down until they are actually needed again.  For
example, a TAC never times out its TCP connection -- it relies on the
human user (which is always present) to decide for himself that the
connection is dead.

If you need to generate timeouts on your telnet programs, the place to
do it is within telnet itself.  Now you can understand why the telnet
specification suggests sending bogus option negotiations (the other
thing to do is to send AYT's, but some hosts do silly things with them,
like converting them into random control characters that are sent to an
application).

One more thing: at MIT we used to have problems with gateways crashing.
Sometimes (in the early days) a gateway would crash for ten or fifteen
minutes.  How elated were we that TCP does not send keep-alive probes,
since they would have determined (in their cleverness) that the
connection should have been reset, while we human users had higher level
information (Mr. Chiappa yelling down the hall) that the connection
would soon return to good health.

- Geof

tcp-ip@ucbvax.ARPA (07/26/85)

From: CERF@USC-ISI.ARPA


The design of TCP explicitly did NOT introduce an are-you-there
automatic facility. We reasoned that the process above the TCP
level did not care what the condition of the other end was until
it had some data to send. If the process sent data (or a request
or query or a task initiation etc.) then it MIGHT care about
response. Certainly in that case, TCP tries to send repeatedly
and reports back when it cannot get an ACK within the specified
amount of time.  Once the data is accepted (ACKED), only the
process using TCP knows when it should get impatient about getting
an answer. TCP certainly doesn't know.  

If the using process does have a time out after which in really
wants an answer, that is the right time for it to query the
other side (e.g. send a query at the protocol layer above TCP)
and find out what its condition is.

It is fair to argue about reinventing the wheel for each protocol
above TCP, but it was our thought at the time that each process
or application would have different levels of patience regarding
when to get nervous about not hearing a response. So we left out
that feature in TCP, not wanting to impose something arbitrary on
the next level of protocol up.

Vint Cerf

tcp-ip@ucbvax.ARPA (07/27/85)

From: Ron Natalie <ron@BRL.ARPA>

Actually the stupid 4.2 BSD telnet sends a CR-NUL at the end of line
which really isn't what the spec had in mind either.

-Ron

tcp-ip@ucbvax.ARPA (07/31/85)

From: David C. Plummer in disguise <DCP@SCRC-QUABBIN.ARPA>

Folks, we've been through this before, and there is the following large
comment in the TCP implementation I wrote for Symbolics:

	  ;; Very important note:  This does NOT, and is not intended to,
	  ;; ensure the other side of the connection is alive.  We are
	  ;; not asking for any positive confirmation that this ack was
	  ;; received.  What this IS for is to generate a RESET from the
	  ;; foreign host if the connection is known to be dead.  This
	  ;; issue was discussed on the TCP-IP@SRI-NIC mailing list from
	  ;; 22 Nov 83 and lasting for a few days.
	  ;; This mechanism is also used for the "zero window probe".
	  (send-ack-for-tcb tcb (get-tcp-segment tcb t) :idle-probe))