rws@EXPO.LCS.MIT.EDU (05/23/89)
I have a random question that I hope this illustrious audience can answer definitively for me (or else point me to a definitive source). Is the BSD notion of SO_KEEPALIVE on a TCP connection considered kosher with respect to the TCP specification? If so, is its use to be encouraged? Specifically, it has been suggested that in the X Window System world, X libraries should automatically be setting SO_KEEPALIVE on connections to X servers. Is this a reasonable thing to do? [If this is a totally inappropriate forum for this question, I apologize.]
dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) (05/23/89)
The use of Keepalives is terrible, but sometimes necessary. The key word, here, is "sometimes". The "terrible" is due to the fact that they add traffic to the net. An important point to keep in mind, with TCP connections, is that they may span the globe, over thin wires. Extra traffic can have a very serious effect. Further, they scale poorly. The incremental traffic from one connection may not be onerous, but what about 1000 connections? Lastly, of course, there is the small fact that there may be a charge for those extra packets, such as may happen if one of the links along the path is over a public X.25 network. If the group proposing the use of Keepalives has already gone through the exercise of convincing themselves that critical functionality will be lost if they are not used, then I hope the next question was/is how to minimize their use. Dave
craig@NNSC.NSF.NET (Craig Partridge) (05/23/89)
> I have a random question that I hope this illustrious audience can answer > definitively for me (or else point me to a definitive source). Is the BSD > notion of SO_KEEPALIVE on a TCP connection considered kosher with respect to > the TCP specification? If so, is its use to be encouraged? Specifically, > it has been suggested that in the X Window System world, X libraries > should automatically be setting SO_KEEPALIVE on connections to X servers. Is this > a reasonable thing to do? Oh what fun! Keepalive wars return.... Well, I'm a firm hater of keep-alives, although Mike Karels has persuaded me that in the current world they are a useful tool for catching clients that go off into hyperspace without telling you. I have lots of fellow travellers (actually, I'm probably a fellow traveller with Phil Karn, president of the "I hate keep-alives" party), witness the current host requirements text, which is appended. Craig Implementors MAY include "keep-alives" in their TCP | implementations, although this practice is not universally | accepted. If keep-alives are included, the application MUST | be able to turn them on or off for each TCP connection, and | they MUST default to off. | Keep-alive packets MUST NOT be sent when any data or | acknowledgement packets have been received for the | connection within a configurable interval; this interval | MUST default to no less than two hours. | An implementation SHOULD send a keep-alive segment with no | data; however, it MAY be configurable to send a keep-alive | segment containing one garbage octet, for compatibililty | with erroneous TCP implementations. | DISCUSSION: | A "keep-alive" mechanism would periodically probe the | other end of a connection when the connection was | otherwise idle, even when there was no data to be sent. | The TCP specification does not include a keep-alive | mechanism because it could: (1) cause perfectly good | connections to break during transient Internet | failures; (2) consume unnecessary bandwidth ("if no one | is using the connection, who cares if it is still | good?"); and (3) cost money for an Internet path that | charges for packets. | Some TCP implementations, however, have included a | keep-alive mechanism. To confirm that an idle | connection is still active, these implementations send | a probe segment designed to elicit a response from the | peer TCP. Such a segment generally contains SEG.SEQ = | SND.NXT-1. The segment may or may not contain one | garbage octet of data. Note that on a quiet | connection, SND.NXT = RCV.NXT and SEG.SEQ will be | outside the window. Therefore, the probe causes the | receiver to return an acknowledgment segment, | confirming that the connection is still live. If the | peer has dropped the connection due to a network | partition or a crash, it will respond with a reset | instead of an acknowledgement. | Unfortunately, some misbehaved TCP implementations fail | to respond to a segment with SEG.SEQ = SND.NXT-1 unless | the segment contains data. Alternatively, an | implementation could determine whether a peer responded | correctly to keep-alive packets with no garbage data | octet. | A TCP keep-alive mechanism should only be invoked in | network servers that might otherwise hang indefinitely | and consume resources unnecessarily if a client crashes | or aborts a connection during a network partition. |
hrp@boring.cray.com (Hal Peterson) (05/24/89)
Here are a couple of relevant extracts from section 4.2.3.5 of the 17 May draft of the Requirements for Internet Hosts RFC: A "keep-alive" mechanism would periodically probe the | other end of a connection when the connection was | otherwise idle, even when there was no data to be sent. | The TCP specification does not include a keep- alive | mechanism because it could: (1) cause perfectly good | connections to break during transient Internet | failures; (2) consume unnecessary bandwidth ("if no one | is using the connection, who cares if it is still | good?"); and (3) cost money for an Internet path that | charges for packets. | [ . . . ] A TCP keep-alive mechanism should only be invoked in | network servers that might otherwise hang indefinitely | and consume resources unnecessarily if a client crashes | or aborts a connection during a network partition. | Bob Braden points out that one of the design goals of TCP/IP was and is robustness in the face of errors: even if a few gateways melt down, the TCP connections that had been using them should pick up where they left off when new routes materialize. Keepalives are explicitly designed to avoid this. The pros and cons, however, are subject to some disagreement. -- Hal Peterson Domain: hrp@cray.com Cray Research Old style: hrp%cray.com@uc.msc.umn.edu 1440 Northland Dr. UUCP: uunet!cray!hrp Mendota Hts, MN 55120 USA Telephone: +1 612 681 3145
mo@prisma.UUCP (05/24/89)
If Keepalives are not (judiciously!) used, how does one transparently discover that the other end of the connection has died a horrible, sudden death? One can argue whether this is a transport or session function, but the ability to lose one end of the connection while the passive end just hangs forever is NOT a feature. -Mike
dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) (05/24/89)
I tried to avoid saying that keepalives should be prohibited, except, perhaps, from an aesthetic point of view. Since aesthetics often are altered by reality, it is no great concession to acknowledge the occasional need for the mechanism. My point was that they are dangerous and therefore should be used VERY judiciously. Craig's note puts this point forward in more detail. It is worth adding that the excessive use of keepalives has removed a feature that used to be in TCP and has been recently re-documented by Bob Braden: TCP used to be remarkably robust against temporary outages. If you were willing to wait, so was TCP. Now, an outage of a very short time -- on some implementations, as short as 1-2 minutes -- will abort the connection. Dave
casey@gauss.llnl.gov (Casey Leedom) (05/24/89)
| From: dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) | | If the group proposing the use of Keepalives has already gone through the | exercise of convincing themselves that critical functionality will be | lost if they are not used, then I hope the next question was/is how to | minimize their use. I think that the big problem that Robert may be trying to deal with is server crashes. (Correct me if I'm totally off the deep end Robert.) Currently, when an X.V11R3 server crashes or simply exits for normal reasons and there are still clients using it, those clients will (typically) lay around forever because they never try to contact the server on their own and they never receive anything from the [now defunct] server. [One exception to this is the "xperfmon" client which periodically attempts to update a system statistics display. When the X server disappears, xperfmon starts gobbling up reams of CPU time, not recognizing the closed connection for what it is. But this is just a coding error.] I would say that any X client which only tries to use its connection to the server in response to input from the server should run with keep-alives on the connection. Otherwise it will never exit. I'm constantly having to go around killing off abandoned xterms because some people just can't remember to terminate all their client before shutting down their server. Casey
rws@EXPO.LCS.MIT.EDU (05/24/89)
Thanks for all the responses so far, I think I get the picture.
braden@VENERA.ISI.EDU (05/24/89)
I don't believe anyone has advocated that keep-alives are a bad thing... indeed, they appear to be a necessity in an imperfect world. The controversy (for the past 10 years, at least!) is whether or not they belong in TCP. The decision of the TCP/IP developers was that keepalives ought to be in the application layer, not the transport layer. Each application has its own parameters for keepalive. Furthermore, cautious application implementors may already have application-level keepalives, and economy of protocol mechanism argues for having the functionality at only one level. On the other hand, one can (and some people do) argue that economy of mechanism requires that TCP provide a keepalive mechanism that may be invoked and parametrized by an application. The Host Requirements RFC explicitly allows that. Bob Braden
mre@beatnix.UUCP (Mike Eisler) (05/25/89)
In article <8905231205.AA00500@expire.lcs.mit.edu> rws@EXPO.LCS.MIT.EDU writes: >I have a random question that I hope this illustrious audience can answer >definitively for me (or else point me to a definitive source). Is the BSD >notion of SO_KEEPALIVE on a TCP connection considered kosher with respect to >the TCP specification? If so, is its use to be encouraged? Specifically, >it has been suggested that in the X Window System world, X libraries >should automatically be setting SO_KEEPALIVE on connections to X servers. When we brought up X on our BSD systems we tested it against a Visual Graphics 640 X-term. xterm was set up to spawned by init. When the Visual was powered off during a connection a new x-term wouldn't get respawned. Analysis of the BSD client showed the old x-term connection intact, and the xterm process waiting for a message from the Visual which it would never get. We figured KEEP alives would solve the problem and put them into the X library. We found that this cured the problem when the Visual was powered off for a long time; the KEEP alives eventually timed out waiting for a response. But for a quick power-off/power-on, KEEPs didn't help. KEEPs are implemented as 1 byte segments countaining rcv_next-1,snd_una-1 as the ACK and SEQ number values (i.e., a 1 byte segment that the segment's receiver has already acknowledged, containing an ACK sequence # for a byte that the segment's sender has already received). The Visual is listening for a X connection, and as expected responds with a 0 byte reset, using rcv_next-1 as the SEQ number value. After getting the reset, BSD resets the KEEP alive timer because it has "proof" that the connection is no longer idle. BSD then proceeds to follow instructions of section 9.2.15.2 "Reset processing" in MIL-STD-1778 (12 Aug 83): " ... A reset is valid if its sequence number is in the connection's receive window. ... " Well rcv_next-1 is not in the xterm client's window, so the reset is tossed, *after* the KEEP timer was reset. So the BSD client sends another KEEP a few seconds later and the process repeasts itself. So we don't get a connection reset, and we don't even get a connection timeout as a consolation prize. I suppose we could have "fixed" the BSD code to not reset the KEEP timer on resets, but we wanted to have something that would work in the field on existing versions of our O/S. We hacked xterm to send send the NOP request of the X protocol to the server every so often and this has the desired effect (I'm putting on my asbestos suit now...) of getting the immediate reset from the Visual, *within* the client's window. The KEEP alive feature doesn't seem that well thought out. Nor does server crash recovery seem well thought out in X. -Mike Eisler (uunet,sun}!elxsi!mre
phil@ux1.cso.uiuc.edu (05/25/89)
> A TCP keep-alive mechanism should only be invoked in | > network servers that might otherwise hang indefinitely | > and consume resources unnecessarily if a client crashes | > or aborts a connection during a network partition. | Even this should be unnecessary for servers that have a specific timeout, e.g. FTP or SMTP will drop on you if you are idle too long. --Phil howard-- <phil@ux1.cso.uiuc.edu>
barmar@think.COM (Barry Margolin) (05/25/89)
In article <8905250638.AA21706@ucbvax.Berkeley.EDU> dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) writes: >It is worth adding that the excessive use of keepalives has removed a >feature that used to be in TCP and has been recently re-documented by >Bob Braden: TCP used to be remarkably robust against temporary >outages. If you were willing to wait, so was TCP. Now, an outage of >a very short time -- on some implementations, as short as 1-2 minutes -- >will abort the connection. I dispute this claim. TCP is only robust against temporary outages if you don't try to use the connection during that period. For instance, if I'm using telnet, the connection will stay alive during outages if I don't type anything to the client or the host doesn't try to send any output. If either end tries to use the connection, and the outage is longer than the TCP acknowledgement timeout, then the connection will die. If I happen to know that the network is having trouble I won't type anything, but how often is this the case? What it mostly means is that a temporary outage after I go home won't break my connections. TCP's robustness is still a good idea. It's nice to be able to swap Ethernet cables without causing all the network connections to die. But in my experience (which, I admit, isn't all that extensive), any connection that dies for more than a minute or two probably isn't going to come back. What I mostly care about, though, is that the other end definitely has reinitialized, e.g. it has crashed and been rebooted. If it's a telnet server that crashed I can do this by typing into the client, which will provoke a reset, and the client will abort. But if it's the telnet client or an X server that died, there's often no way to force the other end to try to send something so it will get a reset. I think the right solution is a compromise. What's needed is a way to send a segment with infinite (or near-infinite, e.g. hours or a day) retransmissions and slow retransmit rate (one to two minutes). This would allow idle connections to stay up across most network failures, but they will die within a minute or so of the other end rebooting. And, of course, it should be optional, so that applications that perform frequent output of their own need not compound their network use (although since keepalives need only be sent when there are no normal packets in the retransmit queue, any application whose output rate is higher than the keepalive rate will never invoke the keepalive mechanism). Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
MRC@CAC.WASHINGTON.EDU (Mark Crispin) (05/26/89)
I like Barry Margolin's suggestion a lot. I am responsible for several servers which have autologout timers *solely* to handle the case of a client getting rebooted with no mechanism for the server to get a reset. It has been shown to be virtually impossible to pick a timer value short enough to be of use (particularly if a resource is locked up while the server lives) yet long enough to live across some of the delays and temporary outages we see on the operating network. -- Mark -- -------
PADLIPSKY@A.ISI.EDU (Michael Padlipsky) (05/26/89)
In the context of crashes/reboots, the TCP Initial Sequence Number magic is SUPPOSED to save you from "embarrassment" (or so it says here). In the context of Telnet, periodic phoney traffic is completely counter to the desire/necessity for certain Hosts to abort inactive connections (lest, e.g., a terminal be borrowed by an unauthorized user when the authorized user broke coffee too long)--unless, of course, such traffic is never "seen" by the relevant timer-outer. cheers, map Past President, IHK-A's (unless Phil Karn started agitating against 'em before I wrote what became p. 151 of The Book) -------
karn@jupiter (Phil R. Karn) (05/26/89)
>>It is worth adding that the excessive use of keepalives has removed a >>feature that used to be in TCP and has been recently re-documented by >>Bob Braden: TCP used to be remarkably robust against temporary >>outages. [...] >I dispute this claim. TCP is only robust against temporary outages if >you don't try to use the connection during that period. TCP becomes quite robust against all outages (whether or not the connection is idle) once you make a very simple change: get rid of TCP level timeouts! I feel very strongly that TCP should *never* just give up on its own accord; that decision belongs to the application. And, in the event the application is an interactive one, the decision to abort should be left to the human user. If he's willing to wait, why shouldn't the system let him? (The only case when TCP should abort a connection on its own is when it has clear proof that the other end has crashed, i.e., by receiving a valid RST.) Users of my TCP/IP package on amateur packet radio occasionally report cases of FTP transfers that resume automatically after network outages lasting for *days* (e.g., those due to crashes of network nodes in remote locations that require manual resets). They are most happy to do without TCP give-up timers, as long as TCP backs off its retransmissions to avoid channel congestion. Phil
barmar@THINK.COM (Barry Margolin) (05/26/89)
Date: Thu, 25 May 89 13:32:04 PDT From: braden@venera.isi.edu Sorry, but Dave Crocker is perfectly correct. The behaviour that you describe is a property of many current-generation LAN-oriented TCP's [a transparent euphemism], but not of the original research TCP's that were WAN-oriented ... nor even of a TAC. A host implementation that follows the Host Requirements RFC can behave like a TAC for Telnet connections: tell the user when it is retransmitting excessively, but DO NOT CLOSE the connection. Let the user decide when to give up. I don't think we users should accept anything less of our communication software. RFC-793, which defines TCP, says, "If data is not successfully delivered to the destination within the timeout period, the TCP will abort the connection." I can believe that the Host Requirements RFC changes "abort the connection" to "signal an error", but this contradicts your claim that original TCPs were more forgiving. Also, how is a TELNET server or xterm client supposed to tell the user when it is retransmitting excessively? Its communication path to the user is the failing connection. Sure, it could put something in a system log or write a message to the system console, but how is the operator (if there is one) supposed to know why the remote machine isn't responding? barmar
dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) (05/26/89)
THe issue of aborting a connection, due to a retransmission timeout, is the choice of the application. Telnet could, as easily, decide to keep trying. Dave
davecb@yunexus.UUCP (David Collier-Brown) (05/26/89)
In article <20761@news.Think.COM> barmar@kulla.think.com.UUCP (Barry Margolin) writes: TCP's robustness is still a good idea. It's nice | to be able to swap Ethernet cables without causing all the network | connections to die. But in my experience (which, I admit, isn't all | that extensive), any connection that dies for more than a minute or | two probably isn't going to come back. [...] Actually the connection might well come back: I had a crossbar switch that timed me out every so often, assuming that I don't leave the terminal for substantial periods without disconnecting it. (This is silly, but not unreasonable for a device which thinks its switching telephone voice lines). After I got back to the terminal controller I could then reconnect to my process. Keepalives would be more secure in such a situation (anyone could pretend to be me if the tty server disconnected), but would tend to cause me to lose work-in-process... Methinks that a facility for polling a connection makes sense, as well as one to send "reset the poll clock, if any" (keepalives redux) would be useful. As does Barry, I'd propose they be optional. I'd also propose that 1) if one exists, so must its complement 2) they be composed out of existing facilities, as were keepalives, and 3) they be distinguishable from any other facility (unambiguous). --dave
dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) (05/26/89)
Phil, As a test-of-concept: I assume that you have no objection to a TCP implementation's being able to do keepalives, under the control of the application, where both the fact of keepalives AND their periodicity can be specified; and the effect of a timeout is a signal to the application, not an abort? Dave
barns@GATEWAY.MITRE.ORG (Bill Barns) (05/26/89)
Sigh. The Tenex TCP of ages ago certainly allowed the user timeout to be set infinite, by specifying a value of 0. If there are older ones than that, I don't think they can be much older! I think the claims about the old TCP's having this capability are grounded in fact. However, it just may be that Bob & Dave fell into a trap here. I almost wrote the message they both wrote, but went off to check RFC 793 and I did not locate any text stating that an RFC 793 conformant TCP necessarily has to provide any particular range of user timeout settings, except that the default is five minutes. And I was just SURE it was there. Oops. The designers probably had it so firmly in their minds from prior discussions that they forgot to write it down explicitly. THAT sounds like a job for **!!HOST REQUIREMENTS MAN!!** However**2, TCP is ALSO specified in MIL-STD-1778, and it DOES have an explicit requirement for the TCP to allow the upper layer to choose whether a user timeout should result in a notification to the upper layer or should cause the TCP to abort the connection. This is referred to in many places but the most coherent description is in section 9.2.9. For your convenience, I've appended it below. Sad to say, there are (other) inconsistencies within the MIL-STD and between it and the RFC. The MIL-STD, section 9.4.4.7, sets the default timeout as 120 unidentified units. Obviously 24ths of a minute... etc. Bill Barns / MITRE-Washington / barns@gateway.mitre.org ------- [MIL-STD-1778] 9.2.9 ULP timeout and ULP timeout action. The timeout allows a ULP to set up a timeout for all data submitted to the TCP entity. If some data is not successfully delivered to the destination within the timeout period, the state of ULP_timeout_action is checked. If ULP_timeout_action is 1, the TCP entity will terminate the connection. If it is 0, the TCP entity informs the ULP that a timeout has occurred, and then resets the timer. The timeout appears as an optional parameter in the open request and the send request. Upon receiving either an active open request, or a SYN segment after a passive request, the TCP entity must maintain a timer set for the interval specified by the ULP. As acknowledgments arrive from the remote TCP, the timer is cancelled and set again for the timeout interval. As parameters of the SEND request, timeout and timeout_action can change during connection lifetime. If the timeout is reduced below the age of data waiting to be acknowledged, the event dictated by ULP_timeout_action will occur. The implementor may choose to allow additional options when informing the ULP in case of a timeout; for example, informing the ULP only on the first timeout.
karn@THUMPER.BELLCORE.COM (Phil R. Karn) (05/27/89)
Dave, Yes, that might be acceptable to me. I'd go a little further, though, and say that a REMOTE USER (not just the application code) must always be able to turn off keepalives, even on binary-only systems. It does no good to say "the application must be able to disable keepalives" when I'm having problems with a remote server that I have no administrative control over. Much of my animosity toward keepalives came from trying to make a Sun workstation work properly over SLIP links and amateur packet radio. I finally replaced the TCP object modules provided by Sun with ones compiled from Van's latest TCP, which I had already edited to disable keepalives. Works like a charm. At the last InterOp, I sat next to Dave Borman in a panel session on TCP performance. Between us, we represented a "dynamic range" of about 6 orders of magnitude in TCP transfer rates (1200 bps amateur packet radio to 500 Mbps between Crays). This is an exceptional achievement for a single networking protocol, but it was possible only because TCP was designed from the beginning to scale well over a wide network performance range. But broken mechanisms like keepalives threaten this. We need a big red warning light that will flash whenever someone proposes to put an fixed time interval into a protocol spec, because you can't scale protocols that have arbitrary timers. Phil
barr@frog.UUCP (Chris Barr) (05/27/89)
In support of 'no timeouts at TCP layer': The first explanation I ever heard of TCP/IP was that it was designed (for DOD) to survive battle conditions where connections were expected to break and later be restored. I then used someone's Telnet which broke a session after 15 minutes without keystrokes.
CERF@A.ISI.EDU (05/29/89)
When TCP was first designed, and for all subsequent versions, it was thought inappropriate to impose any kind of semantics on the logical connections extablished by TCP. In particular, no sense of absolute timeout for the severing of a connection was desired. We thought that such notions of "impatience" or "time to give up" ought to be the choice of the upper level protocol using TCP as the basis merely for reliable delivery. A part of this view stemmed from the fact that the networks over which TCP had to function, for the DoD applications we had in mind, were potentially very unpredictable as to loss and delay. Mobile packet radio systems had to function under jamming and radio shadow effects, for instance. TCP never unilaterally severed connections but only reported failure to achieve positive acknowledgement after a time which could be controlled by the application or upper-level protocol. It was up to the application to decide whether to sever the connection and, even then, the choice to do so gracefully or abruptly was also left to the application. The use of a feature (X-level NOP) to test the liveness of a TCP connection is consonant with the model against which the TCP was designed. Vint Cerf
mo@prisma.UUCP (05/30/89)
I hear you, Bob, but I, for one, don't think it reasonable for every applications protocol developer to have to reinvent all the common stuff of doing keep-alives at the applications level. According to the advertising copy, TCP provides reliable virtual circuits. In my book, knowing that the other end has croaked is part of the definition of "reliable." Since this is mechanism that is going to have to be reinvented by lots of protocols, it makes sense to get it right ONCE so people don't have to (1) reinvent all the bugs and (2) can just use it for what they really want to be doing. The notion that protocols are only designed by "mavens" is long dead, and rightly so. -Mike
jas@proteon.com (John A. Shriver) (06/01/89)
The user (client) Telnet in the MIT UNIX V6 TCP/IP (one of those pre-Bezerkely WAN TCP/IP's) would periodically print: Host not responding, type ^^q to quit on the user's terminal when (and only when) it had outstanding data to send, and could not get it acknowledged. If you had reason to beleive it was right, you aborted the connection. Otherwise, it sat there retransmitting at a slow rate until connectivity was regained. Meanwhile, you would go and fix the broken router, and *would not lose your current session* on the remote host. Now, if the server Telnet gets into a pickle, it would probably just abort and die. That UNIX lacks any way to preserve a login session is it's problem, MIT AI ITS (on PDP-10's) knew exactly how to preserve your state when this happened. Of course, most systems are not in the habit of generating unsolicited output, so this didn't happen as often.
jqj@HOGG.CC.UOREGON.EDU (06/02/89)
Seems to me that much of this discussion is missing the point that an open TCP connection (especially a telnet session) can tie up expensive resources on the server; most of the recent discussion has focussed on the problems of a user who may or may not want to abort a connection on network or remote host failure. For example, many timesharing systems charge based on "connect time", and some even enforce a maximum number of outstanding sessions. In such cases it is in the interest of the user and the system to abort a telnet session if there is reason to believe that loss of connectivity is not just briefly transient. One can obviously do this with a (perhaps user settable) timeout, but are there other heuristics that might usefully be used as well? Does anyone have any data on the distribution of time-length of network partitions? How, for that matter, might we define a network partition? Many events (e.g. the TR card in our NSS going bad) yield obvious network partitions with well defined lengths. Others, e.g. a degraded quality line, may imply very short (a few ms or s) partitions, which increase the errors and retransmissions and ultimately imply an unusable TCP connection. Can we come up with an analytic model that includes both sorts of failures?
stev@VAX.FTP.COM (06/02/89)
*Phil, * *As a test-of-concept: I assume that you have no objection to a TCP *implementation's being able to do keepalives, under the control of the *application, where both the fact of keepalives AND their periodicity *can be specified; and the effect of a timeout is a signal to the *application, not an abort? * *Dave if an application wants a keep alive mechanism, it should do it itself, sending a byte of garbage data, and abusing the sequence numbers is not the way to go about this . . . . and hopefully, the people doing the keepalive mechanism will alow either end to disable it. if i startup an ftp to run all night sucking over the latest X distribution, i dont want it being aborted because a gateway goes down for an hour for PM. stev knowles ftp software stev@ftp.com
MAP@LCS.MIT.EDU (Michael A. Patton) (06/08/89)
From: prisma!mo@uunet.uu.net Date: Tue, 30 May 89 08:07:02 -0600 [...] According to the advertising copy, TCP provides reliable virtual circuits. In my book, knowing that the other end has croaked is part of the definition of "reliable." But just because you aren't getting replies does NOT mean the other end "croaked", just that something did. If it's internal to the network, it should recover and you can continue. The indication that the other end "croaked" is receiving a RST! How an application deals with being temporarily partitioned has to be up to that application. There are just too many possibilities. Since this is mechanism that is going to have to be reinvented by lots of protocols, it makes sense to get it right ONCE [...] But there isn't one right answer so how can we "get it right ONCE"? The whole argument here is that the BSD implementation goes against the design of TCP in that they chose one specific requirement and implemented a solution to it ONCE, but what I want is NOT what they provide and what the guy in the next office wants is not what I want. No strategy that is built into the TCP layer will be right for all applications, and it can get in the way of applications that want some other specific type of handling for these cases. [...] so people don't have to (1) reinvent all the bugs and (2) can just use it for what they really want to be doing. But you don't have to break TCP (oops, I mean add to it) to prevent people from reinventing things. Provide them with a library of different techniques for handling various network problems. If I want one of the standard techniques, I just use it. If I want something special, I write it (and if it's of general use, it's an addition to the library). __ /| /| /| \ Michael A. Patton, Network Manager / | / | /_|__/ Laboratory for Computer Science / |/ |/ |atton Massachusetts Institute of Technology Disclaimer: The opinions expressed above are a figment of the phosphor on your screen and do not represent the views of MIT, LCS, or MAP. :-)
frg@jfcl.dec.com (Fred R. Goldstein) (06/09/89)
This is probably a stupid question since I'm not familiar with the way different systems (ie BSD) implement TCP timeouts. But wouldn't the problem of dissimilar systems (ie, AX.25 on one end and Cray on the other) still be solvable by basing the timeout on the smoothed round trip time (srtt)? If the keepalive timer were some significant multiple of srtt (or longer, if srtt is short) then it would still scale. Proper behavior, of course, is still open to debate -- whether the application or TCP should do the teardown. I'm not joining in... fred
karels@OKEEFFE.BERKELEY.EDU (Mike Karels) (06/09/89)
Sorry, I can't let this go by without commenting on Phil's message and this discussion, even though the discussion has mostly died down. (I haven't been reading tcp-ip very often, but noticed this subject line going by.) Last time Phil and I talked about keepalives in person, I asked him whether he had problems with telnet/rlogin servers accumulating on his systems if they didn't use keepalives. We certainly accumulate junk, including xterm programs, waiting for input from a half-open connection. Phil told me that he doesn't have problems, because he runs a "wall" every night to force output to all users, and of course breaking connections that time out. In other words, Phil violently objects to servers requesting keepalives from TCP, but allows the system manager (himself) to force them above the application level. And before people jump up to point out the difference in time scales, the current BSD code sends no keepalive packets until a connection has been idle for 2 hr, and that interval is easily changeable. One proposal for the Host Requirements document was to wait for 12 hr. I think that's a bit high, but the difference is only a factor of 6. Compare the number of keepalive packets with the number of packets exchanged by an xterm and an X server over the course of a week if used 4 hours a day! Phil says: ... I'd go a little further, though, and say that a REMOTE USER (not just the application code) must always be able to turn off keepalives, even on binary-only systems. It does no good to say "the application must be able to disable keepalives" when I'm having problems with a remote server that I have no administrative control over. I'm sorry, Phil, but remote users have no more right to override system management policies than do local users (at least on *our* systems!). On some of the systems where I have guest accounts, local or remote users are logged off if they aren't active for two hours. I don't like that, either, but I don't claim that the managers of those systems have no right to enforce such a policy. Mike
dcrocker@AHWAHNEE.STANFORD.EDU (Dave Crocker) (06/10/89)
Steve, Let me try, one last time: If the application can direct TCP as to the periodicity and the action to be taken (notify application vs. abort connection) then the application will not abort your connection unless the application programmer decided to force that condition. Under proper design, the programmer will give the user a switch to set, indicating something about the "persistance" that is desired. With respect to having the mechanism in tcp or the application, I agree with you, philosophically, that the mechanism should be in the application (although I believe the OSI model would put it into the session layer, but that seems mostly to be part of the application process, these days. The major issues, however, are kernel vs. user space, and additional complexity to the application protocol. There is a remarkable economy that derives from puting this mechanism into the kernel/transport system. It may be an accident that TCP does not have the mechanism but can be tricked into creating one, but it still is remarkably simple. Most application protocols have very simple interaction styles and tend to be relatively easy to program. To force time-based generation of action would complexify these protocols significantly. Dave