jack@cwi.nl (Jack Jansen) (03/31/89)
I noticed that the X library doesn't set the SO_KEEPALIVE option on the socket. This causes X clients to hang if the server machine crashes (something my experimental server machine often does, and also something I would expect to be common with X terminals). Is there a good reason not the set SO_KEEPALIVE? Please reply, since I don't have the time to keep up with this newsgroup most of the time. I'll summarize, of course. -- -- Een volk dat voor tirannen zwicht | Oral: Jack Jansen zal meer dan lijf en goed verliezen | Internet: jack@cwi.nl dan dooft het licht | Uucp: mcvax!jack
rws@EXPO.LCS.MIT.EDU (05/24/89)
Is there a good reason not to set SO_KEEPALIVE? I've probed a TCP/IP wizards list about this. While there doesn't seem to be unanimity, there is clearly a sizable contingent that believe this feature is evil, and that even it is deemed necessary in certain situations, "indiscriminant" use (e.g. automatically in all X connections) is evil. Here are excerpts from a draft Requirements for Internet Hosts RFC: Implementors MAY include "keep-alives" in their TCP implementations, although this practice is not universally accepted. If keep-alives are included, the application MUST be able to turn them on or off for each TCP connection, and they MUST default to off. The TCP specification does not include a keep-alive mechanism because it could: (1) cause perfectly good connections to break during transient Internet failures; (2) consume unnecessary bandwidth ("if no one is using the connection, who cares if it is still good?"); and (3) cost money for an Internet path that charges for packets. A TCP keep-alive mechanism should only be invoked in network servers that might otherwise hang indefinitely and consume resources unnecessarily if a client crashes or aborts a connection during a network partition.
barmar@think.COM (Barry Margolin) (05/25/89)
In article <8905241247.AA00833@expire.lcs.mit.edu> rws@EXPO.LCS.MIT.EDU writes: >"indiscriminant" use (e.g. automatically in all X connections) is evil. Agreed. xperfmon, xclock, and other clients that produce frequent automatic output don't need keepalives, but xterm, emacs, and most other event-driven applications generally do. Servers probably don't need to use keepalives, either. > A TCP keep-alive mechanism should only be invoked in > network servers that might otherwise hang indefinitely > and consume resources unnecessarily if a client crashes > or aborts a connection during a network partition. Since only the client application knows its interaction style, only it knows whether it needs keepalives. This implies that there needs to be an option to the X stream-creation routine to specify this. If this doesn't fit into the current Xlib design, then it could be done with a new Xlib function to turn keepalives on and off. Needless to say, this option would be advisory only, since not all OSes and transport protocols may support this notion. I definitely think that this is a case where keepalives are warranted. I'm getting sick of having to hunt down all my xterms whenever my server crashes. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
rws@EXPO.LCS.MIT.EDU (05/25/89)
This implies that there needs to be an option to the X stream-creation routine to specify this. At present, there appears to be enough contention about use of keepalives that I would prefer to "ignore" them at the Xlib level entirely. Clients that wish to (ab)use SO_KEEPALIVE and other OS "features" can use the XConnectionNumber() directly.
barmar@THINK.COM (Barry Margolin) (05/25/89)
Date: Wed, 24 May 89 16:06:03 -0400 From: rws@expo.lcs.mit.edu This implies that there needs to be an option to the X stream-creation routine to specify this. At present, there appears to be enough contention about use of keepalives that I would prefer to "ignore" them at the Xlib level entirely. Clients that wish to (ab)use SO_KEEPALIVE and other OS "features" can use the XConnectionNumber() directly. Well, I'd prefer a more portable mechanism. An Xlib-based keepalive interface could turn a timeout into an error event, so the check would fit naturally into the application's event loop. barmar
rbj@DSYS.ICST.NBS.GOV (Root Boy Jim) (05/26/89)
? From: rws@expo.lcs.mit.edu ? This implies that there needs to ? be an option to the X stream-creation routine to specify this. ? At present, there appears to be enough contention about use of keepalives ? that I would prefer to "ignore" them at the Xlib level entirely. Clients ? that wish to (ab)use SO_KEEPALIVE and other OS "features" can use the ? XConnectionNumber() directly. RWS, I saw your query on the TCP/IP list. There seemed to be several heavyweights aligned against keep-alives, and I don't really know enuf to oppose them. However, I do have several comments. 1) As to the objection that KA's would interact poorly with long-haul networks, I agree. However, quite a few X clients talk to a server on the same network, and many of these talk to the same machine (note: I am hereby cautioning people against using UNIX domain sockets, i.e. unix:0, until they work perfectly. Use "localhost:0" or "`$hostname`:0" instead). In this case, KA's might be appropriate. 2) In the case of an xclock that is producing frequent output (especially with a second hand), are KA's sent anyway? My SunOS 3.5 says the KA timer value is 45 seconds. Pretty long time. 3) Several people have mentioned that this feature might be nice on xterms. Perhaps a "-keepalive" option could be added either to xterm or as a standard option. At least then everyone can do it the same way. 4) I am not sure that Barmar's idea of mapping KA failure into an X event helps. Doesn't the OS kill the connection? Or is that left up to the guy who receives the SIGPIPE? Root Boy Jim is what I am Are you what you are or what?
diamant@hpfclp.SDE.HP.COM (John Diamant) (06/01/89)
> I definitely think that this is a case where keepalives are warranted. > I'm getting sick of having to hunt down all my xterms whenever my > server crashes. I'm told that KEEPALIVE wouldn't solve your problem here anyway. It only solves the problem on remote connections when the server machine actually crashes (panic, powerfail, whatever, or is disconnected from the network). If the server dies and the machine continues to run, the operating system will close all open file descriptors (including sockets). The problem must be a lack of handling of the socket closure in the client that causes the hung processes, not the fact that it remained open (it may be that the process doesn't notice that the socket is closed until it tries to write to it). John Diamant Software Engineering Systems Division Hewlett-Packard Co. ARPA Internet: diamant@hpfclp.sde.hp.com Fort Collins, CO UUCP: {hplabs,hpfcla}!hpfclp!diamant
barmar@think.COM (Barry Margolin) (06/05/89)
In article <9740091@hpfclp.SDE.HP.COM> diamant@hpfclp.SDE.HP.COM (John Diamant) writes: >> I definitely think that this is a case where keepalives are warranted. >> I'm getting sick of having to hunt down all my xterms whenever my >> server crashes. >I'm told that KEEPALIVE wouldn't solve your problem here anyway. It only >solves the problem on remote connections when the server machine actually >crashes (panic, powerfail, whatever, or is disconnected from the network). Read my lips: "whenever my server crashes". My server is a Symbolics 3640 Lisp Machine, and from time to time it crashes with a hard disk error. I usually warm boot it, and this reinitializes the network software, thus getting rid of all the TCP connection without sending RST packets (the warm boot software doesn't want to trust that the old TCB's haven't been corrupted by the crash). Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar