[comp.windows.x] Why does X not use SO_KEEPALIVE?

jack@cwi.nl (Jack Jansen) (03/31/89)

I noticed that the X library doesn't set the SO_KEEPALIVE option on the
socket. This causes X clients to hang if the server machine crashes
(something my experimental server machine often does, and also something
I would expect to be common with X terminals).

Is there a good reason not the set SO_KEEPALIVE?

Please reply, since I don't have the time to keep up with this newsgroup
most of the time. I'll summarize, of course.
-- 
--
Een volk dat voor tirannen zwicht	| Oral:     Jack Jansen
zal meer dan lijf en goed verliezen	| Internet: jack@cwi.nl
dan dooft het licht			| Uucp:     mcvax!jack

rws@EXPO.LCS.MIT.EDU (05/24/89)

    Is there a good reason not to set SO_KEEPALIVE?

I've probed a TCP/IP wizards list about this.  While there doesn't seem to be
unanimity, there is clearly a sizable contingent that believe this feature is
evil, and that even it is deemed necessary in certain situations,
"indiscriminant" use (e.g. automatically in all X connections) is evil.
Here are excerpts from a draft Requirements for Internet Hosts RFC:

	Implementors MAY include "keep-alives" in their TCP
	implementations, although this practice is not universally
	accepted.  If keep-alives are included, the application MUST
	be able to turn them on or off for each TCP connection, and
	they MUST default to off.

	The TCP specification does not include a keep-alive
	mechanism because it could:  (1) cause perfectly good
	connections to break during transient Internet
	failures; (2) consume unnecessary bandwidth ("if no one
	is using the connection, who cares if it is still
	good?"); and (3) cost money for an Internet path that
	charges for packets.

	A TCP keep-alive mechanism should only be invoked in
	network servers that might otherwise hang indefinitely
	and consume resources unnecessarily if a client crashes
	or aborts a connection during a network partition.

barmar@think.COM (Barry Margolin) (05/25/89)

In article <8905241247.AA00833@expire.lcs.mit.edu> rws@EXPO.LCS.MIT.EDU writes:
>"indiscriminant" use (e.g. automatically in all X connections) is evil.

Agreed.  xperfmon, xclock, and other clients that produce frequent
automatic output don't need keepalives, but xterm, emacs, and most
other event-driven applications generally do.  Servers probably don't
need to use keepalives, either.

>	A TCP keep-alive mechanism should only be invoked in
>	network servers that might otherwise hang indefinitely
>	and consume resources unnecessarily if a client crashes
>	or aborts a connection during a network partition.

Since only the client application knows its interaction style, only it
knows whether it needs keepalives.  This implies that there needs to
be an option to the X stream-creation routine to specify this.  If
this doesn't fit into the current Xlib design, then it could be done
with a new Xlib function to turn keepalives on and off.  Needless to
say, this option would be advisory only, since not all OSes and
transport protocols may support this notion.

I definitely think that this is a case where keepalives are warranted.
I'm getting sick of having to hunt down all my xterms whenever my
server crashes.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

rws@EXPO.LCS.MIT.EDU (05/25/89)

    This implies that there needs to
    be an option to the X stream-creation routine to specify this.

At present, there appears to be enough contention about use of keepalives
that I would prefer to "ignore" them at the Xlib level entirely.  Clients
that wish to (ab)use SO_KEEPALIVE and other OS "features" can use the
XConnectionNumber() directly.

barmar@THINK.COM (Barry Margolin) (05/25/89)

    Date: Wed, 24 May 89 16:06:03 -0400
    From: rws@expo.lcs.mit.edu

	This implies that there needs to
	be an option to the X stream-creation routine to specify this.

    At present, there appears to be enough contention about use of keepalives
    that I would prefer to "ignore" them at the Xlib level entirely.  Clients
    that wish to (ab)use SO_KEEPALIVE and other OS "features" can use the
    XConnectionNumber() directly.

Well, I'd prefer a more portable mechanism.  An Xlib-based keepalive
interface could turn a timeout into an error event, so the check would
fit naturally into the application's event loop.

                                                barmar

rbj@DSYS.ICST.NBS.GOV (Root Boy Jim) (05/26/89)

? From: rws@expo.lcs.mit.edu

?     This implies that there needs to
?     be an option to the X stream-creation routine to specify this.

? At present, there appears to be enough contention about use of keepalives
? that I would prefer to "ignore" them at the Xlib level entirely.  Clients
? that wish to (ab)use SO_KEEPALIVE and other OS "features" can use the
? XConnectionNumber() directly.

RWS,
	I saw your query on the TCP/IP list. There seemed to be several
heavyweights aligned against keep-alives, and I don't really know
enuf to oppose them. However, I do have several comments.

1) As to the objection that KA's would interact poorly with long-haul
networks, I agree. However, quite a few X clients talk to a server on the
same network, and many of these talk to the same machine (note: I am
hereby cautioning people against using UNIX domain sockets, i.e. unix:0,
until they work perfectly. Use "localhost:0" or "`$hostname`:0" instead).
In this case, KA's might be appropriate.

2) In the case of an xclock that is producing frequent output (especially
with a second hand), are KA's sent anyway? My SunOS 3.5 says the KA
timer value is 45 seconds. Pretty long time.

3) Several people have mentioned that this feature might be nice on
xterms. Perhaps a "-keepalive" option could be added either to xterm
or as a standard option. At least then everyone can do it the same way.

4) I am not sure that Barmar's idea of mapping KA failure into an
X event helps. Doesn't the OS kill the connection? Or is that left
up to the guy who receives the SIGPIPE?

	Root Boy Jim is what I am
	Are you what you are or what?

diamant@hpfclp.SDE.HP.COM (John Diamant) (06/01/89)

> I definitely think that this is a case where keepalives are warranted.
> I'm getting sick of having to hunt down all my xterms whenever my
> server crashes.

I'm told that KEEPALIVE wouldn't solve your problem here anyway.  It only
solves the problem on remote connections when the server machine actually
crashes (panic, powerfail, whatever, or is disconnected from the network).  If
the server dies and the machine continues to run, the operating system will
close all open file descriptors (including sockets).  The problem must be a
lack of handling of the socket closure in the client that causes the hung
processes, not the fact that it remained open (it may be that the process
doesn't notice that the socket is closed until it tries to write to it).


John Diamant
Software Engineering Systems Division
Hewlett-Packard Co.		ARPA Internet: diamant@hpfclp.sde.hp.com
Fort Collins, CO		UUCP:  {hplabs,hpfcla}!hpfclp!diamant

barmar@think.COM (Barry Margolin) (06/05/89)

In article <9740091@hpfclp.SDE.HP.COM> diamant@hpfclp.SDE.HP.COM (John Diamant) writes:
>> I definitely think that this is a case where keepalives are warranted.
>> I'm getting sick of having to hunt down all my xterms whenever my
>> server crashes.
>I'm told that KEEPALIVE wouldn't solve your problem here anyway.  It only
>solves the problem on remote connections when the server machine actually
>crashes (panic, powerfail, whatever, or is disconnected from the network).  

Read my lips: "whenever my server crashes".  My server is a Symbolics
3640 Lisp Machine, and from time to time it crashes with a hard disk
error.  I usually warm boot it, and this reinitializes the network
software, thus getting rid of all the TCP connection without sending
RST packets (the warm boot software doesn't want to trust that the old
TCB's haven't been corrupted by the crash).

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar