tom@litle.litle.COM (tom hampton) (06/14/90)
We have a problem with a socket-based server application which needs to restart quickly. What we find instead is that it may take a minute before the passively binding program can reopen a socket after both the active and passive binders have terminated. Is this because 1) We have a bug in ISC 2.2 TCP/IP 2) We are using a "well known address" and these things don't recycle instantly after closing (just a wild guess...) 3) We aren't cloising things properly? We sure think we are, but we notice that the problem gets even worse after an abnormal termination... -- =============================================================================== Tom Hampton, Mgr. New Technology, Litle & Co. | POB A218, Hanover, NH 03755 603 643 1832 ------------------------------------------------------------------------------- Design is about figuring out what you won't be able to do. ------------------------------------------------------------------------------- tom@litle.com tom@litle.uucp {backbone}!dartvax.dartmouth.edu!litle!tom ===============================================================================
markr@seqp4.ORG (Mark Roddy) (06/15/90)
In <4847@litle.litle.COM> tom@litle.litle.COM (tom hampton) writes: [How Long Must I Wait?] >We have a problem with a socket-based server application which needs to >restart quickly. What we find instead is that it may take a minute >before the passively binding program can reopen a socket after both >the active and passive binders have terminated. >2) We are using a "well known address" and these things don't recycle > instantly after closing (just a wild guess...) In fact you must wait: CHORUS: Two Times the Maximum Segment Lifetime! Which is the minute or two you describe. The solution is to not use the well known address for the connection - TCP allows you to accept a connection on a different port address than the destination address in the connection request. This is the mechanism used by most *nix TCP/IP services. The well-known address is always available to service the next incoming connection request. -- -Mark Roddy Sequoia Systems, Inc. (508) 480-0800 x1284 markr@seqp4.uucp m2c!seqp4!markr
smb@ulysses.att.com (Steven Bellovin) (06/16/90)
In article <515@seqp4.UUCP>, markr@seqp4.ORG (Mark Roddy) writes: > The solution is to not use the well known address for the connection - > TCP allows you to accept a connection on a different port address than > the destination address in the connection request. > > This is the mechanism used by most *nix TCP/IP services. > > The well-known address is always available to service the > next incoming connection request. No, that's not quite correct. A TCP connection is defined by the 4-tuple <localhost,localport,remotehost,remoteport>. Servers define the first two parts -- on Berkeley-derived systems, via the bind() call. When a client connects, a connection is instantiated with a fixed pair of values for <remotehost,remoteport>. The server's partial connection -- or rather, the under-specified TCB for its passive open -- can still exist; that's an implementation issue. What happens on Berkeley-derived systems is that you get a new file descriptor in the server for each connection; the port numbers don't change. However, the kernel will not let you bind even a server port if any TCBs exist with the same <localhost,localport> portion, even if all of the others have <remotehost,remoteport> filled in. You can override this via setsockopt() with the SO_REUSEADDR option. Inetd does exactly that.
thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen) (06/16/90)
markr@seqp4.ORG (Mark Roddy) writes: >In <4847@litle.litle.COM> tom@litle.litle.COM (tom hampton) writes: >>2) We are using a "well known address" and these things don't recycle >> instantly after closing (just a wild guess...) >In fact you must wait: > CHORUS: Two Times the Maximum Segment Lifetime! >Which is the minute or two you describe. >The solution is to not use the well known address for the connection - >TCP allows you to accept a connection on a different port address than >the destination address in the connection request. >This is the mechanism used by most *nix TCP/IP services. This turns out not to be the case, at least in the Berkeley implementation of TCP/IP (also known as Berkeley sockets). This is what happens: The server process creates a socket and uses bind(2) to associate it with the well known port number for its service. However, a socket has two address/port pairs, one local and one foreign. Bind(2) only sets the local part, so that the socket will accept connections from any client. (Each client uses connect(2) to set the foreign part on its socket and make a connection). When a connection is established, a new socket is created and passed to the server process (returned from accept(2)). This new socket has its foreign address part set to the client's. When a TCP connection ends, the party which sends the last packet in the protocol (the FIN ACK) has to wait a short time after that to see if it was lost (if so, the other party resends its last packet, the first sends FIN ACK and waits again). That's your minute. Note that its the party that closes first that gets to wait. When you try to start a new server, it has to bind(2) the same address. The system will not allow that if there an existing protocol control block conflicts with the address. In 4.3-tahoe, conflict is defined as follows: A) The address pairs are the same. In UNIX, this may happen if some server subprocesses haven't closed the original socket descriptor (Berkeley rlogind, e.g.). B) The new address pair would match some packets which ``belong'' to an existing control block. In 4.3-tahoe, this test is not performed if either 1) SO_REUSEADDR has been set with setsockopt(2) or 2) listen(2) has been called on the socket and the protocol is connection-oriented (as TCP is). If the test is done, server-client connections in the wait state will block the bind(2), even though the file descriptor has been closed. So: Check if all child processes of the server have gone away; if not, take care that they close the original socket before they start processing. Also, call listen(2) before bind(2) if your implementation allows it, otherwise set SO_REUSEADDR (ditto), otherwise complain to your vendor. It is strange that the 4.3 IPC tutorials all show the use of bind(2) before listen(2), when the kernel works ``better'' with the opposite order. -- Lars Mathiesen, DIKU, U of Copenhagen, Denmark [uunet!]mcsun!diku!thorinn Institute of Datalogy -- we're scientists, not engineers. thorinn@diku.dk