[comp.protocols.tcp-ip] How long before I can reopen a closed socket?

tom@litle.litle.COM (tom hampton) (06/14/90)

We have a problem with a socket-based server application which needs to
restart quickly.  What we find instead is that it may take a minute
before the passively binding program can reopen a socket after both
the active and passive binders have terminated.

Is this because

1) We have a bug in ISC 2.2 TCP/IP

2) We are using a "well known address" and these things don't recycle
   instantly after closing (just a wild guess...)

3) We aren't cloising things properly?  We sure think we are, but 
   we notice that the problem gets even worse after an abnormal 
   termination...

-- 
===============================================================================
 Tom Hampton, Mgr. New Technology, Litle & Co. | POB A218, Hanover, NH 03755
 603 643 1832 
-------------------------------------------------------------------------------
 Design is about figuring out what you won't be able to do.
-------------------------------------------------------------------------------
tom@litle.com  tom@litle.uucp  {backbone}!dartvax.dartmouth.edu!litle!tom
===============================================================================

markr@seqp4.ORG (Mark Roddy) (06/15/90)

In <4847@litle.litle.COM> tom@litle.litle.COM (tom hampton) writes:

[How Long Must I Wait?]

>We have a problem with a socket-based server application which needs to
>restart quickly.  What we find instead is that it may take a minute
>before the passively binding program can reopen a socket after both
>the active and passive binders have terminated.

>2) We are using a "well known address" and these things don't recycle
>   instantly after closing (just a wild guess...)

In fact you must wait:

	CHORUS: Two Times the Maximum Segment Lifetime!

Which is the minute or two you describe.

The solution is to not use the well known address for the connection -
TCP allows you to accept a connection on a different port address than
the destination address in the connection request.

This is the mechanism used by most *nix TCP/IP services. 

The well-known address is always available to service the 
next incoming connection request.
-- 
-Mark Roddy
Sequoia Systems, Inc.          (508) 480-0800 x1284
markr@seqp4.uucp               m2c!seqp4!markr

smb@ulysses.att.com (Steven Bellovin) (06/16/90)

In article <515@seqp4.UUCP>, markr@seqp4.ORG (Mark Roddy) writes:
> The solution is to not use the well known address for the connection -
> TCP allows you to accept a connection on a different port address than
> the destination address in the connection request.
> 
> This is the mechanism used by most *nix TCP/IP services. 
> 
> The well-known address is always available to service the 
> next incoming connection request.

No, that's not quite correct.  A TCP connection is defined by the
4-tuple <localhost,localport,remotehost,remoteport>.  Servers define
the first two parts -- on Berkeley-derived systems, via the bind() call.
When a client connects, a connection is instantiated with a fixed
pair of values for <remotehost,remoteport>.  The server's partial
connection -- or rather, the under-specified TCB for its passive
open -- can still exist; that's an implementation issue.

What happens on Berkeley-derived systems is that you get a new file
descriptor in the server for each connection; the port numbers
don't change.  However, the kernel will not let you bind even a
server port if any TCBs exist with the same <localhost,localport>
portion, even if all of the others have <remotehost,remoteport>
filled in.  You can override this via setsockopt() with the
SO_REUSEADDR option.  Inetd does exactly that.

thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen) (06/16/90)

markr@seqp4.ORG (Mark Roddy) writes:

>In <4847@litle.litle.COM> tom@litle.litle.COM (tom hampton) writes:
>>2) We are using a "well known address" and these things don't recycle
>>   instantly after closing (just a wild guess...)

>In fact you must wait:

>	CHORUS: Two Times the Maximum Segment Lifetime!

>Which is the minute or two you describe.

>The solution is to not use the well known address for the connection -
>TCP allows you to accept a connection on a different port address than
>the destination address in the connection request.

>This is the mechanism used by most *nix TCP/IP services. 

This turns out not to be the case, at least in the Berkeley
implementation of TCP/IP (also known as Berkeley sockets). This is
what happens: The server process creates a socket and uses bind(2) to
associate it with the well known port number for its service. However,
a socket has two address/port pairs, one local and one foreign.
Bind(2) only sets the local part, so that the socket will accept
connections from any client. (Each client uses connect(2) to set the
foreign part on its socket and make a connection).

When a connection is established, a new socket is created and passed
to the server process (returned from accept(2)). This new socket has
its foreign address part set to the client's.

When a TCP connection ends, the party which sends the last packet in
the protocol (the FIN ACK) has to wait a short time after that to see
if it was lost (if so, the other party resends its last packet, the
first sends FIN ACK and waits again). That's your minute. Note that
its the party that closes first that gets to wait.

When you try to start a new server, it has to bind(2) the same
address. The system will not allow that if there an existing protocol
control block conflicts with the address. In 4.3-tahoe, conflict is
defined as follows:
	A) The address pairs are the same. In UNIX, this may happen if
some server subprocesses haven't closed the original socket descriptor
(Berkeley rlogind, e.g.).
	B) The new address pair would match some packets which
``belong'' to an existing control block. In 4.3-tahoe, this test is
not performed if either
		1) SO_REUSEADDR has been set with setsockopt(2) or
		2) listen(2) has been called on the socket and the
protocol is connection-oriented (as TCP is). If the test is done,
server-client connections in the wait state will block the bind(2),
even though the file descriptor has been closed.

So: Check if all child processes of the server have gone away; if not,
take care that they close the original socket before they start
processing. Also, call listen(2) before bind(2) if your implementation
allows it, otherwise set SO_REUSEADDR (ditto), otherwise complain to
your vendor.

It is strange that the 4.3 IPC tutorials all show the use of bind(2)
before listen(2), when the kernel works ``better'' with the opposite
order.

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk