[comp.protocols.tcp-ip] TCP/IP close connection TIME_WAIT ?

toppin@melpar.UUCP (Doug Toppin) (08/23/89)

We are using IBM Xenix 2 on the PC/AT with Network Research Corp
implementation of TCP/IP (called Fusion) and have hit a snag.
We often get the error EWOULDBLOCK on a heavily used socket (many
opens and closes).  In the Sept. MIPS magazine David Betz writes:
"after a close, the resources used by that connection are not freed
immediately... Since this time interval is in the tens of seconds,
eventually all the resources are tied up in this TIME_WAIT."
My questions are:
* is the time value a constant?
* is it tunable?
* is the value part of the TCP/IP specification?
* is it possible to detect that you are using the last available resource?
* is it possible to allocate more of these resources?
thanks
Doug Toppin
uunet!melpar!toppin

hwajin@wrs.wrs.com (Hwajin Bae) (08/24/89)

In article <227@melpar.UUCP> toppin@melpar.UUCP (Doug Toppin) writes:
>We are using IBM Xenix 2 on the PC/AT with Network Research Corp
>implementation of TCP/IP (called Fusion) and have hit a snag.
>We often get the error EWOULDBLOCK on a heavily used socket (many
>opens and closes).  In the Sept. MIPS magazine David Betz writes:
>"after a close, the resources used by that connection are not freed
>immediately... Since this time interval is in the tens of seconds,
>eventually all the resources are tied up in this TIME_WAIT."
>My questions are:
>* is the time value a constant?
	TCP Protocol Specification (RFC 793) defines 2*MSL (Maximum
	Segment Lifetime) to be the time-out value for the TIME_WAIT state
	to reach the CLOSED state.  
	TCB is not deleted until this time-out happens.
	BSD 4.3-tahoe TCP/IP uses MSL of 30 seconds.

>* is it tunable?
	If you have the source, yes.  If you have an operating system
	that let's you change operating system parameters on the fly, yes.

>* is the value part of the TCP/IP specification?
	In RFC 793,  MSL is defined to be 2 minutes -- the value seems to
	be arbitrary.

>* is it possible to detect that you are using the last available resource?
	If you have netstat program, you can to "netstat -a" to see a
	list of PCB's.  Count the TCP PCB's and subtract the number of the
	TCP PCB's from the maximum number of sockets configured into your
	kernel.

>* is it possible to allocate more of these resources?
	You should be able to tune the kernel by using kernel configuration
	package.  Each Unix system has its own different kernel tuning 
	mechanism.  You will need to read the documentation on your system
	to figure out how to re-build your kernel.  Mostly on system V
	dervied Unix OS's use "conf.c" and "config.h" files that are created
	by "config" program using the database files "master" and "dfile".

One thing that you can try to get around this TIME_WAIT state in TCP
implemenations is to set the SO_LINGER option on your SOCK_STREAM socket
with the linger time out value of 0.  When you close the socket that
has the SO_LINGER option on and the linger time is zero, 4.3 BSD based
TCP/IP will close the connection immediately.  In 4.2 BSD  based TCP
implementations SO_DONGLINGER option can be used.

	#include <socket.h>

	.....

#ifdef BSD_43
	struct linger linger;

	linger.l_onoff = 1;
	linger.l_linger = 0;
	setsockopt (sock, SOL_SOCKET, SO_LINGER, &linger, sizeof (linger));
#else
	int on = 1;

	setsockopt (sock, SOL_SOCKET, SO_DONTLINGER, &on, sizeof (on));
#endif /* BSD_43 */
-- 
Hwajin Bae (hwajin@wrs.com)
Wind River Systems
1351 Ocean Ave.  Emeryville, CA 94608 USA
Tel: 415/428-2623       Fax: 415/428-0540

almquist@JESSICA.STANFORD.EDU (Philip Almquist) (08/29/89)

Hwajin,
	Your message about how to shorten the amount of time a TCP
connection spends in TIME_WAIT state omitted what I think is a rather
crucial point: why shortening that time might be a very bad idea.

	For most users of the TCP protocol, one of its more important
properties is that it delivers data reliably.  Vint Cerf will correct
me if I'm wrong, but my impression is that TIME_WAIT state was not a
devious plot on the part of the protocol designers to tie up system
resources; rather, it was included in the protocol because it is
necessary to insure that delivery is indeed reliable.

						Philip

smb@ulysses.homer.nj.att.com (Steven M. Bellovin) (08/29/89)

In article <8908290927.AA26586@ucbvax.Berkeley.EDU>, almquist@JESSICA.STANFORD.EDU (Philip Almquist) writes:
> but my impression is that TIME_WAIT state was not a
> devious plot on the part of the protocol designers to tie up system
> resources; rather, it was included in the protocol because it is
> necessary to insure that delivery is indeed reliable.

Quite true.  The problem is that a host that has already closed its
direction of transmission must retain knowledge of the connection
until the other side has closed successfully.  I don't have a copy
of the RFC handy, so I won't mention the state names, but the
general scenario goes like this....

Recall, first, that each direction of transmission may be closed
independently.  Furthermore, everything must be ACKed, including
specifically close requests (known in the spec as FIN bits).  Suppose
that host A sends a FIN to host B, thereby ending its transmission.
B replies with an ACK, but continues sending data for an arbitrarily
long period.  Eventually, it too sends a FIN to host A; host A replies
with an ACK.

Let us consider now who knows what.  A has long-since finished transmitting;
it cannot send any more data.  Furthermore, it knows that B is done, too;
it's even acknowledged that.  Can A discard all knowledge of the connection?
No!  What if the ACK going to B gets lost?  B's transmitter will time out
and resend the the FIN; after all, B doesn't know which packet was
dropped.  If A has discarded knowledge of the connection, it would
have to send a reset (RST) in response to the repeated FIN.  This is
inappropriate.  Accordingly, A goes to TIMEWAIT state instead, thus
retaining the appropriate; if it sees a repeat FIN, it can simply repeat
the ACK.  TIMEWAIT persists for twice the maximum segment lifetime;
i.e., long enough to be sure that B has either seen the ACK or has
concluded that the connection is hopelessly broken.

Interestingly enough, B does not go into TIMEWAIT; the reasoning is left
as an exercise for the reader.  (N.B.  If both sides send simultaneous
FINs, both sides will end up in TIMEWAIT.)

sandy@TOVE.UMD.EDU (Sandy Murphy) (08/29/89)

Please (please,please) correct me if I am wrong in this.

If Host A is in the TIME_WAIT state, then A has:
	sent a FIN,
	received a FIN,
	received an ACK (of its FIN)
	and sent an ACK (of Host B's FIN).
Therefore, Host B must have:
	sent a FIN,
	received A's FIN,
	and sent an ACK (of A's FIN).
Host B may not have yet received the ACK of its FIN.  So Host B is in state Closing
or LAST_ACK (if it hasn't received the ACK).  Host B has already passed
all of A's data up to the user.  But B doesn't know if A has sent all of B's
data to the user.  The TIME_WAIT state is to ensure that B can get an ACK
so it knows all the data was delivered (see page 22 of the specs).

	Actually, the TIME_WAIT state is not long enough to ENSURE this.  If 
A's ACK of B's FIN gets lost, B will retransmit its FIN (at least).  If this gets lost
as well, it is possible for the 2MSL timer to expire before B retransmits again and
the FIN arrives at A.  Chances are B will receive a RST in that case, because A
is CLOSED when the FIN arrives.

	Note also that both sides of the connection do not necessarily go 
through the TIME_WAIT state.  If Host A closes after it has received a FIN,
then it can send a FIN,ACK.  An ACK of this FIN means that Host B received
the ACK he needs, and A can go directly to CLOSED.

	In this sequence (ESTAB -> CLOSE_WAIT -> LAST_ACK -> CLOSED)
the specs don't say to send an ACK.  I'm just supposing that the FIN segment it
says to send should include an ACK.  Right?

--Sandy Murphy
  University of Maryland

karn@jupiter (Phil R. Karn) (08/31/89)

>[Sandy Murphy's discussion of TCP connection closing]
>	Actually, the TIME_WAIT state is not long enough to ENSURE this.  If 
>A's ACK of B's FIN gets lost, B will retransmit its FIN (at least).  If this gets lost
>as well, it is possible for the 2MSL timer to expire before B retransmits again and
>the FIN arrives at A.  Chances are B will receive a RST in that case, because A
>is CLOSED when the FIN arrives.

You're quite right. In fact, whenever your network has a nonzero packet
loss probability, you can NEVER be absolutely sure that the connection
will close gracefully on both ends. This is not only true with TCP, it's
the case for ANY protocol. Check out the "two-army problem" on page 397
of Tanenbaum's "Computer Networks" text (second edition) for an
explanation.

Phil

CERF@A.ISI.EDU (09/01/89)

Phil and Sandy have done a good job of explaining the mechanism
and rationale for TIME_WAIT. During the design phase of TCP,
tremendous effort was put into exploring, through case
analysis, various sequences of events in various orders. 

The "two armies" problem is also called the Byzantine
Generals problem in the literature on synchronization.
There are some wonderfully complicated variations in
which there are more than two generals and some of them lie.

In any case, no amount of waiting and acking will guarantee
anything - but the states are there to reduce the possibility
that a successful session is considered a failure for lack
of an ACK. Generally, though, such designs fail safe in that
a truly unsuccessful session is not accidently considered
successful though a successful one may be mislabelled a
failure.

Vint