ian@hpopd.HP.COM (Ian Watson) (07/17/90)
I sometimes get a hang for a long while (say an hour) when I have a client close a sockets connection. The server (SCO Unix 3.2) shows the TCP port state according to 'netstat -i' as being FIN_WAIT_1 before the connection eventually dies completely. The server program has gone away at the time the connection close was requested by the client, so I guess it's just at the TCP level that the connection is almost closed but not quite. Anyone any ideas ? Ian Watson ian@hpopd.HP.COM hplabs!hpopd!ian
gwilliam@SH.CS.NET (George Williams) (07/20/90)
Hello,
The close sequence in TCP is a symetrical exchange. That is the side issuing
close generates the following protocol exchange:
close initiator TCP close receipient
----------------- ------------------
CLOSE REQ =======>
--------------------->
fin(1)
<-------------------
fin(1) ack
must insure now data
in pipe.
usually notification
event
<======= CLOSE REQ
<--------------------
fin(2)
-------------------->
event notification fin(2) ack
The above is a full duplex close operation on a socket/port pair.
There are wait states associated with the above that I ommitted. But I think
it should be noted:
() THe close sequence ideally operates on TCP socket pairs.
It was designed to allow for the draining of data in transit
or in the pipeline. The sender of close can not send data but
is resposible for all data the other side has in transient.
This minimizes loss of data during close between two non-cooperative
or non-peer applications. It allows even for simultaneous close
operations since TCP queues the fin/acks until deliverable. That
is the wait you probally are seeing.
() You can be left with half open connections if this is not
implemented a spec'd. The RFC goes into details in this area.
And there are implemetation tips.
() Depending on the OS interface to TCP one might be able to get away
with exiting without fully closing a connection, i.e. waiting for
notification that the other side has closed. I would do this unless
I wrote both side or am talking to peer applications that have a common
higher level protocol.
Some workarounds for violations of the above I have seen have included
the Os generating an abort/close when a process exits, and timers on
TCP wait states to mention a few.
It used to be one of the most bug infested areas of TCP we ever ran across
depending on the vendor and the application/os interface.
If you a programming for multiple connections or concurrency this really is
a problem depending on whose TCP you use.
Additionally, that last 'fin ack' has no ack so it is a 'hole' in the protocol.
So if it get dropped ( have seen this happen) via routers or bad bridge
connections there is no recovery. So most vendors put in the timer ( 2 min
or 5 min etc. ) as a backup.