[comp.unix.ultrix] Sockets that dont know when to leave...

andy@jhunix.HCF.JHU.EDU (Andy S Poling) (03/25/91)

Hmm.  We just upgraded a uVax II to Ultrix 4.1 rev 52 (which included a
mandatory update) and are now having a rather strange problem with UDP
sockets...

When certain daemons (UREP's SNA-over-UDP daemon for instance) are killed,
the UDP sockets that they opened don't die with them.  This is a BIG problem
because the UDP socket these daemons use must be bound to a certain local
address (port).  So once a daemon has run and has either exited or has been
killed that particular port is no longer available for binding to a socket
(we get EADDRINUSE - errno 48).  When one uses "netstat -a" to see what
sockets are in use there will indeed be a UDP socket open with that address
assigned.  Furthermore netstat usually shows that there is data in the recv
queue for that socket.

To me this does not seem correct - and it is not consistent: it does not
always happen.  It was not a problem with our previous MUCH older version of
Ultrix.  Am I wrong?  What can be done to eradicate a socket which belongs
to noone (short of rebooting...)?

-Andy

PS: don't use the BITNET mail address below.  It won't reach me until we get
this problem fixed...

--
Andy Poling                              Internet: andy@jhunix.hcf.jhu.edu
UNIX Systems Programmer                  Bitnet: ANDY@JHUNIX
Homewood Academic Computing              Voice: (301)338-8096    
Johns Hopkins University                 UUCP: uunet!mimsy!aplcen!jhunix!andy

mellon@nigiri.pa.dec.com (Ted Lemon) (03/29/91)

I've also noticed this, with TCP sockets.   Once the other end of the
socket closes it, the connection times out and goes away, so you can
reclaim the address.   I'm not sure that this behaviour is unique to
Ultrix, but it's certainly obnoxious.   Careful reading of the RFCs
might reveal an explanation for why this happens...

			       _MelloN_

mwp@ubeaut.enet.dec.com (Michael Paddon,,,) (04/02/91)

From article <7817@jhunix.HCF.JHU.EDU>, by andy@jhunix.HCF.JHU.EDU (Andy S Poling):
> When certain daemons (UREP's SNA-over-UDP daemon for instance) are killed,
> the UDP sockets that they opened don't die with them.  This is a BIG problem
> because the UDP socket these daemons use must be bound to a certain local
> address (port).  So once a daemon has run and has either exited or has been
> killed that particular port is no longer available for binding to a socket
> (we get EADDRINUSE - errno 48).  When one uses "netstat -a" to see what
> sockets are in use there will indeed be a UDP socket open with that address
> assigned.  Furthermore netstat usually shows that there is data in the recv
> queue for that socket.

I havn't seen this happen for UDP sockets, but this is common happenstance
under TCP (ie. SOCK_STREAM) type sockets. In the TCP world there is good
reason for this as the connection should be shut down in an orderly fashion;
however there is no good reason for keeping the socket hanging around in
a datagram paradigm.

Anyway, there is a workaround. Set the socket level option SO_REUSEADDR
with setsockopt(2). This will allow multiple sockets to be bound to the
same local <host,port> tuple.

What does it mean when two sockets have the same address? In the TCP domain,
connections are specified by the tuple
	<local_host,local_port,remote_host,remote_port>
so there is no ambiguity introduced. If both sockets are marked passive
(via listen(2)) it is arbitrary which socket will accept an incoming
connection.

In the UDP domain, incoming datagrams addressed to that <host,port> will
be assigned to an arbitrary socket. Actually, the choice of socket isn't
really at random, but that is as reasonable model as any for a programmer
using sockets to use. In practice, the more server processes using the
same UDP address there are, the better throughput you can get. This
technique is used by nfsd(8), among others.

So there are some good reasons for reusing addresses. The only thing you
lose when you set SO_REUSEADDR is the mutual exclusion of not being able
to start up a second server if there is already one running (because the
bind fails). Since this functionality can be provided in other ways, it is
no real problem.

					Michael

-------------------------------------------------------------------
|                     |     Internet: mwp@ubeaut.enet.dec.com     |
|   Michael Paddon    |     ACSnet:   mwp@munnari.oz.au           |
|                     |     Voice:    +61 3 895 9392              |
-------------------------------------------------------------------