[comp.unix.questions] How do you trap 1/2 a tcp connection dying ?

aem@aber-cs.UUCP (Alec D.E. Muffett) (11/06/90)

    Hi there - I've got a problem concerning sockets and select().  I'm
using sockets (AF_INET, SOCK_STREAM, "tcp") to communicate between a
server and unassociated clients meeting on a pre-secified port number.

    Basically the client connects to the server which has an infinite
loop going around a bit like this:-

for (;;)
{
	[set up read_bitmap from an array of socket fds]
	....
	num_active = select(num_fds, &read_bitmap, NULL, NULL, NULL);
	...
	[read from ready fds and act on data received]
}

    Read_bitmap is a 'fd_set' (see <sys/types.h> on Ultrix 4.0), with
bits 0->n set corresponding to the file descriptors to watch.  The fifth
argument to select() is a NULL pointer to struct timeval, which makes
the select() sleep until something happens.

    When select() returns, read_bitmap has only the bits corresponding
to active sockets set.  Num_active is a count of those active sockets.

    Everything is hunky-dory with this situation so long as it all shuts
down in an orderly manner, the client telling the server to wind down
the server end of a connection, and then shutting down it's own.  This
internal protocol is the way I've done it in the past and it works.

    HOWEVER, if the client ups and dies (kill -9, untrapped SEGV, etc)
without telling the server it's gonna die, when the client vanishes the
server goes berserk setting a permament 'read condition' on that
particular fd and it recv()'s the last block of data sent to that fd
over and over again, once for each iteration of the for loop.  Select()
returns (int) 1 immediately and the loop goes on and on....

    What I want to do is trap this situation.  I've done a minimal
server and created the situation over and over again, and I can't find a
sensible method of trapping it.  SIGURG is not generated and of the
other bitmaps which select() generates, write_fds is not helpful and
except_fds is not set under these circumstances.

    So the question is: How do I trap the fact that the other end of a
connected INET/SOCK_STREAM/tcp socket has vanished so that I may close it?

Thanx in advance,
		alec
--
JANET	aem@uk.ac.aber or aem@uk.ac.aber.cs
INET:	aem@cs.aber.ac.uk
UUCP:	...!mcsun!ukc!aber-cs!aem
ARPA:	aem%uk.ac.aber.cs@nsfnet-relay.ac.uk,aem%uk.ac.aber@nsfnet-relay.ac.uk
BITNET:	<play around with aem%aber@ukacrl, ok?>
SNAIL:	Alec Muffett, Computer Unit, Llandinam Building, UCW Campus,
	Aberystwyth, UK, SY23 3DB

chris@mimsy.umd.edu (Chris Torek) (11/06/90)

In article <2088@aber-cs.UUCP> aem@aber-cs.UUCP (Alec D.E. Muffett) writes:
>... if the client ups and dies (kill -9, untrapped SEGV, etc) without
>telling the server it's gonna die, when the client vanishes the server
>goes berserk setting a permament 'read condition' [from select()] on
>that particular fd and it recv()'s the last block of data sent to that fd
>over and over again, once for each iteration of the for loop.

Wanna bet? :-)

select() indeed returns 1 (or more) and indicates that reading the
socket will not block.  This does NOT mean that reading the socket
will succeed.  It only means it will not block---your program will
return from a read() or recv() or recvfrom() or recvmsg() immediately.

When the server knows the client is gone (on a TCP socket, this happens
when it receives a FIN or RST---here it will receive a FIN), it does
the same thing as with a pipe when the last writer vanishes.  Reads
produce the remaining data until the socket buffers have been drained;
then they return EOF.

>    So the question is: How do I trap the fact that the other end of a
>connected INET/SOCK_STREAM/tcp socket has vanished so that I may close it?

Whenever read/recv returns 0 (EOF), the other end has gone away.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)

In article <27457@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
> In article <2088@aber-cs.UUCP> aem@aber-cs.UUCP (Alec D.E. Muffett) writes:
> >... if the client ups and dies (kill -9, untrapped SEGV, etc) without
> >telling the server it's gonna die, when the client vanishes the server
> >goes berserk setting a permament 'read condition' [from select()] on
> >that particular fd and it recv()'s the last block of data sent to that fd
> >over and over again, once for each iteration of the for loop.
> Wanna bet? :-)

I think what Chris means is ``Gaaargh! Hasn't anyone told you that I/O
system calls don't necessarily return the full amount of data you asked
for? That you *have* to check their return values? You're probably
checking the recv() against -1 while it's returning 0! Of course it
doesn't bother to wipe out the last block of data, which you blithely
assume has been read anew!'' but he is, as usual, too polite to say so.

:-)

---Dan

seanf@sco.COM (Sean Fagan) (11/08/90)

In article <12323:Nov603:18:1990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>I think what Chris means is ``Gaaargh! Hasn't anyone told you that I/O
>system calls don't necessarily return the full amount of data you asked
>for? That you *have* to check their return values? You're probably
>checking the recv() against -1 while it's returning 0!

And, of course, when you close the connection, you have to check to make
sure that, some unknown number of hours ago, the server didn't just
disappear, or you didn't overflow your quota, or daylight savings didn't
come into action causing some packet you send a few thousand packets ago to
suddenly be invalid.

-- 
-----------------+
Sean Eric Fagan  | "*Never* knock on Death's door:  ring the bell and 
seanf@sco.COM    |   run away!  Death hates that!"
uunet!sco!seanf  |     -- Dr. Mike Stratford (Matt Frewer, "Doctor, Doctor")
(408) 458-1422   | Any opinions expressed are my own, not my employers'.