robert@SPAM.ISTC.SRI.COM.UUCP (06/11/87)
I have encountered a problem with TCP sockets under Sun 3.2 OS, and I was wondering if anyone on the list has encountered the same problem, or if anyone knows what could be causing the problem. I have an application which listens on TCP sockets, in the AF_INET family, for connections from other processes. The listening socket is set to non- blocking with ioctl() (FIONBIO) after the call to socket() (AF_INET, SOCK_STREAM,0). After the ioctl, setsockopt() (SOL_SOCKET,SO_REUSEADDR) is called, followed by bind(), followed by listen() with a backlog paramater of 5. When a connection is 'heard', accept is called, and the file-descriptor returned from accept is used to establish a 'full' connection to the client process. This file-descriptor too has an ioctl done to it (non-blocking = FIONBIO), after which it is read from. The problem is this; if the client process goes away abruptly, the server socket does not know about it. Further, if a select() is called on the (now supposedly 'dead') file-descriptor for reading, it will indicate that there is data there. If the fd is read, 0 bytes are received; no error condition is indicated by the recv. I have seen upto 600 repeated select()'s and recv()'s without getting a -1 from recv. My questions: 1. Is this a bug of 4.2BSD networking code (it has been a 'feature' of the Sun kernel since OS 2.2 or so), or 2. incorrect coding on my part in establishing the socket, or 3. a result of the fact that exceptfd is not implemented for select on 4.2BSD, or 4. my use of the non-blocking socket? Currently I close down the socket after 100 0-byte recv()'s, but I'd like to have a better idea of why this is happening, and how it can be fixed. Any clues will be gratefully accepted. Robert Allen, robert@spam.istc.sri.com 415-859-2143