LordBah@cup.portal.com (Jeffrey J Vanepps) (03/10/91)
I have a problem with telling when the other side of a socket connection has exited. Normally a read/recv which returns zero after select(2) says that there is data to be read signifies a closed connection. In my case, the application can't use this technique. Here's the situation: Program R is a communications router. It accepts connections from data producers and data consumers. Normally, it waits until data is available from some producer, reads the data, and then writes the data to each consumer who requested the type of data being generated by that producer. However, data is not allowed to be lost or thrown away, so if a producer produces data for which there is not yet a consumer, then R does not read that data. It is left in the socket buffer, eventually blocking the producer. Now, after this has happened, and there is a buffer full of data waiting to be read from a producer, R has no way to tell that the producer process has exited. Normally when one side of a socket connection exits, the other side is told via select(2) that there is data to be read, but the recv(2) call returns zero bytes. But in the case described above, R can't try to recv because it has nowhere to put the data. This is on a Sparc 1+ and IPC, SunOS 4.1.1. Some attempted solutions: - Always reading all available data and queueing it (in memory or on disk) is not acceptable. Data volumes under consideration are too high. We basically want the producer to block while there is no consumer. - The FIONREAD ioctl(2) call always says that there are many bytes to read, since there are many bytes left in the buffer. - getsockopt(SO_ERROR) never returns any error. - getpeername(2) still thinks that the socket is connected. - select(2) still thinks that the socket is writeable, even though the other side has exited. - No exceptional condition is ever apparent to select(2). Given what I've seen written about exceptional conditions, I didn't expect this to work. - I don't receive SIGURG or even SIGPIPE when the other side closes. So, is there any method provided by the system for determining whether the other side of a socket has closed? I'd rather not do any type of handshaking because throughput is also an issue with R. -- -------------------------------------------------------------------- Jeff Van Epps amusing!lordbah@bisco.kodak.com lordbah@cup.portal.com sun!portal!cup.portal.com!lordbah
torek@elf.ee.lbl.gov (Chris Torek) (03/10/91)
In article <39989@cup.portal.com> LordBah@cup.portal.com (Jeffrey J Vanepps) writes: >Program R is a communications router. It accepts connections from data >producers and data consumers. Normally, it waits until data is available >from some producer, reads the data, and then writes the data to each >consumer who requested the type of data being generated by that producer. >However, data is not allowed to be lost or thrown away, so if a producer >produces data for which there is not yet a consumer, then R does not >read that data. It is left in the socket buffer, eventually blocking >the producer. > >Now, after this has happened, and there is a buffer full of data waiting >to be read from a producer, R has no way to tell that the producer process >has exited. Normally when one side of a socket connection exits, the >other side is told via select(2) that there is data to be read, but the >recv(2) call returns zero bytes. But in the case described above, R can't >try to recv because it has nowhere to put the data. There is something missing from the above problem specification, because as given, there is nothing wrong, nothing to fix. If data may never be discarded, then: producer P runs, outputs data of type T producer P exits several hours later, consumer C requests data of type T There also appears to be a potential race condition: producer P runs, outputs data of type T consumer C1 runs, requests data of type T, gets first two items (which thus disappear from the queue) consumer C2 runs, requests data of type T; now both C1 and C2 get the remaining items. C2 has lost out due to a race. If the missing goal above is to have router R spawn new producers, then: - R can be the parent of *all* producers (and thus use wait3() or wait() to detect vanished ones) or - Each producer P can also have a second socket to router R, one which exists only to identify P to R initially and then has no further data written to it. When P exits, this socket will see an end-of-file condition. If producers and consumers run on different machines than R, some sort of keep-alive mechanism may be required as well, since a broken connection and an idle one are otherwise completely identical. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
fkittred@bbn.com (Fletcher Kittredge) (03/22/91)
In article <39989@cup.portal.com> LordBah@cup.portal.com (Jeffrey J Vanepps) writes: >I have a problem with telling when the other side of a socket connection >has exited. Normally a read/recv which returns zero after select(2) says >that there is data to be read signifies a closed connection. In my case, >the application can't use this technique. Here's the situation: So why don't you set SO_KEEPALIVE on the socket and respond to the SIGIO? If that is a problem for you, you could have R write to the producer's socket. If the send returns ENOCONN, then you take whatever action necessary. Throughput should not be a problem since the write won't block, and the producer can ignore the message. Note that doing your own polling is considered by many to be a more elegant solution than setting SO_KEEPALIVE. SO_KEEPALIVE causes additional network load. regards, fletcher Fletcher Kittredge Platforms and Tools Group, BBN Software Products 10 Fawcett Street, Cambridge, MA. 02138 617-873-3465 / fkittred@bbn.com / fkittred@das.harvard.edu
pww@bnr.ca (Peter Whittaker) (03/23/91)
In article <63357@bbn.BBN.COM> fkittred@spca.bbn.com (Fletcher Kittredge) writes: >In article <39989@cup.portal.com> LordBah@cup.portal.com (Jeffrey J Vanepps) writes: >>I have a problem with telling when the other side of a socket connection >>has exited. Normally a read/recv which returns zero after select(2) says >>that there is data to be read signifies a closed connection. In my case, >>the application can't use this technique. Here's the situation: > >So why don't you set SO_KEEPALIVE on the socket and respond to the SIGIO? > oooo, my least favourite topic :-> >Note that doing your own polling is considered by many to be a more elegant >solution than setting SO_KEEPALIVE. SO_KEEPALIVE causes additional network >load. That's not all it does! Beware: certain OS's have broken KEEPALIVE handling, notably HP-UX pre-7.0.5 (it seems to be fixed in 7.0.5), and (maybe) SunOS pre 4.1.1. The nature of the problem is this: (Assume TCP connection between any Sun or HP server, and an HPUX 6.5 client): The client sets KEEPALIVE; the server crashes; the server's kernel tries to close the connection gracefully and sends a RST; the client ACKs the RST, and should then wait ~15 minutes before sending its own RST - which the server side kernel should ACK, whereupon the connection is closed. What actually happens: the client receives the RST, ACKs it, then starts waiting the 15 minutes. Before the 15 minutes are up, THE CLIENT SENDS a KEEPALIVE (i.e. an ACK!). The server ACKs the KEEPALIVE. Result? The connection lives forever! Note that this buggy behaviour depends on the server ACK the KEEPALIVE even though it has sent a RST! Unfortunately, no one seems to have accounted for that possibility! (Not surprising: once you send a RST, the 'other side' should - one would think - stop KEEPALIVING! Of course, KEEPALIVE is not TCP!). Beware the SO_KEEPALIVE, my son, and keep your vorpal ready! (in other words, use select()) -- Peter Whittaker [~~~~~~~~~~~~~~~~~~~~~~~~~~] Open Systems Integration pww@bnr.ca [ ] Bell Northern Research Ph: +1 613 765 2064 [ ] P.O. Box 3511, Station C FAX:+1 613 763 3283 [__________________________] Ottawa, Ontario, K1Y 4H7