mnl%IDTSUN1.E-TECHNIK.TH-DARMSTADT.DE@BRL.MIL (Michael@CUNYVM.CUNY.EDU N. Lipp) (12/06/90)
Hello, I have a program that establishes a TCP-connection with another machine, requests the server to send some packets of data and then does a while (read (fd, &packet, sizeof (packet)) == sizeof (packet)) { ... } This program hangs frequently. I made it QUIT and found it hanging in the read. As this program frequently connects to diskless machines that are switched off at night, I assume that the connection comes down while the program is reading. I am wondering: shouldn't read return with an error status if the connection breaks? As it apparently does not, what is the most reasonable fix? A blocking read with timeout comes to my mind, but what is the best way to do this? Thanks Michael -----------------,------------------------------,------------------------------ Michael N. Lipp ! Institut fuer Datentechnik ! XDATMNLX@DDATHD21.BITNET ! Merckstr. 25 ! (local: mnl@idtsun1) ! D-6100 Darmstadt (Germany) `------------------------------ ! Phone: 49-6151-163776 Fax: 49-6151-164976 -----------------'------------------------------'------------------------------
barmar@think.com (Barry Margolin) (12/06/90)
In article <25205@adm.brl.mil> mnl%IDTSUN1.E-TECHNIK.TH-DARMSTADT.DE@BRL.MIL (Michael@CUNYVM.CUNY.EDU N. Lipp) writes: >I have a program that establishes a TCP-connection with another machine, >requests the server to send some packets of data and then does a > >while (read (fd, &packet, sizeof (packet)) == sizeof (packet)) { ... } > >This program hangs frequently. I made it QUIT and found it hanging in the >read. As this program frequently connects to diskless machines that are >switched off at night, I assume that the connection comes down while >the program is reading. > >I am wondering: shouldn't read return with an error status if the connection >breaks? As it apparently does not, what is the most reasonable fix? >A blocking read with timeout comes to my mind, but what is the best >way to do this? Are the diskless machines simply switched off, or are they shut down with software? If they're just switched off, then they won't be able to send the appropriate "close this connection" packets (either a FIN or a RST) on these connections. Unfortunately, there's no reliable way to determine whether another machine is up or down on many network media (Ethernet, in particular). Lack of communication can result from a number of other causes: network congestion, router/bridge failure, a flaky cable or connector, etc. If you're willing to assume that incommunicado means dead you can use a keepalive, an empty packet that is sent periodically in order to elicit an acknowledgement. If you're using Unix sockets, the SO_KEEPALIVE option can be enabled to automate this. By the way, there's another bug in your code, in the "== sizeof (packet)". Read() is permitted to return fewer bytes than you asked for; the third argument is only a maximum. You should use something like while ((count = read(...)) > 0) { ... } -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
damenf@motcid.UUCP (Frederick Damen) (12/12/90)
In article <1990Dec6.055353.23846@Think.COM> barmar@think.com (Barry Margolin) writes: >If you're willing to assume that incommunicado means dead you can use a >keepalive, an empty packet that is sent periodically in order to elicit an >acknowledgement. If you're using Unix sockets, the SO_KEEPALIVE option can >be enabled to automate this. After RTFMs, I have a few questions and assumptions that need confirming: 1) After reading some related information on SIGPIPE and running a test program it seems as though SIGPIPE is only raised on a pipe/socket that has been written to. In most/all the documentation on SIGPIPE that I have seen it always refers to writing to the pipe/socket. In test program that I have written the read(2) command will return a 0 if there has not been any writes to that end of the socket, the read(2) command will cause a SIGPIPE if that end of the socket has been previously written to. This happens with or without SO_KEEPALIVE set. Q: Is SIGPIPE only raised for the end of the socket/pipe that has been written to? 2) SunOS 4.0.1 man page for setsockopt(2) Says: SO_KEEPALIVE enables the periodic transmission of messages on a connected socket. Should the connected party fail to respond to these messages, the con- nection is considered broken and processes using the socket are notified using a SIGPIPE signal. Q: What is the period of these messages? Q: When is the SIGPIPE sent: After n(n=1) messages are not responded to? When the next I/O operation is performed on this socket after a nonresponce? Q: Define processess *using* the socket. Is this: Processes that have written to the socket? Processes that have an open file descriptor for this socket? Processes at both ends of socket connection? Processes that are currently performing and I/O operation on the socket? 3) After the signal handler for SIGPIPE is called how do/should you tell which socket caused the SIGPIPE? I am on a Sun 3/80 running SunOS 4.0.1. I am using AF_INET, SOCK_STREAM. I have RTFM and then some. I have written some programs usings sockets and understand(?) the basics. Thanks in advance for any answers or rtfm(references to f___ manuals) that might be more enlighting. Fred -- Fred Damen 1501 W. Shure Drive Motorola, Inc. Arlington Heights, IL 60004 Cellular Infrastructure Division 708 632-4641 ...!uunet!motcid!damenf
barmar@think.com (Barry Margolin) (12/12/90)
In article <5727@navy40.UUCP> damenf@motcid.UUCP (Frederick Damen) writes: >1) After reading some related information on SIGPIPE and running a test program > it seems as though SIGPIPE is only raised on a pipe/socket that has been > written to. Someone please correct me if I'm wrong (my answers are mostly educated guesses), but I think Unix keepalives are implemented by periodically retransmitting a packet with the sequence number of the last one sent. The other host acknowledges having received all bytes up to that point, and this acknowledgement serves as the indication that it is still alive. But if you haven't yet written anything, then there is nothing to retransmit > Q: When is the SIGPIPE sent: > After n(n=1) messages are not responded to? > When the next I/O operation is performed on this socket after a nonresponce? After n (n > 1, possibly a settable kernel parameter) messages are not responded to. There would be no reason to use a signal if it waited for an I/O operation to be performed, as it could simply return an error in that case. > Q: Define processess *using* the socket. > Is this: > Processes that have written to the socket? > Processes that have an open file descriptor for this socket? > Processes at both ends of socket connection? > Processes that are currently performing and I/O operation on the socket? I would expect the second definition. The third definition is unlikely, because the process at the other end of the socket isn't likely to be on this host, so it's not possible to send a Unix signal to it (it might not even be on a Unix system); also if the keepalive is accurately detecting a crashed host, the process at the other end doesn't even exist. The fourth definition is also unlikely for the reason I gave in the answer to the previous question. And the first definition seems unlikely because I don't think the kernel keeps track of which processes have written to a socket; there's a single buffer that all file descriptors for the socket reference. >3) After the signal handler for SIGPIPE is called how do/should you tell which > socket caused the SIGPIPE? I don't think there's a reliable way. I think the intent of the keepalive mechanism was to provide a way for the process to be killed automatically if the other end died. It doesn't provide for much fine control. It's probably the case that trying to read on a socket that caused a SIGPIPE will get an error, but I wouldn't stake much on it. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
terryl@sail.LABS.TEK.COM (12/14/90)
In article <1990Dec12.060545.7673@Think.COM> barmar@think.com (Barry Margolin) writes: >In article <5727@navy40.UUCP> damenf@motcid.UUCP (Frederick Damen) writes: >>1) After reading some related information on SIGPIPE and running a test program >> it seems as though SIGPIPE is only raised on a pipe/socket that has been >> written to. > >Someone please correct me if I'm wrong (my answers are mostly educated >guesses), but I think Unix keepalives are implemented by periodically >retransmitting a packet with the sequence number of the last one sent. The >other host acknowledges having received all bytes up to that point, and >this acknowledgement serves as the indication that it is still alive. But >if you haven't yet written anything, then there is nothing to retransmit OK, consider yourself corrected!!! (-: What actually happens is that sends out a packet with a sequence number of send unacknowledged - 1, which should have been the last byte sent and already acknowledged, which is what I guess Barry was trying to say..... __________________________________________________________ Terry Laskodi "There's a permanent crease of in your right and wrong." Tektronix Sly and the Family Stone, "Stand!" __________________________________________________________