litwin@ROBOTICS.JPL.NASA.GOV (Todd Litwin) (01/29/91)
I have a program that uses TCP/IP sockets and needs to know quickly, within a second or so, if the physical connection between the two systems is broken. It appears that the operating system is very tolerant of physical disruptions, and won't timeout the connection and formally break it even if the problem lasts several minutes. I'm using setsockopt() to turn on SO_KEEPALIVE, but this doesn't help, either. Is there any way that I can force a socket to disconnect after a second or so of failure to communicate (short of sending my own heartbeats)? I am running under Sun OS 4.0.2, but also will need to move a version of this software to the Silicon Graphics world, and to the VxWorks real-time operating system. Any suggestions would be greatly appreciated. Todd Litwin Jet Propulsion Laboratory (818) 354-5028 litwin@robotics.jpl.nasa.gov
romkey@ASYLUM.SF.CA.US (John Romkey) (01/30/91)
You can't always tell within a second or two whether the physical connection between two systems is broken. Sometimes the break is a router crashing. Sometimes it's an AT&T fiberoptic cable cut by a backhoe in upstate New York when you're in Dallas and the computer you're talking to is in San Francisco. Most applications want to be tolerant of order-of-several-minutes disruption of communications, because there are too many real world transient conditions that aren't readily distinguishable from long term failures. - john romkey Epilogue Technology USENET/UUCP/Internet: romkey@asylum.sf.ca.us FAX: 415 594-1141
BILLW@MATHOM.CISCO.COM (William "Chops" Westfield) (01/30/91)
I have a program that uses TCP/IP sockets and needs to know quickly, within a second or so, if the physical connection between the two systems is broken. Foo. tcp/ip is designed for reliability over many media. You have no guarantee that your packet will even get to its destination within a second, even if the network is working perfectly. If you really need to know that quickly whether the network has gone away, tcp/ip is not a suitable protocol to be using. BillW -------
henry@zoo.toronto.edu (Henry Spencer) (01/31/91)
In article <9101291553.AA06606@litwin.jpl.nasa.gov.> litwin@ROBOTICS.JPL.NASA.GOV (Todd Litwin) writes: >I have a program that uses TCP/IP sockets and needs to know quickly, within a >second or so, if the physical connection between the two systems is broken. This basically can't be done; it's easy to get transient interruptions that last longer than that, and there is no reliable way to distinguish them from a real break in the link. If you're willing to consider even such a hiccup as a failure, then you need some sort of keepalive protocol at a higher level. TCP/IP is deliberately very tolerant of outages. >... I'm using setsockopt() to turn on SO_KEEPALIVE, but this >doesn't help, either.... SO_KEEPALIVE is a kludge; its timeout period is non-adjustable and quite long. You're going to have to do it yourself. -- If the Space Shuttle was the answer, | Henry Spencer at U of Toronto Zoology what was the question? | henry@zoo.toronto.edu utzoo!henry
mleech@bwdlh131.bnr.ca (Marcus Leech) (01/31/91)
In article <1991Jan30.172337.7084@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes: |> In article <9101291553.AA06606@litwin.jpl.nasa.gov.> litwin@ROBOTICS.JPL.NASA.GOV (Todd Litwin) writes: |> |> SO_KEEPALIVE is a kludge; its timeout period is non-adjustable and quite |> long. You're going to have to do it yourself. I'll second that, and add that I have successfully used application-level "keep-alives" (which should properly be called "make-deads") to detect server processes going away. This *only* works if you have very tight control over where your packets are routed. It happens that my applications have servers "in the next room", so network prop delays are quite predictable (in lieu of technicians severing cables ;-( ). -- Marcus Leech, 4Y11 Bell-Northern Research |opinions expressed mleech@bnr.ca P.O. Box 3511, Stn. C |are my own, and not VE3MDL@VE3JF.ON.CAN.NA Ottawa, ON, CAN K1Y 4H7 |necessarily BNRs
nelson@sun.soe.clarkson.edu (Russ Nelson) (01/31/91)
In article <12657895596.14.BILLW@mathom.cisco.com> BILLW@MATHOM.CISCO.COM (William "Chops" Westfield) writes:
I have a program that uses TCP/IP sockets and needs to know quickly,
within a second or so, if the physical connection between the two
systems is broken.
Foo. tcp/ip is designed for reliability over many media. You have no
guarantee that your packet will even get to its destination within a
second, even if the network is working perfectly.
If you really need to know that quickly whether the network has gone
away, tcp/ip is not a suitable protocol to be using.
You didn't foo him very well, Bill. Yes, he shouldn't be using TCP/IP.
But he *can* use IP. It's just a matter of protocol design. If you
*really* want to know if your network has gone away in a second, you
obviously have to have a network whose packets can make a round trip
in less than a second.
Moreover, we need to communicate in *much* less than a second, because
we have to be able to retry several times. We also need to be able to
limit traffic on the network, so that we can guarantee a certain probability
that no return packet *really* means dead machines.
And if the protocol is designed well, it could constantly update a
probability measure of connection downness.
So, he can't do it on an arbitrary LAN (or LANs), nor guarantee a
100% correct answer, but he *can* do it over IP.
--
--russ (nelson@clutx [.bitnet | .clarkson.edu]) FAX 315-268-7600
It's better to get mugged than to live a life of fear -- Freeman Dyson
I joined the League for Programming Freedom, and I hope you'll join too.