mike@brl-tgr (Mike Muuss) (04/24/85)
The recently re-posted John Quarterman "fix" for hanging FIN_WAIT_2 connections is *bad*, in a subtle and dangerous way. We had it installed for several weeks before we noticed why it was bad. I have commented on the list before about this, but it is worth repeating. A connection may operate correctly in FIN_WAIT_2 state forever. A good example of this is the following command line: rsh tgr batchnews < /dev/null | unbatchnews The RSH will sense EOF on stdin immediately, and do an advisory close on the TCP connection *TO* machine "tgr", yet data will continue to flow *FROM* machine "tgr" for hours. This connection operates in FIN_WAIT_2 state the whole time -- THIS IS CORRECT FUNCTION. If you choose to install JSQ's fix anyways, just remember that any RSH that you run from a shell file had better run quickly (ie, within the FIN_WAIT_2 timeout you pick), or the connection will be broken for you. Best, -Mike Muuss
jsq@ut-sally.UUCP (John Quarterman) (04/24/85)
Ok, folks, I have tried the scenario Muuss and Hemminger say breaks my kludge, and they are right: it does. We do that kind of connection so rarely that I never noticed. We have connections with TOPS-20 hosts so frequently that I *had* to have something, and not with a six hour timeout, either. I am currently trying Hemminger's fix in our systems. I don't expect to have any problems with it, as he has evidently run it for some time. On the off chance that I do find some problem with it, I will report it. Otherwise, I recommend using Hemminger's fix, not my kludge. Now, I notice that I neglected to include my usual disclaimer in my recent posting that I used my kludge and gave it out to others only because nobody had apparently found anything better. I will assume that this omission accounts for the flamage level of the followups. However, note that I never used the word "fix", I did use the word "kludge", and I clearly stated that the kludge violated the TCP spec. A simple note from one of you who tried the kludge to me reporting the problem with it at the time it was found would have sufficed, eh? -- John Quarterman, jsq@ut-sally.ARPA, {ihnp4,seismo,ctvax}!ut-sally!jsq