sfp@APLPY.ARPA (Steven Parr) (09/29/87)
Hi, We are having problems with tcp connections getting hung in the LAST_ACK state. I believe the fault lies with the software at the other end of the connection and so we are in the process of getting an update of that software. However purchasing takes time and in the mean time, we keep re-transmitting a FIN every second or two until the next reboot. So my question is this: Does anyone know of a way to force closed a half-open connection such as this? Seems to me that you should be able to change the state to FIN_ACK_2 or TIME_WAIT and the connection should go ahead and close itself on the next expiration of the timer. Has anyone tried anything like this? Any suggestions on how to go about it? (Looks like adb may be useful, but I know almost nothing about it.) If it matters, we have a Pyramid running release 3.1 (without source). Thanks in advance, -Steve Parr sfp@aplpy.arpa
mar@ATHENA.MIT.EDU (09/30/87)
I assume that Pyramid release 3.1 has the Berkeley 4.2 network layer. In this case you can force the connections closed. I wrote a program to do this a while back, but it's full of unix source I shouldn't redistribute to someone without a source license... To do it by hand, run netstat -A and find the address of the PCB for the connection. This is the hex number in the first column). Then start adb with adb -w -k /vmunix /dev/kmem and zero out the short word 8 bytes past the address of the PCB (this is the size of the offset on the vax, it may vary on the Pyramid. You can check it by looking at struct tcpcb in <netinet/tcp_var.h>, and finding the offset to t_state) address+8/w 0 this forces the state of this connection to CLOSED. The next time a timer fires for that connection, it will notice that it is in the closed state and deallocate it. You can exit adb with $q I suspect that this would work on 4.3 based network layers also, although the bug shouldn't exist there that requires it. -Mark
andrew@mitisft.UUCP (10/01/87)
While I dont have any suggestions for closing existing half-open connections (although I think someone posted something awhile back), I do have a scenario which I have seen cause this, which can be traced to an ambiguity in the RFC... Scenario: 1) Server sends FIN, gets ACK, enters FIN_WAIT_2. 2) Client sends a bunch of data. 3) Server's window size goes to zero due to normal flow control. 4) Client closes connection. At this point, client has data buffered, and needs a window update. FIN hasnt been sent since data is pending. 5) Client is now in LAST_ACK. However, he ignores window updates, looking only for ACK of FIN he hasnt sent! The connection is effectively idle. Now, the RFC says all data should be sent after a close (pgs 49 & 61), and that when a segment arrives in LAST_ACK state only the ACK of FIN should be checked for (pg 73). 4.3 seems to have "fixed" this problem by both flushing data on a close and putting a timer on FIN_WAIT_2, along with having just about everybody use "linger mode" where the close delays till the data drains (not the default). I fixed it by looking at window updates during LAST_ACK; not exactly spec, but harmless (apparently) in the normal cases.... Andrew
mkhaw@teknowledge-vaxc.UUCP (10/01/87)
Here's a /bin/sh driven adb script posted to the net a while back that forces a socket to close: <--- cut here ---> #! /bin/sh # original from cdjohns@NSWC-G.ARPA # # TIMETODEATH expressed in decimal instead of hex # -- mkhaw@teknowledge-vaxc.arpa # Use this script to force sockets in FIN_WAIT_2 state to close. # It works by setting the 2MSL timer in the TCP Protocol Control Block (PCB) # to a non-zero value. The kernel then begins to decrement this value until # it reaches zero, at which point the kernel forces a close on the socket and # deletes the TCP PCB. If both sides of the connection are hung, clearing one # side will possibly clear the other. # MSLOFFSET is the offset in the tcpcb record for the 2MSL timer. # <netinet/tcp_var.h> describes the tcpcb record. # This value is the number of bytes offset, expressed in hexadecimal. MSLOFFSET=10 # TIMETODEATH is the number of half seconds until the connection is # closed. This value is expressed in decimal and must be greater # than zero. TIMETODEATH=06 # Display netstat to get PCB addresses (first column). echo 'Active connections PCB Proto Recv-Q Send-Q Local Address Foreign Address (state)' netstat -A | fgrep FIN_WAIT_2 echo echo -n 'PCB address to terminate? ' read addr echo # Use adb on kernel to display the PCB of the specified address adb -k /vmunix /dev/mem << SHAR_EOF $addr\$<tcpcb \$q SHAR_EOF # Check to see if this was the correct address and PCB. state should be # 8 for LAST_ACK, 9 for FIN_WAIT_2 echo echo 'state = 9 = FIN_WAIT_2' echo -n 'Is this the correct PCB (y/n)? ' read ans echo case $ans in [Yy]*) ;; *) echo 'No Changes.' exit ;; esac # Use adb on kernel to set the 2MSL timer for the PCB adb -k -w /vmunix /dev/mem << SHAR_EOF $addr+$MSLOFFSET/w 0t$TIMETODEATH \$q SHAR_EOF # Use these lines in place of the above for testing the script. #adb -k /vmunix /dev/mem << SHAR_EOF #$addr+$MSLOFFSET/x #\$q #SHAR_EOF echo echo 'Connection will be terminated in `expr $TIMETODEATH / 2` seconds.' echo <--- cut here ---> Mike Khaw -- internet: mkhaw@teknowledge-vaxc.arpa usenet: {uunet|sun|ucbvax|decwrl|uw-beaver}!mkhaw%teknowledge-vaxc.arpa USnail: Teknowledge Inc, 1850 Embarcadero Rd, POB 10119, Palo Alto, CA 94303
thomson@uthub.UUCP (10/05/87)
In article <247@mitisft.Convergent.COM> andrew@mitisft.Convergent.COM (Andrew Knutsen) writes: > While I dont have any suggestions for closing existing half-open >connections (although I think someone posted something awhile back), I >do have a scenario which I have seen cause this, which can be traced to >an ambiguity in the RFC... ... >4) Client closes connection. > At this point, client has data buffered, and needs a window update. > FIN hasnt been sent since data is pending. > >5) Client is now in LAST_ACK. However, he ignores window updates, looking > only for ACK of FIN he hasnt sent! The connection is effectively > idle. > > Now, the RFC says all data should be sent after a close (pgs 49 & 61), >and that when a segment arrives in LAST_ACK state only the ACK of FIN should >be checked for (pg 73). The problem is really with the implementation, not the RFC. A TCP is not supposed to enter LAST_ACK until it has sent the FIN. From pg. 61, it should remain in CLOSE_WAIT state "... until all preceding SENDs have been segmentized; then send a FIN segment, enter [ LAST_ACK ] state". The actual document said "enter CLOSING state", obviously a typo. Having said all that, it may well be that the easiest way to handle this is to accept window updates while in LAST_ACK. -- Brian Thomson, CSRI Univ. of Toronto utcsri!uthub!thomson, thomson@hub.toronto.edu