[comp.sys.ti.explorer] flaky net connections

jwz@teak.berkeley.edu (Jamie Zawinski) (09/01/89)

Often when I try to connect to another machine (via copy file, or telnet, 
or whatever) the connection times out.  Much of the time, hitting resume 
or reinvoking the command does no good; I get the "not responding" error
again right away, without another attempt to connect being made at all.

I've found that (sometimes) calling (NAME:REFRESH-CACHE NIL) will disrupt
whatever caching is going on enough that it will actually try to connect
again, but that doesn't always work, and is especially unreliable if the
connection is to an IP host.  Calling (IP:RESET T) usually fixes things,
but that takes a long time, and has the nasty side effect of closing any
other open connections...

So does anyone know a better way of dealing with this situation?

		-- Jamie

snicoud@ATC.BOEING.COM (Stephen Nicoud) (09/01/89)

    Date: 31 Aug 89 17:59:05 GMT
    From: jwz%teak.Berkeley.EDU@ucbvax.berkeley.edu.ARPANET  (Jamie Zawinski)


    Often when I try to connect to another machine (via copy file, or telnet, 
    or whatever) the connection times out.  Much of the time, hitting resume 
    or reinvoking the command does no good; I get the "not responding" error
    again right away, without another attempt to connect being made at all.

    I've found that (sometimes) calling (NAME:REFRESH-CACHE NIL) will disrupt
    whatever caching is going on enough that it will actually try to connect
    again, but that doesn't always work, and is especially unreliable if the
    connection is to an IP host.  Calling (IP:RESET T) usually fixes things,
    but that takes a long time, and has the nasty side effect of closing any
    other open connections...

    So does anyone know a better way of dealing with this situation?

		    -- Jamie

Yes, I've run into this as well.

Here's what I think happens.  If anyone else can correct my mistakes,
please do. 

When a connection fails, a :connection-failures property goes on the
HOST object.  I think that somewhere in that property is a TCP/IP
connection object.  It hangs around for, I think, about 4 minutes, which
just happens to be the timeout for getting rid of the TCP/IP connection
object.  Apparently retries within the four minutes attempt to reuse the
connection object and fail immediately.  After the four minutes, the
object is no longer "around" (whatever that means) and connect attempts
proceed.

Removal of the :connection-failures property from the host, allows the
system to proceed without waiting for the 4 minutes.  I use a function
which takes a host argument and does just that.

(defun free-up (host)
   (remprop (si:parse-host host) ':connection-failures))

It's not a fully detailed answer, but maybe it will help.