[comp.protocols.iso.dev-environ] Connections that are never released

PWW@BNR.CA (Peter Whittaker, P.W.) (11/13/90)

Hello, I'm posting this note to both the QUIPU and ISODE discussion
groups in the hope of finding someone who has sufficient familiarity
with the lower levels of ISODE to tell us why our DSA (Layer 7 application,
directory server) and its network connection behave the way they do....

The problem is that because of certain memory leaks in our DSA
(not sure if these are due to QUIPU, or to our own 'enhancements',
but we're workin' on it :->) we kill and restart our DSA every morning.

Invariably there is a DUA (directory client) connected to the DSA (we've
been using the QUIPU DUAs, and building our own - neat graphical beasts, etc...)
 so the TCP port the DSA was using is never released.

A quick netstat -n reveals that the address the DSA would like to use is
in a FIN_WAIT_2, while the DUA is in CLOSE_WAIT.  The connection is not
released until the DUA unbinds or until the DUA is killed, so the DSA cannot
come up, as its desired port is in use.

The Layer four routines for servers set SO_KEEPALIVE, but that shouldn't
enable the socket to stay up even though its calling process is dead,
should it?

Anyone have any ideas, clues (better yet: fixes!! :->), or pointers
to places to look?

Thanks,

Peter Whittaker      [~~~~~~~~~~~~~~~~~~~~~~~~~~]   Open Systems Integration
pww@bnr.ca           [                          ]   Bell Northern Research
Ph: +1 613 765 2064  [                          ]   P.O. Box 3511, Station C
FAX:+1 613 763 3283  [__________________________]   Ottawa, Ontario, K1Y 4H7

jdr@RAINIER.UDEV.CDC.COM (11/14/90)

Peter and John,

 I also have been writing DUAs etc.  The only way I have found to release 
 half open connections to the DSA is to have the DUAs xselect their bind
 connections to the DSA.  That way the DUAs are awakened by the collapse
 of the bind connections when the DSA fails.  The DUAs can then cleanup
 the connections.  I don't believe DISH and friends do so.


 Thanks, Jim Reed
	 Control Data Corporation

 PS:  I'm running ISODE-6.6 and have fixed some of the QUIPU leakage in
      that release.  I can forward my humble corrections to you if you're
      interested.
  
      Colin Robbins says he has or will soon post corrections QUIPU.
      Also, a new version of QUIPU is due soon with such corrections and
      "updates" to the QUIPU encoders/decoders.  Perhaps we should wait
      for the new versions.  I'm worried though that problems with PEPSY
      (the new ASN.1 compiler) could delay the next version.

tebbutt@rhino.ncsl.nist.gov (John Tebbutt) (11/14/90)

Peter,

We have been experiencing similar problems with ISODE connections. If a process
goes down, we have to kill all other processes that were connected to it before
we can restart the process. Even then, it can take several attempts before the 
"congestion at TSAP" clears and the process will run.

I have done no research into this at present, deciding instead to put up with it
as an inconvenience for the time being, but like you I would be very interested
to hear from anybody who has an explanation. Is this a non-issue ? Should we be
expecting such behavior ? Does anybody else regard this as a problem ?

	In eager anticipation,

		JT

c.robbins@xtel.co.uk (Colin Robbins) (11/14/90)

   > I also have been writing DUAs etc.  The only way I have found to release 
   > half open connections to the DSA is to have the DUAs xselect their bind
   > connections to the DSA.  That way the DUAs are awakened by the collapse
   > of the bind connections when the DSA fails.  The DUAs can then cleanup
   > the connections.  I don't believe DISH and friends do so.

DISH does not do this.

This is a problem we investigated over a year ago when we first
noticed the problem.  We could find no satisfactory solution.
It would appear that if a client gets into the FIN_WAIT_2 state, and
the server goes away, there is no way that the client can exit cleanly.
(Of course 'kill -9' does the trick!).
I should point out I am not a TCP/IP expert, so this may be totally wrong !


   > Thanks, Jim Reed
   >	 Control Data Corporation

   > PS:  I'm running ISODE-6.6 and have fixed some of the QUIPU leakage in
   >      that release.  I can forward my humble corrections to you if you're
   >      interested.

We beleive that we have fixed all the memory leak problems in Quipu in
the 6.7 release we are using internally.
If you could mail quipu-support details of any changes you have made,
we will take a look to make sure.

I should point out that then is not a significant memory problem in
the 6.0/6.1 versions of Quipu.  The problems were introduced in code
being developed for the 7.0 version of Quipu.

   >      Colin Robbins says he has or will soon post corrections QUIPU.

Details of the 6.1 upgrade, and two other patches have been mailed to
the quipu list since the 6.0 release.
I will resend details to the Quipu list in the next message.

   >      Also, a new version of QUIPU is due soon with such corrections and
   >      "updates" to the QUIPU encoders/decoders.  Perhaps we should wait
   >      for the new versions.  I'm worried though that problems with PEPSY
   >      (the new ASN.1 compiler) could delay the next version.

Time will tell !


Colin

PWW@BNR.CA (Peter Whittaker, P.W.) (11/15/90)

John,

According the post from Colin, there is no easy solution (:-<).

It is a big problem though:  we are going to release an API so that users
can write their own DUAs (it'll be a skinnier version of the QUIPU API).
Unless we kludge in an automatic timeout, we have no guarantee that the DSA
will be without clients if and when it is taken down for servicing, and that
means that we might have problems bringing it back up.

The automatic timeout is kludgeable, but I'd rather stay away from that:
some applications are bound to be dependent on fast response time once
connected - having our users handicap their applications is unnacceptable.

I haven't investigated the xselect option that Jim mentioned in his post,
but if that can be incoporated into our API, it might be an acceptable
alternative:  having the API return a descriptive X.500 error message to
the DUA application would allow the DUA to handle things in a manner
appropriate to its own needs, i.e. shutting down, timing out for X amount
of time, etc....  In any case the connection would have been cleared at both
ends.

Peter Whittaker      [~~~~~~~~~~~~~~~~~~~~~~~~~~]   Open Systems Integration
pww@bnr.ca           [                          ]   Bell Northern Research
Ph: +1 613 765 2064  [                          ]   P.O. Box 3511, Station C
FAX:+1 613 763 3283  [__________________________]   Ottawa, Ontario, K1Y 4H7

>
> Peter,
>
> We have been experiencing similar problems with ISODE connections. If a proces
s
> goes down, we have to kill all other processes that were connected to it befor
e
> we can restart the process. Even then, it can take several attempts before the
> "congestion at TSAP" clears and the process will run.
>
> I have done no research into this at present, deciding instead to put up with
it
> as an inconvenience for the time being, but like you I would be very intereste
d
> to hear from anybody who has an explanation. Is this a non-issue ? Should we b
e
> expecting such behavior ? Does anybody else regard this as a problem ?
>
>         In eager anticipation,
>
>                 JT
>                                                                      

J.Crowcroft@CS.UCL.AC.UK (Jon Crowcroft) (11/16/90)

 >shows some FIN_WAIT_2 connections hanging, but they don't distrupt
 >anything.

 >>The Layer four routines for servers set SO_KEEPALIVE, but that shouldn't

 >   I hear this should be avoided..?

yes, me too - why is TP? setting this socket option?

 jon