[comp.databases] Hang-up signals in Informix 4gl

bochner@speed.harvard.EDU (Harry Bochner) (04/12/90)

I've been having the following problem with INFORMIX-4gl RDS (2.10.03F
on a sun4):
Our users use rlogin from (Encore) annex boxes. From time to time a user is
forceably logged out, for instance because of a failure in the annex
box. The host
machine gets a hangup signal, which properly kills the login shell, but it
doesn't kill the user's 4gl processes: instead of dying, fglgo falls
into a tight
loop (probably trying to report the error?), and eats up all the CPU
time it can
grab. These runaway jobs keep running, and degrading system performance, until
they are killed manually (or the system crashes).

I figure I can probably write an interrupt handler in C, or perhaps just set
the handler for the hangup signal to be the same as the one for interrupt, but
I've been trying to minimize the amount of C code I use, so I'm wondering if
anyone can suggest another solution.

Harry Bochner
bochner@endor.harvard.edu

prc@erbe.se (Robert Claeson) (04/12/90)

In article <2536@husc6.harvard.edu>, bochner@speed.harvard.EDU (Harry Bochner) writes:

<On Informix RDS>

> Our users use rlogin from (Encore) annex boxes. From time to time a user is
> forceably logged out, for instance because of a failure in the annex
> box. The host machine gets a hangup signal, which properly kills the
> login shell, but it doesn't kill the user's 4gl processes: instead of
> dying, fglgo falls into a tight loop (probably trying to report the error?),
> and eats up all the CPU time it can grab.

I've seen this happen with Oracle as well. What I think happens is that
those guys have thought of the case when someone dials into a systems and
then hangs up without exiting the application first. That makes the read()
system call return 0 (or was it -1?). However, when essentially the same
thing happens with a network connection, be it rlogin or telnet, read()
returns -1 (or was it 0?) and sets errno to some value.

When Informix or Oracle (and I believe a fair number of other packages)
sees the -1 being returned, they thinks "heck, something went wrong,
let's try it again, and again, and again...". Mind you, many applications
run with signals such as SIGHUP and SIGINT disabled by default, and relies
on the return value from read() and write() instead.

-- 
          Robert Claeson      E-mail: rclaeson@erbe.se
	  ERBE DATA AB

sullivan@aqdata.uucp (Michael T. Sullivan) (04/13/90)

:From article <2536@husc6.harvard.edu>, by bochner@speed.harvard.EDU (Harry Bochner):
>
> box. The host
> machine gets a hangup signal, which properly kills the login shell, but it
> doesn't kill the user's 4gl processes: instead of dying, fglgo falls
> into a tight
> loop (probably trying to report the error?), and eats up all the CPU
> time it can

Gee, sounds like a bug that's been around for years and has a customer upset...
-- 
Michael Sullivan          uunet!jarthur!aqdata!sullivan
aQdata, Inc.              sullivan@aqdata.uucp
San Dimas, CA             +1 714 599 9992

rja@unify.uucp (Rick Anderson) (04/13/90)

In article <2536@husc6.harvard.edu> bochner@harvard.EDU (Harry Bochner) writes:
>I've been having the following problem with INFORMIX-4gl RDS (2.10.03F
>on a sun4):
>Our users use rlogin from (Encore) annex boxes. From time to time a user is
>forceably logged out, for instance because of a failure in the annex
>box. The host
>machine gets a hangup signal, which properly kills the login shell, but it
>doesn't kill the user's 4gl processes: instead of dying, fglgo falls
>into a tight
>loop (probably trying to report the error?), and eats up all the CPU
>time it can
>grab. These runaway jobs keep running, and degrading system performance, until
>they are killed manually (or the system crashes).
>
>[deleted...]

Some of our customers have experienced similar events on other platforms,
almost universally all of them running through annex boxes or other
multiplexing-type exchanges.  The problem has been identified (in our case,
anyways) as the UNIX 'read(2)' function being interrupted (SIGHUP) but
returning a value of "0" instead of a value of '-1' and "errno=EINTR".
On some platforms, we have been told that we should check 'errno' whenever
a return value of "0" or "-1" is returned from 'read(2)'.

-- 
Richard J. Anderson		                       internet: rja@unify.UUCP
Unify Corporation		                 ...!{csusac,pyramid}!unify!rja
3870 Rosin Court                                          voice: (916) 920-9092 Sacramento, CA 95834                                        fax: (916) 921-5340