bochner@speed.harvard.EDU (Harry Bochner) (04/12/90)
I've been having the following problem with INFORMIX-4gl RDS (2.10.03F on a sun4): Our users use rlogin from (Encore) annex boxes. From time to time a user is forceably logged out, for instance because of a failure in the annex box. The host machine gets a hangup signal, which properly kills the login shell, but it doesn't kill the user's 4gl processes: instead of dying, fglgo falls into a tight loop (probably trying to report the error?), and eats up all the CPU time it can grab. These runaway jobs keep running, and degrading system performance, until they are killed manually (or the system crashes). I figure I can probably write an interrupt handler in C, or perhaps just set the handler for the hangup signal to be the same as the one for interrupt, but I've been trying to minimize the amount of C code I use, so I'm wondering if anyone can suggest another solution. Harry Bochner bochner@endor.harvard.edu
prc@erbe.se (Robert Claeson) (04/12/90)
In article <2536@husc6.harvard.edu>, bochner@speed.harvard.EDU (Harry Bochner) writes: <On Informix RDS> > Our users use rlogin from (Encore) annex boxes. From time to time a user is > forceably logged out, for instance because of a failure in the annex > box. The host machine gets a hangup signal, which properly kills the > login shell, but it doesn't kill the user's 4gl processes: instead of > dying, fglgo falls into a tight loop (probably trying to report the error?), > and eats up all the CPU time it can grab. I've seen this happen with Oracle as well. What I think happens is that those guys have thought of the case when someone dials into a systems and then hangs up without exiting the application first. That makes the read() system call return 0 (or was it -1?). However, when essentially the same thing happens with a network connection, be it rlogin or telnet, read() returns -1 (or was it 0?) and sets errno to some value. When Informix or Oracle (and I believe a fair number of other packages) sees the -1 being returned, they thinks "heck, something went wrong, let's try it again, and again, and again...". Mind you, many applications run with signals such as SIGHUP and SIGINT disabled by default, and relies on the return value from read() and write() instead. -- Robert Claeson E-mail: rclaeson@erbe.se ERBE DATA AB
sullivan@aqdata.uucp (Michael T. Sullivan) (04/13/90)
:From article <2536@husc6.harvard.edu>, by bochner@speed.harvard.EDU (Harry Bochner): > > box. The host > machine gets a hangup signal, which properly kills the login shell, but it > doesn't kill the user's 4gl processes: instead of dying, fglgo falls > into a tight > loop (probably trying to report the error?), and eats up all the CPU > time it can Gee, sounds like a bug that's been around for years and has a customer upset... -- Michael Sullivan uunet!jarthur!aqdata!sullivan aQdata, Inc. sullivan@aqdata.uucp San Dimas, CA +1 714 599 9992
rja@unify.uucp (Rick Anderson) (04/13/90)
In article <2536@husc6.harvard.edu> bochner@harvard.EDU (Harry Bochner) writes: >I've been having the following problem with INFORMIX-4gl RDS (2.10.03F >on a sun4): >Our users use rlogin from (Encore) annex boxes. From time to time a user is >forceably logged out, for instance because of a failure in the annex >box. The host >machine gets a hangup signal, which properly kills the login shell, but it >doesn't kill the user's 4gl processes: instead of dying, fglgo falls >into a tight >loop (probably trying to report the error?), and eats up all the CPU >time it can >grab. These runaway jobs keep running, and degrading system performance, until >they are killed manually (or the system crashes). > >[deleted...] Some of our customers have experienced similar events on other platforms, almost universally all of them running through annex boxes or other multiplexing-type exchanges. The problem has been identified (in our case, anyways) as the UNIX 'read(2)' function being interrupted (SIGHUP) but returning a value of "0" instead of a value of '-1' and "errno=EINTR". On some platforms, we have been told that we should check 'errno' whenever a return value of "0" or "-1" is returned from 'read(2)'. -- Richard J. Anderson internet: rja@unify.UUCP Unify Corporation ...!{csusac,pyramid}!unify!rja 3870 Rosin Court voice: (916) 920-9092 Sacramento, CA 95834 fax: (916) 921-5340