jaa@codas.att.com (James Anderson) (01/31/89)
This is a problem that has been bugging me for a while. Maybe someone else has run into it and solved it. Informix 3.30 Perform screen and Informix SQL 2.10 Perform screens A user brings up the perform screen, enters the Add or Update mode. Something happens to the dialin port they are on and they get disconnected from the system. The Informix process starts to run wild grabbing as much usr and sys tics it can get (can tell by reading sar). a Who shows the user still logged in with no idle time. A stat on the tty usualy shows 0 idle read time and a write idle time (sometimes no write idle). The getty defs for the tty have HUPCL in them. There is no changing of this in any of the profile scripts. Question: What is going on, and what can be done to prevent it? The current solution is to look at ps for a perform or sperform that has a large time (>10.00) and kill them and their parents. Thanks for any help James Anderson jaa@codas.att.com
jdt@sfsup.UUCP (J Tais) (01/31/89)
In article <34891@codas.att.com>, jaa@codas.att.com (James Anderson) writes: > A user brings up the perform screen, enters the Add or Update mode. > Something happens to the dialin port they are on and they get disconnected > from the system. The Informix process starts to run wild grabbing as much > usr and sys tics it can get (can tell by reading sar). > a Who shows the user still logged in with no idle time. A stat on the tty > usualy shows 0 idle read time and a write idle time (sometimes no write idle). I remember a similar problem on my last project, but it happened when users were rlogin'ed over TCP/IP and running perform on the remote machine. We had a persistent problem with rogue perform processes grabbing all kinds of cpu time when users disconnected in abnormal ways or were terminated by the idle-line watcher. However, I think the problem went away when we upgraded to the next release of [Wollongong] TCP/IP. I don't really know the answer, but you're definitely not alone. > The getty defs for the tty have HUPCL in them. There is no changing of this > in any of the profile scripts. > > Question: What is going on, and what can be done to prevent it? > > The current solution is to look at ps for a perform or sperform that has a > large time (>10.00) and kill them and their parents. Yes, we set up a shell script to do just that, actually, identified pseudo-ttys with no logged-in user and killed their procs. Might be worth hacking up a C function to call using 'on beginning' and make sure signal handling for SIGHUP is set the way you want it. Since they seem to trap SIGINT, maybe there's some other signal handling going on. I dunno. Perform is a strange beast. Seems to handle boundary conditions very poorly. I suspect there are fixed size tables for user functions, lookups, etc; when you exceed them unpredictable things start to happen! I have seen perform screens stop functioning when a new user-defined functions were added; also seen lookup's fail to work when a screen had too many of them. Never took the time to diagnose exactly when these things stop working. Also, never could get ESQL to work from within sperform. Had to run SQL calls in a child. Is this a feature or a bug? John Tais jdt@sfsup.att.com
mjm@attibr.UUCP (Mike Matthews) (02/04/89)
> Perform is a strange beast. Seems to handle boundary conditions very poorly. > I suspect there are fixed size tables for user functions, lookups, etc; when > you exceed them unpredictable things start to happen! I have seen perform > screens stop functioning when a new user-defined functions were added; also > seen lookup's fail to work when a screen had too many of them. Never took > the time to diagnose exactly when these things stop working. Perform will not allow lookups of more then 12 tables which is described in the 2.10 manual. Perform and Ace have definite limits that neither compiler ( saceprep or formbuild ) are intelligent to warn about. The after effects can be quite painful and often do not appear until later versions are installed. We had a Perform application where a programmer had ignored the above mentioned limit but the application ran any way until the Informix version was upgraded. The perform screen started exhibiting the same behavior as described in the HUPCL problem mentioned, essentially bringing a 3B2 to it knees, much to the surprise of the system administrators doing the upgrade. An ace report that results in a row greater than PAGE_SIZE - ( 32 + 4 ) will bomb with any one of many strange messages. Why can`t saceprep tally the row size and produce a warning giving a clue to the subsequent run-time problem? > > Also, never could get ESQL to work from within sperform. Had to run SQL > calls in a child. Is this a feature or a bug? > The perform language is explicitly defined in the manual. I don't think any extra-language constructs or references are supported. Don't you have any manuals??? Mike Matthews ATT International Tech Support
prc@maxim.ERBE.SE (Robert Claeson) (02/05/89)
In article <4723@sfsup.UUCP>, jdt@sfsup.UUCP (J Tais) writes: > In article <34891@codas.att.com>, jaa@codas.att.com (James Anderson) writes: > > A user brings up the perform screen, enters the Add or Update mode. > > Something happens to the dialin port they are on and they get disconnected > > from the system. The Informix process starts to run wild grabbing as much > > usr and sys tics it can get (can tell by reading sar). > > a Who shows the user still logged in with no idle time. A stat on the tty > > usualy shows 0 idle read time and a write idle time (sometimes no write idle). > I remember a similar problem on my last project, but it happened when users > were rlogin'ed over TCP/IP and running perform on the remote machine. We > had a persistent problem with rogue perform processes grabbing all kinds of > cpu time when users disconnected in abnormal ways or were terminated by the > idle-line watcher. I've seen this behaviour in much too many software packages -- Informix and Oracle is just a few of them. What I think happens is that these packages ignores SIGHUP and relies on the return code from the write() and read() system calls to determine when a user has been disconnected. On many machines, the return code is 0 when the disconnect occurs on a dialup port (meaning "0 characters read/written") and -1 when a network connection is disconnected (meaning "error"; errno is set to some reasonable value). So these packages examines the return code, sees a 0 or a -1 and the program logic decides "heck, sumthin' went wrong, let's try it again". And off we go. Some packages interprets the 0 return code as a hangup indication, while they thinks that -1 is some kind of error and the fix is to retry the operation until it succeeds. Note that I don't say that this is what happens in all packages. I just happens to know that this is the way it happens in some packages. In fact, I haven't got the faintest idea about what Oracle and Informix does. I've just seen it happen to both of them, but in Oracle's case only when the disconnect occured on a TELNET/rlogin connection. -- Robert Claeson, ERBE DATA AB, P.O. Box 77, S-175 22 Jarfalla, Sweden "No problems." -- Alf Tel: +46 758-202 50 EUnet: rclaeson@ERBE.SE uucp: uunet!erbe.se!rclaeson Fax: +46 758-197 20 Internet: rclaeson@ERBE.SE BITNET: rclaeson@ERBE.SE
jdt@sfsup.UUCP (Happy Informix User) (02/08/89)
In article <130@attibr.UUCP>, mjm@attibr.UUCP (Mike Matthews) writes: > >I write: > > Perform is a strange beast. Seems to handle boundary conditions very poorly. > > I suspect there are fixed size tables for user functions, lookups, etc; when > > you exceed them unpredictable things start to happen! I have seen perform > > screens stop functioning when a new user-defined functions were added; also > > seen lookup's fail to work when a screen had too many of them. Never took > > the time to diagnose exactly when these things stop working. > Perform will not allow lookups of more then 12 tables which is described > in the 2.10 manual. We were definitely looking up less than 12 tables. 12 fields? I dunno. Besides, it's not in the 3.3 manual. Not much is. > Perform and Ace have definite limits that neither compiler ( saceprep or > formbuild ) are intelligent to warn about. The after effects can be quite > painful and often do not appear until later versions are installed. > We had a Perform application where a programmer had ignored the above mentioned > limit but the application ran any way until the Informix version was upgraded. > The perform screen started exhibiting the same behavior as described in the > HUPCL problem mentioned, essentially bringing a 3B2 to it knees, much to the > surprise of the system administrators doing the upgrade. I find the causal relationship here suspicious, but with Informix, who knows. > An ace report that results in a row greater than PAGE_SIZE - > ( 32 + 4 ) will bomb with any one of many strange messages. Why can`t saceprep > tally the row size and produce a warning giving a clue to the subsequent > run-time problem? I would assume that the form compilers and the form interpreters were written be separate groups to a given design spec...unfortunately, one or both of them took shortcuts and imposed undocumented limitations on the language. > > Also, never could get ESQL to work from within sperform. Had to run SQL > > calls in a child. Is this a feature or a bug? > The perform language is explicitly defined in the manual. I don't think > any extra-language constructs or references are supported. Don't you have > any manuals??? Have you ever attempted to develop something with Informix, or are you just some kind of tech support parrot who keeps saying "RTFM"? Well, let me tell you, I have full Informix 3.3/SQL/4GL manuals, and even so, you tend to get beyond what they cover pretty easily. As for the above, you obviously do not understand the question I am asking. I said 'sperform', not 'perform screens' or 'forms' or '.frm files' or whatever you care to call them. We wished to execute certain DB operations in our C functions called from the form; however ESQL wouldn't work. You DO know that you can link your own C functions into [s]perform, right? Haven't YOU ever done this? Check your manual... > Mike Matthews > ATT International > Tech Support I don't need you telling me RTFM; I can always get that from Informix! :-) John Tais AT&T-BL Summit NJ jdt@sfsup.att.com P.S. I DO like Informix. Honest.