belkin@teecs.UUCP (Hershel Belkin) (06/28/90)
I have what appears to me to be a strange situation which occasionally occurs on my system. If anyone can shed some light on what may be happenning, I'd appreciate some e-mail! ... Every so often I fins a shell process (sh or ksh) which has somehow become dis-associated with its logon session. By this I mean that the shell's PPID is "1", and the user is no longer logged on. How or why that happenned is not really my concern now. What does puzzle me, is that when this happens, the shell process eats huge gobs of CPU time! Running monitor shows it using all available cpu (system) at all times, so that there is no idle cpu time on the system! As well, monitor shows a large count of "Involuntary context switches". I can find no evidence of any disc (or other) I/O associated with the process. Can anyone explain what the process is doing??? (Killing it always helps :-) -- +-----------------------------------------------+-------------------------+ | Hershel Belkin hp9000/825(HP-UX)| UUCP: teecs!belkin | | Test Equipment Engineering Computing Services | Phone: 416 246-2647 | | Litton Systems Canada Limited (Toronto) | FAX: 416 246-5233 | +-----------------------------------------------+-------------------------+
wsinpdb@lso.win.tue.nl (Paul de Bra) (07/02/90)
In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes: >... >Every so often I fins a shell process (sh or ksh) which has somehow >become dis-associated with its logon session. By this I mean that >the shell's PPID is "1", and the user is no longer logged on. >... This is the infamous trap/signal/eof bug, which I don't know exactly, but some combination of trapping and sending signals and having end-of-file on standard input causes an infinite loop in the Bourne and Korn shell, at least in some Unix versions which haven't fixed the bug. Anyone know the full scoop? Paul.
celvin@EE.Surrey.Ac.UK (Chris Elvin) (07/03/90)
In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes: >I have what appears to me to be a strange situation which occasionally >occurs on my system. If anyone can shed some light on what may be >happenning, I'd appreciate some e-mail! ... > >Every so often I fins a shell process (sh or ksh) which has somehow >become dis-associated with its logon session. By this I mean that >the shell's PPID is "1", and the user is no longer logged on. My guess is that the user has written a shell script which he/she kick off as a background job. The script is a loop so never terminates. When the user logs off, the script is not killed, but its PPID is changed to 1 because its parent (the user's login shell) dies. This is the case for SUNOS 4.0.3 and I suspect that its pretty general for all dialects of *NIX I have experienced such scripts by users checking every 5 seconds whether their 'friends' are logged in. I positively discourage this sort of script and show the users how to kill the script automatically on logout and to leave a longer 'sleep time' in the loop. Hope this helps Chris -- Chris Elvin C.Elvin@EE.Surrey.Ac.UK "what happens if I press this big red button" Dept of Elec. Eng, University of Surrey, Guildford, Surrey, GU2 5XH. England
bob@wyse.wyse.com (Bob McGowen x4312 dept208) (07/03/90)
In article <1277@tuewsd.win.tue.nl> wsinpdb@lso.win.tue.nl (Paul de Bra) writes: >In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes: >>... >>Every so often I fins a shell process (sh or ksh) which has somehow --- >>the shell's PPID is "1", and the user is no longer logged on. >>... > >This is the infamous trap/signal/eof bug, which I don't know exactly, --- >Anyone know the full scoop? This is an educated guess: trap "" 0 The signal "0" is a pseudo signal used by the shell to indicate a user has logged off. I use it to generate a clear screen function when my users log off (trap clear 0). It could also be that someone is trying to get a particular function to run as they log off and have an infininte loop somewhere. I just tried: trap infin 0 where infin is: while : do : done and got massive "usage" of free CPU time, so this may actually be the problem. Bob McGowan (standard disclaimer, these are my own ...) Product Support, Wyse Technology, San Jose, CA ..!uunet!wyse!bob bob@wyse.com
rjk@sawmill.sawmill.uucp (Richard Kuhns) (07/10/90)
In article <1277@tuewsd.win.tue.nl> wsinpdb@lso.win.tue.nl (Paul de Bra) writes: Sender: wsinpdb@win.tue.nl (Paul de Bra) Organization: Eindhoven University of Technology, The Netherlands Lines: 15 In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes: >... >Every so often I fins a shell process (sh or ksh) which has somehow >become dis-associated with its logon session. By this I mean that >the shell's PPID is "1", and the user is no longer logged on. >... This is the infamous trap/signal/eof bug, which I don't know exactly, but some combination of trapping and sending signals and having end-of-file on standard input causes an infinite loop in the Bourne and Korn shell, at least in some Unix versions which haven't fixed the bug. Anyone know the full scoop? Paul. I don't know if this is the *full* scoop or not, but some versions of Korn shell and Bourne shell have problems if a program leaves the terminal (stdin) in non-blocking mode when you've set `ignoreeof'. The following little program will exercise this bug, if that's what it is. Rich Kuhns newton.physics.purdue.edu!sawmill!rjk ==============================cut here============================== /* killksh.c */ #include <stdio.h> #include <fcntl.h> extern int errno; main(argc, argv) int argc; char *argv[]; { char *progname; int flags; progname = *argv; if ((flags = fcntl(0, F_GETFL, 0)) < 0) { fprintf(stderr, "%s:can't get flags (errno=%d)\n", progname); exit(1); } if (fcntl(0, F_SETFL, flags|O_NDELAY) < 0) { fprintf(stderr, "%s:can't set flags (errno=%d)\n", progname, errno); exit(1); } fprintf(stderr, "shell should crash in 3 seconds\n"); sleep(3); exit(0); }
leo@ehviea.ine.philips.nl (Leo de Wit) (07/18/90)
In article <1277@tuewsd.win.tue.nl> wsinpdb@lso.win.tue.nl (Paul de Bra) writes: |In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes: |>... |>Every so often I fins a shell process (sh or ksh) which has somehow |>become dis-associated with its logon session. By this I mean that |>the shell's PPID is "1", and the user is no longer logged on. |>... | |This is the infamous trap/signal/eof bug, which I don't know exactly, |but some combination of trapping and sending signals and having |end-of-file on standard input causes an infinite loop in the Bourne |and Korn shell, at least in some Unix versions which haven't fixed the bug. | |Anyone know the full scoop? | |Paul. Not the full scoop I'm afraid, but every bit may help ... Some years ago I had a comparable problem with a trap 0 command in the Bourne shell (this was Ultrix 1.0 if I remember correctly). The trap went something like: trap "rm -f tmpfile;exit" 0 and the error message when the shell exited (don't remember whether it really went away): longjmp botch (on this Pyramid this string is also in the binary /bin/sh). From the context it seems as if the signal handling was implemented with setjmp/longjmp pairs, and that after the first longjmp (the one caused by the end-of-file-on-stdin condition) the jmpbuf had become invalid (marked as such by the shell?) which was detected when the exit inside the trap string was performed (which probably triggered the trap again). All guesswork, I'm afraid. I was not able to reproduce this error on this machine, but some experimenting shows that inside a trap that same trap is disabled, much the same way that a signal is blocked while being handled (BSD). This would explain why I can't reproduce it here. Leo.