[comp.unix.wizards] Phantom CPU gobbler?!

belkin@teecs.UUCP (Hershel Belkin) (06/28/90)

I have what appears to me to be a strange situation which occasionally
occurs on my system.  If anyone can shed some light on what may be
happenning, I'd appreciate some e-mail! ...

Every so often I fins a shell process (sh or ksh) which has somehow
become dis-associated with its logon session.  By this I mean that
the shell's PPID is "1", and the user is no longer logged on.

How or why that happenned is not really my concern now.  What does
puzzle me, is that when this happens, the shell process eats huge
gobs of CPU time!  Running monitor shows it using all available
cpu (system) at all times, so that there is no idle cpu time on
the system!  As well, monitor shows a large count of "Involuntary
context switches".  I can find no evidence of any disc (or other) I/O
associated with the process.  Can anyone explain what the process 
is doing???  (Killing it always helps :-)

-- 
+-----------------------------------------------+-------------------------+
| Hershel Belkin               hp9000/825(HP-UX)|      UUCP: teecs!belkin |
| Test Equipment Engineering Computing Services |     Phone: 416 246-2647 |
| Litton Systems Canada Limited       (Toronto) |       FAX: 416 246-5233 |
+-----------------------------------------------+-------------------------+

wsinpdb@lso.win.tue.nl (Paul de Bra) (07/02/90)

In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes:
>...
>Every so often I fins a shell process (sh or ksh) which has somehow
>become dis-associated with its logon session.  By this I mean that
>the shell's PPID is "1", and the user is no longer logged on.
>...

This is the infamous trap/signal/eof bug, which I don't know exactly,
but some combination of trapping and sending signals and having
end-of-file on standard input causes an infinite loop in the Bourne
and Korn shell, at least in some Unix versions which haven't fixed the bug.

Anyone know the full scoop?

Paul.

celvin@EE.Surrey.Ac.UK (Chris Elvin) (07/03/90)

In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes:
>I have what appears to me to be a strange situation which occasionally
>occurs on my system.  If anyone can shed some light on what may be
>happenning, I'd appreciate some e-mail! ...
>
>Every so often I fins a shell process (sh or ksh) which has somehow
>become dis-associated with its logon session.  By this I mean that
>the shell's PPID is "1", and the user is no longer logged on.

My guess is that the user has written a shell script which he/she kick off
as a background job.  The script is a loop so never terminates.

When the user logs off, the script is not killed, but its PPID is changed
to 1 because its parent (the user's login shell) dies. This is the case
for SUNOS 4.0.3 and I suspect that its pretty general for all dialects
of *NIX

I have experienced such scripts by users checking every 5 seconds whether
their 'friends' are logged in.  I positively discourage this sort of script
and show the users how to kill the script automatically on logout and
to leave a longer 'sleep time' in the loop.

		Hope this helps

		Chris

-- 
Chris Elvin
C.Elvin@EE.Surrey.Ac.UK        "what happens if I press this big red button"
Dept of Elec. Eng, University of Surrey, Guildford, Surrey, GU2 5XH. England

bob@wyse.wyse.com (Bob McGowen x4312 dept208) (07/03/90)

In article <1277@tuewsd.win.tue.nl> wsinpdb@lso.win.tue.nl (Paul de Bra) writes:
>In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes:
>>...
>>Every so often I fins a shell process (sh or ksh) which has somehow
---
>>the shell's PPID is "1", and the user is no longer logged on.
>>...
>
>This is the infamous trap/signal/eof bug, which I don't know exactly,
---
>Anyone know the full scoop?

This is an educated guess:

	trap "" 0

The signal "0" is a pseudo signal used by the shell to indicate a user
has logged off.  I use it to generate a clear screen function when my
users log off (trap clear 0).  It could also be that someone is trying
to get a particular function to run as they log off and have an infininte
loop somewhere.  I just tried:

     trap infin 0

where infin is:

	while :
	do
	:
	done

and got massive "usage" of free CPU time, so this may actually be the
problem.

Bob McGowan  (standard disclaimer, these are my own ...)
Product Support, Wyse Technology, San Jose, CA
..!uunet!wyse!bob
bob@wyse.com

rjk@sawmill.sawmill.uucp (Richard Kuhns) (07/10/90)

In article <1277@tuewsd.win.tue.nl> wsinpdb@lso.win.tue.nl (Paul de Bra) writes:
   Sender: wsinpdb@win.tue.nl (Paul de Bra)
   Organization: Eindhoven University of Technology, The Netherlands
   Lines: 15

   In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes:
   >...
   >Every so often I fins a shell process (sh or ksh) which has somehow
   >become dis-associated with its logon session.  By this I mean that
   >the shell's PPID is "1", and the user is no longer logged on.
   >...

   This is the infamous trap/signal/eof bug, which I don't know exactly,
   but some combination of trapping and sending signals and having
   end-of-file on standard input causes an infinite loop in the Bourne
   and Korn shell, at least in some Unix versions which haven't fixed the bug.

   Anyone know the full scoop?

   Paul.

I don't know if this is the *full* scoop or not, but some versions of Korn
shell and Bourne shell have problems if a program leaves the terminal
(stdin) in non-blocking mode when you've set `ignoreeof'.  The following
little program will exercise this bug, if that's what it is.


Rich Kuhns
newton.physics.purdue.edu!sawmill!rjk
==============================cut here==============================
/* killksh.c */
#include <stdio.h>
#include <fcntl.h>

extern	int	errno;

main(argc, argv)
     int	argc;
     char	*argv[];
{
    char	*progname;
    int		flags;

    progname = *argv;
    if ((flags = fcntl(0, F_GETFL, 0)) < 0) {
	fprintf(stderr, "%s:can't get flags (errno=%d)\n", progname);
	exit(1);
    }
    if (fcntl(0, F_SETFL, flags|O_NDELAY) < 0) {
	fprintf(stderr, "%s:can't set flags (errno=%d)\n", progname, errno);
	exit(1);
    }
    fprintf(stderr, "shell should crash in 3 seconds\n");
    sleep(3);
    exit(0);
}

leo@ehviea.ine.philips.nl (Leo de Wit) (07/18/90)

In article <1277@tuewsd.win.tue.nl> wsinpdb@lso.win.tue.nl (Paul de Bra) writes:
|In article <960004@teecs.UUCP> belkin@teecs.UUCP (Hershel Belkin) writes:
|>...
|>Every so often I fins a shell process (sh or ksh) which has somehow
|>become dis-associated with its logon session.  By this I mean that
|>the shell's PPID is "1", and the user is no longer logged on.
|>...
|
|This is the infamous trap/signal/eof bug, which I don't know exactly,
|but some combination of trapping and sending signals and having
|end-of-file on standard input causes an infinite loop in the Bourne
|and Korn shell, at least in some Unix versions which haven't fixed the bug.
|
|Anyone know the full scoop?
|
|Paul.

Not the full scoop I'm afraid, but every bit may help ...

Some years ago I had a comparable problem with a trap 0 command in the
Bourne shell (this was Ultrix 1.0 if I remember correctly).

The trap went something like:

trap "rm -f tmpfile;exit" 0

and the error message when the shell exited (don't remember whether it
really went away):

longjmp botch

(on this Pyramid this string is also in the binary /bin/sh).

From the context it seems as if the signal handling was implemented
with setjmp/longjmp pairs, and that after the first longjmp (the one
caused by the end-of-file-on-stdin condition) the jmpbuf had become
invalid (marked as such by the shell?) which was detected when the exit
inside the trap string was performed (which probably triggered the trap
again). All guesswork, I'm afraid.

I was not able to reproduce this error on this machine, but some
experimenting shows that inside a trap that same trap is disabled, much
the same way that a signal is blocked while being handled (BSD). This
would explain why I can't reproduce it here.

    Leo.