[comp.unix.ultrix] killpg

gordon@prls.UUCP (Gordon Vickers) (10/26/87)

Reply-Path:




     I am currently running 1.2 though I've had this problem since 1.0 .
     I have a very simple interface to the killpg(2) call but it fails
   if the parent process was started in rc.local  .

    Specifically, my application program xxx must run 24 hrs/day so,
   I have an appropriate entry in my /etc/rc.local to ensure it gets
   started.

     xxx is owned by root and has the setuid bit on.  Once started it
   spawns some number of children processes and then sleeps until all
   of the children have derminated.

     The killgp(2) call has worked flawlessly everytime except for those
   occations when xxx was started from /etc/rc.local .

     When the call fails, the error indicates that there is no such process.

     I have, of course, been the super user while using the call.

     Any help in resolving this would be most appreciated since killpg()
   provides the only easy-to-use way for shutting down my application.


 Gordon P. Vickers, (408) 991-5370,             =======================
 Signetics Corp.                                || Ultrix-32 ver 1.2 ||
 PO Box 3409  M/S 69                            ||     VAX 11/750    ||
 Sunnyvale, California,  USA  94086             =======================
 {pyramid, philabs}!prls!gordon
----- ALL DISCLAIMERS APPLY. In fact I have sent this whole mess by mistake.

mouse@uunet.UU.NET (der Mouse) (11/18/87)

Reply-Path:


In article <10364@felix.UUCP>, gordon@prls.UUCP (Gordon Vickers) writes:
> I am currently running 1.2 though I've had this problem since 1.0.  I
> have a very simple interface to the killpg(2) call but it fails if
> the parent process was started in rc.local.
> [...]
> The killgp(2) call has worked flawlessly everytime except for those
> occations when xxx was started from /etc/rc.local.

This is not peculiar to Ultrix.

I have been checking mtXinu 4.3+NFS source on this subject; I expect
that this code is nearly identical in all Berkeley derivatives.  The
pronouncements below about what is and isn't done are thus based on the
mtXinu 4.3+NFS code.

Where do you get the process group you send the signal to?  Do you
simply assume it is identical to the process ID of the parent process
or do you use getpgrp()?  If the former, bad boy - fix it, 'cause
otherwise it'll bite you eventually (like when you use this technique
on a process started as part of a pipeline).  If the latter, note the
returned process group (it's 0 for rc.local processes, right?).  Note
also that /bin/sh, which interprets rc.local, does not grok process
groups.  All children of a sh inherit the sh's process group, which was
given to it by its parent. In the case of rc.local, this runs all the
way back to init.

Init's process group is never set and is therefore zero.  Sending a
signal to process group 0 with killpg(2) is interpreted as a request to
send to the process group of the sending process.  Thus, you wind up
sending the signal to the process supposedly doing the sending instead
of to the process you wanted to send it to.

So how do you reliably kill the whole tree?

Recommendation one:  At startup, have the parent process check to see
whether it was started from rc.local and if so to set its process group
to match its process ID.  To check for rc.local, you can check for a
process group ID of zero or for a missing control terminal (try to open
/dev/tty), both, or something else if it occurs to you.

Recommendation two:  Have your killing program know a little about the
internals of the kernel, a la ps, and have it run through the process
tree finding and killing the child processes.

Recommendation three:  Have the child processes periodically check to
see whether the parent process has died (do getppid() at startup and
periodically try sending signal 0), and make them go away if so.

Recommendation four:  Have the parent process trap the signal and have
it kill all the children before dying.

If none of the above are satisfactory, send me mail explaining in
greater detail and I'll see if I can come up with any better ideas.

					der Mouse

				(mouse@mcgill-power:56 you 

chuck@felix.UUCP (12/03/87)

Reply-Path:


-In an article <recently>, mouse&uunet (der Mouse) writes: 
      In article <10364@felix.UUCP>, gordon@prls.UUCP
      (Gordon Vickers) writes:
           I am currently running 1.2 though I've had
       this problem since 1.0.  I have a very simple
       interface to the killpg(2) call but it fails if
       the parent process was started in rc.local.
       [...]
            The killgp(2) call has worked flawlessly
       everytime except for those occations when xxx
       was started from /etc/rc.local.
-
-This is not peculiar to Ultrix.
-I have been checking mtXinu 4.3+NFS source on this subject; I expect
-that this code is nearly identical in all Berkeley derivatives.  The
-pronouncements below about what is and isn't done are thus based on the
-mtXinu 4.3+NFS code.

     Three or four weeks ago, someone sent me the aprropriate fix to my
  problem.  Since it has been a while, I no longer have the message
  so I don't know who deserves the credit (Sorry Mr/Mrs/Ms However ).

     One line of code fixed the problem. Had I been more familar with
  the availiable library routines prehaps I would not have had the
  difficulty.

     The line that cured it all:
         setpgrp(0, getpid());

    Pretty easy huh?

    BTW: Thanks to ALL that responded, even the inappropriate suggestions
         contained interesting ideas and usefull knowelge.


 Gordon P. Vickers, (408) 991-5370,             =======================
 Signetics Corp.                                || Ultrix-32 ver 1.2 ||
 PO Box 3409  M/S 69                            ||     VAX 11/750    ||
 Sunnyvale, California,  USA  94086             =======================
 {pyramid, philabs}!prls!gordon
----- ALL DISCLAIMERS APPLY. In fact I have sent this whole mess by mistake.