[comp.unix.questions] SUMMARY: defunct processes even after signal

kannan@cerc.wvu.wvnet.edu (R. Kannan) (09/22/89)

Hai


	THANKYOU FOR the help we got for our posting on
	"defunct process in spite of signal and wait calls".


	Here is the summary of replies I got: (EDITED)



From: netcom!hue@apple.com (Jonathan Hue )
Status: R

Don't use signal, use sigvec.  What's probably happening is that you're
getting a signal  in that small window where the signal handler isn't reset,
so you're missing it.  The default action of sigvec is to hold off the
signal while you're in it's handler, so you won't lose it.


-Jonathan

From: cpcahil@virtech.UUCP (Conor P. Cahill)

It *sounds like* you are loosing some of the sigchlds.  Since you are
running under the sun os I would recommend using the BSD signal handling
mechanisms which should handle the problem.



--
+-----------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil             703-430-9247    !



From: "Matt Crawford" <matt@oddjob.uchicago.edu>
The system doesn't promise to give you a SIGCHLD for each child that
exits.  Two or more may exit in the time it takes you to service one
signal.  Therefore, use:

int ChildInt()
{
      while (wait3((union wait *)0, WNOHANG, (struct rusage *)0) > 0)
        /*nothing*/ ;
}

as your signal handler.
________________________________________________________
Matt Crawford                   matt@oddjob.uchicago.edu


From: servio!penneyj@uunet.UU.NET (D. Jason Penney)
We've had this problem.  Some analysis has shown that if one OR MORE
processes die, a SIGCHLD is generated.  This means that when you process
your signal, you may need to get rid of multiple child processes.  The
magic (this is BSD like SunOS, now) looks something like,

while ((childid = wait3(&childstat, WNOHANG | WUNTRACED, &theStats)) > 0) {
  print messages, pick your nose, whatever
  }

The wait3() call allows you to pick up one or more dead children before you
return from your SIGCHLD interrupt.

By the way, the comment on the net about using sigvec() instead of signal()
is VERY cogent -- signal() resets the SIGCHLD signal to SIG_DFL while you're
in the signal, so a race condition can arise where a child dies while you're
clearing another one.

I hope this helps.
--
D. Jason Penney                  Ph: (503) 629-8383
Beaverton, OR 97006              uucp: ...uunet!servio!penneyj


========


	We chose to use wait3 as a solution, after confirming that it is
available on all of the following H/S platforms:

	SUNOS , HP SYSV, SGI(??), DEC/ULTRIX
        on 

	SUN4, HP 9000, PERSONAL IRIS and VAX 3200.


Result:

      No more defunct processes under similar test conditions.
      One more problem/bug surmounted.


We appreciate all the help we got, and regardless of our choice, 
we learnt something valuable from every response we got.


--kannan