kannan@cerc.wvu.wvnet.edu (R. Kannan) (09/22/89)
Hai THANKYOU FOR the help we got for our posting on "defunct process in spite of signal and wait calls". Here is the summary of replies I got: (EDITED) From: netcom!hue@apple.com (Jonathan Hue ) Status: R Don't use signal, use sigvec. What's probably happening is that you're getting a signal in that small window where the signal handler isn't reset, so you're missing it. The default action of sigvec is to hold off the signal while you're in it's handler, so you won't lose it. -Jonathan From: cpcahil@virtech.UUCP (Conor P. Cahill) It *sounds like* you are loosing some of the sigchlds. Since you are running under the sun os I would recommend using the BSD signal handling mechanisms which should handle the problem. -- +-----------------------------------------------------------------+ | Conor P. Cahill uunet!virtech!cpcahil 703-430-9247 ! From: "Matt Crawford" <matt@oddjob.uchicago.edu> The system doesn't promise to give you a SIGCHLD for each child that exits. Two or more may exit in the time it takes you to service one signal. Therefore, use: int ChildInt() { while (wait3((union wait *)0, WNOHANG, (struct rusage *)0) > 0) /*nothing*/ ; } as your signal handler. ________________________________________________________ Matt Crawford matt@oddjob.uchicago.edu From: servio!penneyj@uunet.UU.NET (D. Jason Penney) We've had this problem. Some analysis has shown that if one OR MORE processes die, a SIGCHLD is generated. This means that when you process your signal, you may need to get rid of multiple child processes. The magic (this is BSD like SunOS, now) looks something like, while ((childid = wait3(&childstat, WNOHANG | WUNTRACED, &theStats)) > 0) { print messages, pick your nose, whatever } The wait3() call allows you to pick up one or more dead children before you return from your SIGCHLD interrupt. By the way, the comment on the net about using sigvec() instead of signal() is VERY cogent -- signal() resets the SIGCHLD signal to SIG_DFL while you're in the signal, so a race condition can arise where a child dies while you're clearing another one. I hope this helps. -- D. Jason Penney Ph: (503) 629-8383 Beaverton, OR 97006 uucp: ...uunet!servio!penneyj ======== We chose to use wait3 as a solution, after confirming that it is available on all of the following H/S platforms: SUNOS , HP SYSV, SGI(??), DEC/ULTRIX on SUN4, HP 9000, PERSONAL IRIS and VAX 3200. Result: No more defunct processes under similar test conditions. One more problem/bug surmounted. We appreciate all the help we got, and regardless of our choice, we learnt something valuable from every response we got. --kannan