murrey@lehi3b15.csee.Lehigh.EDU (Erik Murrey) (05/12/89)
Here's one I can't figure out... I'm writing a routine (like reapchildren(), found all over the place in BSD code), using the SIGCLD signal to invoke it. Its job is to execute a wait(2) on all of the children that have died, getting their exit status, etc. Now for the problem: It seems (according to the man page for signal(2)) that during the signal handler for SIGCLD (reapchildren()), any further SIGCLD's will be ignored. By the same token, I can't do a non-blocking wait() in System V, so I can't repeatedly call wait() (I might have other children that haven't exit()-ed yet) Here is what happens: If I have spawned a few children, and one just did an exit(), then my reapchildren() signal handler gets called. It does a wait(), gets the exit status, and gets out of there (after resetting the SIGCLD signal). If by chance a second child dies while we are within the reapchildren() handler, I never get notified of it... How can I get around this? If only I had wait3()... ... Erik -- Erik Murrey Lehigh University murrey@csee.Lehigh.EDU erik@mpx.com
panos@boulder.Colorado.EDU (Panos Tsirigotis) (05/12/89)
In article <565@lehi3b15.csee.Lehigh.EDU> murrey@lehi3b15.csee.Lehigh.EDU (Erik Murrey) writes: > >Here's one I can't figure out... > >I'm writing a routine (like reapchildren(), found all over the place >in BSD code), using the SIGCLD signal to invoke it. Its job is to >execute a wait(2) on all of the children that have died, getting their >exit status, etc. > >Now for the problem: > >It seems (according to the man page for signal(2)) that during the >signal handler for SIGCLD (reapchildren()), any further SIGCLD's will >be ignored. By the same token, I can't do a non-blocking wait() in >System V, so I can't repeatedly call wait() (I might have other >children that haven't exit()-ed yet) > >Here is what happens: > >If I have spawned a few children, and one just did an exit(), then my >reapchildren() signal handler gets called. It does a wait(), gets the >exit status, and gets out of there (after resetting the SIGCLD >signal). If by chance a second child dies while we are within the >reapchildren() handler, I never get notified of it... I can see 2 solutions: First solution: Have each call to reapchildren() reap just 1 child. Since you got a SIGCLD you know that at least 1 child id dead. The first thing reapchildren() should do is to reset the signal handler for SIGCLD so if another child dies while in reapchildren(), reapchildren() will be called recursively. This solution has 2 problems: a) There is a race between restoring the signal handler and receiving another SIGCLD. b) More that 1 child may have died but you only get 1 SIGCLD. Second solution: Before doing the wait in reapchildren() arrange to have a timer interrupt in case wait blocks: The code would look like this: alarm( 1 ) ; /* no setitimer in System V */ pid = wait( &status ) ; if ( pid == -1 ) /* you may want to check errno for EINTR too */ { /* timer expired */ } else { alarm( 0 ) ; /* reset timer */ do whatever processing is needed } I think the second solution is better since it is more reliable. Panos ---------------------------------------------------- | Email address: panos@boulder.colorado.edu | ----------------------------------------------------
chris@mimsy.UUCP (Chris Torek) (05/12/89)
In article <565@lehi3b15.csee.Lehigh.EDU> murrey@lehi3b15.csee.Lehigh.EDU (Erik Murrey) writes: >If I have spawned a few children, and one just did an exit(), then my >reapchildren() signal handler gets called. It does a wait(), gets the >exit status, and gets out of there (after resetting the SIGCLD >signal). If by chance a second child dies while we are within the >reapchildren() handler, I never get notified of it... Your analysis is right, but this is not in fact what happens. In System V (all releases for all hardware platforms, for once), SIGCLD works differently from every other signal. The default action for SIGCLD is to do nothing; the `ignore' action for SIGCLD is to discard exiting children; and the `catch' action is as usual. But when the signal is set from any previous disposition to `catch', if there are any pending exited children, a new SIGCLD signal is generated. Loosely: /* signal system call */ if (sig == SIGCLD) switch (action) { case SIG_DFL: /* default action is to ignore */ ... nothing special ... break; case SIG_IGN: /* ignore action is to flush */ ... flush any children ... break; default: /* catch: regenerate */ ... set handler ... if (there are children) psig(u.u_procp, SIGCLD); break; } This means that a catcher routine written as catch() { int w, status; signal(SIGCLD, catch); w = wait(&status); } recurses infinitely as soon as one child exits. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
kucharsk@uts.amdahl.com (William Kucharski) (05/12/89)
In article <17457@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: (Explanation of SYSV SIGCLD deleted...) >This means that a catcher routine written as > catch() > { > int w, status; > signal(SIGCLD, catch); > w = wait(&status); > } >recurses infinitely as soon as one child exits. Yep. The signal(SIGCLD, catch) has to go _after_ the wait(2), unlike the way you would program a signal handler for "normal" signals... -- William Kucharski ARPA: kucharsk@uts.amdahl.com UUCP: ...!{ames,decwrl,sun,uunet}!amdahl!kucharsk Disclaimer: The opinions expressed above are my own, and may not agree with those of any other sentient being, not to mention those of my employer. So there.
murrey@lehi3b15.csee.Lehigh.EDU (Erik Murrey) (05/13/89)
In article <17457@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <565@lehi3b15.csee.Lehigh.EDU> murrey@lehi3b15.csee.Lehigh.EDU >(Erik Murrey) writes: >> ... > ... >This means that a catcher routine written as > > catch() > { > int w, status; > > signal(SIGCLD, catch); > w = wait(&status); > } > >recurses infinitely as soon as one child exits. Yes, and in fact I tried this once before, and it did go into an endless loop after the first catch(). So my solution is to reset the sigcld signal after the wait(). Doesn't this leave a window where a child can die, and I won't ever know about it? At least they could have given me a SIG_HOLD... . . . . -- Erik Murrey Lehigh University murrey@csee.Lehigh.EDU erik@mpx.com
chris@mimsy.UUCP (Chris Torek) (05/13/89)
In article <567@lehi3b15.csee.Lehigh.EDU> murrey@lehi3b15.csee.Lehigh.EDU (Erik Murrey) writes: >So my solution is to reset the sigcld signal after the wait(). Doesn't >this leave a window where a child can die, and I won't ever know about >it? No. Reread <17457@mimsy.UUCP>. Any `lost' SIGCLD is `regenerated'. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris