john@polyof.UUCP ( John Buck ) (08/09/88)
I recently noticed the following bug(?, or maybe "feature") with the way wait(2) works in conjunction with the special System 5 signal handling functions, namely, sigset(SIGCLD,...). Consider the following piece of code: main() { int catchchild(); int i, j; sigset(SIGCLD, catchchild); if((i = fork()) == 0){ sleep(3); exit(1); } if((j = fork()) == 0){ sleep(4); exit(2); } pause(); } catchchild() { /* Upon entry here (via a SIGCLD from the first fork() above), * SIGCLD is automatically placed on "hold" */ int pid, status; for(;;){ pid = wait(&status); if(pid == -1) break; printf("pid %d status = %d\n", pid, status); } printf("out of for loop, no more procs...\n"); } /* Upon exit of this routine, SIGCLD is taken off hold, and any more that * come in can happen, but more importantly, any single SIGCLD that happened * while we were in the catchchild() routine will be triggered soon. */ The scenario here is that the for(;;) loop executes until there are no more children left. Well, tell that to System 5! What it does is: 1) picks up the first proc that terminates 2) prints the message saying it picked it up 3) goes back to waits for more 4) sleeps in wait() for ever and ever... despite the fact that the second proc finishes and is in ZOMBIE state. Seems the code in psignal() checks to see if the signal being sent (in this case SIGCLD) is on "hold" (which it is), since i'm in catchchild(). If the signal is on "hold", it does not "wakeup" the process (IE no setrun() is executed). It does, however, set the "signal happened" bit in the p_sig mask. The routine psignal() does not check to see if someone is waiting for the process to finish (actually, exit() should probably check, then let wait() know.) I am not proposing a fix (although I could), in fact, I'm not sure it's a bug. After all, I did tell it that I didn't want to hear about SIGCLD while in catchchild(). I did not, however, tell it to not wakeup my wait() call when something died. Does anyone have any comments? (PS Yes, I know the algorithm above is dumb, and I found this problem out by mistake, but I found it interesting.) John Buck john@polyof.poly.edu john@POLYGRAF.BITNET (516)-755-4206