[comp.bugs.sys5] Bug

john@polyof.UUCP ( John Buck ) (08/09/88)

I recently noticed the following bug(?, or maybe "feature") with the
way wait(2) works in conjunction with the special System 5 signal handling
functions, namely, sigset(SIGCLD,...).

Consider the following piece of code:

main()
{
	int catchchild();
	int i, j;

	sigset(SIGCLD, catchchild);
	if((i = fork()) == 0){
		sleep(3);
		exit(1);
	}
	if((j = fork()) == 0){
		sleep(4);
		exit(2);
	}
	pause();
}
catchchild()
{
	/* Upon entry here (via a SIGCLD from the first fork() above),
	 * SIGCLD is automatically placed on "hold"
	 */
	int pid, status;
	for(;;){
		pid = wait(&status);
		if(pid == -1)
			break;
		printf("pid %d status = %d\n", pid, status);
	}
	printf("out of for loop, no more procs...\n");
}
/* Upon exit of this routine, SIGCLD is taken off hold, and any more that
 * come in can happen, but more importantly, any single SIGCLD that happened
 * while we were in the catchchild() routine will be triggered soon.
 */


The scenario here is that the for(;;) loop executes until there are no more
children left.  Well, tell that to System 5!  What it does is:
	1) picks  up the first proc that terminates
	2) prints the message saying it picked it up
	3) goes back to waits for more
	4) sleeps in wait() for ever and ever...
	   despite the fact that the second proc finishes and is in ZOMBIE
	   state.

Seems the code in psignal() checks to see if the signal being sent
(in this case SIGCLD) is on "hold" (which it is), since i'm in catchchild().
If the signal is on "hold", it does not "wakeup" the process (IE no setrun()
is executed).
It does, however, set the "signal happened" bit in the p_sig mask.
The routine psignal() does not check to see if someone is waiting for
the process to finish (actually, exit() should probably check, then let
wait() know.)

I am not proposing a fix (although I could), in fact, I'm not sure it's a bug.
After all, I did tell it that I didn't want to hear about SIGCLD while
in catchchild().  I did not, however, tell it to not wakeup my wait() call
when something died.

Does anyone have any comments?
(PS Yes, I know the algorithm above is dumb, and I found this problem out
 by mistake, but I found it interesting.)

John Buck
john@polyof.poly.edu
john@POLYGRAF.BITNET
(516)-755-4206