[comp.unix.wizards] System V / SIGCLD questions...

murrey@lehi3b15.csee.Lehigh.EDU (Erik Murrey) (05/12/89)

Here's one I can't figure out...

I'm writing a routine (like reapchildren(), found all over the place
in BSD code), using the SIGCLD signal to invoke it.  Its job is to
execute a wait(2) on all of the children that have died, getting their
exit status, etc.

Now for the problem:

It seems (according to the man page for signal(2)) that during the
signal handler for SIGCLD (reapchildren()), any further SIGCLD's will
be ignored.  By the same token, I can't do a non-blocking wait() in
System V, so I can't repeatedly call wait() (I might have other
children that haven't exit()-ed yet)

Here is what happens:

If I have spawned a few children, and one just did an exit(), then my
reapchildren() signal handler gets called.  It does a wait(), gets the
exit status, and gets out of there (after resetting the SIGCLD
signal).  If by chance a second child dies while we are within the
reapchildren() handler, I never get notified of it...

How can I get around this?   If only I had wait3()...


... Erik
-- 
Erik Murrey
Lehigh University
murrey@csee.Lehigh.EDU
erik@mpx.com

panos@boulder.Colorado.EDU (Panos Tsirigotis) (05/12/89)

In article <565@lehi3b15.csee.Lehigh.EDU> murrey@lehi3b15.csee.Lehigh.EDU (Erik Murrey) writes:
>
>Here's one I can't figure out...
>
>I'm writing a routine (like reapchildren(), found all over the place
>in BSD code), using the SIGCLD signal to invoke it.  Its job is to
>execute a wait(2) on all of the children that have died, getting their
>exit status, etc.
>
>Now for the problem:
>
>It seems (according to the man page for signal(2)) that during the
>signal handler for SIGCLD (reapchildren()), any further SIGCLD's will
>be ignored.  By the same token, I can't do a non-blocking wait() in
>System V, so I can't repeatedly call wait() (I might have other
>children that haven't exit()-ed yet)
>
>Here is what happens:
>
>If I have spawned a few children, and one just did an exit(), then my
>reapchildren() signal handler gets called.  It does a wait(), gets the
>exit status, and gets out of there (after resetting the SIGCLD
>signal).  If by chance a second child dies while we are within the
>reapchildren() handler, I never get notified of it...

I can see 2 solutions:
First solution:
Have each call to reapchildren() reap just 1 child. Since you got a
SIGCLD you know that at least 1 child id dead. The first thing
reapchildren() should do is to reset the signal handler for SIGCLD
so if another child dies while in reapchildren(), reapchildren() will
be called recursively. This solution has 2 problems:
a) There is a race between restoring the signal handler and receiving
another SIGCLD.
b) More that 1 child may have died but you only get 1 SIGCLD.

Second solution:
Before doing the wait in reapchildren() arrange to have a timer interrupt
in case wait blocks:
The code would look like this:
	alarm( 1 ) ;		/* no setitimer in System V */
	pid = wait( &status ) ;
	if ( pid == -1 )			/* you may want to check errno for EINTR too */
	{
		/* timer expired */
	}
	else
	{
		alarm( 0 ) ;		/* reset timer */
		do whatever processing is needed
	}

I think the second solution is better since it is more reliable.

Panos

----------------------------------------------------
| Email address: panos@boulder.colorado.edu        |
----------------------------------------------------

chris@mimsy.UUCP (Chris Torek) (05/12/89)

In article <565@lehi3b15.csee.Lehigh.EDU> murrey@lehi3b15.csee.Lehigh.EDU
(Erik Murrey) writes:
>If I have spawned a few children, and one just did an exit(), then my
>reapchildren() signal handler gets called.  It does a wait(), gets the
>exit status, and gets out of there (after resetting the SIGCLD
>signal).  If by chance a second child dies while we are within the
>reapchildren() handler, I never get notified of it...

Your analysis is right, but this is not in fact what happens.  In
System V (all releases for all hardware platforms, for once), SIGCLD
works differently from every other signal.  The default action for
SIGCLD is to do nothing; the `ignore' action for SIGCLD is to discard
exiting children; and the `catch' action is as usual.  But when the
signal is set from any previous disposition to `catch', if there are
any pending exited children, a new SIGCLD signal is generated.

Loosely:

/* signal system call */
	if (sig == SIGCLD)
		switch (action) {
		case SIG_DFL:	/* default action is to ignore */
			... nothing special ...
			break;
		case SIG_IGN:	/* ignore action is to flush */
			... flush any children ...
			break;
		default:	/* catch: regenerate */
			... set handler ...
			if (there are children)
				psig(u.u_procp, SIGCLD);
			break;
		}

This means that a catcher routine written as

	catch()
	{
		int w, status;

		signal(SIGCLD, catch);
		w = wait(&status);
	}

recurses infinitely as soon as one child exits.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

kucharsk@uts.amdahl.com (William Kucharski) (05/12/89)

In article <17457@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
 (Explanation of SYSV SIGCLD deleted...)

 >This means that a catcher routine written as
 >	catch()
 >	{
 >		int w, status;
 >		signal(SIGCLD, catch);
 >		w = wait(&status);
 >	}
 >recurses infinitely as soon as one child exits.

Yep.  The signal(SIGCLD, catch) has to go _after_ the wait(2), unlike the
way you would program a signal handler for "normal" signals...
-- 
					William Kucharski

ARPA: kucharsk@uts.amdahl.com
UUCP: ...!{ames,decwrl,sun,uunet}!amdahl!kucharsk

Disclaimer:  The opinions expressed above are my own, and may not agree with
	     those of any other sentient being, not to mention those of my 
	     employer.  So there.

murrey@lehi3b15.csee.Lehigh.EDU (Erik Murrey) (05/13/89)

In article <17457@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <565@lehi3b15.csee.Lehigh.EDU> murrey@lehi3b15.csee.Lehigh.EDU
>(Erik Murrey) writes:
>> ...
> ...
>This means that a catcher routine written as
>
>	catch()
>	{
>		int w, status;
>
>		signal(SIGCLD, catch);
>		w = wait(&status);
>	}
>
>recurses infinitely as soon as one child exits.

Yes, and in fact I tried this once before, and it did go into
an endless loop after the first catch().

So my solution is to reset the sigcld signal after the wait(). Doesn't
this leave a window where a child can die, and I won't ever know about
it?

At least they could have given me a SIG_HOLD...

.
.
.
.

-- 
Erik Murrey
Lehigh University
murrey@csee.Lehigh.EDU
erik@mpx.com

chris@mimsy.UUCP (Chris Torek) (05/13/89)

In article <567@lehi3b15.csee.Lehigh.EDU> murrey@lehi3b15.csee.Lehigh.EDU
(Erik Murrey) writes:
>So my solution is to reset the sigcld signal after the wait(). Doesn't
>this leave a window where a child can die, and I won't ever know about
>it?

No.  Reread <17457@mimsy.UUCP>.  Any `lost' SIGCLD is `regenerated'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris