[comp.unix.wizards] Wait, Select, and a SIGCHLD Race Condition

stuart@cs.rochester.edu (12/11/87)

I need advice (or sympathy) for handling a race condition in 4.3BSD
flavored UNIX.  Briefly, I want to use wait3 to reap all the dead or
stopped children of a process, then use select to wait for the first
new IO or child activity.  Sketch something like this:

  while (0 < (pid = wait3(..., WNOHANG, ...))) {
    /* do something with child */
  }

/* XXX Race condition is here */

  numfds = select(...);
  if (numfds < 0) {
    if (errno == EINTR)
      /* caught a signal, what kind was it, etc */
  }

There is a race condition between reaping children and starting the
select.  It is possible that a child can change status, a SIGCHLD gets
delivered *before* I enter select, I don't notice it, enter select and
hang forever.  Even if I have a handler for SIGCHLD that sets a flag
and I check that flag immediately before calling select, there is still
a (small) window of vulnerability.

Ideally, I would like to set the signal mask to block SIGCHLD and have
select release the signal *after* starting to wait.  That would allow
me to ensure that *all* dead children are noticed.  However, select
does not release any signals as far as I can tell.  Berkeley truly
improved the signal handling features going to 4.3, but the (improved)
features don't seem to let me write this code safely.  (In particular,
the sigblock, signal, sigpause, signal, setsigmask idiom is of no help
here.)

I would appreciate advice on how to safely avoid this race condition
given 4.3BSD features.  I suspect that it's not possible, but would be
delighted to learn otherwise (see next paragraph for an equivocation
for "not possible").  It's not essential that the skeleton code look
like that given above;  all that's needed is that I/O and child
activity is processed as soon as *either* is available.  Neither kind
of activity is guaranteed to happen, and some events may already have
happened, which must not be ignored.

There *is* a kludge that I can fall back on, but I would really like to
avoid it:  Put a maximum on the timeout given to select and check for
more children when select times out.  Even if I miss a SIGCHLD, I would
still reap the child.  This is doable, but a pain, because I am
managing timer requests in addition to IO and child requests in the
same package;  keeping the real timeouts straight from the kluge
timeouts (which might coincide!) is real ugly.  The whole point of this
package is to multiplex lots of request and AVOID POLLING.  The kludge
is, of course, nothing but polling.

Stu Friedberg  {ames,cmcl2,rutgers}!rochester!stuart  stuart@cs.rochester.edu

matt@oddjob.uchicago.EDU (Keeper of the Sacred Tablets) (12/12/87)

Suppose you set all your fd's of interest to be asynchronous and you
handle the SIGIO instead of doing select?  Then it's signal vs.
signals instead of signals vs. select.  A blocking select would
translate into a sigpause().
________________________________________________________
Matt	     University		matt@oddjob.uchicago.edu
Crawford     of Chicago     {astrovax,ihnp4}!oddjob!matt

rpd@F.GP.CS.CMU.EDU (Richard Draves) (12/12/87)

One solution (this idea is from GNU Emacs) is to put a large timeout
on the select.  Make a SIGCHLD handler that clears the timeout.
Then if the handler is called before entering select, the select won't
block.

Rich

wen-king@cit-vlsi.Caltech.Edu (Wen-King Su) (12/12/87)

In article <5105@sol.ARPA> stuart@cs.rochester.edu writes:
>I need advice (or sympathy) for handling a race condition in 4.3BSD
<flavored UNIX.  Briefly, I want to use wait3 to reap all the dead or
>stopped children of a process, then use select to wait for the first
<new IO or child activity.  Sketch something like this:

This method uses the timeout in a non-polled way.  The ORIGINAL_TIMER
is the timer you are using.  The ZERO_TIMER is a structure with zero
in all its fields.  Now, instead of passing &ORIGINAL_TIMER to select,
you put its address in a pointer and pass that pointer instead so you
can change the pointer on the fly whenever a child signal is received.

  on SIGCHLD interrupt, set time_ptr = &ZERO_TIMER;

  do {  time_ptr = &ORIGINAL_TIMER;
	if (0 > (pid = wait3(..., WNOHANG, ...))) break;
	....  }

  numfds = select(...,time_ptr);
  if(time_ptr == &ZERO_TIMER) { go back to wait for childs }
  if (numfds < 0) { if (errno == EINTR) { ... } else { ... } }

/*------------------------------------------------------------------------*\
| Wen-King Su  wen-king@vlsi.caltech.edu  Caltech Corp of Cosmic Engineers |
\*------------------------------------------------------------------------*/

rpd@F.GP.CS.CMU.EDU (Richard Draves) (12/12/87)

>  on SIGCHLD interrupt, set time_ptr = &ZERO_TIMER;
...
>  numfds = select(...,time_ptr);
...
>| Wen-King Su  wen-king@vlsi.caltech.edu  Caltech Corp of Cosmic Engineers |

This method doesn't quite work.  The problem is if the signal comes in after
the select procedure call but before the trap to the kernel, then the
select will end up blocking because the kernel will get ORIGINAL_TIMER
instead of ZERO_TIMER as the argument to the trap.  The signal handler
must atomically zero out the actual timer structure.

Rich

matt@ncr-sd.SanDiego.NCR.COM (Matt Costello) (12/12/87)

In article <5105@sol.ARPA> stuart@cs.rochester.edu writes:
>I need advice (or sympathy) for handling a race condition in 4.3BSD
>flavored UNIX.  Briefly, I want to use wait3 to reap all the dead or
>stopped children of a process, then use select to wait for the first
>new IO or child activity.

I've two methods I use to get around the race conditions in signals.
They are:

1.  If you are not using SIGALRM for something else, have your timeout
    routine re-enable the SIGALRM on 1 second intervals until it is
    turned off in the outer level code.  If the original signal hits
    the timing hole then the second (or third) won't.
    The beauty of this is that it usable in any version of UNIX, since
    it uses no features specific to BSD or USG.

    For wanting to not miss any child processes with SIGCHLD:

	onedied() {
		signal(SIGCHLD,SIG_DFL); /* will infinite loop otherwise */
		signal(SIGALRM,onedied); alarm(1);
	}

		signal(SIGCHLD,onedied);
		/* race condition is here... */
		numfds = select();	  /* or read(), or msgrcv() */
		alarm(0);


2.  For select() or any operation where the process is waiting on incoming
    IO, you can have the signal routine send a dummy message that will
    cause the select() to return immediately.  Rather than aborting the
    operation find some way to make it terminate normally.  This works
    wonderfully for SYSV message queues since it is perfectly legal to
    send a zero length message.
-- 
Matt Costello	<matt.costello@SanDiego.NCR.COM>
+1 619 485 2926	<matt.costello%SanDiego.NCR.COM@Relay.CS.NET>
		{sdcsvax,cbosgd,pyramid,nosc.ARPA}!ncr-sd!matt

wen-king@cit-vlsi.Caltech.Edu (Wen-King Su) (12/12/87)

In article <496@PT.CS.CMU.EDU> rpd@F.GP.CS.CMU.EDU (Richard Draves) writes:
<>  on SIGCHLD interrupt, set time_ptr = &ZERO_TIMER;
>...
<>  numfds = select(...,time_ptr);
>...
<>| Wen-King Su  wen-king@vlsi.caltech.edu  Caltech Corp of Cosmic Engineers |
>
<This method doesn't quite work.  The problem is if the signal comes in after
>the select procedure call but before the trap to the kernel, then the
<select will end up blocking because the kernel will get ORIGINAL_TIMER
>instead of ZERO_TIMER as the argument to the trap.  The signal handler
<must atomically zero out the actual timer structure.
>
<Rich

OOPS, you are right.  How about this:

  on SIGCHLD interrupt, set time_struct = ZERO_TIMER;
...
  numfds = select(...,&time_struct);

It should cover all possibilities.  I knew there is a way, I just
couldn't remember all the details.

/*------------------------------------------------------------------------*\
| Wen-King Su  wen-king@vlsi.caltech.edu  Caltech Corp of Cosmic Engineers |
\*------------------------------------------------------------------------*/

jamesa%betelgeuse@Sun.COM (James D. Allen) (12/14/87)

In article <5105@sol.ARPA>, stuart@cs.rochester.edu writes:
> I need advice (or sympathy) for handling a race condition in 4.3BSD
> flavored UNIX.  Briefly, I want to use wait3 to reap all the dead or
> stopped children of a process, then use select to wait for the first
> new IO or child activity.  Sketch something like this:
> 
>   while (0 < (pid = wait3(..., WNOHANG, ...))) {
>     /* do something with child */
>   }
> /* XXX Race condition is here */
>   numfds = select(...);
	et cetera

This race condition is intriguing because it arises from "instruction
atomization" in the application program, and NOT due to limitations in the
Unix system interface.  Two correct solutions have been posted which take
advantage of a Unix feature (FASYNC in one case, select timeout in the
other).  It's worth noting, however, that the application code can detect
and fix the race without OS assistance, as shown in the crude fragment:

sig_catcher()
{
	caught_signal = TRUE;
	if (possible_race)
		longjmp(jmp_buffer, 1);
}
wait_for_event()
{
	...
	if (setjmp(jmp_buffer)) {
	    signalled:
		possible_race = FALSE;
	  /* process the signal */
		caught_signal = FALSE;
	} else {
		possible_race = TRUE;
		if (caught_signal)
			goto signalled;
		numfds = select(...);
	/*
	 * No longer need to check EINTR!!
	 *	if (numfds < 0 && errno == EINTR)
	 *		goto signalled;
	 */
		possible_race = FALSE;
	  /* process the select */
	}
	...
}

I don't know how to do this without `longjmp' (or something even uglier).

> 
> Stu Friedberg  {ames,cmcl2,rutgers}!rochester!stuart  stuart@cs.rochester.edu

James Allen      {ucbvax,hplabs,seismo}!sun!betelgeuse!jamesa

matt@oddjob.uchicago.EDU (Keeper of the Sacred Tablets) (12/14/87)

>Newsgroups: comp.unix.wizards
>Organization: Up against the wall of SCIENCE

Suppose you set all your fd's of interest to be asynchronous and you
handle the SIGIO instead of doing select?  Then it's signal vs.
signals instead of signals vs. select.  A blocking select would
translate into a sigpause().
________________________________________________________
Matt	     University		matt@oddjob.uchicago.edu
Crawford     of Chicago     {astrovax,ihnp4}!oddjob!matt

chris@mimsy.UUCP (Chris Torek) (12/15/87)

In article <36350@sun.uucp> jamesa%betelgeuse@Sun.COM (James D. Allen) writes:
>This race condition is intriguing because it arises from "instruction
>atomization" in the application program, and NOT due to limitations in the
>Unix system interface.

All user-level race conditions result from this; it can be (but does
not have to be) considered a deficiency in the interface.  This very
problem is precisely why I want a signal mask in select's arguments:
so that none of the three kludges are necessary.  (Some of you may
recall my recent diatribe on the subject.)

In summary:

	FASYNC
	select timeout
	longjmp

The first method works only if you are willing to recode everything
to use signals (and you still need select!---but with immediate
timeout); the select timeout works but is somewhat tricky to code;
the longjmp works but only if you have exactly one jump.  I would
probably use the select timeout trick myself.  Here it is in its
full glory:

struct select_goo {
	int	*nfds;			/* points to select's nfds argument */
	struct	timeval *timeout;	/* ... to timeout argument */
	struct	select_goo *inner;	/* previous select, if recursing */
} *select_goo;

catch_sigchld()
{

	child_changed = 1;
	if (select_goo) {
		*select_goo->nfds = 0;	/* saves work in select() */
		timerclear(select_goo->timeout);
	}
}

...
	int nfds, omask, cc;
	fd_set rfd, wfd, xfd;
	struct timeval to;
	struct select_goo goo;

	/* copy these since select or SIGCHLD will modify them */
	nfds = Nfds, rfd = Rfd, wfd = Wfd, xfd = Xfd, to = Timeout;

	/* hold the signal while we build the goo */
	omask = sigblock(sigmask(SIGCHLD));
	if (child_changed) {
		/* a signal already occurred, so skip all this */
		(void) sigsetmask(omask);
		return (0);
	}
	goo.nfds = &nfds;
	goo.timeout = &to;
	goo.inner = select_goo;
	select_goo = &goo;

	/* now safe to release the signal */
	(void) sigsetmask(omask);

	cc = select(nfds, &rfd, &wfd, &xfd, &to);

	/* either the select selected, or the signal hit,
	   or the timeout expired */

	/* undo the goo */
	omask = sigblock(sigmask(SIGCHLD));
	select_goo = goo.inner;

	/* figure out which event occurred */
	if (child_changed) {
		/* the signal hit */
	} else if (cc == 0) {
		/* the timeout expired */
	} else {
		/* the select selected */
	}
	(void) sigsetmask(omask);
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

robf2@pyuxf.UUCP (robert fair) (12/16/87)

In article <10829@brl-adm.ARPA>, matt@oddjob.uchicago.EDU (Keeper of the Sacred Tablets) writes:
> >Newsgroups: comp.unix.wizards
> >Organization: Up against the wall of SCIENCE
> 
> Suppose you set all your fd's of interest to be asynchronous and you
> handle the SIGIO instead of doing select?  Then it's signal vs.
> signals instead of signals vs. select.  A blocking select would
> translate into a sigpause().
> ________________________________________________________
> Matt	     University		matt@oddjob.uchicago.edu
> Crawford     of Chicago     {astrovax,ihnp4}!oddjob!matt

Well, so far over 15 copies of this posting have reached us here at Bellcore.
Is anyone else suffering from the Xerox effect ?

Rob Fair
ihnp4!pyuxww!pyuxf!robf2

Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler
Polite News Filler

rml@hpfcdc.HP.COM (Bob Lenk) (12/18/87)

> In summary:
> 
> 	FASYNC
> 	select timeout
> 	longjmp

A fourth (not necessarily better) alternative is to have an extra pipe
or socket that the signal handler writes to, and select on it for
reading.  One reply alluded to this general type of mechanism, but
didn't tie it into specific BSD features.

One warning about the longjmp() method is that it can be troblesome in
the presence of other signals.  If SIGCHLD happens to interrupt another
signal handler, that handler can be aborted.  This can be avoided by
setting the SIGCHLD bit in sv_mask for all other handlers that might be
invoked, further complicating things.

		Bob Lenk
		{ihnp4, hplabs}!hpfcla!rml