bowen@sgi.com (Jerre Bowen) (03/17/90)
From: bowen@sgi.com (Jerre Bowen) Folks: I'm wondering if there is an easy way in POSIX to be absolutely certain that a process which calls a library routine that forks and waits on a child does not lose any SIGCLDs. I apologize for the length of this article. Here's the scenario: void cldhandler(); pid_t pid; main() { sigset_t mtmask; struct sigaction action; sigemptyset(&mtmask); /* sigsuspend with no sigs blocked */ /* SIGCLD handler runs with SIGCLD blocked */ sigemptyset(&action.sa_mask); sigaddset(&action.sa_mask, SIGCLD); action.sa_handler = cldhandler; action.sa_flags = 0; sigaction(SIGCLD, &action,NULL); if ( (pid = fork()) == 0) { sleep(1); exit; } else { forkit(); sigsuspend(&mtmask); /* will parent awaken? */ } } void cldhandler(sig) { waitpid(pid, &stat, (WNOHANG|WUNTRACED)); } forkit() { struct sigaction act, oact; act.sa_handler = SIG_DFL; act.sa_mask = 0; act.sa_flags = 0; sigaction(SIGCLD, &act, &oact); /* default handling for SIGCLD */ <process forks and execs a program which runs for at least 1 sec> <process does a waitpid() on its child process> sigaction(SIGCLD, &oact, NULL); /* reinstall prior handling */ } The problem here is that the original child of the parent will exit while forkit() is executing, and since SIGCLD is SIG_DFL'ed during that time, a zombie *will* be created, but the SIGCLD will *not* be delivered. The parent then suspends waiting for the SIGCLD indicating that its child exited, which of course never arrives. (Obviously, I am primarily concerned about the case where forkit() is a library routine, and the user has no idea what the routine is doing with signals--and *shouldn't* need to either.) SysV solves this problem in signal() and sigset() by checking for zombied children at the bottom of the kernel code, and--if any exist-- re-raising a SIGCLD, thus creating the impression that it is impossible to lose a SIGCLD. BSD requires the user to get around the problem of lost SIGCHLDs by calling wait3(WNOHANG) until no more children remain to be reaped whenever one SIGCHLD is received. But in a BSD version of the above code, you never get any SIGCHLD, so the parent hangs. POSIX has provided waitpid in order to allow library routines such as system(3) and popen/pclose(3), which need to fork and wait for child processes, to be implemented reliably even in the case that the calling program has child processes that may terminate while in the library routines. But the above program example shows that a conforming implementation still does not necessarily allow an application program to depend on facilities like system(3). The reason is that POSIX explicitly leaves undefined the question of whether SIGCHLD is raised when a process with a terminated child for which it has not waited establishes a handler for SIGCHLD (see section 3.3.1.3 paragraph 3(e)). One way in which an implementation can make the above program work properly is to raise SIGCHLD in this case (i.e. whenever a process with an outstanding zombie calls sigaction to set a handler for SIGCHLD). Is there a compelling reason for the standard not to require this behavior? Granted the implementor has the ability to make things work correctly. But if the behavior isn't required, the writer of conforming applications can't depend on it. Is there some other better solution to the problem posed by the sample program? Thanks -- Jerre Bowen (bowen@sgi.com) Volume-Number: Volume 18, Number 79