[comp.unix.questions] defunct processes even after signal

kannan@cerc.wvu.wvnet.edu (R. Kannan) (09/20/89)

Hai

	We are faced with a strange problem. Even though we have the 
        int ChildInt ()
{
      wait((union wait *)0);
}

  ....
  ....

    signal(SIGCHLD, ChildInt);
  ....
....
  
      when we really try to fork at a very rapid rate, we end up with
      some defunct (ZOMBIE ) processes.


	Is a SUN 4 environment. COuld some one explain to us what 
        could be wrong? and possible solutions.


Thanks very much ,

kannan

cpcahil@virtech.UUCP (Conor P. Cahill) (09/20/89)

In article <230@cerc.wvu.wvnet.edu.edu>, kannan@cerc.wvu.wvnet.edu (R. Kannan) writes:
> 	We are faced with a strange problem. Even though we have the 
>         int ChildInt ()  {   wait((union wait *)0); }
> 
>     		signal(SIGCHLD, ChildInt);
>   
>       when we really try to fork at a very rapid rate, we end up with
>       some defunct (ZOMBIE ) processes.

> 	Is a SUN 4 environment. COuld some one explain to us what 
>         could be wrong? and possible solutions.

It *sounds like* you are loosing some of the sigchlds.  Since you are 
running under the sun os I would recommend using the BSD signal handling
mechanisms which should handle the problem.



-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

guy@auspex.auspex.com (Guy Harris) (09/23/89)

>It *sounds like* you are loosing some of the sigchlds.  Since you are 
>running under the sun os I would recommend using the BSD signal handling
>mechanisms which should handle the problem.

Since he's running under SunOS, unless he's building his code in the
System V environment he *is* using the BSD signal handling mechanisms,
even if he's using "signal()".  (Yes, "signal()" in BSD, and the BSD
environment of SunOS, has BSD rather than V7 semantics.)

The problem is that there isn't any guarantee in BSD that one SIGCHLD is
delivered for each child process.  The SIGCHLD handler should loop until
there are no zombies to be picked up.  For example, this is the SIGCHLD
handler used in the "script" command (simplified a bit):

	#include <sys/wait.h>

	finish()
	{
		union wait status;

		while (wait3(&status, WNOHANG, 0) > 0)
			;
	}

The WNOHANG makes sure it doesn't block waiting for children that
haven't exited yet.