[comp.unix.xenix] Getting rid of a <defunct> process

kory@avatar.UUCP (Kory Hamzeh) (02/11/89)

I have written an application which forks and execs off many subtasks.
The main process (the parent which does all of the forks) can not
do a wait() because I can't get blocked for anything. Well, this results
in a lot of "<defunct>" processes in the process table when the child
exits.

Is there any way to get rid of these processes? If not, will they take
up any core space? I assume they will take up a slot in the process table.

A short term solution would be to keep a list of all of the pids which
where created and occasionaly do a kill(pid, 0) to see if the process is
still alive. If not, then do a wait().

The OS that I am using is XENIX/386 System V Version 2.3.1.

Any help would be greatly appreciated.

Thank you,
kory


-- 
-------------------------------------------------------------------------------
Kory Hamzeh			    UUCP:     ..!uunet!psivax!quad1!avatar!kory
				    INTERNET: avatar!kory@quad.com

madd@bu-cs.BU.EDU (Jim Frost) (02/13/89)

In article <102@avatar.UUCP> kory@avatar.UUCP (Kory Hamzeh) writes:
|I have written an application which forks and execs off many subtasks.
|The main process (the parent which does all of the forks) can not
|do a wait() because I can't get blocked for anything. Well, this results
|in a lot of "<defunct>" processes in the process table when the child
|exits.
|
|Is there any way to get rid of these processes? If not, will they take
|up any core space? I assume they will take up a slot in the process table.

What I usually do is:

	fireman() /* catches falling children */
	{ union wait wstatus;

	  while (wait3(&wstatus, WNOHANG, NULL) >= 0)
	    ;
	}

	main()
	{
	  signal(SIGCHLD, fireman);

          /*...*/
        }

<defunct> processes (also called "zombies") definitely take up process
table entries and probably take up many more resources.  Exactly what
a zombie process holds depends on the operating system implementation.

jim frost
madd@bu-it.bu.edu

dyer@spdcc.COM (Steve Dyer) (02/13/89)

In article <102@avatar.UUCP> kory@avatar.UUCP (Kory Hamzeh) writes:
>I have written an application which forks and execs off many subtasks.
>The main process (the parent which does all of the forks) can not
>do a wait() because I can't get blocked for anything. Well, this results
>in a lot of "<defunct>" processes in the process table when the child
>exits. Is there any way to get rid of these processes? If not, will they take
>up any core space? I assume they will take up a slot in the process table.

Defunct, or "zombie", processes take up a process slot, but no other resources.
Nevertheless, process slots themselves may be scare resources on a busy
system (especially if you expect to run a lot of programs like the one
you describe.)

You can change the behavior of wait and exit by having the parent
process (the one doing the forking) call signal(SIGCLD, SIG_IGN):
now, child processes which exit will not leave zombies around, and
the wait system call in the parent will not return until all children
have died.  If the parent never performs a wait(), and you're not
interested in the exit status of your children, this will give you
the behavior you desire.

I've never really liked this SYSV behavior, since it overloads signal
semantics with stuff they were never designed to have, but it's there
and it works...

-- 
Steve Dyer
dyer@ursa-major.spdcc.com aka {ima,harvard,rayssd,linus,m2c}!spdcc!dyer
dyer@arktouros.mit.edu

jamesa@arabian.Sun.COM (JD Allen) (02/13/89)

In article <102@avatar.UUCP>, kory@avatar.UUCP (Kory Hamzeh) writes:
> I have written an application which forks and execs off many subtasks.
> The main process (the parent which does all of the forks) can not
> do a wait() because I can't get blocked for anything. Well, this results
> in a lot of "<defunct>" processes in the process table when the child
> exits.

	Simply replace:

		if (fork() == 0)
			run_child_code();

	With a lintable version of:

		if (fork() == 0)
			fork() ? _exit() : run_child_code();
		else
			wait(0);	/* reap the "dummy" process */

	When your orphaned grandchild exits it will now be reaped by the Grim
	Reaper.  "Bastards" don't become "zombies".  (Holy metaphor, Batman!)

	(On the other hand, "morally", if a process is worth running, its exit
	status is worth inspecting.  You can get it "asynchronously" with
	SIGCHLD and/or wait3().)

	-- James Allen

gregg@ihlpb.ATT.COM (Wonderly) (02/13/89)

From article <102@avatar.UUCP>, by kory@avatar.UUCP (Kory Hamzeh):
> I have written an application which forks and execs off many subtasks.
> The main process (the parent which does all of the forks) can not
> do a wait() because I can't get blocked for anything. Well, this results
> in a lot of "<defunct>" processes in the process table when the child
> exits.
> 
> Is there any way to get rid of these processes? If not, will they take
> up any core space? I assume they will take up a slot in the process table.

If you have SIGCLD, you have two choices.  If you absolutly do not wish to
know anything about the children than just use

    signal (SIGCLD, SIG_IGN);

to tell the kernel to implicilty wait on them for you.  When they die, they
will be removed...  If you wish to know something, then use something like

    routine ()
    {
        int status, pid;

        while ((pid = wait (&status)) == -1 && errno == EINTR)
            continue;
        if (status)
            printf ("Process %d died with status 0x%04x\n", pid, status);
    }

    main (...)
    {
    ...
        signal (SIGCLD , routine);
    ...
    }

to look at the return status and catch possible problems.  I would suggest
that you do check return status because subtle problems can go unnoticed if
you do not.

If you do not have SIGCLD, use

    junk()
    {}
        
    main()
    {
    ...
        try_wait();
    ...
    }

    try_wait()
    {
        int amt, status, pid, (*ftn)();

        amt = alarm (0);
        ftn = signal (SIGLARM, junk);
        (void) alarm (1);

        pid = wait (&status);
        (void) alarm (0);

        (void) signal (SIGALRM, ftn);
        (void) alarm (amt);

        if (pid != -1 && status)
            printf ("Process %d died with status 0x%04x\n", pid, status);
    ...
    }
-- 
Gregg Wonderly                             DOMAIN: gregg@ihlpb.att.com
AT&T Bell Laboratories                     UUCP:   att!ihlpb!gregg

dg@lakart.UUCP (David Goodenough) (02/14/89)

From article <102@avatar.UUCP>, by kory@avatar.UUCP (Kory Hamzeh):
> I have written an application which forks and execs off many subtasks.
> The main process (the parent which does all of the forks) can not
> do a wait() because I can't get blocked for anything. Well, this results
> in a lot of "<defunct>" processes in the process table when the child
> exits.

This may provide a workable solution, but there is a disadvantage:
It will slow down the fork process, because two fork calls are required
instead of 1. However, that may or may not be an issue.
You may need to muck with the calling parameters passed to forknew, I've
assumed useage of execv() - modify as appropriate for whatever you're
using.

forknew(prog, argv)
char *prog, **argv;
 {
    int pid1, pid2;

    if (fork() != 0)		/* parent */
     {
	wait(&pid1);		/* wait for child */
	return;			/* and carry on */
     }
    if (fork() != 0)		/* child forks AGAIN, */
      exit(0);			/* child exits, so parent's wait()
				 * doesn't block */
    execv(prog, argv);		/* grandchild actually does the execv, but
				 * since it has no parent it gets inherited
				 * by init (PID 1) which will detect it's
				 * passing, and do a wait */
 }

This works on BSD, but may well work on other systems

Comments anyone?
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

allbery@ncoast.ORG (Brandon S. Allbery) (02/18/89)

As quoted from <102@avatar.UUCP> by kory@avatar.UUCP (Kory Hamzeh):
+---------------
| I have written an application which forks and execs off many subtasks.
| The main process (the parent which does all of the forks) can not
| do a wait() because I can't get blocked for anything. Well, this results
| in a lot of "<defunct>" processes in the process table when the child
| exits.
+---------------

Add a call to signal(SIGCLD, SIG_IGN).  This tells the system to not leave
zombies around for child processes.

++Brandon
-- 
Brandon S. Allbery, moderator of comp.sources.misc	     allbery@ncoast.org
uunet!hal.cwru.edu!ncoast!allbery		    ncoast!allbery@hal.cwru.edu
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>
NCoast Public Access UN*X - (216) 781-6201, 300/1200/2400 baud, login: makeuser

logan@vsedev.VSE.COM (James Logan III) (02/21/89)

In article <2565@spdcc.SPDCC.COM> dyer@ursa-major.spdcc.COM (Steve Dyer) writes:
# In article <102@avatar.UUCP> kory@avatar.UUCP (Kory Hamzeh) writes:
# >I have written an application which forks and execs off many subtasks.
# >The main process (the parent which does all of the forks) can not
# >do a wait() because I can't get blocked for anything. Well, this results
# >in a lot of "<defunct>" processes in the process table when the child
# >exits. Is there any way to get rid of these processes? If not, will they take
# >up any core space? I assume they will take up a slot in the process table.
# 
# You can change the behavior of wait and exit by having the parent
# process (the one doing the forking) call signal(SIGCLD, SIG_IGN):
# now, child processes which exit will not leave zombies around, and
# the wait system call in the parent will not return until all children
# have died.  If the parent never performs a wait(), and you're not
# interested in the exit status of your children, this will give you
# the behavior you desire.

Unless you do this under XENIX...  I had a problem with this a
couple of months ago.  When I ignored the SIGCLD signal and
called system("some_program"), the defunct processes were still
hanging around.  

I had to fork, call system() in the child, and then ignore SIGCLD
in the parent to solve the problem (if I remember correctly).  I
believe the problem is in the SCO Bourne shell since this
does not occur on any other System V machine I've used.   

			-Jim
-- 
Jim Logan                           logan@vsedev.vse.com
VSE Software Development Lab        uucp:  ..!uunet!vsedev!logan
(703) 418-0002                      inet:  logan%vsedev.vse.com@uunet.uu.net