[net.unix-wizards] UNIX question

pjs@uw-june (Philip J. Schneider) (12/06/85)

I have a C program which, during the course of its execution, spawns(forks)
child processes. I should mention that a process is spawned, it lives
for a while, then exits, and then sometime later the same thing happens,
and so on.  This all happens within the lifetime of the parent, and I
would like to do this an arbitrary number of times.  Sometimes, two or
more child processes exist at once, but the upper limit on child
processes that exist concurrently is low, and a group of such children
exit before the next process begins.

Since UNIX only allows one a certain number of processes at a time,
eventually during the course of execution of the parent I run out of
processes.  If I temporarily stop the parent process execution and
do a 'ps', the child processes show up in the list with a 'Z' status.
They do not completely disappear until the parent process exits.  As some
of you probably already know, these useless ex-processes can't even
be completely gotten rid of with a 'kill' command.  The result is that
these processes are taking up my process quota, even though they are
dead in all practical terms (in that they finished their work and exited
properly).  Of course, they do go away completely once the parent exits.

I can certainly understand why one is allowed only a limited number of
active processes at any time.  My processes, however, are not at all
active once they have exited, and I feel that once a process exits,
I should have my quota "credited" so that I can get more.

Clearly, my problem is how to get around this situation.  I could
(possibly) get a higher limit on my process quota, but this would
only mean that running out of processes will happen a little later.

My question: Is there any way to kill off these zombies so I can get
	     more processes ?  Or, failing that, is there any other
	     way to do what I want ?


Please respond by e-mail if you can help at all, or if you need more
details.  Thanks in advance.
-- 

  Philip Schneider                               
  University of Washington Computer Science                
  pjs@{uw-june.arpa,washington.arpa}                      
  {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs           

zemon@fritz.UUCP (Art Zemon) (12/10/85)

In article <156@uw-june> pjs@uw-june (Philip J. Schneider) writes:
>
>
>I have a C program which, during the course of its execution, spawns(forks)
>child processes. I should mention that a process is spawned, it lives
>for a while, then exits, and then sometime later the same thing happens,
>and so on.  This all happens within the lifetime of the parent, and I
>would like to do this an arbitrary number of times.  Sometimes, two or
>more child processes exist at once, but the upper limit on child
>processes that exist concurrently is low, and a group of such children
>exit before the next process begins.
>
>Since UNIX only allows one a certain number of processes at a time,
>eventually during the course of execution of the parent I run out of
>processes.  If I temporarily stop the parent process execution and
>do a 'ps', the child processes show up in the list with a 'Z' status.
>They do not completely disappear until the parent process exits.  As some
>of you probably already know, these useless ex-processes can't even
>be completely gotten rid of with a 'kill' command.  The result is that
>these processes are taking up my process quota, even though they are
>dead in all practical terms (in that they finished their work and exited
>properly).  Of course, they do go away completely once the parent exits.

The child processes have gone away properly and are waiting for
someone (some process) to collect their exit statuses with a
wait() or a wait3().  They cannot be kill()-ed because they are
already dead.

They go away after the parent exits because init inherits them
and does enough wait()s to get rid of all of them.
-- 
	-- Art Zemon
	   FileNet Corp.
	   ...! {decvax, ihnp4, ucbvax} !trwrb!felix!zemon

lasse@daab.UUCP (Lars Hammarstrand) (12/10/85)

>I have a C program which, during the course of its execution, spawns(forks)
>child processes .....
>
>... do a 'ps', the child processes show up in the list with a 'Z' status.
>
>.. of you probably already know, these useless ex-processes can't even
>be completely gotten rid of with a 'kill' command.....



If you don't want your children to end up as (Z)ombie processes, the parent
process have to execute a wait(2) on each child that have been killed or stoped.

See also: signal(2) and kill(2).


	Lars Hammarstrand.

ahb@ccice5.UUCP (Al Brumm) (12/12/85)

In article <156@uw-june> pjs@uw-june (Philip J. Schneider) writes:
>My question: Is there any way to kill off these zombies so I can get
>	     more processes ?  Or, failing that, is there any other
>	     way to do what I want ?

A clean way to handle this problem on Sys3 was to use the following
system call in the parent process:
	signal(SIGCLD, SIG_IGN);

Then when a child process exited, a zombie would not be created.
Note that this would not allow you to examine the child's exit
status.  However, you could examine the exit status by doing the 
following:
	int
	sigcld()
	{
		int pid, status;
		pid = wait(&status);
		.
		. (do stuff)
		.
	}
	main()
	{
		int	(*sigcld)();

		signal(SIGCLD, sigcld);
	}

The example immediately above is also possible in 4.2BSD,
only SIGCLD is called SIGCHLD.

Then again, there is always the double fork() trick which goes
something like this:
	if (fork()) {			/* parent */
		wait((int *)0);		/* no zombies please */
	}
	else {
		if (fork()) {		/* child */
			exit(0);	/* satisfy parent's wait */
		}
		else {			/* grandchild */
			do_stuff();		/* since my parent exit'ed */
			.			/* I am inherited by init */
			.
			.
		}
	}
The above trick is used quite heavily by the UNET servers.

chris@umcp-cs.UUCP (Chris Torek) (12/14/85)

In article <974@ccice5.UUCP> ahb@ccice5.UUCP (Al Brumm) writes:

> A clean way to [ignore children] on Sys3 was to use the following
> system call in the parent process:
>	signal(SIGCLD, SIG_IGN);

Cute... maybe I will add this hack to our kernel.  One question:
Is SIGCLD always reset to SIG_DFL on exec?  If not, since ignored
signals normally remain ignored, it could break other programs
which expect to collect children; and programs that ignore SIGCLD
would have to carefully un-ignore it just after forks.

> Note that this would not allow you to examine the child's exit
> status.  However, you could examine the exit status by doing the 
> following:
>	int
>	sigcld()
>	{
>		int pid, status;
>		pid = wait(&status);
>		...
>	}
>	main()
>	{
>		int	(*sigcld)();
>
>		signal(SIGCLD, sigcld);
>	}

Well, the `int (*sigcld)()' declaration is wrong and (in this case)
unnecessary; it should be `int sigcld()' if anything.  But that is
not all that is amiss.  In V7, 3BSD, and 4BSD, and I suspect also
in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well,
signals are not queued, and without the `jobs library' of 4.1BSD,
or the signal facilities of 4.2, this code cannot be made to operate
reliably.  It *will fail*, someday, no doubt at the worst possible
moment.

The problem is that several children may exit in quick succession.
Only one SIGCLD signal will be delivered, since the parent process
will (just this once) not manage to run before all have exited.
The sigcld handler has no way of determining how many children are
to be processed.

In 4.1BSD and later, the solution is a new `system call', wait3().
This call has two optional parameters, WNOHANG and WUNTRACTED.
WNOHANG tells the kernel not to wait for existing children to exit.
Instead, wait3 returns 0 in this case, allowing the signal handler
to finish up, having now collected all exited children.  (WUNTRACED
exists only for C-shell style job control with stopped processes,
and is irrelevant here.)

Unfortunately, this solution is still incomplete.  There are race
conditions unless the child exit signal is withheld (but not ignored)
for the duration of the child collection routine, and can be withheld
during process creation (in case the created process exits before
the parent finishes updating data structures).  This is the case
under the 4.1BSD `jobs' library, and in all 4.2 and 4.3 systems.

Anyway, what it all boils down to is that process control is
unreliable in many versions of Unix, but can be made reliable in
4.1, 4.2, and 4.3BSD.  If there is any way to reliably handle
process exit and `job control' style processing in System III and
System V, I am not aware of it---though that should be unsurprising
since I have never used them.  If it is possible in the latest AT&T
Unixes, I would like to know how.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

lcc.richard@locus.ucla.edu (Richard Mathews) (12/15/85)

There have been several responses to this which indicated that zombies can
be cleaned up by having the parent call wait(2) (or in Berkeley compatible
systems, wait3(2)).  Under System V there is an alternate method that can
be used.  If the parent ignores SIGCLD, then the manual states that "the
calling process's child processes will not create zombie processes when
they terminate" (see signal(2)).  In reality (at least on a VAX) they do
create zombies, but the parent automatically cleans them up.  The effect is
the same.

Richard M. Mathews
Locus Computing Corporation		       lcc.richard@LOCUS.UCLA.EDU
					       lcc.richard@UCLA-CS
			  {ihnp4,ucivax,trwrb}!lcc!richard
       {randvax,sdcrdcf,ucbvax,trwspp}!ucla-cs!lcc!richard

dave@onfcanim.UUCP (Dave Martindale) (12/15/85)

In article <2548@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>Anyway, what it all boils down to is that process control is
>unreliable in many versions of Unix, but can be made reliable in
>4.1, 4.2, and 4.3BSD.  If there is any way to reliably handle
>process exit and `job control' style processing in System III and
>System V, I am not aware of it---though that should be unsurprising
>since I have never used them.  If it is possible in the latest AT&T
>Unixes, I would like to know how.

(My only experience with system V is on an IRIS workstation, which is system
V with some Berkeley stuff.  But the signal mechanism seems to be from
System V - there is none of the Berkeley sigmask stuff.)

On the IRIS, if you read the fine print, you will find that SIGCLD
doesn't behave like a "normal" signal.  It seems that SIGCLD is generated
by the presence of a zombie child, not the event of a child terminating.
This was brought home to me in a program that had a single child.  When
the child terminated, the SIGCLD handler (due to me not understanding
what was going on) re-enabled the signal before waiting for the child.
Immediately, another SIGCLD was delivered, and so on until the stack
overflowed.

So, if you had a second child exit while handling the first SIGCLD, no
problem - you'll get another SIGCLD as soon as you re-enable the signal.

It is also unlike a "normal" signal in that the "default" action is
for nothing to happen, while a "normal" signal causes some action beyond
the control of the process.

The essential difference, I think, is simply that V7 signals had no
"memory" - when one was delivered, either you caught it or you ignored
it and it went away, but you couldn't "hold" it.  4.2 signal handling
knows about signals that are held for later delivery.  SV doesn't
have this in general for signals, but in the case of SIGCLD the
existence of the zombie process provides the "memory".

	Dave Martindale

rich@rexago1.UUCP (K. Richard Magill) (12/16/85)

In article <14767@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>In article <2548@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>>If there is any way to reliably handle
>>process exit and `job control' style processing in System III and
>>System V, I am not aware of it---
>It seems that SIGCLD is generated
>by the presence of a zombie child, not the event of a child terminating.

In the spirit of information sharing...

With respect to a 3b2/300 running SV.2.2...

I guess I can't/shouldn't copy the manual but SIGCLD is generated by the
death of a child and reset when  caught.  ...  behaves as other  signals
with the  exception  that  successive SIGCLD's  are  queued  instead  of
successively interrupting the catching function.

A rudimentary job control has been accomplished on SV.2.2.  It  consists
of a parent process that calls  subprocesses which are $SHELL.  All  you
can do from the  parent, shl, is create,  delete, block, unblock,  list,
etc.  ie,  job control only.   not really  a shell.   Typing your  SWTCH
character from a child gets you back to shl, with the child  effectively
bg'd, from which  you may  create a  new subshell.   Of  course you  are
limited to eight subprocesses.  All of this is accomplished using pseudo
terminals, (sxt's), a new  control character defined in  termio.c_cc[7],
SWTCH, a new control  mode, in (termio.c_cflag  & 0x10000), LOBLK  which
blocks output of the current layer.

I should add that  shl cannot be  your login shell  and doesn't work  if
exec'd from you login shell.  I do use it.  It's not csh but its  better
than sh alone.

Ksh is puported to do csh style job control on 3b2 but I have yet to see
it work.  The copies  I have seen tend  to lock up terminals  frequently
when you try.

AGAIN!   This is  3b2/300 SV.2.2.   I  know that  the pc7300  SV  pc7300
version 3 does NOT  have these features and  I can't speak for  anything
else.

K. Richard Magill
Have I violated copyright?  Have I said something stupid?

lcc.rich-wiz@locus.ucla.edu (Richard Mathews) (12/18/85)

> From: Chris Torek <chris%umcp-cs.uucp@BRL.ARPA>

> ... But that is
> not all that is amiss.  In V7, 3BSD, and 4BSD, and I suspect also
> in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well,
> signals are not queued...

> The problem is that several children may exit in quick succession.
> Only one SIGCLD signal will be delivered, since the parent process
> will (just this once) not manage to run before all have exited.
> The sigcld handler has no way of determining how many children are
> to be processed.

In System V, SIGCLDs are queued (well, sort of).  See the signal(2)
manual page.  In reality what Sys V does is this (at least on a VAX):

The SIGCLD action gets reset to SIG_DFL when the signal is caught.
The signal handler must reestablish itself as the handler for SIGCLD.
System V assumes that this is done just before the handler returns.
When you call signal(SIGCLD, func), the system checks for any zombies
and sends a SIGCLD to the parent if there are any zombie children.  Thus
it looks as if SIGCLDs are queued (unfortunately, the manual lies and
just says "the signal-catching function will be continually reentered
until the queue is empty").

Richard M. Mathews
Locus Computing Corporation		       lcc.richard@LOCUS.UCLA.EDU
					       lcc.richard@UCLA-CS
			  {ihnp4,ucivax,trwrb}!lcc!richard
       {randvax,sdcrdcf,ucbvax,trwspp}!ucla-cs!lcc!richard

daemon@houligan.UUCP (12/18/85)

In <156@uw-june> pjs@uw-june (Philip J. Schneider) writes:
> I have a C program which, during the course of its execution,
> spawns(forks) child processes. ...  Since UNIX only allows one a
> certain number of processes at a time, eventually during the course of
> execution of the parent I run out of processes.  If I temporarily stop
> the parent process execution and do a 'ps', the child processes show up
> in the list with a 'Z' status.  They do not completely disappear until
> the parent process exits.
>
> My question: Is there any way to kill off these zombies so I can get
> 	     more processes ?  Or, failing that, is there any other
> 	     way to do what I want ?
>
> Philip Schneider                               
> University of Washington Computer Science                
> pjs@{uw-june.arpa,washington.arpa}                      
> {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs           

The problem (and fortunately, the solution) is simple.  A process, once
terminated, becomes a "zombie" (Z status from "ps") until its parent
(as determined by its PPID) "wait"s for it.  Thus, it is the parent
process' responsibility to "clean up after" its children (kinda like
real life, eh?)

You can do one of two things, depending on your situtation, to handle
this correctly.

1.  If the parent process does not have anything better to do while the
    children are out playing, it can just "wait" for them to finish.

2.  You can cause the parent to "double-fork".  This will make it a
    "grand-parent" for a time, just long enough for the "parent" to
    fork the child, and then terminate (exit).  Then, when the
    "grand-parent" waits for the "parent", it will be VERY quick, and
    should not impact the "grand-parent" (original spawning process)
    much, in terms of slowing down the execution.  Then, the "child"
    will become an "orphan", and when it terminates, the system "init"
    process (PID = 1) will clean up after it.

Implementation of the "double-fork" is simple (error detection omitted for
clarity).

/* grand-parent */	switch (fork()) {
/* parent */		case 0:
/* parent */			switch (fork()) {
/* child */			case 0:
/* child */				/* do the "child" part */
/* child */				break;
/* parent */			default:
/* parent */				exit(0); /* orphan the child */
/* parent */			}
/* grand-parent */	default:
/* grand-parent */		wait(0); /* wait for "parent" */
/* grand-parent */	}
/* grand-parent */	/* proceed with normal processing */

Obviously, the "wait" and "fork" calls need to be checked for errors,
and you may want to use "_exit" instead, in the "parent", so it doesn't
flush <stdio> buffers, etc.  These are left as exercises for the
reader.

--tgi
	while (--tgi)	/* my mind continues to decay */
		;	/* even though I do nothing.. */    

{brl-bmd,ccvaxa,pur-ee,sun}!csd-gould!midas!tgi (Craig Strickland @ Gould)
305/587-2900 x5014  CompuServe: 76545,1007   Source: BDQ615   MCIMail: 272-3350

(echo ".ft B"; echo ".ps 999"; echo "$disclaimer") | troff -t	# :-)

chris@umcp-cs.UUCP (Chris Torek) (12/19/85)

In article <824@brl-tgr.ARPA> lcc.rich-wiz@locus.ucla.edu (Richard Mathews)
writes:

>> From: Chris Torek <chris%umcp-cs.uucp@BRL.ARPA>

>> In V7, 3BSD, and 4BSD, and I suspect also in Sys III and V (and
>> Vr2 and Vr2V2), and probably in V8 as well, signals are not queued...

> In System V, SIGCLDs are queued (well, sort of).  See the signal(2)
> manual page.  In reality what Sys V does is this (at least on a VAX):

[description deleted]

In other words, System V arranges for the delivery of a SIGCLD, in
the process changing things back to SIG_DFL, so that that exactly
one is sent, and one more will be sent when the signal handler
restores SIGCLD catching if and only if there is at least one more
child process.  To put it another way, the signals themselves are
not queued, but child process exit is not the only trigger for
SIGCLD; exited children are already queued, so the effect is the
same.

Implemented properly, that will guarantee reliable operation.  Ok.
One down, 31 to go :-).  ---Signals, of course.  What else?  (Well,
all right, I will give them credit for not breaking everything in
the name of advancement.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

kimball@bsdpkh.UUCP (Rick Kimball) (12/21/85)

> My question: Is there any way to kill off these zombies so I can get
> 	     more processes ?  Or, failing that, is there any other
> 	     way to do what I want ?
>
> Philip Schneider                               
> University of Washington Computer Science                
> pjs@{uw-june.arpa,washington.arpa}                      
> {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs           

If the return code from the child isn't important the following code 
will show you how to have a "fork-a-thon". 

--------- Cut Here ------

/* fork example ( parent doesn't care about child's return codes ) */

#include <stdio.h>
#include <signal.h>

#define CLS "\033[H\033[2J"       /* clear screen for vt100 */

main(argc,argv)
int   argc;
char   *argv[];
{
   int   Processes_forked,
         len,
         random_line,
         pid;
   char   message_buffer[80];

   signal(SIGCLD, SIG_IGN);

   write(1,CLS,strlen(CLS));

   while ( 1 ) {
      pid = fork();
      switch(pid) {
      case -1:         /* fork failed no process created   */
            len = sprintf(message_buffer,
                  "\033[24;0Htoo many");
            write(1,message_buffer,len);
            sleep(10);
            break;
      case 0:         /* child process                      */
            pid=getpid();
            random_line = pid % 22;
            len = sprintf(message_buffer, "\033[%d;0H#%5d", random_line, pid);
            write(1,message_buffer,len);
            sleep(2);
            len = sprintf(message_buffer, "\033[%d;0H      ", random_line);
            write(1,message_buffer,len);
            exit();
      default:         /* parent process                     */
            len = sprintf(message_buffer, "\033[24;0H%d forked",
               ++Processes_forked);
            write(1,message_buffer,len);
            break;
      }
   }
}
-------- Cut Here ----------
Rick Kimball 
UUCP: ihnp4!bsdpkh!kimball