[net.unix] Unix Question

pjs@uw-june (Philip J. Schneider) (12/06/85)

I have a C program which, during the course of its execution, spawns(forks)
child processes. I should mention that a process is spawned, it lives
for a while, then exits, and then sometime later the same thing happens,
and so on.  This all happens within the lifetime of the parent, and I
would like to do this an arbitrary number of times.  Sometimes, two or
more child processes exist at once, but the upper limit on child
processes that exist concurrently is low, and a group of such children
exit before the next process begins.

Since UNIX only allows one a certain number of processes at a time,
eventually during the course of execution of the parent I run out of
processes.  If I temporarily stop the parent process execution and
do a 'ps', the child processes show up in the list with a 'Z' status.
They do not completely disappear until the parent process exits.  As some
of you probably already know, these useless ex-processes can't even
be completely gotten rid of with a 'kill' command.  The result is that
these processes are taking up my process quota, even though they are
dead in all practical terms (in that they finished their work and exited
properly).  Of course, they do go away completely once the parent exits.

I can certainly understand why one is allowed only a limited number of
active processes at any time.  My processes, however, are not at all
active once they have exited, and I feel that once a process exits,
I should have my quota "credited" so that I can get more.

Clearly, my problem is how to get around this situation.  I could
(possibly) get a higher limit on my process quota, but this would
only mean that running out of processes will happen a little later.

My question: Is there any way to kill off these zombies so I can get
	     more processes ?  Or, failing that, is there any other
	     way to do what I want ?


Please respond by e-mail if you can help at all, or if you need more
details.  Thanks in advance.
-- 

  Philip Schneider                               
  University of Washington Computer Science                
  pjs@{uw-june.arpa,washington.arpa}                      
  {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs

johnl@ima.UUCP (12/08/85)

/* Written  7:38 pm  Dec  5, 1985 by pjs@uw-june> in ima:net.unix */
> I have a C program which, during the course of its execution, spawns(forks)
> child processes. I should mention that a process is spawned, it lives
> for a while, then exits, and then sometime later the same thing happens,
> and so on.  This all happens within the lifetime of the parent, and I
> would like to do this an arbitrary number of times.  Sometimes, two or
> more child processes exist at once, but the upper limit on child
> processes that exist concurrently is low, and a group of such children
> exit before the next process begins.
> 
> Since UNIX only allows one a certain number of processes at a time,
> eventually during the course of execution of the parent I run out of
> processes.

The problem here is a minor misunderstanding of how fork() and wait()
interact.  Each time a process dies, it has some status to return to its
parent when the parent wait()s for it.  A zombie process is one that has
died but whose parent hasn't yet waited for it, so the way to get rid of
zombies is to make sure that somebody collects them with a wait().

If you know at some point that all of your subprocesses have died, you can
just wait for all of them by calling wait() until it returns -1 with error
code ECHILD, then go ahead and spawn any more children.

The other possibility is to take advantage of the fact that when a process'
parent dies, the orphan is handed to init, the top level process.  Use
code like this (real code should have error checking, but you get the idea):

	...
	pid = fork();
	if(pid != 0)
		while(wait(0) != pid)	/* parent here waits for child */
			;
	else {	/* child */
		if(fork() != 0)
			exit(0);	/* child exits right away */
		/* grandchild here is inherited by init, can go off and */
		/* do what it wants */
	}

Init spends most of its time in a wait() loop and can be counted on to
collect the orphaned grahdchild when it exits.  The child is collected promptly
by the wait call in the parent, so there will be no zombie problem.

Pedantically,
John Levine, ima!johnl

PS:  This is all in the manual, but perhaps not so crystal clear.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/09/85)

> My question: Is there any way to kill off these zombies so I can get
> 	     more processes ?  Or, failing that, is there any other
> 	     way to do what I want ?

Sure, have the parent wait() on terminated children.
If for some reason you have to avoid blocking, you
could have the wait() done in a SIGCLD signal handler.
Or, you could keep track of the child PIDs and probe
their state every so often via kill() with "signal" 0,
waiting on those that return failure from the kill().

zemon@fritz.UUCP (Art Zemon) (12/10/85)

In article <156@uw-june> pjs@uw-june (Philip J. Schneider) writes:
>
>
>I have a C program which, during the course of its execution, spawns(forks)
>child processes. I should mention that a process is spawned, it lives
>for a while, then exits, and then sometime later the same thing happens,
>and so on.  This all happens within the lifetime of the parent, and I
>would like to do this an arbitrary number of times.  Sometimes, two or
>more child processes exist at once, but the upper limit on child
>processes that exist concurrently is low, and a group of such children
>exit before the next process begins.
>
>Since UNIX only allows one a certain number of processes at a time,
>eventually during the course of execution of the parent I run out of
>processes.  If I temporarily stop the parent process execution and
>do a 'ps', the child processes show up in the list with a 'Z' status.
>They do not completely disappear until the parent process exits.  As some
>of you probably already know, these useless ex-processes can't even
>be completely gotten rid of with a 'kill' command.  The result is that
>these processes are taking up my process quota, even though they are
>dead in all practical terms (in that they finished their work and exited
>properly).  Of course, they do go away completely once the parent exits.

The child processes have gone away properly and are waiting for
someone (some process) to collect their exit statuses with a
wait() or a wait3().  They cannot be kill()-ed because they are
already dead.

They go away after the parent exits because init inherits them
and does enough wait()s to get rid of all of them.
-- 
	-- Art Zemon
	   FileNet Corp.
	   ...! {decvax, ihnp4, ucbvax} !trwrb!felix!zemon

lasse@daab.UUCP (Lars Hammarstrand) (12/10/85)

>I have a C program which, during the course of its execution, spawns(forks)
>child processes .....
>
>... do a 'ps', the child processes show up in the list with a 'Z' status.
>
>.. of you probably already know, these useless ex-processes can't even
>be completely gotten rid of with a 'kill' command.....



If you don't want your children to end up as (Z)ombie processes, the parent
process have to execute a wait(2) on each child that have been killed or stoped.

See also: signal(2) and kill(2).


	Lars Hammarstrand.

ron@BRL.ARPA (Ron Natalie) (12/11/85)

Processes that die stay around until their status gets "inheritted."
If the parent process (the one that did the fork) is still alive,
it must execute a wait system call to get the information.  If the
parent dies without waiting for the child, then the child gets
inheritted by the "orphanage" process, init, which once the system
is running is always waiting for processes to die.

Killing these ZOMBIE (dead, but not inheritted) processes is ineffective
since they are already dead.  They count against you because until they
are inheritted, they consume one of a finite number of process slots on
the system, which is what the process limit is protecting.

You should fix the parent program such that it either waits for the
dead children, or use the following frequently used kludge:

	FORK
	if CHILD then
		FORK
		if CHILD then
			EXECUTE SUBPROCESS CODE
		else
			EXIT
		endif
	else
		WAIT
	endif

Here the process forks a second process which forks the spawned job.
The middle process dies, making the spawned job an orphan who will
be eaten by init.

=Ron

ahb@ccice5.UUCP (Al Brumm) (12/12/85)

In article <156@uw-june> pjs@uw-june (Philip J. Schneider) writes:
>My question: Is there any way to kill off these zombies so I can get
>	     more processes ?  Or, failing that, is there any other
>	     way to do what I want ?

A clean way to handle this problem on Sys3 was to use the following
system call in the parent process:
	signal(SIGCLD, SIG_IGN);

Then when a child process exited, a zombie would not be created.
Note that this would not allow you to examine the child's exit
status.  However, you could examine the exit status by doing the 
following:
	int
	sigcld()
	{
		int pid, status;
		pid = wait(&status);
		.
		. (do stuff)
		.
	}
	main()
	{
		int	(*sigcld)();

		signal(SIGCLD, sigcld);
	}

The example immediately above is also possible in 4.2BSD,
only SIGCLD is called SIGCHLD.

Then again, there is always the double fork() trick which goes
something like this:
	if (fork()) {			/* parent */
		wait((int *)0);		/* no zombies please */
	}
	else {
		if (fork()) {		/* child */
			exit(0);	/* satisfy parent's wait */
		}
		else {			/* grandchild */
			do_stuff();		/* since my parent exit'ed */
			.			/* I am inherited by init */
			.
			.
		}
	}
The above trick is used quite heavily by the UNET servers.

rml@hpfcla.UUCP (12/12/85)

> > My question: Is there any way to kill off these zombies so I can get
> >          more processes ?  Or, failing that, is there any other
> >          way to do what I want ?
>
>               ...
>
> Or, you could keep track of the child PIDs and probe
> their state every so often via kill() with "signal" 0,
> waiting on those that return failure from the kill().

This will work on 4.x-based systems, but not on most others.  Kill does
not support "signal" 0 in many earlier systems.  In System III and V,
kill does support "signal" 0, but does not fail on attempts to send
signals to zombies.

> A clean way to handle this problem on Sys3 was to use the following
> system call in the parent process:
>       signal(SIGCLD, SIG_IGN);
>
> Then when a child process exited, a zombie would not be created.

This applies to System V as well.  It is not, however, part of the SVID.

> Is SIGCLD always reset to SIG_DFL on exec?  If not, since ignored
> signals normally remain ignored, it could break other programs
> which expect to collect children; and programs that ignore SIGCLD
> would have to carefully un-ignore it just after forks.

SIGCLD is not reset from SIG_IGN to SIG_DFL on exec.  Yes, this
means that programs which ignore it need to be careful before spawning
other programs.  The same is true, by the way, of programs which mask
out signals in BSD systems.

>                         In V7, 3BSD, and 4BSD, and I suspect also
> in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well,
> signals are not queued, and without the `jobs library' of 4.1BSD,
> or the signal facilities of 4.2, this code cannot be made to operate
> reliably.  It *will fail*, someday, no doubt at the worst possible
> moment.
>
> The problem is that several children may exit in quick succession.
> Only one SIGCLD signal will be delivered, since the parent process
> will (just this once) not manage to run before all have exited.
> The sigcld handler has no way of determining how many children are
> to be processed.

It turns out that SIGCLD can be used reliably in System III and V.
What is missing from the example is a call within the signal handler
to re-install itself.

>       int
>       sigcld()
>       {
>               int pid, status;
>               pid = wait(&status);
>               ...
>>>             signal(SIGCLD, sigcld);         /* add this line */
>       }

The signal(2) system call checks to see if any zombie child(ren) are
present and sends the calling process another SIGCLD if there are.
The signal handler is thus invoked recursively, once per zombie.
Note that the reinstallation of the handler must follow the call to
wait, or infinite recursion results.

Unfortunately in System III SIGCLD was not reset-when-caught, so this
call might have been left out, allowing children to be missed. This
was changed in System V; SIGCLD is reset to SIG_DFL when caught.  Note
that there is no loss of reliability from the reset to SIG_DFL; since
SIGCLD is ignored by default, this is equivalent to masking out the
signal until the handler is reinstalled.  Unfortunately both System III
and V fail to document these semantics of signal(2), and instead have
an incorrect explanation on the signal(2) page which states that SIGCLD
signals are queued internally.  We at HP implemented some systems (HP9000
series 500 releases <= 4.02) which queued the signals as AT&T documents;
current HP systems are all compatible with the System V code.

BTW, I find BSD's wait3 with WNOHANG to be a more intuitive mechanism.

                        Bob Lenk
                        {hplabs, ihnp4}!hpfcla!rml

chris@umcp-cs.UUCP (Chris Torek) (12/14/85)

In article <974@ccice5.UUCP> ahb@ccice5.UUCP (Al Brumm) writes:

> A clean way to [ignore children] on Sys3 was to use the following
> system call in the parent process:
>	signal(SIGCLD, SIG_IGN);

Cute... maybe I will add this hack to our kernel.  One question:
Is SIGCLD always reset to SIG_DFL on exec?  If not, since ignored
signals normally remain ignored, it could break other programs
which expect to collect children; and programs that ignore SIGCLD
would have to carefully un-ignore it just after forks.

> Note that this would not allow you to examine the child's exit
> status.  However, you could examine the exit status by doing the 
> following:
>	int
>	sigcld()
>	{
>		int pid, status;
>		pid = wait(&status);
>		...
>	}
>	main()
>	{
>		int	(*sigcld)();
>
>		signal(SIGCLD, sigcld);
>	}

Well, the `int (*sigcld)()' declaration is wrong and (in this case)
unnecessary; it should be `int sigcld()' if anything.  But that is
not all that is amiss.  In V7, 3BSD, and 4BSD, and I suspect also
in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well,
signals are not queued, and without the `jobs library' of 4.1BSD,
or the signal facilities of 4.2, this code cannot be made to operate
reliably.  It *will fail*, someday, no doubt at the worst possible
moment.

The problem is that several children may exit in quick succession.
Only one SIGCLD signal will be delivered, since the parent process
will (just this once) not manage to run before all have exited.
The sigcld handler has no way of determining how many children are
to be processed.

In 4.1BSD and later, the solution is a new `system call', wait3().
This call has two optional parameters, WNOHANG and WUNTRACTED.
WNOHANG tells the kernel not to wait for existing children to exit.
Instead, wait3 returns 0 in this case, allowing the signal handler
to finish up, having now collected all exited children.  (WUNTRACED
exists only for C-shell style job control with stopped processes,
and is irrelevant here.)

Unfortunately, this solution is still incomplete.  There are race
conditions unless the child exit signal is withheld (but not ignored)
for the duration of the child collection routine, and can be withheld
during process creation (in case the created process exits before
the parent finishes updating data structures).  This is the case
under the 4.1BSD `jobs' library, and in all 4.2 and 4.3 systems.

Anyway, what it all boils down to is that process control is
unreliable in many versions of Unix, but can be made reliable in
4.1, 4.2, and 4.3BSD.  If there is any way to reliably handle
process exit and `job control' style processing in System III and
System V, I am not aware of it---though that should be unsurprising
since I have never used them.  If it is possible in the latest AT&T
Unixes, I would like to know how.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

dave@onfcanim.UUCP (Dave Martindale) (12/15/85)

In article <2548@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>Anyway, what it all boils down to is that process control is
>unreliable in many versions of Unix, but can be made reliable in
>4.1, 4.2, and 4.3BSD.  If there is any way to reliably handle
>process exit and `job control' style processing in System III and
>System V, I am not aware of it---though that should be unsurprising
>since I have never used them.  If it is possible in the latest AT&T
>Unixes, I would like to know how.

(My only experience with system V is on an IRIS workstation, which is system
V with some Berkeley stuff.  But the signal mechanism seems to be from
System V - there is none of the Berkeley sigmask stuff.)

On the IRIS, if you read the fine print, you will find that SIGCLD
doesn't behave like a "normal" signal.  It seems that SIGCLD is generated
by the presence of a zombie child, not the event of a child terminating.
This was brought home to me in a program that had a single child.  When
the child terminated, the SIGCLD handler (due to me not understanding
what was going on) re-enabled the signal before waiting for the child.
Immediately, another SIGCLD was delivered, and so on until the stack
overflowed.

So, if you had a second child exit while handling the first SIGCLD, no
problem - you'll get another SIGCLD as soon as you re-enable the signal.

It is also unlike a "normal" signal in that the "default" action is
for nothing to happen, while a "normal" signal causes some action beyond
the control of the process.

The essential difference, I think, is simply that V7 signals had no
"memory" - when one was delivered, either you caught it or you ignored
it and it went away, but you couldn't "hold" it.  4.2 signal handling
knows about signals that are held for later delivery.  SV doesn't
have this in general for signals, but in the case of SIGCLD the
existence of the zombie process provides the "memory".

	Dave Martindale

rich@rexago1.UUCP (K. Richard Magill) (12/16/85)

In article <14767@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>In article <2548@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>>If there is any way to reliably handle
>>process exit and `job control' style processing in System III and
>>System V, I am not aware of it---
>It seems that SIGCLD is generated
>by the presence of a zombie child, not the event of a child terminating.

In the spirit of information sharing...

With respect to a 3b2/300 running SV.2.2...

I guess I can't/shouldn't copy the manual but SIGCLD is generated by the
death of a child and reset when  caught.  ...  behaves as other  signals
with the  exception  that  successive SIGCLD's  are  queued  instead  of
successively interrupting the catching function.

A rudimentary job control has been accomplished on SV.2.2.  It  consists
of a parent process that calls  subprocesses which are $SHELL.  All  you
can do from the  parent, shl, is create,  delete, block, unblock,  list,
etc.  ie,  job control only.   not really  a shell.   Typing your  SWTCH
character from a child gets you back to shl, with the child  effectively
bg'd, from which  you may  create a  new subshell.   Of  course you  are
limited to eight subprocesses.  All of this is accomplished using pseudo
terminals, (sxt's), a new  control character defined in  termio.c_cc[7],
SWTCH, a new control  mode, in (termio.c_cflag  & 0x10000), LOBLK  which
blocks output of the current layer.

I should add that  shl cannot be  your login shell  and doesn't work  if
exec'd from you login shell.  I do use it.  It's not csh but its  better
than sh alone.

Ksh is puported to do csh style job control on 3b2 but I have yet to see
it work.  The copies  I have seen tend  to lock up terminals  frequently
when you try.

AGAIN!   This is  3b2/300 SV.2.2.   I  know that  the pc7300  SV  pc7300
version 3 does NOT  have these features and  I can't speak for  anything
else.

K. Richard Magill
Have I violated copyright?  Have I said something stupid?

daemon@houligan.UUCP (12/18/85)

In <156@uw-june> pjs@uw-june (Philip J. Schneider) writes:
> I have a C program which, during the course of its execution,
> spawns(forks) child processes. ...  Since UNIX only allows one a
> certain number of processes at a time, eventually during the course of
> execution of the parent I run out of processes.  If I temporarily stop
> the parent process execution and do a 'ps', the child processes show up
> in the list with a 'Z' status.  They do not completely disappear until
> the parent process exits.
>
> My question: Is there any way to kill off these zombies so I can get
> 	     more processes ?  Or, failing that, is there any other
> 	     way to do what I want ?
>
> Philip Schneider                               
> University of Washington Computer Science                
> pjs@{uw-june.arpa,washington.arpa}                      
> {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs           

The problem (and fortunately, the solution) is simple.  A process, once
terminated, becomes a "zombie" (Z status from "ps") until its parent
(as determined by its PPID) "wait"s for it.  Thus, it is the parent
process' responsibility to "clean up after" its children (kinda like
real life, eh?)

You can do one of two things, depending on your situtation, to handle
this correctly.

1.  If the parent process does not have anything better to do while the
    children are out playing, it can just "wait" for them to finish.

2.  You can cause the parent to "double-fork".  This will make it a
    "grand-parent" for a time, just long enough for the "parent" to
    fork the child, and then terminate (exit).  Then, when the
    "grand-parent" waits for the "parent", it will be VERY quick, and
    should not impact the "grand-parent" (original spawning process)
    much, in terms of slowing down the execution.  Then, the "child"
    will become an "orphan", and when it terminates, the system "init"
    process (PID = 1) will clean up after it.

Implementation of the "double-fork" is simple (error detection omitted for
clarity).

/* grand-parent */	switch (fork()) {
/* parent */		case 0:
/* parent */			switch (fork()) {
/* child */			case 0:
/* child */				/* do the "child" part */
/* child */				break;
/* parent */			default:
/* parent */				exit(0); /* orphan the child */
/* parent */			}
/* grand-parent */	default:
/* grand-parent */		wait(0); /* wait for "parent" */
/* grand-parent */	}
/* grand-parent */	/* proceed with normal processing */

Obviously, the "wait" and "fork" calls need to be checked for errors,
and you may want to use "_exit" instead, in the "parent", so it doesn't
flush <stdio> buffers, etc.  These are left as exercises for the
reader.

--tgi
	while (--tgi)	/* my mind continues to decay */
		;	/* even though I do nothing.. */    

{brl-bmd,ccvaxa,pur-ee,sun}!csd-gould!midas!tgi (Craig Strickland @ Gould)
305/587-2900 x5014  CompuServe: 76545,1007   Source: BDQ615   MCIMail: 272-3350

(echo ".ft B"; echo ".ps 999"; echo "$disclaimer") | troff -t	# :-)

kimball@bsdpkh.UUCP (Rick Kimball) (12/21/85)

> My question: Is there any way to kill off these zombies so I can get
> 	     more processes ?  Or, failing that, is there any other
> 	     way to do what I want ?
>
> Philip Schneider                               
> University of Washington Computer Science                
> pjs@{uw-june.arpa,washington.arpa}                      
> {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs           

If the return code from the child isn't important the following code 
will show you how to have a "fork-a-thon". 

--------- Cut Here ------

/* fork example ( parent doesn't care about child's return codes ) */

#include <stdio.h>
#include <signal.h>

#define CLS "\033[H\033[2J"       /* clear screen for vt100 */

main(argc,argv)
int   argc;
char   *argv[];
{
   int   Processes_forked,
         len,
         random_line,
         pid;
   char   message_buffer[80];

   signal(SIGCLD, SIG_IGN);

   write(1,CLS,strlen(CLS));

   while ( 1 ) {
      pid = fork();
      switch(pid) {
      case -1:         /* fork failed no process created   */
            len = sprintf(message_buffer,
                  "\033[24;0Htoo many");
            write(1,message_buffer,len);
            sleep(10);
            break;
      case 0:         /* child process                      */
            pid=getpid();
            random_line = pid % 22;
            len = sprintf(message_buffer, "\033[%d;0H#%5d", random_line, pid);
            write(1,message_buffer,len);
            sleep(2);
            len = sprintf(message_buffer, "\033[%d;0H      ", random_line);
            write(1,message_buffer,len);
            exit();
      default:         /* parent process                     */
            len = sprintf(message_buffer, "\033[24;0H%d forked",
               ++Processes_forked);
            write(1,message_buffer,len);
            break;
      }
   }
}
-------- Cut Here ----------
Rick Kimball 
UUCP: ihnp4!bsdpkh!kimball

arturo@humming.UUCP (Arturo Perez) (12/25/85)

In article <12600002@hpfcls.UUCP> rml@hpfcla.UUCP writes:
>> 
>> The problem is that several children may exit in quick succession.
>> Only one SIGCLD signal will be delivered, since the parent process
>> will (just this once) not manage to run before all have exited.
>> The sigcld handler has no way of determining how many children are
>> to be processed.
>
>It turns out that SIGCLD can be used reliably in System III and V.
>What is missing from the example is a call within the signal handler
>to re-install itself. 
>
>>	int
>>	sigcld()
>>	{
>>		int pid, status;
>>		pid = wait(&status);
>>		...
>>>>		signal(SIGCLD, sigcld);		/* add this line */
>>	}
>
>The signal(2) system call checks to see if any zombie child(ren) are
>present and sends the calling process another SIGCLD if there are.
>The signal handler is thus invoked recursively, once per zombie.
>Note that the reinstallation of the handler must follow the call to
>wait, or infinite recursion results.
>			Bob Lenk
>			{hplabs, ihnp4}!hpfcla!rml


This isn't correct. The problem is that the implicit 'signal(SIGCLD,
SIG_DFL)' is done AFTER the signal trapping function returns. Thus,
if you call signal from within the trapping function it doesn't do
you any good. At least, this is the way it works on our SYSV/BSD
hybrids.

cc743805@sjuvax.UUCP (conway) (11/08/86)

I'll make this short and sweet:

     How can one change the date/time stamp of a file?

     I want to be able to put any date/time on a file that I
     have in my directory.  Is this possible?  If this question
     has been discussed before, please forgive me. I don't
     usually read this group.

Chuck Conway
--
______________________________________________________________
| Chuck Conway, St. Joseph's University                      |
| {bpa|burdvax|princeton|allegra}!sjuvax!cc743805            |
| cc743805@sjuvax.UUCP                                       |
|------------------------------------------------------------|