pjs@uw-june (Philip J. Schneider) (12/06/85)
I have a C program which, during the course of its execution, spawns(forks) child processes. I should mention that a process is spawned, it lives for a while, then exits, and then sometime later the same thing happens, and so on. This all happens within the lifetime of the parent, and I would like to do this an arbitrary number of times. Sometimes, two or more child processes exist at once, but the upper limit on child processes that exist concurrently is low, and a group of such children exit before the next process begins. Since UNIX only allows one a certain number of processes at a time, eventually during the course of execution of the parent I run out of processes. If I temporarily stop the parent process execution and do a 'ps', the child processes show up in the list with a 'Z' status. They do not completely disappear until the parent process exits. As some of you probably already know, these useless ex-processes can't even be completely gotten rid of with a 'kill' command. The result is that these processes are taking up my process quota, even though they are dead in all practical terms (in that they finished their work and exited properly). Of course, they do go away completely once the parent exits. I can certainly understand why one is allowed only a limited number of active processes at any time. My processes, however, are not at all active once they have exited, and I feel that once a process exits, I should have my quota "credited" so that I can get more. Clearly, my problem is how to get around this situation. I could (possibly) get a higher limit on my process quota, but this would only mean that running out of processes will happen a little later. My question: Is there any way to kill off these zombies so I can get more processes ? Or, failing that, is there any other way to do what I want ? Please respond by e-mail if you can help at all, or if you need more details. Thanks in advance. -- Philip Schneider University of Washington Computer Science pjs@{uw-june.arpa,washington.arpa} {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs
zemon@fritz.UUCP (Art Zemon) (12/10/85)
In article <156@uw-june> pjs@uw-june (Philip J. Schneider) writes: > > >I have a C program which, during the course of its execution, spawns(forks) >child processes. I should mention that a process is spawned, it lives >for a while, then exits, and then sometime later the same thing happens, >and so on. This all happens within the lifetime of the parent, and I >would like to do this an arbitrary number of times. Sometimes, two or >more child processes exist at once, but the upper limit on child >processes that exist concurrently is low, and a group of such children >exit before the next process begins. > >Since UNIX only allows one a certain number of processes at a time, >eventually during the course of execution of the parent I run out of >processes. If I temporarily stop the parent process execution and >do a 'ps', the child processes show up in the list with a 'Z' status. >They do not completely disappear until the parent process exits. As some >of you probably already know, these useless ex-processes can't even >be completely gotten rid of with a 'kill' command. The result is that >these processes are taking up my process quota, even though they are >dead in all practical terms (in that they finished their work and exited >properly). Of course, they do go away completely once the parent exits. The child processes have gone away properly and are waiting for someone (some process) to collect their exit statuses with a wait() or a wait3(). They cannot be kill()-ed because they are already dead. They go away after the parent exits because init inherits them and does enough wait()s to get rid of all of them. -- -- Art Zemon FileNet Corp. ...! {decvax, ihnp4, ucbvax} !trwrb!felix!zemon
lasse@daab.UUCP (Lars Hammarstrand) (12/10/85)
>I have a C program which, during the course of its execution, spawns(forks) >child processes ..... > >... do a 'ps', the child processes show up in the list with a 'Z' status. > >.. of you probably already know, these useless ex-processes can't even >be completely gotten rid of with a 'kill' command..... If you don't want your children to end up as (Z)ombie processes, the parent process have to execute a wait(2) on each child that have been killed or stoped. See also: signal(2) and kill(2). Lars Hammarstrand.
ahb@ccice5.UUCP (Al Brumm) (12/12/85)
In article <156@uw-june> pjs@uw-june (Philip J. Schneider) writes: >My question: Is there any way to kill off these zombies so I can get > more processes ? Or, failing that, is there any other > way to do what I want ? A clean way to handle this problem on Sys3 was to use the following system call in the parent process: signal(SIGCLD, SIG_IGN); Then when a child process exited, a zombie would not be created. Note that this would not allow you to examine the child's exit status. However, you could examine the exit status by doing the following: int sigcld() { int pid, status; pid = wait(&status); . . (do stuff) . } main() { int (*sigcld)(); signal(SIGCLD, sigcld); } The example immediately above is also possible in 4.2BSD, only SIGCLD is called SIGCHLD. Then again, there is always the double fork() trick which goes something like this: if (fork()) { /* parent */ wait((int *)0); /* no zombies please */ } else { if (fork()) { /* child */ exit(0); /* satisfy parent's wait */ } else { /* grandchild */ do_stuff(); /* since my parent exit'ed */ . /* I am inherited by init */ . . } } The above trick is used quite heavily by the UNET servers.
chris@umcp-cs.UUCP (Chris Torek) (12/14/85)
In article <974@ccice5.UUCP> ahb@ccice5.UUCP (Al Brumm) writes: > A clean way to [ignore children] on Sys3 was to use the following > system call in the parent process: > signal(SIGCLD, SIG_IGN); Cute... maybe I will add this hack to our kernel. One question: Is SIGCLD always reset to SIG_DFL on exec? If not, since ignored signals normally remain ignored, it could break other programs which expect to collect children; and programs that ignore SIGCLD would have to carefully un-ignore it just after forks. > Note that this would not allow you to examine the child's exit > status. However, you could examine the exit status by doing the > following: > int > sigcld() > { > int pid, status; > pid = wait(&status); > ... > } > main() > { > int (*sigcld)(); > > signal(SIGCLD, sigcld); > } Well, the `int (*sigcld)()' declaration is wrong and (in this case) unnecessary; it should be `int sigcld()' if anything. But that is not all that is amiss. In V7, 3BSD, and 4BSD, and I suspect also in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well, signals are not queued, and without the `jobs library' of 4.1BSD, or the signal facilities of 4.2, this code cannot be made to operate reliably. It *will fail*, someday, no doubt at the worst possible moment. The problem is that several children may exit in quick succession. Only one SIGCLD signal will be delivered, since the parent process will (just this once) not manage to run before all have exited. The sigcld handler has no way of determining how many children are to be processed. In 4.1BSD and later, the solution is a new `system call', wait3(). This call has two optional parameters, WNOHANG and WUNTRACTED. WNOHANG tells the kernel not to wait for existing children to exit. Instead, wait3 returns 0 in this case, allowing the signal handler to finish up, having now collected all exited children. (WUNTRACED exists only for C-shell style job control with stopped processes, and is irrelevant here.) Unfortunately, this solution is still incomplete. There are race conditions unless the child exit signal is withheld (but not ignored) for the duration of the child collection routine, and can be withheld during process creation (in case the created process exits before the parent finishes updating data structures). This is the case under the 4.1BSD `jobs' library, and in all 4.2 and 4.3 systems. Anyway, what it all boils down to is that process control is unreliable in many versions of Unix, but can be made reliable in 4.1, 4.2, and 4.3BSD. If there is any way to reliably handle process exit and `job control' style processing in System III and System V, I am not aware of it---though that should be unsurprising since I have never used them. If it is possible in the latest AT&T Unixes, I would like to know how. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
lcc.richard@locus.ucla.edu (Richard Mathews) (12/15/85)
There have been several responses to this which indicated that zombies can be cleaned up by having the parent call wait(2) (or in Berkeley compatible systems, wait3(2)). Under System V there is an alternate method that can be used. If the parent ignores SIGCLD, then the manual states that "the calling process's child processes will not create zombie processes when they terminate" (see signal(2)). In reality (at least on a VAX) they do create zombies, but the parent automatically cleans them up. The effect is the same. Richard M. Mathews Locus Computing Corporation lcc.richard@LOCUS.UCLA.EDU lcc.richard@UCLA-CS {ihnp4,ucivax,trwrb}!lcc!richard {randvax,sdcrdcf,ucbvax,trwspp}!ucla-cs!lcc!richard
dave@onfcanim.UUCP (Dave Martindale) (12/15/85)
In article <2548@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes: >Anyway, what it all boils down to is that process control is >unreliable in many versions of Unix, but can be made reliable in >4.1, 4.2, and 4.3BSD. If there is any way to reliably handle >process exit and `job control' style processing in System III and >System V, I am not aware of it---though that should be unsurprising >since I have never used them. If it is possible in the latest AT&T >Unixes, I would like to know how. (My only experience with system V is on an IRIS workstation, which is system V with some Berkeley stuff. But the signal mechanism seems to be from System V - there is none of the Berkeley sigmask stuff.) On the IRIS, if you read the fine print, you will find that SIGCLD doesn't behave like a "normal" signal. It seems that SIGCLD is generated by the presence of a zombie child, not the event of a child terminating. This was brought home to me in a program that had a single child. When the child terminated, the SIGCLD handler (due to me not understanding what was going on) re-enabled the signal before waiting for the child. Immediately, another SIGCLD was delivered, and so on until the stack overflowed. So, if you had a second child exit while handling the first SIGCLD, no problem - you'll get another SIGCLD as soon as you re-enable the signal. It is also unlike a "normal" signal in that the "default" action is for nothing to happen, while a "normal" signal causes some action beyond the control of the process. The essential difference, I think, is simply that V7 signals had no "memory" - when one was delivered, either you caught it or you ignored it and it went away, but you couldn't "hold" it. 4.2 signal handling knows about signals that are held for later delivery. SV doesn't have this in general for signals, but in the case of SIGCLD the existence of the zombie process provides the "memory". Dave Martindale
rich@rexago1.UUCP (K. Richard Magill) (12/16/85)
In article <14767@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes: >In article <2548@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes: >>If there is any way to reliably handle >>process exit and `job control' style processing in System III and >>System V, I am not aware of it--- >It seems that SIGCLD is generated >by the presence of a zombie child, not the event of a child terminating. In the spirit of information sharing... With respect to a 3b2/300 running SV.2.2... I guess I can't/shouldn't copy the manual but SIGCLD is generated by the death of a child and reset when caught. ... behaves as other signals with the exception that successive SIGCLD's are queued instead of successively interrupting the catching function. A rudimentary job control has been accomplished on SV.2.2. It consists of a parent process that calls subprocesses which are $SHELL. All you can do from the parent, shl, is create, delete, block, unblock, list, etc. ie, job control only. not really a shell. Typing your SWTCH character from a child gets you back to shl, with the child effectively bg'd, from which you may create a new subshell. Of course you are limited to eight subprocesses. All of this is accomplished using pseudo terminals, (sxt's), a new control character defined in termio.c_cc[7], SWTCH, a new control mode, in (termio.c_cflag & 0x10000), LOBLK which blocks output of the current layer. I should add that shl cannot be your login shell and doesn't work if exec'd from you login shell. I do use it. It's not csh but its better than sh alone. Ksh is puported to do csh style job control on 3b2 but I have yet to see it work. The copies I have seen tend to lock up terminals frequently when you try. AGAIN! This is 3b2/300 SV.2.2. I know that the pc7300 SV pc7300 version 3 does NOT have these features and I can't speak for anything else. K. Richard Magill Have I violated copyright? Have I said something stupid?
lcc.rich-wiz@locus.ucla.edu (Richard Mathews) (12/18/85)
> From: Chris Torek <chris%umcp-cs.uucp@BRL.ARPA> > ... But that is > not all that is amiss. In V7, 3BSD, and 4BSD, and I suspect also > in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well, > signals are not queued... > The problem is that several children may exit in quick succession. > Only one SIGCLD signal will be delivered, since the parent process > will (just this once) not manage to run before all have exited. > The sigcld handler has no way of determining how many children are > to be processed. In System V, SIGCLDs are queued (well, sort of). See the signal(2) manual page. In reality what Sys V does is this (at least on a VAX): The SIGCLD action gets reset to SIG_DFL when the signal is caught. The signal handler must reestablish itself as the handler for SIGCLD. System V assumes that this is done just before the handler returns. When you call signal(SIGCLD, func), the system checks for any zombies and sends a SIGCLD to the parent if there are any zombie children. Thus it looks as if SIGCLDs are queued (unfortunately, the manual lies and just says "the signal-catching function will be continually reentered until the queue is empty"). Richard M. Mathews Locus Computing Corporation lcc.richard@LOCUS.UCLA.EDU lcc.richard@UCLA-CS {ihnp4,ucivax,trwrb}!lcc!richard {randvax,sdcrdcf,ucbvax,trwspp}!ucla-cs!lcc!richard
daemon@houligan.UUCP (12/18/85)
In <156@uw-june> pjs@uw-june (Philip J. Schneider) writes: > I have a C program which, during the course of its execution, > spawns(forks) child processes. ... Since UNIX only allows one a > certain number of processes at a time, eventually during the course of > execution of the parent I run out of processes. If I temporarily stop > the parent process execution and do a 'ps', the child processes show up > in the list with a 'Z' status. They do not completely disappear until > the parent process exits. > > My question: Is there any way to kill off these zombies so I can get > more processes ? Or, failing that, is there any other > way to do what I want ? > > Philip Schneider > University of Washington Computer Science > pjs@{uw-june.arpa,washington.arpa} > {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs The problem (and fortunately, the solution) is simple. A process, once terminated, becomes a "zombie" (Z status from "ps") until its parent (as determined by its PPID) "wait"s for it. Thus, it is the parent process' responsibility to "clean up after" its children (kinda like real life, eh?) You can do one of two things, depending on your situtation, to handle this correctly. 1. If the parent process does not have anything better to do while the children are out playing, it can just "wait" for them to finish. 2. You can cause the parent to "double-fork". This will make it a "grand-parent" for a time, just long enough for the "parent" to fork the child, and then terminate (exit). Then, when the "grand-parent" waits for the "parent", it will be VERY quick, and should not impact the "grand-parent" (original spawning process) much, in terms of slowing down the execution. Then, the "child" will become an "orphan", and when it terminates, the system "init" process (PID = 1) will clean up after it. Implementation of the "double-fork" is simple (error detection omitted for clarity). /* grand-parent */ switch (fork()) { /* parent */ case 0: /* parent */ switch (fork()) { /* child */ case 0: /* child */ /* do the "child" part */ /* child */ break; /* parent */ default: /* parent */ exit(0); /* orphan the child */ /* parent */ } /* grand-parent */ default: /* grand-parent */ wait(0); /* wait for "parent" */ /* grand-parent */ } /* grand-parent */ /* proceed with normal processing */ Obviously, the "wait" and "fork" calls need to be checked for errors, and you may want to use "_exit" instead, in the "parent", so it doesn't flush <stdio> buffers, etc. These are left as exercises for the reader. --tgi while (--tgi) /* my mind continues to decay */ ; /* even though I do nothing.. */ {brl-bmd,ccvaxa,pur-ee,sun}!csd-gould!midas!tgi (Craig Strickland @ Gould) 305/587-2900 x5014 CompuServe: 76545,1007 Source: BDQ615 MCIMail: 272-3350 (echo ".ft B"; echo ".ps 999"; echo "$disclaimer") | troff -t # :-)
chris@umcp-cs.UUCP (Chris Torek) (12/19/85)
In article <824@brl-tgr.ARPA> lcc.rich-wiz@locus.ucla.edu (Richard Mathews) writes: >> From: Chris Torek <chris%umcp-cs.uucp@BRL.ARPA> >> In V7, 3BSD, and 4BSD, and I suspect also in Sys III and V (and >> Vr2 and Vr2V2), and probably in V8 as well, signals are not queued... > In System V, SIGCLDs are queued (well, sort of). See the signal(2) > manual page. In reality what Sys V does is this (at least on a VAX): [description deleted] In other words, System V arranges for the delivery of a SIGCLD, in the process changing things back to SIG_DFL, so that that exactly one is sent, and one more will be sent when the signal handler restores SIGCLD catching if and only if there is at least one more child process. To put it another way, the signals themselves are not queued, but child process exit is not the only trigger for SIGCLD; exited children are already queued, so the effect is the same. Implemented properly, that will guarantee reliable operation. Ok. One down, 31 to go :-). ---Signals, of course. What else? (Well, all right, I will give them credit for not breaking everything in the name of advancement.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
kimball@bsdpkh.UUCP (Rick Kimball) (12/21/85)
> My question: Is there any way to kill off these zombies so I can get > more processes ? Or, failing that, is there any other > way to do what I want ? > > Philip Schneider > University of Washington Computer Science > pjs@{uw-june.arpa,washington.arpa} > {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs If the return code from the child isn't important the following code will show you how to have a "fork-a-thon". --------- Cut Here ------ /* fork example ( parent doesn't care about child's return codes ) */ #include <stdio.h> #include <signal.h> #define CLS "\033[H\033[2J" /* clear screen for vt100 */ main(argc,argv) int argc; char *argv[]; { int Processes_forked, len, random_line, pid; char message_buffer[80]; signal(SIGCLD, SIG_IGN); write(1,CLS,strlen(CLS)); while ( 1 ) { pid = fork(); switch(pid) { case -1: /* fork failed no process created */ len = sprintf(message_buffer, "\033[24;0Htoo many"); write(1,message_buffer,len); sleep(10); break; case 0: /* child process */ pid=getpid(); random_line = pid % 22; len = sprintf(message_buffer, "\033[%d;0H#%5d", random_line, pid); write(1,message_buffer,len); sleep(2); len = sprintf(message_buffer, "\033[%d;0H ", random_line); write(1,message_buffer,len); exit(); default: /* parent process */ len = sprintf(message_buffer, "\033[24;0H%d forked", ++Processes_forked); write(1,message_buffer,len); break; } } } -------- Cut Here ---------- Rick Kimball UUCP: ihnp4!bsdpkh!kimball