pjs@uw-june (Philip J. Schneider) (12/06/85)
I have a C program which, during the course of its execution, spawns(forks) child processes. I should mention that a process is spawned, it lives for a while, then exits, and then sometime later the same thing happens, and so on. This all happens within the lifetime of the parent, and I would like to do this an arbitrary number of times. Sometimes, two or more child processes exist at once, but the upper limit on child processes that exist concurrently is low, and a group of such children exit before the next process begins. Since UNIX only allows one a certain number of processes at a time, eventually during the course of execution of the parent I run out of processes. If I temporarily stop the parent process execution and do a 'ps', the child processes show up in the list with a 'Z' status. They do not completely disappear until the parent process exits. As some of you probably already know, these useless ex-processes can't even be completely gotten rid of with a 'kill' command. The result is that these processes are taking up my process quota, even though they are dead in all practical terms (in that they finished their work and exited properly). Of course, they do go away completely once the parent exits. I can certainly understand why one is allowed only a limited number of active processes at any time. My processes, however, are not at all active once they have exited, and I feel that once a process exits, I should have my quota "credited" so that I can get more. Clearly, my problem is how to get around this situation. I could (possibly) get a higher limit on my process quota, but this would only mean that running out of processes will happen a little later. My question: Is there any way to kill off these zombies so I can get more processes ? Or, failing that, is there any other way to do what I want ? Please respond by e-mail if you can help at all, or if you need more details. Thanks in advance. -- Philip Schneider University of Washington Computer Science pjs@{uw-june.arpa,washington.arpa} {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs
johnl@ima.UUCP (12/08/85)
/* Written 7:38 pm Dec 5, 1985 by pjs@uw-june> in ima:net.unix */ > I have a C program which, during the course of its execution, spawns(forks) > child processes. I should mention that a process is spawned, it lives > for a while, then exits, and then sometime later the same thing happens, > and so on. This all happens within the lifetime of the parent, and I > would like to do this an arbitrary number of times. Sometimes, two or > more child processes exist at once, but the upper limit on child > processes that exist concurrently is low, and a group of such children > exit before the next process begins. > > Since UNIX only allows one a certain number of processes at a time, > eventually during the course of execution of the parent I run out of > processes. The problem here is a minor misunderstanding of how fork() and wait() interact. Each time a process dies, it has some status to return to its parent when the parent wait()s for it. A zombie process is one that has died but whose parent hasn't yet waited for it, so the way to get rid of zombies is to make sure that somebody collects them with a wait(). If you know at some point that all of your subprocesses have died, you can just wait for all of them by calling wait() until it returns -1 with error code ECHILD, then go ahead and spawn any more children. The other possibility is to take advantage of the fact that when a process' parent dies, the orphan is handed to init, the top level process. Use code like this (real code should have error checking, but you get the idea): ... pid = fork(); if(pid != 0) while(wait(0) != pid) /* parent here waits for child */ ; else { /* child */ if(fork() != 0) exit(0); /* child exits right away */ /* grandchild here is inherited by init, can go off and */ /* do what it wants */ } Init spends most of its time in a wait() loop and can be counted on to collect the orphaned grahdchild when it exits. The child is collected promptly by the wait call in the parent, so there will be no zombie problem. Pedantically, John Levine, ima!johnl PS: This is all in the manual, but perhaps not so crystal clear.
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/09/85)
> My question: Is there any way to kill off these zombies so I can get > more processes ? Or, failing that, is there any other > way to do what I want ? Sure, have the parent wait() on terminated children. If for some reason you have to avoid blocking, you could have the wait() done in a SIGCLD signal handler. Or, you could keep track of the child PIDs and probe their state every so often via kill() with "signal" 0, waiting on those that return failure from the kill().
zemon@fritz.UUCP (Art Zemon) (12/10/85)
In article <156@uw-june> pjs@uw-june (Philip J. Schneider) writes: > > >I have a C program which, during the course of its execution, spawns(forks) >child processes. I should mention that a process is spawned, it lives >for a while, then exits, and then sometime later the same thing happens, >and so on. This all happens within the lifetime of the parent, and I >would like to do this an arbitrary number of times. Sometimes, two or >more child processes exist at once, but the upper limit on child >processes that exist concurrently is low, and a group of such children >exit before the next process begins. > >Since UNIX only allows one a certain number of processes at a time, >eventually during the course of execution of the parent I run out of >processes. If I temporarily stop the parent process execution and >do a 'ps', the child processes show up in the list with a 'Z' status. >They do not completely disappear until the parent process exits. As some >of you probably already know, these useless ex-processes can't even >be completely gotten rid of with a 'kill' command. The result is that >these processes are taking up my process quota, even though they are >dead in all practical terms (in that they finished their work and exited >properly). Of course, they do go away completely once the parent exits. The child processes have gone away properly and are waiting for someone (some process) to collect their exit statuses with a wait() or a wait3(). They cannot be kill()-ed because they are already dead. They go away after the parent exits because init inherits them and does enough wait()s to get rid of all of them. -- -- Art Zemon FileNet Corp. ...! {decvax, ihnp4, ucbvax} !trwrb!felix!zemon
lasse@daab.UUCP (Lars Hammarstrand) (12/10/85)
>I have a C program which, during the course of its execution, spawns(forks) >child processes ..... > >... do a 'ps', the child processes show up in the list with a 'Z' status. > >.. of you probably already know, these useless ex-processes can't even >be completely gotten rid of with a 'kill' command..... If you don't want your children to end up as (Z)ombie processes, the parent process have to execute a wait(2) on each child that have been killed or stoped. See also: signal(2) and kill(2). Lars Hammarstrand.
ron@BRL.ARPA (Ron Natalie) (12/11/85)
Processes that die stay around until their status gets "inheritted." If the parent process (the one that did the fork) is still alive, it must execute a wait system call to get the information. If the parent dies without waiting for the child, then the child gets inheritted by the "orphanage" process, init, which once the system is running is always waiting for processes to die. Killing these ZOMBIE (dead, but not inheritted) processes is ineffective since they are already dead. They count against you because until they are inheritted, they consume one of a finite number of process slots on the system, which is what the process limit is protecting. You should fix the parent program such that it either waits for the dead children, or use the following frequently used kludge: FORK if CHILD then FORK if CHILD then EXECUTE SUBPROCESS CODE else EXIT endif else WAIT endif Here the process forks a second process which forks the spawned job. The middle process dies, making the spawned job an orphan who will be eaten by init. =Ron
ahb@ccice5.UUCP (Al Brumm) (12/12/85)
In article <156@uw-june> pjs@uw-june (Philip J. Schneider) writes: >My question: Is there any way to kill off these zombies so I can get > more processes ? Or, failing that, is there any other > way to do what I want ? A clean way to handle this problem on Sys3 was to use the following system call in the parent process: signal(SIGCLD, SIG_IGN); Then when a child process exited, a zombie would not be created. Note that this would not allow you to examine the child's exit status. However, you could examine the exit status by doing the following: int sigcld() { int pid, status; pid = wait(&status); . . (do stuff) . } main() { int (*sigcld)(); signal(SIGCLD, sigcld); } The example immediately above is also possible in 4.2BSD, only SIGCLD is called SIGCHLD. Then again, there is always the double fork() trick which goes something like this: if (fork()) { /* parent */ wait((int *)0); /* no zombies please */ } else { if (fork()) { /* child */ exit(0); /* satisfy parent's wait */ } else { /* grandchild */ do_stuff(); /* since my parent exit'ed */ . /* I am inherited by init */ . . } } The above trick is used quite heavily by the UNET servers.
rml@hpfcla.UUCP (12/12/85)
> > My question: Is there any way to kill off these zombies so I can get > > more processes ? Or, failing that, is there any other > > way to do what I want ? > > ... > > Or, you could keep track of the child PIDs and probe > their state every so often via kill() with "signal" 0, > waiting on those that return failure from the kill(). This will work on 4.x-based systems, but not on most others. Kill does not support "signal" 0 in many earlier systems. In System III and V, kill does support "signal" 0, but does not fail on attempts to send signals to zombies. > A clean way to handle this problem on Sys3 was to use the following > system call in the parent process: > signal(SIGCLD, SIG_IGN); > > Then when a child process exited, a zombie would not be created. This applies to System V as well. It is not, however, part of the SVID. > Is SIGCLD always reset to SIG_DFL on exec? If not, since ignored > signals normally remain ignored, it could break other programs > which expect to collect children; and programs that ignore SIGCLD > would have to carefully un-ignore it just after forks. SIGCLD is not reset from SIG_IGN to SIG_DFL on exec. Yes, this means that programs which ignore it need to be careful before spawning other programs. The same is true, by the way, of programs which mask out signals in BSD systems. > In V7, 3BSD, and 4BSD, and I suspect also > in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well, > signals are not queued, and without the `jobs library' of 4.1BSD, > or the signal facilities of 4.2, this code cannot be made to operate > reliably. It *will fail*, someday, no doubt at the worst possible > moment. > > The problem is that several children may exit in quick succession. > Only one SIGCLD signal will be delivered, since the parent process > will (just this once) not manage to run before all have exited. > The sigcld handler has no way of determining how many children are > to be processed. It turns out that SIGCLD can be used reliably in System III and V. What is missing from the example is a call within the signal handler to re-install itself. > int > sigcld() > { > int pid, status; > pid = wait(&status); > ... >>> signal(SIGCLD, sigcld); /* add this line */ > } The signal(2) system call checks to see if any zombie child(ren) are present and sends the calling process another SIGCLD if there are. The signal handler is thus invoked recursively, once per zombie. Note that the reinstallation of the handler must follow the call to wait, or infinite recursion results. Unfortunately in System III SIGCLD was not reset-when-caught, so this call might have been left out, allowing children to be missed. This was changed in System V; SIGCLD is reset to SIG_DFL when caught. Note that there is no loss of reliability from the reset to SIG_DFL; since SIGCLD is ignored by default, this is equivalent to masking out the signal until the handler is reinstalled. Unfortunately both System III and V fail to document these semantics of signal(2), and instead have an incorrect explanation on the signal(2) page which states that SIGCLD signals are queued internally. We at HP implemented some systems (HP9000 series 500 releases <= 4.02) which queued the signals as AT&T documents; current HP systems are all compatible with the System V code. BTW, I find BSD's wait3 with WNOHANG to be a more intuitive mechanism. Bob Lenk {hplabs, ihnp4}!hpfcla!rml
chris@umcp-cs.UUCP (Chris Torek) (12/14/85)
In article <974@ccice5.UUCP> ahb@ccice5.UUCP (Al Brumm) writes: > A clean way to [ignore children] on Sys3 was to use the following > system call in the parent process: > signal(SIGCLD, SIG_IGN); Cute... maybe I will add this hack to our kernel. One question: Is SIGCLD always reset to SIG_DFL on exec? If not, since ignored signals normally remain ignored, it could break other programs which expect to collect children; and programs that ignore SIGCLD would have to carefully un-ignore it just after forks. > Note that this would not allow you to examine the child's exit > status. However, you could examine the exit status by doing the > following: > int > sigcld() > { > int pid, status; > pid = wait(&status); > ... > } > main() > { > int (*sigcld)(); > > signal(SIGCLD, sigcld); > } Well, the `int (*sigcld)()' declaration is wrong and (in this case) unnecessary; it should be `int sigcld()' if anything. But that is not all that is amiss. In V7, 3BSD, and 4BSD, and I suspect also in Sys III and V (and Vr2 and Vr2V2), and probably in V8 as well, signals are not queued, and without the `jobs library' of 4.1BSD, or the signal facilities of 4.2, this code cannot be made to operate reliably. It *will fail*, someday, no doubt at the worst possible moment. The problem is that several children may exit in quick succession. Only one SIGCLD signal will be delivered, since the parent process will (just this once) not manage to run before all have exited. The sigcld handler has no way of determining how many children are to be processed. In 4.1BSD and later, the solution is a new `system call', wait3(). This call has two optional parameters, WNOHANG and WUNTRACTED. WNOHANG tells the kernel not to wait for existing children to exit. Instead, wait3 returns 0 in this case, allowing the signal handler to finish up, having now collected all exited children. (WUNTRACED exists only for C-shell style job control with stopped processes, and is irrelevant here.) Unfortunately, this solution is still incomplete. There are race conditions unless the child exit signal is withheld (but not ignored) for the duration of the child collection routine, and can be withheld during process creation (in case the created process exits before the parent finishes updating data structures). This is the case under the 4.1BSD `jobs' library, and in all 4.2 and 4.3 systems. Anyway, what it all boils down to is that process control is unreliable in many versions of Unix, but can be made reliable in 4.1, 4.2, and 4.3BSD. If there is any way to reliably handle process exit and `job control' style processing in System III and System V, I am not aware of it---though that should be unsurprising since I have never used them. If it is possible in the latest AT&T Unixes, I would like to know how. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
dave@onfcanim.UUCP (Dave Martindale) (12/15/85)
In article <2548@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes: >Anyway, what it all boils down to is that process control is >unreliable in many versions of Unix, but can be made reliable in >4.1, 4.2, and 4.3BSD. If there is any way to reliably handle >process exit and `job control' style processing in System III and >System V, I am not aware of it---though that should be unsurprising >since I have never used them. If it is possible in the latest AT&T >Unixes, I would like to know how. (My only experience with system V is on an IRIS workstation, which is system V with some Berkeley stuff. But the signal mechanism seems to be from System V - there is none of the Berkeley sigmask stuff.) On the IRIS, if you read the fine print, you will find that SIGCLD doesn't behave like a "normal" signal. It seems that SIGCLD is generated by the presence of a zombie child, not the event of a child terminating. This was brought home to me in a program that had a single child. When the child terminated, the SIGCLD handler (due to me not understanding what was going on) re-enabled the signal before waiting for the child. Immediately, another SIGCLD was delivered, and so on until the stack overflowed. So, if you had a second child exit while handling the first SIGCLD, no problem - you'll get another SIGCLD as soon as you re-enable the signal. It is also unlike a "normal" signal in that the "default" action is for nothing to happen, while a "normal" signal causes some action beyond the control of the process. The essential difference, I think, is simply that V7 signals had no "memory" - when one was delivered, either you caught it or you ignored it and it went away, but you couldn't "hold" it. 4.2 signal handling knows about signals that are held for later delivery. SV doesn't have this in general for signals, but in the case of SIGCLD the existence of the zombie process provides the "memory". Dave Martindale
rich@rexago1.UUCP (K. Richard Magill) (12/16/85)
In article <14767@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes: >In article <2548@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes: >>If there is any way to reliably handle >>process exit and `job control' style processing in System III and >>System V, I am not aware of it--- >It seems that SIGCLD is generated >by the presence of a zombie child, not the event of a child terminating. In the spirit of information sharing... With respect to a 3b2/300 running SV.2.2... I guess I can't/shouldn't copy the manual but SIGCLD is generated by the death of a child and reset when caught. ... behaves as other signals with the exception that successive SIGCLD's are queued instead of successively interrupting the catching function. A rudimentary job control has been accomplished on SV.2.2. It consists of a parent process that calls subprocesses which are $SHELL. All you can do from the parent, shl, is create, delete, block, unblock, list, etc. ie, job control only. not really a shell. Typing your SWTCH character from a child gets you back to shl, with the child effectively bg'd, from which you may create a new subshell. Of course you are limited to eight subprocesses. All of this is accomplished using pseudo terminals, (sxt's), a new control character defined in termio.c_cc[7], SWTCH, a new control mode, in (termio.c_cflag & 0x10000), LOBLK which blocks output of the current layer. I should add that shl cannot be your login shell and doesn't work if exec'd from you login shell. I do use it. It's not csh but its better than sh alone. Ksh is puported to do csh style job control on 3b2 but I have yet to see it work. The copies I have seen tend to lock up terminals frequently when you try. AGAIN! This is 3b2/300 SV.2.2. I know that the pc7300 SV pc7300 version 3 does NOT have these features and I can't speak for anything else. K. Richard Magill Have I violated copyright? Have I said something stupid?
daemon@houligan.UUCP (12/18/85)
In <156@uw-june> pjs@uw-june (Philip J. Schneider) writes: > I have a C program which, during the course of its execution, > spawns(forks) child processes. ... Since UNIX only allows one a > certain number of processes at a time, eventually during the course of > execution of the parent I run out of processes. If I temporarily stop > the parent process execution and do a 'ps', the child processes show up > in the list with a 'Z' status. They do not completely disappear until > the parent process exits. > > My question: Is there any way to kill off these zombies so I can get > more processes ? Or, failing that, is there any other > way to do what I want ? > > Philip Schneider > University of Washington Computer Science > pjs@{uw-june.arpa,washington.arpa} > {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs The problem (and fortunately, the solution) is simple. A process, once terminated, becomes a "zombie" (Z status from "ps") until its parent (as determined by its PPID) "wait"s for it. Thus, it is the parent process' responsibility to "clean up after" its children (kinda like real life, eh?) You can do one of two things, depending on your situtation, to handle this correctly. 1. If the parent process does not have anything better to do while the children are out playing, it can just "wait" for them to finish. 2. You can cause the parent to "double-fork". This will make it a "grand-parent" for a time, just long enough for the "parent" to fork the child, and then terminate (exit). Then, when the "grand-parent" waits for the "parent", it will be VERY quick, and should not impact the "grand-parent" (original spawning process) much, in terms of slowing down the execution. Then, the "child" will become an "orphan", and when it terminates, the system "init" process (PID = 1) will clean up after it. Implementation of the "double-fork" is simple (error detection omitted for clarity). /* grand-parent */ switch (fork()) { /* parent */ case 0: /* parent */ switch (fork()) { /* child */ case 0: /* child */ /* do the "child" part */ /* child */ break; /* parent */ default: /* parent */ exit(0); /* orphan the child */ /* parent */ } /* grand-parent */ default: /* grand-parent */ wait(0); /* wait for "parent" */ /* grand-parent */ } /* grand-parent */ /* proceed with normal processing */ Obviously, the "wait" and "fork" calls need to be checked for errors, and you may want to use "_exit" instead, in the "parent", so it doesn't flush <stdio> buffers, etc. These are left as exercises for the reader. --tgi while (--tgi) /* my mind continues to decay */ ; /* even though I do nothing.. */ {brl-bmd,ccvaxa,pur-ee,sun}!csd-gould!midas!tgi (Craig Strickland @ Gould) 305/587-2900 x5014 CompuServe: 76545,1007 Source: BDQ615 MCIMail: 272-3350 (echo ".ft B"; echo ".ps 999"; echo "$disclaimer") | troff -t # :-)
kimball@bsdpkh.UUCP (Rick Kimball) (12/21/85)
> My question: Is there any way to kill off these zombies so I can get > more processes ? Or, failing that, is there any other > way to do what I want ? > > Philip Schneider > University of Washington Computer Science > pjs@{uw-june.arpa,washington.arpa} > {ihnp4,decvax,cornell}!uw-beaver!uw-june!pjs If the return code from the child isn't important the following code will show you how to have a "fork-a-thon". --------- Cut Here ------ /* fork example ( parent doesn't care about child's return codes ) */ #include <stdio.h> #include <signal.h> #define CLS "\033[H\033[2J" /* clear screen for vt100 */ main(argc,argv) int argc; char *argv[]; { int Processes_forked, len, random_line, pid; char message_buffer[80]; signal(SIGCLD, SIG_IGN); write(1,CLS,strlen(CLS)); while ( 1 ) { pid = fork(); switch(pid) { case -1: /* fork failed no process created */ len = sprintf(message_buffer, "\033[24;0Htoo many"); write(1,message_buffer,len); sleep(10); break; case 0: /* child process */ pid=getpid(); random_line = pid % 22; len = sprintf(message_buffer, "\033[%d;0H#%5d", random_line, pid); write(1,message_buffer,len); sleep(2); len = sprintf(message_buffer, "\033[%d;0H ", random_line); write(1,message_buffer,len); exit(); default: /* parent process */ len = sprintf(message_buffer, "\033[24;0H%d forked", ++Processes_forked); write(1,message_buffer,len); break; } } } -------- Cut Here ---------- Rick Kimball UUCP: ihnp4!bsdpkh!kimball
arturo@humming.UUCP (Arturo Perez) (12/25/85)
In article <12600002@hpfcls.UUCP> rml@hpfcla.UUCP writes: >> >> The problem is that several children may exit in quick succession. >> Only one SIGCLD signal will be delivered, since the parent process >> will (just this once) not manage to run before all have exited. >> The sigcld handler has no way of determining how many children are >> to be processed. > >It turns out that SIGCLD can be used reliably in System III and V. >What is missing from the example is a call within the signal handler >to re-install itself. > >> int >> sigcld() >> { >> int pid, status; >> pid = wait(&status); >> ... >>>> signal(SIGCLD, sigcld); /* add this line */ >> } > >The signal(2) system call checks to see if any zombie child(ren) are >present and sends the calling process another SIGCLD if there are. >The signal handler is thus invoked recursively, once per zombie. >Note that the reinstallation of the handler must follow the call to >wait, or infinite recursion results. > Bob Lenk > {hplabs, ihnp4}!hpfcla!rml This isn't correct. The problem is that the implicit 'signal(SIGCLD, SIG_DFL)' is done AFTER the signal trapping function returns. Thus, if you call signal from within the trapping function it doesn't do you any good. At least, this is the way it works on our SYSV/BSD hybrids.
cc743805@sjuvax.UUCP (conway) (11/08/86)
I'll make this short and sweet: How can one change the date/time stamp of a file? I want to be able to put any date/time on a file that I have in my directory. Is this possible? If this question has been discussed before, please forgive me. I don't usually read this group. Chuck Conway -- ______________________________________________________________ | Chuck Conway, St. Joseph's University | | {bpa|burdvax|princeton|allegra}!sjuvax!cc743805 | | cc743805@sjuvax.UUCP | |------------------------------------------------------------|