tonyb@tektronix.TEK.COM (Tony Birnseth) (01/14/87)
Description: In an old report, wendt@megaron.UUCP reports the following problem. > Vi doesn't wait for spawned children, on things like > > 50,75!cat >> foo > > I had something like this mapped to a keystroke and was > using it to partition my mailbox. After thirty or so > spawns, it says "no more processes" and goes into a mode > where it simply echoes every character you type at it. > All the defunct processes stay around till you exit. Index: ex_unix.c Fix: I investigated this problem as a result of a bug report I received here at Tek. When ex wants to write a range of lines to a subprocess, it calls a routine called filter() which fork()'s and writes the range from the buffer to the 'io' stream. This child process then exits without being wait()'d for. After writing the range to 'io', a routine is called which fork()/execl()'s a 'shell -c cmd', using 'io' as stdin to the shell. The zombie processes are left as a result of the fork() in filter() not being waited for. Between the beta and real releases of 4.3BSD, Berkeley fixed the symptom of the problem, but not the cause. In the Berkeley fix, the 4.2 code was changed to loop on wait() until a -1 was returned indicating that no more children remained. There is no check to verify that 'errno == ECHILD' or checks for other error conditions. I contend that this change was not necessary and introduces a situation where if the first fork() in filter() succeeds, but the fork() in unixex() fails (forking the shell), the zombie process remains. The only change that was really necessary was to add a call to waitfor() after the call to setrupt() if forking to write the 'io' stream. Of course it's entirely possible that I'm missing something here. I think the fix should be done correctly by waiting for each child after it is created. Diffs follow. To install the correct changes on pre 4.3bsd, just add the call to waitfor() after setrupt() in filter() inside of the 'if(mode & 2)' conditional block. Tony Birnseth tonyb@tektronix.tek.com -------------------------- Diffs Follow For 4.3BSD ----------------------------- *** /tmp/,RCSt1025777 Tue Jan 13 13:17:17 1987 --- ex_unix.c Tue Jan 13 13:17:11 1987 *************** *** 269,274 close(pvec[1]); io = pvec[0]; setrupt(); } f = unixex("-c", uxb, (mode & 2) ? pvec[0] : 0, mode); if (mode == 3) { --- 269,281 ----- close(pvec[1]); io = pvec[0]; setrupt(); + #ifndef BAD_BSD_FIX + /* + * It's necessary to wait for this child here so we don't run + * out of process slots or memory in unixex() below. + */ + waitfor(); + #endif /* BAD_BSD_FIX */ } f = unixex("-c", uxb, (mode & 2) ? pvec[0] : 0, mode); if (mode == 3) { *************** *** 327,332 close(pvec[1]); } /* * Wait for the process (pid an external) to complete. */ --- 334,340 ----- close(pvec[1]); } + #ifdef BAD_BSD_FIX /* * Wait for the process (pid an external) to complete. */ *************** *** 341,346 } while (rpid != -1); status = (status >> 8) & 0377; } /* * The end of a recover operation. If the process --- 349,378 ----- } while (rpid != -1); status = (status >> 8) & 0377; } + #else /* ! BAD_BSD_FIX */ + /* + * Wait for the process (pid an external) to complete. + * The above patch treats the symptoms, rather than the cause of zombie + * children not being waited for. The code below is from the 4.3 beta + * release. It was not necessary to change waitfor() at all. What + * was needed was to add a call to waitfor() in filter() after it forks + * to write the range of lines to the pipe. + * + * With the above change to waitfor(), it is still possible to have + * a zombie process laying around. If the fork() in filter() succeeds, + * but the fork() in unixex() fails from lack of proc slots or memory, + * the zombie left from filter() still remains. The change should have + * been made in filter(), not waitfor(). + */ + waitfor() + { + + do + rpid = wait(&status); + while (rpid != pid && rpid != -1); + status = (status >> 8) & 0377; + } + #endif /* BAD_BSD_FIX */ /* * The end of a recover operation. If the process
thorinn@diku.UUCP (01/19/87)
In article <10069@tektronix.TEK.COM> tonyb@tektronix.TEK.COM (Tony Birnseth) writes: > The only change that was really necessary was to add a call > to waitfor() after the call to setrupt() if forking to write the 'io' > stream. > Of course it's entirely possible that I'm missing something here. I > think the fix should be done correctly by waiting for each child after > it is created. > I think that you missed one important point: If the region written to the pipe contains more than 4K bytes, the first child won't exit if no one reads the pipe. So the correct thing would be to remember the pid of the writer and wait for it *after* the reader exits -- even if the writer didn't get every- thing into the pipe, it will get a SIGPIPE and die when the pipe is closed. -- Lars Mathiesen, DIKU, U of Copenhagen, Denmark ..mcvax!diku!thorinn Institute of Datalogy -- we're scientists, not engineers.