[comp.bugs.4bsd] ex doesn't wait for all its children

tonyb@tektronix.TEK.COM (Tony Birnseth) (01/14/87)

Description:
	In an old report, wendt@megaron.UUCP reports the following problem.

	> Vi doesn't wait for spawned children, on things like
	> 
	> 50,75!cat >> foo
	> 
	> I had something like this mapped to a keystroke and was
	> using it to partition my mailbox.  After thirty or so
	> spawns, it says "no more processes" and goes into a mode
	> where it simply echoes every character you type at it.
	> All the defunct processes stay around till you exit.

Index: ex_unix.c

Fix:
	I investigated this problem as a result of a bug report I received 
	here at Tek.  When ex wants to write a range of lines to a subprocess, 
	it calls a routine called filter() which fork()'s and writes the range 
	from the buffer to the 'io' stream.  This child process then exits 
	without being wait()'d for.  After writing the range to 'io', a routine 
	is called which fork()/execl()'s a 'shell -c cmd', using 'io' as stdin 
	to the shell.
	
	The zombie processes are left as a result of the fork() in filter() 
	not being waited for.  Between the beta and real releases of 4.3BSD, 
	Berkeley fixed the symptom of the problem, but not the cause.  

	In the Berkeley fix, the 4.2 code was changed to loop on wait() until 
	a -1 was returned indicating that no more children remained.  There 
	is no check to verify that 'errno == ECHILD' or checks for other error 
	conditions.
	
	I contend that this change was not necessary and introduces a 
	situation where if the first fork() in filter() succeeds, but the 
	fork() in unixex() fails (forking the shell), the zombie process 
	remains.  The only change that was really necessary was to add a call 
	to waitfor() after the call to setrupt() if forking to write the 'io' 
	stream.

	Of course it's entirely possible that I'm missing something here.  I 
	think the fix should be done correctly by waiting for each child after
	it is created.
	
	Diffs follow.  To install the correct changes on pre 4.3bsd, just add
	the call to waitfor() after setrupt() in filter() inside of the
	'if(mode & 2)' conditional block.

	Tony Birnseth
	tonyb@tektronix.tek.com

-------------------------- Diffs Follow For 4.3BSD -----------------------------

*** /tmp/,RCSt1025777	Tue Jan 13 13:17:17 1987
--- ex_unix.c	Tue Jan 13 13:17:11 1987
***************
*** 269,274
  		close(pvec[1]);
  		io = pvec[0];
  		setrupt();
  	}
  	f = unixex("-c", uxb, (mode & 2) ? pvec[0] : 0, mode);
  	if (mode == 3) {

--- 269,281 -----
  		close(pvec[1]);
  		io = pvec[0];
  		setrupt();
+ #ifndef BAD_BSD_FIX
+ 		/*
+ 		 * It's necessary to wait for this child here so we don't run
+ 		 * out of process slots or memory in unixex() below.
+ 		 */
+ 		waitfor();
+ #endif /* BAD_BSD_FIX */
  	}
  	f = unixex("-c", uxb, (mode & 2) ? pvec[0] : 0, mode);
  	if (mode == 3) {
***************
*** 327,332
  	close(pvec[1]);
  }
  
  /*
   * Wait for the process (pid an external) to complete.
   */

--- 334,340 -----
  	close(pvec[1]);
  }
  
+ #ifdef BAD_BSD_FIX
  /*
   * Wait for the process (pid an external) to complete.
   */
***************
*** 341,346
  	} while (rpid != -1);
  	status = (status >> 8) & 0377;
  }
  
  /*
   * The end of a recover operation.  If the process

--- 349,378 -----
  	} while (rpid != -1);
  	status = (status >> 8) & 0377;
  }
+ #else /* ! BAD_BSD_FIX */
+ /*
+  * Wait for the process (pid an external) to complete.
+  * 	The above patch treats the symptoms, rather than the cause of zombie
+  *	children not being waited for.  The code below is from the 4.3 beta
+  *	release.  It was not necessary to change waitfor() at all.  What
+  *	was needed was to add a call to waitfor() in filter() after it forks
+  *	to write the range of lines to the pipe.
+  *
+  *	With the above change to waitfor(), it is still possible to have 
+  *	a zombie process laying around.  If the fork() in filter() succeeds, 
+  *	but the fork() in unixex() fails from lack of proc slots or memory, 
+  *	the zombie left from filter() still remains.  The change should have
+  *	been made in filter(), not waitfor().
+  */
+ waitfor()
+ {
+ 
+ 	do
+ 		rpid = wait(&status);
+ 	while (rpid != pid && rpid != -1);
+ 	status = (status >> 8) & 0377;
+ }
+ #endif /* BAD_BSD_FIX */
  
  /*
   * The end of a recover operation.  If the process

thorinn@diku.UUCP (01/19/87)

In article <10069@tektronix.TEK.COM> tonyb@tektronix.TEK.COM (Tony Birnseth) writes:
>	The only change that was really necessary was to add a call 
>	to waitfor() after the call to setrupt() if forking to write the 'io' 
>	stream.

>	Of course it's entirely possible that I'm missing something here.  I 
>	think the fix should be done correctly by waiting for each child after
>	it is created.
>
  I think that you missed one important point: If the region written to the
pipe contains more than 4K bytes, the first child won't exit if no one reads
the pipe. So the correct thing would be to remember the pid of the writer and
wait for it *after* the reader exits -- even if the writer didn't get every-
thing into the pipe, it will get a SIGPIPE and die when the pipe is closed.
--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark		..mcvax!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.