andy@istbt.UUCP (Andy Greener) (02/28/85)
We recently had an intermittent problem when using the "shell out" in pg (Sys V.2 on Vax 750). Depending on the length of the preceding pipeline (the problem only occurs when pg is the tail of a pipeline) pg would resume without waiting for the child shell to complete - result: two processes trying to read from the terminal. On investigating in the pg source I found that it does a "wait(0)" without testing the return pid (this "mechanism" is common in Sys V code as far as I can tell). It is relying on the wait being interrupted by the death-of-a-child signal when the sub-process terminates. However, the Sys V shell makes the last process in a pipeline the parent of all the others, so if the pipeline contains a few processes the first may already have died, resulting in the wait(0) returning immediately and pg carrying on its merry way. It seems that this practice of relying on a signal to terminate the wait is inherently dangerous, especially as the signal(2) man section states in reference to SIGCLD and SIGPWR: "Their use in new programs is strongly discouraged" Does one hand at AT&T know what the other is doing? It is almost beyond comprehension that such a major screw up has seen the light of day. It undoubtedly will produce unforseen effects elsewhere (eg Sys V tar forks mkdir, and uses the same wait mechanism - we had problems here too). Do they do any testing of their "products"? My impression of Sys V.2 is that it was rushed out the door; there are other problems which we have come across that are really only of interest to other Sys V sites. I won't bore you with the details here, but I will respond to mailed requests and if there's lots of interest I'll post to the net. Andy Greener Imperial Software Technology London, ENGLAND. {mcvax, qtlon, inset, root44}!ist!andy "UNIX System V: from now on consider it sub-standard"
guy@rlgvax.UUCP (Guy Harris) (03/08/85)
Code which starts up a child process and does a general "wait" for any children without checking that the child they're interested in is what exited is broken, due, as you mentioned, to the fact that the shell's way of setting up pipelines creates unexpected edges in the family tree of processes. This was discovered in the 4.2BSD "crypt" a while ago, so such broken code is not restricted to System V. Any such code out there should be redone, and all future code which waits for children *must* check that the process which exited is the process that was being waited for. I don't know how seriously to take the comment about SIGCLD and SIGPWR, considering 1) the System III manual said they'd go away, and they're still here and 2) System V's "init" uses both of those signals. -- Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (03/08/85)
> Any such code out there should be redone, and all future code which > waits for children *must* check that the process which exited is the > process that was being waited for. Of course, if there are no remaining children, the parent should also quit trying to wait.. One does sometimes wonder where the people who write this stuff in the first place learned to program. I have lost count of the bugs I have found (and trivially fixed) in UNIX System V utilities. (Before people start flaming, I have found a similar situation in all other versions of UNIX that I have encountered.)
jsdy@hadron.UUCP (Joseph S. D. Yao) (03/12/85)
[Re: checking the return value of wait(2) for pid/-1.] > Any such code out there should be redone, and all future code which > waits for children *must* check that the process which exited is the > process that was being waited for. This was being said some time around the era of V E R S I O N 5! All versions of UNIX that I can recall, back to that era, had both this feature and this bug in most of its code. (I'm not familiar with UNIX before 1974/V5 -- the problem may even predate that.) ;-S Joe Yao hadron!jsdy@seismo.{ARPA,UUCP}
leiby@masscomp.UUCP (Mike Leibensperger) (03/12/85)
In article <562@rlgvax.UUCP> guy@rlgvax.UUCP (Guy Harris) writes: >Code which starts up a child process and does a general "wait" for any >children without checking that the child they're interested in is what >exited is broken, due, as you mentioned, to the fact that the shell's >way of setting up pipelines creates unexpected edges in the family >tree of processes. Does anyone know what the rational was/is for having the shell create pipelines in this seemingly ludicrous way (i.e. the Nth process in the pipeline is parent of the (N-1)st)? After all, why should a program need to concern itself with the process behaviour of previous programs in its pipeline? It just rubs me the wrong way. If someone could enlighten me about the rational for this change, and about when the change first appeared, I would appreciate it. The 'r' key is sufficient.... -- Mike Leibensperger Masscomp; 1 Technology Park; Westford, MA 01886 {decvax,harpo,tektronix}!masscomp!leiby
jack@boring.UUCP (03/15/85)
Well, for the benefit of all V7 users that are still out there:
There is a bug like this in Version 7 tar.
When it has to make a new directory, it does something like
if( fork() == 0) {
exec("/bin/mkdir",....)
exit();
}
while( wait(NULL) > 0 ); <--- GRRRRR
Now, guess what happens when you start up tar in the receiving
end of a pipe.......
(For those who don't like guessing: The shell sets up pipes so
that the front end is a child of the back end. This means that,
besides the mkdir, tar will have another child, being the first
end of the pipe. This means that the second wait will take a
*very* long time........)
--
Jack Jansen, {decvax|philabs|seismo}!mcvax!jack
It's wrong to wish on space hardware.