[net.bugs.usg] Shell out bug in pg

andy@istbt.UUCP (Andy Greener) (02/28/85)

We recently had an intermittent problem when using the "shell out" in
pg (Sys V.2 on Vax 750). Depending on the length of the preceding 
pipeline (the problem only occurs when pg is the tail of a pipeline)
pg would resume without waiting for the child shell to complete - 
result: two processes trying to read from the terminal.

On investigating in the pg source I found that it does a "wait(0)"
without testing the return pid (this "mechanism" is common in Sys V
code as far as I can tell). It is relying on the wait being interrupted
by the death-of-a-child signal when the sub-process terminates.
However, the Sys V shell makes the last process in a pipeline the parent
of all the others, so if the pipeline contains a few processes the
first may already have died, resulting in the wait(0) returning immediately
and pg carrying on its merry way.

It seems that this practice of relying on a signal to terminate the
wait is inherently dangerous, especially as the signal(2) man section
states in reference to SIGCLD and SIGPWR:

	"Their use in new programs is strongly discouraged"

Does one hand at AT&T know what the other is doing? It is almost beyond
comprehension that such a major screw up has seen the light of day.
It undoubtedly will produce unforseen effects elsewhere (eg Sys V tar
forks mkdir, and uses the same wait mechanism - we had problems here too).
Do they do any testing of their "products"? My impression of
Sys V.2 is that it was rushed out the door; there are other problems
which we have come across that are really only of interest to other
Sys V sites. I won't bore you with the details here, but I will respond
to mailed requests and if there's lots of interest I'll post to the net.

			Andy Greener	Imperial Software Technology
					London, ENGLAND.

					{mcvax, qtlon, inset, root44}!ist!andy

	"UNIX System V: from now on consider it sub-standard"

guy@rlgvax.UUCP (Guy Harris) (03/08/85)

Code which starts up a child process and does a general "wait" for any
children without checking that the child they're interested in is what
exited is broken, due, as you mentioned, to the fact that the shell's
way of setting up pipelines creates unexpected edges in the family
tree of processes.  This was discovered in the 4.2BSD "crypt" a while
ago, so such broken code is not restricted to System V.

Any such code out there should be redone, and all future code which
waits for children *must* check that the process which exited is the
process that was being waited for.

I don't know how seriously to take the comment about SIGCLD and SIGPWR,
considering 1) the System III manual said they'd go away, and they're still
here and 2) System V's "init" uses both of those signals.
-- 
	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (03/08/85)

> Any such code out there should be redone, and all future code which
> waits for children *must* check that the process which exited is the
> process that was being waited for.

Of course, if there are no remaining children, the parent should
also quit trying to wait..

One does sometimes wonder where the people who write this stuff in
the first place learned to program.  I have lost count of the bugs
I have found (and trivially fixed) in UNIX System V utilities.
(Before people start flaming, I have found a similar situation in
all other versions of UNIX that I have encountered.)

jsdy@hadron.UUCP (Joseph S. D. Yao) (03/12/85)

[Re: checking the return value of wait(2) for pid/-1.]
> Any such code out there should be redone, and all future code which
> waits for children *must* check that the process which exited is the
> process that was being waited for.

This was being said some time around the era of   V E R S I O N   5!
All versions of UNIX that I can recall, back to that era, had both this
feature and this bug in most of its code.  (I'm not familiar with UNIX
before 1974/V5 -- the problem may even predate that.)
;-S

Joe Yao		hadron!jsdy@seismo.{ARPA,UUCP}

leiby@masscomp.UUCP (Mike Leibensperger) (03/12/85)

In article <562@rlgvax.UUCP> guy@rlgvax.UUCP (Guy Harris) writes:
>Code which starts up a child process and does a general "wait" for any
>children without checking that the child they're interested in is what
>exited is broken, due, as you mentioned, to the fact that the shell's
>way of setting up pipelines creates unexpected edges in the family
>tree of processes.

Does anyone know what the rational was/is for having the shell create
pipelines in this seemingly ludicrous way (i.e. the Nth process in the
pipeline is parent of the (N-1)st)?  After all, why should a program 
need to concern itself with the process behaviour of previous programs 
in its pipeline?  It just rubs me the wrong way.

If someone could enlighten me about the rational for this change, and
about when the change first appeared, I would appreciate it.  The 'r'
key is sufficient....
--
Mike Leibensperger
Masscomp; 1 Technology Park; Westford, MA 01886
{decvax,harpo,tektronix}!masscomp!leiby

jack@boring.UUCP (03/15/85)

Well, for the benefit of all V7 users that are still out there:
There is a bug like this in Version 7 tar.
When it has to make a new directory, it does something like

	if( fork() == 0) {
		exec("/bin/mkdir",....)
		exit();
	}
	while( wait(NULL) > 0 );	<--- GRRRRR

Now, guess what happens when you start up tar in the receiving
end of a pipe.......

(For those who don't like guessing: The shell sets up pipes so
that the front end is a child of the back end. This means that,
besides the mkdir, tar will have another child, being the first
end of the pipe. This means that the second wait will take a
*very* long time........)

-- 
	Jack Jansen, {decvax|philabs|seismo}!mcvax!jack
It's wrong to wish on space hardware.