[comp.sys.sun] 4.1 and ksh88d

marg@cunixf.cc.columbia.edu (Margarita Suarez) (07/12/90)

We upgraded our 3 Sun-4/280's to SunOS 4.1 in early June.  We also have
compiled ksh88d using the patches posted here by Doug Kingston
(dpk@morgan.com).  We noticed problems, however, when once in a while
programs like "more" and "less" at the end of a pipeline would appear to
hang the terminal.  Symptoms: all input to the terminal was ignored,
although the process could be killed from another terminal.

We finally traced the problem down to some questionable code in xec.c.
The shell would fork up a kid for each member of the pipeline, and try to
assign them to the same process group as the "leader" of the pipeline.
However, if by chance the tail process of the pipeline tried to call
setpgid with the pid of the pipleline "leader" *before* the pipeline
"leader" had established itself as the process group leader, the setpgid()
would return with a ENOPERM error.

Problem was, in the event of an error, the process group would be set to
be the same as the current process' pid, and the setpgid was called again
with this new pgrp.  This confuses ioctl() terribly because now we have a
piece of a pipeline which is not in the same process group as the rest of
the pipeline.  That explains why all keybord input was ignored, etc.

To fix, check for the case where setpgid returns ENOPERM, and if so, sleep
a bit and try again until the process group leader has been properly
established.

By the way, does anyone know where ksh bug reports should be sent?  Also,
out of curiosity, how many people are actually running ksh88d under 4.1,
and have you had many problems?

Margarita M. Suarez
Unix Systems Group

 VOICE:  w:212-854-5434 h:212-932-3023
INTERNET:  marg@cunixf.cc.columbia.edu
 BITNET:            marg@cunixf.bitnet
  UUCP:  !rutgers!columbia!cunixf!marg

chet@po.cwru.edu (07/17/90)

In article <9833@brazos.Rice.edu> marg@cunixf.cc.columbia.edu (Margarita Suarez) writes:

[An explanation of why sometimes the last process in a pipeline would hang
ksh88d, and finally traced it down to a setpgid() race condition where a
process in the pipeline would try to set its pgrp before the process group
`leader' had established the process group.  This resulted in a pipeline
with pieces in different process groups.]

>To fix, check for the case where setpgid returns ENOPERM, and if so, sleep
>a bit and try again until the process group leader has been properly
>established.

A better fix is to do the setpgid() in both the parent and child, so the
setpgid() succeeds before anything else happens.  It doesn't matter which
succeeds, as long as the pgrp is set before anything else depends on it
being so.

Chet Ramey				"...but worst of all, young man,
Network Services Group			 you've got Industrial Disease!"
Case Western Reserve University	
chet@ins.CWRU.Edu