[comp.unix.wizards] csh pgrp problem

richard@aiai.ed.ac.uk (Richard Tobin) (08/10/89)

For historical reasons, we use a modified csh, which we call tcsh, but
which isn't the "standard" one used elsewhere.

Running under SunOS 4, we occasionally encounter an annoying problem:
a pipeline (eg cat /etc/passwd | more) will stop, with the message

   Stopped (tty output)

I believe I've found the problem, what I want to know is whether there's
a simple fix, perhaps in more recent versions of csh than we have here.

I had some problems when I first compiled this shell for SunOS 4, and
the simplest solution seemed to be to #undef VFORK, since fork() in
SunOs 4 does copy-on-write.

What seems to be happening is that the shell forks twice (once for cat
and once for more).  Each child sets its process group to the jobid,
which is cat's process id.  The first child sets the terminal process
group to the same thing.  However, there's nothing to guarantee that
the first child sets the terminal process group before the second child
starts running, and perhaps once in 20 times it doesn't.  In these
cases the ioctls performed by more cause a SIGTTOU.

Presumably using vfork() forces things to happen in the right order.

Thanks in advance,
  Richard

-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

pjb@tcom.stc.co.uk (Peter J. Bishop) (08/10/89)

In article <712@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin) writes:
) Running under SunOS 4, we occasionally encounter an annoying problem:
) a pipeline (eg cat /etc/passwd | more) will stop, with the message
) 
)    Stopped (tty output)
) 
) I had some problems when I first compiled this shell for SunOS 4, and
) the simplest solution seemed to be to #undef VFORK, since fork() in
) SunOs 4 does copy-on-write.

I am also encountering this problem using /usr/new/csh on a DEC microvax
running Ultrix 3.0

As I have no way of re-compiling this shell, does anyone know of a fix that I
could implement, or should I raise the problem with DES s/w support.

Thanks,

-- 
Peter Bishop.  <pjb@tcom.stc.co.uk> || ...!mcvax!ukc!stc!pjb
STC TNDD, Access Systems Engineering, 20-22 Edinburgh Way, Harlow. Essex
Phone : +44 279 626626 x2795

hedrick@geneva.rutgers.edu (Charles Hedrick) (08/11/89)

On a Sun 4 under 4.0, you must #include <vfork.h> in order to use
vfork reliably.  If you don't do this in csh, you get wierd results.
That may explain your problem.

eloranta@tukki.jyu.fi (Jussi Eloranta) (08/11/89)

In article <712@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>For historical reasons, we use a modified csh, which we call tcsh, but
---
>Running under SunOS 4, we occasionally encounter an annoying problem:
>a pipeline (eg cat /etc/passwd | more) will stop, with the message
>
>   Stopped (tty output)
>

We have the same problem here.... (with tcsh from tut.cis.ohio-state.edu)
Does anyone have a fixed version? We don't (yet) have the sources so I would
need binaries...

-jme


-- 
============================================================================
Jussi Eloranta               Internet(/Bitnet):
University of Jyvaskyla,     eloranta@tukki.jyu.fi
Finland                      [128.214.7.5]

chris@mimsy.UUCP (Chris Torek) (08/11/89)

In article <712@skye.ed.ac.uk> richard@aiai.ed.ac.uk (Richard Tobin) writes:
>I believe I've found the problem, what I want to know is whether there's
>a simple fix ... the shell forks twice (once for cat and once for more).
>Each child sets its process group to the jobid, which is cat's process id.
>The first child sets the terminal process group to the same thing.
>However, there's nothing to guarantee that the first child sets the
>terminal process group before the second child starts running ...
>
>Presumably using vfork() forces things to happen in the right order.

This analysis is correct (congratulations: discovering this bug is
rather tricky---the POSIX folks noticed it eventually, but it took
quite a while).

The accepted solution is to set the terminal's process group k+1 times
when there are k children in a pipeline (or k times with the current
system): once in each child and once in the parent.  Setting the pgroup
to whatever it is already is harmless, and this ensures that the pgroup
is set by the time it needs to be.

(Most of the mess would go away if process groups were allocated
by the system, rather than by user code.  But it is too late for
POSIX.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

mojo@legato (Joseph Moran) (08/13/89)

In article <712@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>Running under SunOS 4, we occasionally encounter an annoying problem:
>a pipeline (eg cat /etc/passwd | more) will stop, with the message
>
>   Stopped (tty output)
>
>I believe I've found the problem, what I want to know is whether there's
>a simple fix, perhaps in more recent versions of csh than we have here.

Unfortunately, the `simple' fix I know of is to continue to use vfork
with csh...

>I had some problems when I first compiled this shell for SunOS 4, and
>the simplest solution seemed to be to #undef VFORK, since fork() in
>SunOs 4 does copy-on-write.

Here's some history on this stuff.  While I was working at Sun, I did
most of the work on the new VM system.  When the new VM project was
started, we believed that we could just have the vfork system call do a
standard copy-on-write fork for binary compatibility.  Then we could
retire vfork from the C library since vfork was a hack marked for
deletion.

When I ran a prototype new VM kernel on my workstation, I occasionally
ran in the "Stopped (tty output)" problem when using csh and pipes.  I
spent some time tracking this mess down.  I found that if I compiled
csh with VFORK not defined, that the csh would occasionally fail the
same way that it did running on the new VM system with vfork replaced
by fork.  From this I concluded that I was seeing the result of a long
time csh bug that was never noticed at Berkeley (where both csh and
vfork originated) since vfork was always used there for csh.  Folklore
has it that vfork was created solely for csh because of the performance
costs of csh doing Unix fork's in a paged environment without
copy-on-write.

After tracking down the race condition in setting the process group
stuff in csh, I decided that it was too hard for me personally to fix
(I was doing kernel VM work, not csh support).  As time went on, we
found more places that depended on the subtle effects of vfork.
Eventually it was decided that SunOS needed to continue to support
vfork even after we had a copy-on-write fork just because of a few
$%$#$!* programs that either took advantage of the vfork semantics
(e.g., csh using vfork to keep exec hash statistics) or accidentally
depended on them (e.g., the csh process group problem when not using
vfork).

>What seems to be happening is that the shell forks twice (once for cat
>and once for more).  Each child sets its process group to the jobid,
>which is cat's process id.  The first child sets the terminal process
>group to the same thing.  However, there's nothing to guarantee that
>the first child sets the terminal process group before the second child
>starts running, and perhaps once in 20 times it doesn't.  In these
>cases the ioctls performed by more cause a SIGTTOU.

Yes - this is problem that I found.  And this is one of the reasons why
SunOS 4.0 csh still uses vfork even though fork now uses copy-on-write.

>Presumably using vfork() forces things to happen in the right order.

Exactly - when using vfork the child process gets to run first and
"borrow the address space" of the parent until the child exec's or
exit's.  After the child exec's or exit's, the parent gets to run after
it gets its address space back from the child process.

I think that the general lesson to be learned here is to not introduce
"temporary hack system calls" because it can be hard to later get rid
of them because some important program(s) either accidentally or
consciencely depending on the (subtle effects of that) hack.

Joseph Moran
Legato Systems Inc.
260 Sheridan Avenue
Palo Alto, CA  94306
(415) 329-7886
mojo@legato.com or {sun,uunet}!legato!mojo

terry@sunquest.UUCP (Terry Friedrichsen) (08/18/89)

In article <920@legato.LEGATO.COM>, mojo@legato (Joseph Moran) writes:
> Folklore
> has it that vfork was created solely for csh because of the performance
> costs of csh doing Unix fork's in a paged environment without
> copy-on-write.

And WHY was there no copy-on-write?  In "Design and Implementation of
4.3 BSD" (title paraphrased from memory), the authors write that copy-on-write
was considered and abandoned because a microcode bug in one model of VAX
made it questionable that copy-on-write could be reliably implemented.

They don't identify the model, though, so it's hard to say whether it would
have been better to write off that VAX instead of writing vfork().

Terry R. Friedrichsen
TERRY@SDSC.EDU  (alternate address)

Disclaimer:  the company doesn't read my mail, so it can't possibly know
		what I'm saying!

chris@mimsy.UUCP (Chris Torek) (08/19/89)

In article <184@sunquest.UUCP> terry@sunquest.UUCP (Terry Friedrichsen) writes:
>And WHY was there no copy-on-write?  In "Design and Implementation of
>4.3 BSD" (title paraphrased from memory), the authors write that copy-on-write
>was considered and abandoned because a microcode bug in one model of VAX
>made it questionable that copy-on-write could be reliably implemented.
>
>They don't identify the model, though, so it's hard to say whether it would
>have been better to write off that VAX instead of writing vfork().

As I heard it, the model was the 750, and it had something to do with
peculiar addressing modes and stack pages (maybe something like movc3 with
an (sp)+ argument :-) ).  Instruction restart after a fault did not
always work right.

If this is correct, it would explain why not `write off that VAX': until
recently, monet.berkeley.edu (a VAX-11/750) was *the* BSD development
machine.  (It also had---and still has---only 2 MB of memory.  The
new development machine is about 6 VAX MIPS---a 750 is about .6 or .7
---and has 16 MB.  Somehow I suspect 4.4BSD will have a more cavalier
attitude towards CPU and memory usage :-/ . . . .)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris