wales@ucla-cs.UUCP (02/09/85)
Here at UCLA, we have noted a problem with ^Z and processes that change the terminal modes. We are running LOCUS (a distributed system built on top of 4.1BSD), but the problem in question seems to exist in vanilla 4.1 as well. I would like to know whether anyone else has seen this problem and can propose a fix for it. The problem happens when you fork a process from /bin/csh, and that pro- cess in turn forks a second process which sets non-standard TTY modes. (For example, a user runs a mail-sending program, which in turn forks a full-screen text editor.) The symptom of the problem is that, when you type a ^Z to suspend the pair of processes and go back to your shell, the TTY modes are occasion- ally not restored properly. Instead, you sometimes end up back in the shell with the same "non-standard" modes as were set by the "grandchild" process (e.g., "cbreak" mode with echo off). If you then put the sus- pended process group back into the foreground (via "%" or "fg"), it will immediately stop again, and this time you are returned to the shell with the correct TTY modes set. Here is my theory as to what is happening. (For convenience' sake, I will call the process directly forked from the shell "process A", and the second process -- forked from process A -- I will call "process B".) (1) Process "A" does not have a signal-catcher for the SIGTSTP signal (which is what ^Z generates). Process "B", on the other hand, has a signal-catcher for SIGTSTP which restores the original TTY modes before stopping. Presumably, process "B" uses code similar to that shown in the "jobs(3)" manual article. (2) When the user types a ^Z, both process "A" and process "B" receive a SIGTSTP signal. (a) When "B" gets a SIGTSTP, it restores the TTY modes and throws another SIGTSTP to both itself and "A" (via a "kill(0,SIGTSTP)" call). This means that "A" is getting SIGTSTP'ed twice -- once from the ^Z, and once from "B" -- but I don't THINK this should be caus- ing any problems here, since both signals should get recorded before "A" gets scheduled again. (b) When "A" gets a SIGSTSP, it stops right away (since it has no SIGTSTP signal-catcher). (c) The shell catches the SIGCHLD signal which is generated when "A" stops, notes that "A" has stopped (the shell, of course, neither knows nor cares anything about "B"), and reclaims the TTY via a TIOCSPGRP "ioctl" call. Ideally, process "B" should restore the TTY modes before anything else happens. HOWEVER, if the scheduler starts up "A" before "B" -- AND subsequently starts up the shell to process the SIGCHLD signal before "B" can change the modes -- then the shell is left with B's funny TTY modes. ("B" in this case will get slapped with a SIGTTOU signal when it tries to change the TTY modes, because it is no longer in the TTY's controlling process group once the shell starts up again. This SIGTTOU, by the way, appears to get sent irrespec- tive of whether "stty tostop" is in effect on the TTY in question.) (3) When the user puts "A" back into the foreground (via "%" or "fg"): (a) Process "A", at the time the ^Z was typed, was waiting for "B" to finish -- and once restarted, it will resume said waiting. (b) Process "B" will go ahead and change the TTY modes back to their original state, and then send a SIGTSTP to its process group -- stopping both itself and "A". (c) The shell will get a SIGCHLD signal, discover that its child "A" has stopped, and start up again. This time, however, the TTY modes will be correct. MY QUESTIONS FOR THE GROUP: (1) Has anyone else (on either a 4.1 or a 4.2 system) ever seen this problem? (2) Is the above analysis of what is causing the problem correct? If not, what in fact is happening? (3) Has anyone ever fixed their kernel to get around the problem? If there is a fix in 4.2, could someone please describe the basic idea behind the fix and point me to the appropriate part of the kernel? (I do have access to the 4.2 sources, by the way.) (4) Does anyone know of a way to fix the application program(s) in ques- tion (process "B" in my example above -- the program that plays with the TTY modes) so that this problem will not occur, even assuming no changes to the 4.1 kernel? Please, by the way -- no flames about why we aren't running 4.2. When the LOCUS project was started, 4.2 was naught but a gleam in Bill Joy's eye :-} -- and when 4.2 did finally come out, LOCUS was already in pro- duction, and the effort required to reimplement it on top of 4.2 would have been infeasibly monumental. As a result, comments of the form This was fixed in 4.2, so what's the problem? are useful only to the extent that the fix can be back-ported into our existing, 4.1-based LOCUS system. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Rich Wales University of California, Los Angeles (UCLA) Computer Science Department 3531 Boelter Hall Los Angeles, California 90024 // USA -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Phone: (213) 825-5683 // +1 213 825 5683 ARPANET: wales@UCLA-LOCUS.ARPA UUCP: ...!{cepu,ihnp4,trwspp,ucbvax}!ucla-cs!wales -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
lepreau@utah-cs.UUCP (Jay Lepreau) (02/12/85)
1. Yes, this problem exists on both 4.1 and 4.2 and I've seen it (in "ded"). 2. Your analysis is pretty much what I came up with (but not quite so clearly!). 3. I wouldn't really call it a kernel problem, though, because: 4. I solved it in my application by disabling t_*suspc when in a funny tty mode in the child (B). This allows the child to read the ^Z itself, reset its tty modes, and then issue the TSTP to the pgrp, avoiding having the kernel issue the TSTP to the whole pgrp, with the attendant problems. (Note that this problem doesn't occur in raw mode for the same reason my solution works, so all those raw mode screen applications don't see it. Must be cbreak.) Anyone have other ways? Jay Lepreau, lepreau@utah-cs, {ihnp4,decvax}!utah-cs!lepreau