das@eplunix.UUCP (David Steffens) (01/11/90)
This problem was found when porting a version of TCSH which works under 4.3bsd (uVAX II, VAX750) and SunOS4.0.3 (SUN4) to RTU4.1A on a Concurrent (nee Masscomp) SLS5450. Compilation was done in the bsd (as opposed to att) universe. The problem appeared in backquote evaluation -- echo `tty` would fail (nothing printed) about 15-20% of the time. Concurrent's own version of CSH always worked. On every other system I tested, the attached program always prints: /dev/ttyXX SIGCHLD Well, almost always. On a SUN4 about 20% of the time it will print: SIGCHLD /dev/ttyXX But the net result is always the same -- the parent reliably reads and prints the message from its child. Under RTU4.1A (bsd universe) on a Concurrent SLS5450, things are quite different. The following results are obtained in the indicated proportions based on about 1000 repetitions: 85% /dev/ttyp0 SIGCHLD 10% /dev/ttyp0 SIGCHLD backq: read error -- Interrupted system call 5% SIGCHLD backq: read error -- Interrupted system call once /dev/ttyp0 SIGCHLD backq: wait -- Interrupted system call It seems that the SIGCHLD from the child is occasionally interrupting one of the two pipe reads in the parent. If the second read is interrupted, nothing is lost because the read loop handles EOF and error similarly. But if the first read is interrupted, the loop terminates prematurely and the message is lost. Lest you think this is just an academic exercise, this code is intended to simulate 4.3bsd csh/sh.glob.c quite closely. Instrumentation installed directly in tcsh/sh.glob.c gives similar results. The RTU4.1A signal stuff appears to be a mish-mash of _three_ different implementations -- the original v7 implementation as carried over to SYSV, an early 4.1bsd (!) implementation, and a recent 4.2bsd implementation. The latter is the one I'm _supposed_ to get when in the bsd universe. The _behavior_ seems more like the old v7 implementation, however! Is it proper behavior for SIGCHLD to interrupt a pipe read? Can this occur on other flavors of UNIX, or is RTU just plain brain-damaged? Assuming no bug fix from Concurrent, there seem to be two work-arounds: 1) block SIGCHLD before entering the read loop; release after EOF. 2) On read error, retry the read if errno == EINTR (aka v7!) Which is to be preferred? Is there anything better? advTHANKSance. -------------------- /* backq.c -- simulate csh/tcsh backquote processing, DAS JAN-90 */ #include <stdio.h> #include <signal.h> #include <sys/wait.h> char *whoami; #ifdef masscomp void exit(); #endif #ifdef sun void #endif catch() { (void) printf("SIGCHLD\n"); } main(argc, argv) int argc; char *argv[]; { int n, pid, pfd[2]; char buf[BUFSIZ]; extern char **environ; whoami = *argv++; --argc; (void) signal(SIGCHLD, catch); if (pipe(pfd) < 0) fatalperror("can't open pipe"); if ((pid = fork()) < 0) fatalperror("can't fork"); if (pid == 0) { /* child */ (void) close(pfd[0]); if (pfd[1] != 1) { (void) close(1); (void) dup(pfd[1]); } (void) execle("/bin/tty", "tty", 0, environ); (void) execle("/usr/bin/tty", "tty", 0, environ); fatalperror("can't exec"); } /* parent */ (void) close(pfd[1]); if (pfd[0] != 0) { (void) close(0); (void) dup(pfd[0]); } do { n = read(0, buf, sizeof(buf)); if (n < 0) fatalperror("read error"); else if (n > 0) #ifdef masscomp (void) write(1, buf, (unsigned)n); #else (void) write(1, buf, n); #endif } while (n > 0); n = wait((union wait *)0); if (n < 0) fatalperror("wait"); if (n != pid) (void) printf("%s: expecting pid %d, got pid %d\n", whoami, pid, n); exit(0); /* NOTREACHED */ } /* Print system error message and die. */ fatalperror(msg) char *msg; { extern int errno; extern char *sys_errlist[]; char *syserr = sys_errlist[errno]; (void) printf("%s: %s", whoami, msg); (void) printf(" -- %s\n", syserr); exit(1); /* NOTREACHED */ } -- {harvard,mit-eddie,think}!eplunix!das David Allan Steffens 243 Charles St., Boston, MA 02114 Eaton-Peabody Laboratory (617) 573-3748 Mass. Eye & Ear Infirmary