[comp.sys.concurrent] read on a pipe vs. SIGCHLD

das@harvard.harvard.edu (David Steffens) (02/12/90)

The following article was submitted to comp.unix.wizards a week or so ago.
As I have not yet had a response, I thought it might be worth posting
to comp.sys.masscomp since the problem was found on our Concurrent SLS5450.
----------
This problem was found when porting a version of TCSH
which works under 4.3bsd (uVAX II, VAX750) and SunOS4.0.3 (SUN4)
to RTU4.1A on a Concurrent (nee Masscomp) SLS5450.
Compilation was done in the bsd (as opposed to att) universe.

The problem appeared in backquote evaluation --
	echo `tty`
would fail (nothing printed) about 15-20% of the time.
Concurrent's own version of CSH always worked.

On every other system I tested, the attached program always prints:
	/dev/ttyXX
	SIGCHLD
Well, almost always.  On a SUN4 about 20% of the time it will print:
	SIGCHLD
	/dev/ttyXX
But the net result is always the same -- the parent reliably reads
and prints the message from its child.

Under RTU4.1A (bsd universe) on a Concurrent SLS5450,
things are quite different.  The following results are obtained
in the indicated proportions based on about 1000 repetitions:
85%	/dev/ttyp0
	SIGCHLD

10%	/dev/ttyp0
	SIGCHLD
	backq: read error -- Interrupted system call

5%	SIGCHLD
	backq: read error -- Interrupted system call

once	/dev/ttyp0
	SIGCHLD
	backq: wait -- Interrupted system call

It seems that the SIGCHLD from the child is occasionally interrupting
one of the two pipe reads in the parent.  If the second read is interrupted,
nothing is lost because the read loop handles EOF and error similarly.
But if the first read is interrupted, the loop terminates prematurely
and the message is lost.  Lest you think this is just an academic exercise,
this code is intended to simulate 4.3bsd csh/sh.glob.c quite closely.
Instrumentation installed directly in tcsh/sh.glob.c gives similar results.

The RTU4.1A signal stuff appears to be a mish-mash of _three_ different
implementations -- the original v7 implementation as carried over to SYSV,
an early 4.1bsd (!) implementation, and a recent 4.2bsd implementation.
The latter is the one I'm _supposed_ to get when in the bsd universe.
The _behavior_ seems more like the old v7 implementation, however!

Is it proper behavior for SIGCHLD to interrupt a pipe read?
Can this occur on other flavors of UNIX, or is RTU just plain brain-damaged?
Assuming no bug fix from Concurrent, there seem to be two work-arounds:
	1) block SIGCHLD before entering the read loop; release after EOF.
	2) On read error, retry the read if errno == EINTR (aka v7!)
Which is to be preferred?  Is there anything better?  advTHANKSance.
----------
/* backq.c -- simulate csh/tcsh backquote processing, DAS JAN-90 */
#include <stdio.h>
#include <signal.h>
#include <sys/wait.h>

char *whoami;

#ifdef masscomp
void exit();
#endif

#ifdef sun
void
#endif
catch()
{
	(void) printf("SIGCHLD\n");
}

main(argc, argv)
	int argc;
	char *argv[];
{
	int n, pid, pfd[2];
	char buf[BUFSIZ];
	extern char **environ;

	whoami = *argv++;
	--argc;

	(void) signal(SIGCHLD, catch);

	if (pipe(pfd) < 0)
		fatalperror("can't open pipe");

	if ((pid = fork()) < 0)
		fatalperror("can't fork");

	if (pid == 0) { /* child */
		(void) close(pfd[0]);
		if (pfd[1] != 1) {
			(void) close(1);
			(void) dup(pfd[1]);
		}

		(void) execle("/bin/tty", "tty", 0, environ);
		(void) execle("/usr/bin/tty", "tty", 0, environ);
		fatalperror("can't exec");
	}

	/* parent */
	(void) close(pfd[1]);
	if (pfd[0] != 0) {
		(void) close(0);
		(void) dup(pfd[0]);
	}

	do {
		n = read(0, buf, sizeof(buf));

		if (n < 0)
			fatalperror("read error");
		else if (n > 0)
#ifdef masscomp
			(void) write(1, buf, (unsigned)n);
#else
			(void) write(1, buf, n);
#endif
	} while (n > 0);

	n = wait((union wait *)0);
	if (n < 0)
		fatalperror("wait");
	if (n != pid)
		(void) printf("%s: expecting pid %d, got pid %d\n",
			whoami, pid, n);

	exit(0);
	/* NOTREACHED */
}

/* Print system error message and die. */
fatalperror(msg)
	char *msg;
{
	extern int errno;
	extern char *sys_errlist[];
	char *syserr = sys_errlist[errno];

	(void) printf("%s: %s", whoami, msg);
	(void) printf(" -- %s\n", syserr);
	exit(1);
	/* NOTREACHED */
}
----------
{harvard,mit-eddie,think}!eplunix!das	David Allan Steffens
243 Charles St., Boston, MA 02114	Eaton-Peabody Laboratory
(617) 573-3748				Mass. Eye & Ear Infirmary

Articles to: concurrent@soma.bcm.tmc.edu or uunet!soma.bcm.tmc.edu!concurrent
Administrative stuff: concurrent-request@soma.bcm.tmc.edu
Stan Barber, Moderator