[net.bugs.uucp] Bizarre code in UUCP

mcg@tekecs.UUCP (06/14/83)

I just finished battling a strange problem related to UUCP over 4.1a TCP/IP
sockets that was caused by some code in 'cico.c' which I could not
fully understand the reasoning behind. (Jump to the end of the message
to see the question and my fix).


The problem is this:

	Symptoms:

	Two systems are talking to each other over a TCP/IP channel.
	It becomes clear that when one system is lightly loaded and
	another is heavily loaded, the LCK..sysname lock file was
	left on the loaded system, preventing further polling until
	it was manually removed.

	Cause:
	The following two pieces of code are the culprits:

	
	/* the very end of the mainline in cico.c ... */

	alarm(MAXMSGTIME);
	omsg('O', "OOOOO", Ofn);
	DEBUG(4, "send OO %d,", ret);
	if (!setjmp(Sjbuf)) {
		for (;;) {
			omsg('O', "OOOOO", Ofn);
			ret = imsg(msg, Ifn);
			if (ret != 0)
				break;
			if (msg[0] == 'O')
				break;
		}
	}
	alarm(0);


	There is a window between the two omsg() calls during which
	the lightly loaded system may have sent BOTH his omsg()'s
	and called imsg(). He (the lightly loaded system) gets the
	"OOOOOO" from the heavily loaded system from the first
	omsg() call, and exits, implicitly closing his end of the
	channel.

	In the meantime, the heavily loaded system has finally gotten
	around to executing the second omsg() call, and gets an error
	because there is nothing/nobody to write to. In 4.1A, writing
	to a socket which the other end has closed causes a SIGPIPE!

	UUCP doesn't catch SIGPIPE, and uucico dies suddenly, silently,
	and mysteriously, without a chance to clean up.

	Second Problem:

	Assume that the above problem didn't occur, or was fixed.
	After the ending handshake, the routine cleanup() was called.
	There is some code in cleanup() as follows:

	cleanup(code)
	int code;
	{
		....
	/* toward the end of cleanup(), in cico.c */

		if (Role == MASTER) {
			write(Ofn, EOTMSG, strlen(EOTMSG));
		}

	There is the same problem here, i.e. uucico is writing to
	a neighbor who may very well be dead and gone. A SIGPIPE
	will occur here as well, if implicit delays have allowed the
	other side to actually close.


	THE QUESTION(S):

	1) Why does uucico LOOP, sending "OOOOOO"'s to each other?
	What's the point?

	2) Why is there an initial call to omsg(), when it is immediately
	called again, right before the imsg()? Is this neccesary?

	3) In cleanup(), what is the real purpose of the EOTMSG?
	Is this intended to cause the other system to turn the line off?
	Is it really needed?

	My Fix:

	A bit of a kludge, I'm afraid. I was afraid to change the code
	under normal circumstances, fearing I would introduce an
	unforseen incompatibility with other uucico's. Thus, I merely
	conditionally execute the first omsg('O', "OOOOO"), and the
	write(..., EOTMSG), executing them ONLY on a NON-TCP/IP channel.
	This solved my problem.

	Also, to some extent this problem is caused by silly 4.1A
	sending SIGPIPE in these circumstances, which seems completely
	unreasonable.

Does anyone have any answers to my questions? My feeling is that it
is hisorical (hysterical) accident.

S. McGeady
{decvax,ucbvax,zehntel}!tektronix!tekecs!mcg

P.s:
	I am copying this to the unix-wizards and bugs.uucp lists.
	Feel free to reply to the list as well as me if you think others
	would be interested.