[net.bugs.uucp] dual -> ihnp4 UUCP lossage on 10/17

fair@dual.UUCP (Erik E. Fair) (10/18/84)

Last night when ihnp4 called us, we had 17 messages queued for them.
Something was wrong with the UUCP spool directory on ihnp4 (this
surmised from the LOGFILE and the nasty notes I got back from UUCP) and
ihnp4 rejected all attempts to send it stuff with ``access to remote
path/file denied.'' As a result, all 17 messages were dumped down the
proverbial bit bucket.  While the problem appears to have been fixed
relatively quickly (they called us again this morning, and all went
well; a tribute to Gary Murakami and the other people who run ihnp4),
17 people somewhere are going to wonder who dropped their mail on the
floor.

This is one of the reasons that UUCP is considered ``unreliable.''
When something goes wrong in the transport, there is no way to get
back to the original requestor if the transaction was mail, because
the mail program is not an active part of the actual transport.

Now for Peter Honeyman: since ihnp4 is running honey danber UUCP, the
question comes up about the UUCP spool directory and what UUCP should do
when it isn't accessable for some reason. Currently, when you try and
initiate a transfer, UUCP will check the path to see if it can write on
it (both through access(2) and the USERFILE). If the path is
inaccessable, UUCP will send an error code back to the requesting
remote, which will in turn delete the request and mail a note back to
who it thinks is the requestor that the transfer failed. For actual
``uucp'' requests, this is reasonable. For mail (uux), it sucks.

My inclination is to say that if the requesting side gets back an error
on path/file access, and the destination was the remote system's UUCP
spool directory, UUCP should:

	1) notify ``uucp'' by mail that a request failed.
		(i.e. put a human being in the loop)

	2) Keep the request around, on the assumption that the remote
		has a temporary problem (consequently trying again later)

Given this view of things, clearly my System V UUCP was at fault (for
deleting the requests). I'm curious though, what honey danber would do
if the positions had been reversed?

	Erik E. Fair	ucbvax!fair	fair@ucb-arpa.ARPA

	dual!fair@BERKELEY.ARPA
	{ihnp4,ucbvax,hplabs,decwrl,cbosgd,sun,nsc,apple,pyramid}!dual!fair
	Dual Systems Corporation, Berkeley, California

P.S.	This can also be considered notice that other systems might have
	had problems with UUCP to ihnp4 last night.

mp@allegra.UUCP (Mark Plotnick) (10/20/84)

Here's a quick and dirty modification to the 4.2bsd uucp that will save
failed mail files in /usr/spool/uucp/failed.  It has come in
handy when ihnp4 and several other machines got their directory protections
screwed up and refused all file transfers.

*** cntrl.c.bsd	Sun Feb 19 10:41:49 1984
--- cntrl.c	Wed Apr 18 20:11:04 1984
***************
*** 607,612
  	sprintf(str, "file %s, system %s\n%s\n",
  		file, sys, msg);
  	mailst(user, str, "");
  	return;
  }
  

--- 607,615 -----
  	sprintf(str, "file %s, system %s\n%s\n",
  		file, sys, msg);
  	mailst(user, str, "");
+ 	/* mp: save failed files */
+ 	if (*msgcode == 'N' && strncmp(file, "D.", 2)==0)
+ 		xcp(file, "/usr/spool/uucp/failed");
  	return;
  }
  

gjm@ihnp4.UUCP (Gary J. Murakami) (10/21/84)

The problem was not the fault of ihnp4, nor of uucp on ihnp4.
The problem was my fault, and I apologize for the loss of mail.

Due to a heavier than usual load of mail on ihnp4, I turned on an
experimental "Distributed UUCP" to help drain the queue on ihnp4.  This
experiment runs on ihnp1, adding extra CPU cycles and device bandwidth
while using a Newcastle type of Distributed UNIX to access the spool on
ihnp4.

This works fine for master transfer of files out, but there are problems
with permissions after switching roles to slave.  With System V, an
intricate procedure for suid programs is necessary to create
directories.  The remote file server doesn't quite handle this
correctly.

This only ran for one pass at the queue on ihnp4, so the number of
systems affected should have been small (sorry, Erik).  However I'll
find and fix the problem before using this again on real work.

-Gary