[ont.uucp] help! uucp losing files

dave@lsuc.uucp (David Sherman) (01/27/88)

Our uucp seems to be losing files fairly often.  It's a v7
uucp, to which we have source and into which I've hacked in
most of the bug fixes posted to the net over the years.

Scenario: mail is sent to remote!whatever!whoever. uucp
constructs	C.remoteA1234	- with the instructions
		D.remoteB1233	- with the mail
		D.lsucX4059	- with the "rmail whatever!whoever" details

Now, the mail is a largish file. (Doesn't have to be huge for
this to happen.)  During the transfer, uucico craps out with
	BAD READ (expected 'S' got FAIL)
where the S is sometimes a C.  OK, so maybe it's a phone line
problem.  It's not reproducible, though it happens several times
a day with different sites (we shuffle several Mb a day around).

So, it should try again next time, right?  But on the next call, I
get mail from uucp telling me "file D.remoteB1233 can't access" on
lsuc.  Yes indeedy, uucico is REMOVING the file even though the
transfer didn't succeed.

Anyone know why? I delved into cntrl.c, where there are numerous
calls to unlinkdf(), but as far as I can make out the code under
"case SNDFILE:" is correct.

Incidentally, this problem is affecting news as well; I suspect
some news batches may not be making it downstream from us for
the same reason.  It happens with numerous different sites, so
it's not the fault of any one site we talk to.

Any help would be greatly appreciated.

David Sherman
The Law Society of Upper Canada
(416) 947-3466
-- 
{ uunet!mnetor  pyramid!utai  decvax!utcsri  ihnp4!utzoo } !lsuc!dave

dave@lsuc.uucp (David Sherman) (03/07/88)

Back on January 26 I posted a plea for help. Our uucp was
regularly failing on connections to many sites; following
the BAD READ failure, I'd get "file xxx can't access" mail on
the next connection.

The problem applies to v7-vintage uucico's, when talking to
4.3BSD uucp sites, as it turns out.

After working through all the replies and suggestions, and getting
particularly useful help from kwlalonde@watmath and rick@uunet,
I have the answer.  It's a two-part answer.

If you are running a v7-vintage uucp, you want fix (1), and
your neighbours running 4.3 want fix (2).  If you are running
4.3BSD uucp, you want fix (2).

(1) in v7 uucp sources, cntrl.c,

	if (msg[1] == Y) {
		...
		unlink(W_DFILE);
		RMESG(RQSTCMPT, msg);
		goto process;
	}

    Reverse the unlink and RMESG lines.  Reason:
    If the RMESG fails, the remote end may not have received W_DFILE,
    but it's too late.  Change it to call RMESG first, then do the unlink.

(2) From rick:
    Yes, there is a bug in the virgin 4.3bsd uucp in the 'g' protocol
    driver. It will produce the symptoms you describe.

*** pk1.c	Fri Nov  7 17:51:10 1986
--- ../nuucp/pk1.c	Sun Nov  2 21:12:49 1986
***************
*** 196,202 ****
  		return;
  	}
  	if (k && pksizes[k] == pk->p_rsize) {
! 		pk->p_rpr = (h->cntl >> 3) & MOD8;
  		pksack(pk);
  		bp = pk->p_ipool;
  		if (bp == NULL) {
--- 196,203 ----
  		return;
  	}
  	if (k && pksizes[k] == pk->p_rsize) {
! 		pk->p_rpr = h->cntl & MOD8;
! 		DEBUG(7, "end pksack 0%o\n", pk->p_rpr);
  		pksack(pk);
  		bp = pk->p_ipool;
  		if (bp == NULL) {


When I posted the query, I thought the problem "had" to be us
because it appeared when talking to all kinds of other sites.
I've now prevailed upon all of those sites (maccs, mnetor, utflis,
sickkids and some others) to patch their uucico's, and the problem has
entirely disappeared.

If you're running 4.3BSD, I strongly recommend you implement Rick's fix.
(Just feed this article through patch -d /usr/src/cmd/uucp and recompile.)

David Sherman
The Law Society of Upper Canada
Toronto
-- 
{ uunet!mnetor  pyramid!utai  decvax!utcsri  ihnp4!utzoo } !lsuc!dave