[comp.unix.wizards] UUCP [ultrix] guru needed!

grr@cbmvax.cbm.UUCP (George Robbins) (12/25/86)

In article <1269@ncc.UUCP> lyndon@ncc.UUCP (Lyndon Nerenberg) writes:
>> 
>This problem seems to be generic to Ultrix UUCP. I have a h*** of a time
>passing traffic to systems running it. We run CTIX (Sys_V), and I've also
>tried it from V7 without much luck. It does seem to talk to 4.2 quite
>nicely.
>
>Again, in our case the big problem seems to be TIMEOUTs. We can usually
>push between 20 and 40 packets across, then the sender (us) starts timing
>out and resending. After a while it just gives up.
>
>One day (in frustration), I compiled the code on a non-VAX machine and
>tried it. Same result, so it doesn't look like it's a hardware related
>problem.
>
>Like the man said, HELP!
>-- 
>Lyndon Nerenberg (VE6BBM)      Systems Group - A Div. of Nexus Computing Corp.  
I'm glad other people are wondering if ultrix has uucp problems.  It's easy to
convince oneself that there is some kind of problem in ultrix uucp, it's a hell
of a lot harder to prove it.

It smells a lot like a problem in error recovery somewhere.  We see the same
timeout symptom.  Of course this sort of thing only manifests itself when
your phone lines get marginal to start with.  I've also seen it on a direct
connection between our ultrix system and a box running SVr1 uucp.  

I've tried the Ultrix 1.1 and 1.2 uucp's, no particular difference.  I've
patched uucico to bump the retry count from 10 to 100+, not much help.  I've
checked the object code against some of the reported 4.3 uucp bugs without
finding anything obvious.

Anybody other ultrix users whose neighbors are getting fed up with them?
Anybody with ultrix source that can do some diff's against 4.2 bsd uucp?
Anybody who has a program to analyze traces of uucp protocol exchanges?

I was hoping Fred Avolino would have a magic answer, but it looks like he's
not having problems...
-- 
George Robbins - now working for,	uucp: {ihnp4|seismo|rutgers}!cbmvax!grr
but no way officially representing	arpa: cbmvax!grr@seismo.css.GOV
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)

grr@cbmvax.cbm.UUCP (George Robbins) (12/27/86)

In article <1176@cbmvax.cbmvax.cbm.UUCP> grr@cbmvax.UUCP (George Robbins) writes:
>I'm glad other people are wondering if ultrix has uucp problems.  It's easy to
>convince oneself that there is some kind of problem in ultrix uucp, it's a hell
>of a lot harder to prove it.

Nothing like spending XMAS over a hot protocol analyzer...

Anyway, it looks like Ultrix does have a problem in the protocol that makes
certain kinds of errors result in no-recovery situation.
=========
Here's what happens:

Ultrix sends packet N.
Other sends RR packet N to acknowledge.
Ultrix sends a packet N+1.
Packet N+1 falls in bit-bucket. (sync char gets trashed)
Other times out.
Other sends RR packet N to acknowledge last packet seen.
Ultrix sees packet N already acknowledged - sends nothing!!!
Other times out.
Other sends RR packet N to acknowledge last packet seen.
Ultrix sees packet N already acknowledged - sends nothing!!!
.
.
.
Other decides retry count exceeded - gives up.
Ultrix times out - fatal.
==========
The other system can't do anything about packet N+1 because it's
never seen it.  If it had and there was an error in it, it could
have rejected it, but *only* if it recognizes the bad packet.

Ultrix is happy because the other keeps sending it nice acknowledgment
packets, even if they don't say anything new.  It doesn't time out or
get errors because they are such nice packets.
==========

Now the kicker - as far as I can tell this is the way both Berkeley and
AT&T unix play the game!  It looks pretty easy to fix, but am I missing
something?  Just add a test in pksack that says if you get an ack for
a packet already acknowledged, and you've already sent another packet,
then set the retransmit flag.

Well, am I confused?  Am I the 91'st person to discover this problem this
year?  Any uu.experts out there who want to comment?

-- 
George Robbins - now working for,	uucp: {ihnp4|seismo|rutgers}!cbmvax!grr
but no way officially representing	arpa: cbmvax!grr@seismo.css.GOV
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)

milo@ndmath.UUCP (12/29/86)

As long as everyone else is venting on Ultrix uucp...I had a serious problem
with a system here.

Apparently the privleges of the version I had were somehow messed up...I 
created all the uucp files and permitted all the directories as they are
supposed to be (I have done several UUCP setups on other machines) but the
ULTRIX UUCP would error out on every file transfer claiming that the remote
machine didn't have enough privleges.  I got the security features in the
UUCP down as low as they would go and still had trouble.

Anyone else have this problem?  And how did you solve it?

Greg Corson
seismo!iuvax!kangaro!milo

rick@seismo.CSS.GOV (Rick Adams) (12/30/86)

Yes, It's a bug. It was fixed in 4.3BSD. Here is a rough idea
of how it was fixed (Jim Bloom found and fixed this one).

I'm not sure that the test for Reacks need to wait for 4. I think 2
would probably be adequate. However, Jim may know of a case I don't.

As a general rule, the 4.3bsd 'g' protocol driver is in better shape
than ANY uucp available (including Honey DanBer, [gasp]). At a mimimum, it's
at least readable (cryptic, but readable)

---rick

4.2BSD pk0.c:
  	case RJ:
  		pk->p_state |= RXMIT;
  		pk->p_msg |= M_RR;
  	case RR:
  		pk->p_rpr = val;
! 		if (pksack(pk)==0) {
! 			WAKEUP(&pk->p_ps);
  		}
  		break;

4.3BSD pk0.c:
  	case RJ:
  		pk->p_state |= RXMIT;
  		pk->p_msg |= M_RR;
+ 		pk->p_rpr = val;
+ 		(void) pksack(pk);
+ 		break;
  	case RR:
  		pk->p_rpr = val;
! 		if (pk->p_rpr == pk->p_ps) {
! 			DEBUG(9, "Reack count is %d\n", ++Reacks);
! 			if (Reacks >= 4) {
! 				DEBUG(6, "Reack overflow on %d\n", val);
! 				pk->p_state |= RXMIT;
! 				pk->p_msg |= M_RR;
! 				Reacks = 0;
! 			}
! 		} else {
! 			Reacks = 0;
! 			(void) pksack(pk);
  		}
  		break;