grr@cbmvax.cbm.UUCP (George Robbins) (12/25/86)
In article <1269@ncc.UUCP> lyndon@ncc.UUCP (Lyndon Nerenberg) writes: >> >This problem seems to be generic to Ultrix UUCP. I have a h*** of a time >passing traffic to systems running it. We run CTIX (Sys_V), and I've also >tried it from V7 without much luck. It does seem to talk to 4.2 quite >nicely. > >Again, in our case the big problem seems to be TIMEOUTs. We can usually >push between 20 and 40 packets across, then the sender (us) starts timing >out and resending. After a while it just gives up. > >One day (in frustration), I compiled the code on a non-VAX machine and >tried it. Same result, so it doesn't look like it's a hardware related >problem. > >Like the man said, HELP! >-- >Lyndon Nerenberg (VE6BBM) Systems Group - A Div. of Nexus Computing Corp. I'm glad other people are wondering if ultrix has uucp problems. It's easy to convince oneself that there is some kind of problem in ultrix uucp, it's a hell of a lot harder to prove it. It smells a lot like a problem in error recovery somewhere. We see the same timeout symptom. Of course this sort of thing only manifests itself when your phone lines get marginal to start with. I've also seen it on a direct connection between our ultrix system and a box running SVr1 uucp. I've tried the Ultrix 1.1 and 1.2 uucp's, no particular difference. I've patched uucico to bump the retry count from 10 to 100+, not much help. I've checked the object code against some of the reported 4.3 uucp bugs without finding anything obvious. Anybody other ultrix users whose neighbors are getting fed up with them? Anybody with ultrix source that can do some diff's against 4.2 bsd uucp? Anybody who has a program to analyze traces of uucp protocol exchanges? I was hoping Fred Avolino would have a magic answer, but it looks like he's not having problems... -- George Robbins - now working for, uucp: {ihnp4|seismo|rutgers}!cbmvax!grr but no way officially representing arpa: cbmvax!grr@seismo.css.GOV Commodore, Engineering Department fone: 215-431-9255 (only by moonlite)
grr@cbmvax.cbm.UUCP (George Robbins) (12/27/86)
In article <1176@cbmvax.cbmvax.cbm.UUCP> grr@cbmvax.UUCP (George Robbins) writes: >I'm glad other people are wondering if ultrix has uucp problems. It's easy to >convince oneself that there is some kind of problem in ultrix uucp, it's a hell >of a lot harder to prove it. Nothing like spending XMAS over a hot protocol analyzer... Anyway, it looks like Ultrix does have a problem in the protocol that makes certain kinds of errors result in no-recovery situation. ========= Here's what happens: Ultrix sends packet N. Other sends RR packet N to acknowledge. Ultrix sends a packet N+1. Packet N+1 falls in bit-bucket. (sync char gets trashed) Other times out. Other sends RR packet N to acknowledge last packet seen. Ultrix sees packet N already acknowledged - sends nothing!!! Other times out. Other sends RR packet N to acknowledge last packet seen. Ultrix sees packet N already acknowledged - sends nothing!!! . . . Other decides retry count exceeded - gives up. Ultrix times out - fatal. ========== The other system can't do anything about packet N+1 because it's never seen it. If it had and there was an error in it, it could have rejected it, but *only* if it recognizes the bad packet. Ultrix is happy because the other keeps sending it nice acknowledgment packets, even if they don't say anything new. It doesn't time out or get errors because they are such nice packets. ========== Now the kicker - as far as I can tell this is the way both Berkeley and AT&T unix play the game! It looks pretty easy to fix, but am I missing something? Just add a test in pksack that says if you get an ack for a packet already acknowledged, and you've already sent another packet, then set the retransmit flag. Well, am I confused? Am I the 91'st person to discover this problem this year? Any uu.experts out there who want to comment? -- George Robbins - now working for, uucp: {ihnp4|seismo|rutgers}!cbmvax!grr but no way officially representing arpa: cbmvax!grr@seismo.css.GOV Commodore, Engineering Department fone: 215-431-9255 (only by moonlite)
milo@ndmath.UUCP (12/29/86)
As long as everyone else is venting on Ultrix uucp...I had a serious problem with a system here. Apparently the privleges of the version I had were somehow messed up...I created all the uucp files and permitted all the directories as they are supposed to be (I have done several UUCP setups on other machines) but the ULTRIX UUCP would error out on every file transfer claiming that the remote machine didn't have enough privleges. I got the security features in the UUCP down as low as they would go and still had trouble. Anyone else have this problem? And how did you solve it? Greg Corson seismo!iuvax!kangaro!milo
rick@seismo.CSS.GOV (Rick Adams) (12/30/86)
Yes, It's a bug. It was fixed in 4.3BSD. Here is a rough idea of how it was fixed (Jim Bloom found and fixed this one). I'm not sure that the test for Reacks need to wait for 4. I think 2 would probably be adequate. However, Jim may know of a case I don't. As a general rule, the 4.3bsd 'g' protocol driver is in better shape than ANY uucp available (including Honey DanBer, [gasp]). At a mimimum, it's at least readable (cryptic, but readable) ---rick 4.2BSD pk0.c: case RJ: pk->p_state |= RXMIT; pk->p_msg |= M_RR; case RR: pk->p_rpr = val; ! if (pksack(pk)==0) { ! WAKEUP(&pk->p_ps); } break; 4.3BSD pk0.c: case RJ: pk->p_state |= RXMIT; pk->p_msg |= M_RR; + pk->p_rpr = val; + (void) pksack(pk); + break; case RR: pk->p_rpr = val; ! if (pk->p_rpr == pk->p_ps) { ! DEBUG(9, "Reack count is %d\n", ++Reacks); ! if (Reacks >= 4) { ! DEBUG(6, "Reack overflow on %d\n", val); ! pk->p_state |= RXMIT; ! pk->p_msg |= M_RR; ! Reacks = 0; ! } ! } else { ! Reacks = 0; ! (void) pksack(pk); } break;