grr@cbmvax.UUCP (George Robbins) (01/28/88)
Distribution: Keywords: Now that we have this nice Ultrix group, I figure it's time to repost this little Ultrix uucp discussion from last January. The problem described was present in Ultrix 1.1 and 1.2. Someone from DEC (Marc T?) called me and from his comments it seemed that it was too late to get the fix into 2.0, and I don't really know that I even convinced him that there was a problem. I'm still running 1.2, so I can't be any more authoritative about later releases or verify that the patch at the tail end of this is applicable to other releases. I'm hoping the rumors of 2.2 by Valentines day are true, since I'm about ready to upgrade, although I'd have installed 4.3 BSD long ago, if only it came with DECnet and LAT support 8-)... I've been running the patched version on cbmvax (a 750, later 785) for a year with no particular problems, and it definitly changed things from absolutly wretched to tolerable. digested antiquities: From: grr@cbmvax.cbm.UUCP (George Robbins) Subject: Re: UUCP [ultrix] guru needed! Date: 25 Dec 86 01:47:51 GMT In article <1269@ncc.UUCP> lyndon@ncc.UUCP (Lyndon Nerenberg) writes: >> >This problem seems to be generic to Ultrix UUCP. I have a h*** of a time >passing traffic to systems running it. We run CTIX (Sys_V), and I've also >tried it from V7 without much luck. It does seem to talk to 4.2 quite >nicely. > >Again, in our case the big problem seems to be TIMEOUTs. We can usually >push between 20 and 40 packets across, then the sender (us) starts timing >out and resending. After a while it just gives up. > >One day (in frustration), I compiled the code on a non-VAX machine and >tried it. Same result, so it doesn't look like it's a hardware related >problem. > >Like the man said, HELP! >-- >Lyndon Nerenberg (VE6BBM) Systems Group - A Div. of Nexus Computing Corp. I'm glad other people are wondering if ultrix has uucp problems. It's easy to convince oneself that there is some kind of problem in ultrix uucp, it's a hell of a lot harder to prove it. It smells a lot like a problem in error recovery somewhere. We see the same timeout symptom. Of course this sort of thing only manifests itself when your phone lines get marginal to start with. I've also seen it on a direct connection between our ultrix system and a box running SVr1 uucp. I've tried the Ultrix 1.1 and 1.2 uucp's, no particular difference. I've patched uucico to bump the retry count from 10 to 100+, not much help. I've checked the object code against some of the reported 4.3 uucp bugs without finding anything obvious. From: grr@cbmvax.cbm.UUCP (George Robbins) Subject: Re: UUCP [ultrix] guru needed! Date: 27 Dec 86 07:31:47 GMT Nothing like spending XMAS over a hot protocol analyzer... Anyway, it looks like Ultrix does have a problem in the protocol that makes certain kinds of errors result in no-recovery situation. ========= Here's what happens: Ultrix sends packet N. Other sends RR packet N to acknowledge. Ultrix sends a packet N+1. Packet N+1 falls in bit-bucket. (sync char gets trashed) Other times out. Other sends RR packet N to acknowledge last packet seen. Ultrix sees packet N already acknowledged - sends nothing!!! Other times out. Other sends RR packet N to acknowledge last packet seen. Ultrix sees packet N already acknowledged - sends nothing!!! . . Other decides retry count exceeded - gives up. Ultrix times out - fatal. ========== The other system can't do anything about packet N+1 because it's never seen it. If it had and there was an error in it, it could have rejected it, but *only* if it recognizes the bad packet. Ultrix is happy because the other keeps sending it nice acknowledgment packets, even if they don't say anything new. It doesn't time out or get errors because they are such nice packets. ========== Now the kicker - as far as I can tell this is the way both Berkeley and AT&T unix play the game! It looks pretty easy to fix, but am I missing something? Just add a test in pksack that says if you get an ack for a packet already acknowledged, and you've already sent another packet, then set the retransmit flag. From: rick@seismo.CSS.GOV (Rick Adams) Subject: Re: UUCP [ultrix] guru needed! Date: 29 Dec 86 21:18:03 GMT Yes, It's a bug. It was fixed in 4.3BSD. Here is a rough idea of how it was fixed (Jim Bloom found and fixed this one). I'm not sure that the test for Reacks need to wait for 4. I think 2 would probably be adequate. However, Jim may know of a case I don't. As a general rule, the 4.3bsd 'g' protocol driver is in better shape than ANY uucp available (including Honey DanBer, [gasp]). At a mimimum, it's at least readable (cryptic, but readable) <<<<< source fix omitted >>>>> From: grr@cbmvax.cbm.UUCP (George Robbins) Subject: Re: UUCP [ultrix] guru needed! [patch included] Date: 7 Jan 87 08:44:53 GMT Part of the problem is that some person at DEC changed the timeouts in pkcget from 10 and 20 seconds to 25 and 30 seconds, in hopes of making things more robust. This actually increased the probability of the remote system timing out before the local system and encountering the underlying protocol problem. The following is a simple patch, applicable to ultrix 1.1 and 1.2, to reset the timeouts to the correct values: 1) you must be root (or the setuid bits will go away!) 2) uucico must not be running 3) you should know what you are doing... 4) copy the old uucico to some safe place 5) make sure the numbers match Script started on Wed Jan 7 03:21:27 198 # cd /usr/lib/uucp # cp uuico uucico.nopatch # adb -w uucico pkcget+57?x _pkcget+57: 19d0 ?w 0ad0 _pkcget+57: 19d0 = ad0 pkcget+5c?x _pkcget+5c: 1ed0 ?w 14d0 _pkcget+5c: 1ed0 = 14d0 # ^d script done on Wed Jan 7 03:24:39 198 This seems to have solved most of my problems, but I would be interested in any reports or comments.