guy@sun.uucp (Guy Harris) (04/25/85)
> > >We are encountering a very similar problem when our Pyramid (BSD4.2 > > >uucp) trys to converse with System V's (VME10 and EXORmacs using > > >dialup or direct lines). They *always* TIMEOUT. We don't have this > > >problem when conversing with other BSD4.2s... > > > > I believe that this is a different problem. When System V came out, > > early Unisoft ports of every flavor had a bug in uucp, whereby they > > could only talk to other Unisoft-System-V machines. You have described... In the first problem (Pyramid to VME10/EXORmacs), is the only symptom a TIMEOUT at the end of the session - i.e., does everything else work correctly? If so, it's probably a long-standing bug in UUCP's hangup sequence which only showed up with the 4.2BSD UUCP, because that UUCP is the only one which bothers to note timeouts in the hangup sequence (with the possible exception of howdy doody, I mean honey danber). The bug is that when the master runs out of work, it sends an HY, which the slave responds to with another HY. Unfortunately, the "if you get an HY, respond with an HY" code is NOT conditional on being the slave; the master then sends another HY back to the slave, which is expecting an OOOOOO from the master instead. Hence the timeout. Unfortunately, either of the two possible fixes (making the slave wait for the master's second HY, or making the master not send the second HY) can cause timeouts on the other end, but hey, that's their problem. 4.3BSD's UUCP fixed the master not to send the second HY (having the slave expect the second HY causes timeouts on your side if the other side doesn't send it). I think honey danber also put in the same fix. I don't know why, but although the S5 UUCP code seems to indicate that it sends the second HY, I've seen timeouts with the 4.3 UUCP when talking to S5 systems. I think something else changed in the S5 hangup sequence so that it causes timeouts. > I also believe this is the (same) different problem, but I believe it > is the "notorious" bug in uucp's chksum routine (in pk0.c). Almost all > 68K ports have run into this, at least if they started with the MIT C > compiler or other compilers that work correctly.... A while ago somebody sped up the VAX UUCP "g" protocol checksum routine by rewriting to declare some register variables as "int" or "long" rather than "short" (because the 4.xBSD compiler fixes a bug in the S3 VAX compiler where it generated "poor, and sometimes incorrect, code" for register variables with <32 bits by not putting such variables in registers) and then hand-tweaking the assembler code. Similar hand-tweaked protocol routines were done for the PDP-11 and 68000 (both of which can do a 16-bit rotate best by using a code sequence which you can't convince the Ritchie or MIT compilers to generate). A side benefit of assembler-language checksum routines is that they don't break if you compiler handles casts differently... (that's how we solved the checksum problem at CCI). Guy Harris
wes@fritz.UUCP (Wes Chalfant) (05/04/85)
Many Unisoft ports did have the problem described -- the bug was in pk0.c (it dealt with some bad assumptions about unsigned casts and comparisons). The version of pk0.c distributed with Berkeley 4.2 contained a chksum() routine that should be portable to almost anything. There are probably more casts and masks in that routine now than are really needed. The System V rev 2.2 sources that we have for the VAX have a version of pk0.c that is different from both version 7 (the bug) and 4.2 -- I haven't tried running it but it looks like it should work. Motorola should be coming out with a compatible version soon (if they haven't already) -- you should probably contact them about getting the full release or at least the new uucp, if you suspect that the chksum() routine is your problem.