[net.bugs.uucp] UUCP checksum problems

guy@sun.uucp (Guy Harris) (04/25/85)

> > >We are encountering a very similar problem when our Pyramid (BSD4.2
> > >uucp) trys to converse with System V's (VME10 and EXORmacs using
> > >dialup or direct lines).  They *always* TIMEOUT.  We don't have this 
> > >problem when conversing with other BSD4.2s...  
> > 
> > I believe that this is a different problem.  When System V came out,
> > early Unisoft ports of every flavor had a bug in uucp, whereby they
> > could only talk to other Unisoft-System-V machines.  You have described...

In the first problem (Pyramid to VME10/EXORmacs), is the only symptom a
TIMEOUT at the end of the session - i.e., does everything else work
correctly?  If so, it's probably a long-standing bug in UUCP's hangup
sequence which only showed up with the 4.2BSD UUCP, because that UUCP is the
only one which bothers to note timeouts in the hangup sequence (with the
possible exception of howdy doody, I mean honey danber).  The bug is that
when the master runs out of work, it sends an HY, which the slave responds
to with another HY.  Unfortunately, the "if you get an HY, respond with an
HY" code is NOT conditional on being the slave; the master then sends
another HY back to the slave, which is expecting an OOOOOO from the master
instead.  Hence the timeout.  Unfortunately, either of the two possible
fixes (making the slave wait for the master's second HY, or making the
master not send the second HY) can cause timeouts on the other end, but hey,
that's their problem.  4.3BSD's UUCP fixed the master not to send the second
HY (having the slave expect the second HY causes timeouts on your side if
the other side doesn't send it).  I think honey danber also put in the same
fix.  I don't know why, but although the S5 UUCP code seems to indicate that
it sends the second HY, I've seen timeouts with the 4.3 UUCP when talking to
S5 systems.  I think something else changed in the S5 hangup sequence so
that it causes timeouts.

> I also believe this is the (same) different problem, but I believe it
> is the "notorious" bug in uucp's chksum routine (in pk0.c).  Almost all
> 68K ports have run into this, at least if they started with the MIT C
> compiler or other compilers that work correctly....

A while ago somebody sped up the VAX UUCP "g" protocol checksum routine by
rewriting to declare some register variables as "int" or "long" rather than
"short" (because the 4.xBSD compiler fixes a bug in the S3 VAX compiler where
it generated "poor, and sometimes incorrect, code" for register variables
with <32 bits by not putting such variables in registers) and then
hand-tweaking the assembler code.  Similar hand-tweaked protocol routines
were done for the PDP-11 and 68000 (both of which can do a 16-bit rotate
best by using a code sequence which you can't convince the Ritchie or MIT
compilers to generate).  A side benefit of assembler-language checksum
routines is that they don't break if you compiler handles casts
differently... (that's how we solved the checksum problem at CCI).

	Guy Harris

wes@fritz.UUCP (Wes Chalfant) (05/04/85)

	Many Unisoft ports did have the problem described -- the bug was
in pk0.c (it dealt with some bad assumptions about unsigned casts and
comparisons).  The version of pk0.c distributed with Berkeley 4.2 contained
a chksum() routine that should be portable to almost anything.  There
are probably more casts and masks in that routine now than are really needed.
The System V rev 2.2 sources that we have for the VAX have a version of pk0.c
that is different from both version 7 (the bug) and 4.2 -- I haven't tried
running it but it looks like it should work.  Motorola should be coming out
with a compatible version soon (if they haven't already) -- you should
probably contact them about getting the full release or at least the new
uucp, if you suspect that the chksum() routine is your problem.