[comp.dcom.modems] Trouble running uucp over trailblazer connection

reinier@.parcom.nl (Reinier van den Born) (11/15/90)
I have a uucp vs. trailblazer problem that I would like some help with.
I have posted this to both comp.mail.uucp and comp.dcom.modems, but
unfortunately we don't receive the latter here so please reply by mail.
If anything interesting may turn up I will post a summary.
I am not a regular reader of any of these groups so I apologize if this
problem has been discussed here before.

I am afraid things get a bit technical below, so I suppose it will only be 
interesting to uucp-gurus, trailblazer-lovers, and PK-addicts (:-). 
However I would like to hear from sites that have/had similar problems.
Therefore I will start with a superficial problem description before
getting into detail.

The configuration:
	- Sony NWS-1750 workstation running NEWS 3.2 (BSD4.3 variant)
	- TELEBIT TrailBlazer Plus running UUCP support
	- a dialin connection to hp4nl which is the uucp backbone for the
	  Netherlands (connects to TrailBlazers)

What happens:
	Almost any file transfer (> some kbytes) failes due to a checksum error
	happening at an apparently random place/time.

Conditions:
	This only happens when using the UUCP support of the TrailBlazer.
	Otherwise everything works just fine, except that the connection appears
	to be even slower than a 2400 baud modem.
	(As a result we currently use such a modem and consider to get rid 
	 of the TrailBlazer. ;-(
	No difference in behaviour is noticeable when talking to the trailblazer
	at 9600 or 19200 baud

Note:
	During the short period we had a Sony NWS-1850 workstation
	(which has a separate I/O processor and is therefore much faster)
	the problems appeared considerably less.

------ stop reading here if you are not a guru, lover, or addict ---------

The problem lies in the packet (PK) communication protocol employed
by uucico and supported (whatever that means) by the trailblazer.
I thankfully used a protocol description of G.L. Chesson (Bell Labs) to
get as far as I have now. Terminology is taken from there.

I got my hands on some uucico sources and after some hacking I discovered
that the checksum error was caused by control packets appearing
at a random position in a data packet. After the data packet
uucico finds exactly n*6 bytes "noise" (where n is the number of control
packets present and 6 the size of such a packet) before recognising the
next data packet. Of course, sometimes the "noise" contains a DLE (^P),
which causes even more errors.

Uucico has some means to recover from transmission errors.
A checksum error causes uucico to request to resend the last packets.
However the result is exactly the same, i.e., the control packet(s)
appear again at exactly the same location(s). (However when uucico is started
again to transmit the same file the error will be found at another place
or may not occur at all)

The misaligned control packet found is always an RR packet. The sequence
number of the last correctly received packet found in the RR packet is
consistent with that found in the data packets received. (So it isn't a 
packet originating from our machine that is inadvertently echoed)

So we get, for instance, something like this (C in octal, others hex)
	:
	receive C = 227
	send    C = 042
	receive C = 237
	send    C = 043
	receive C = 247 - checksum error
		control packet found: k =  0x9, c0 = 0x83, c1 = 0xaa, C = 047
	send    C = 023
		6 bytes noise
	receive C = 257
	receive C = 267
	receive C = 247 - checksum error
		control packet found: k =  0x9, c0 = 0x83, c1 = 0xaa, C = 047
	send    C = 023
		6 bytes noise
	:
	etc.

After some further hacking I had uucico remove the control
packet from the data packet (sometimes: it should not appear in the
header or end up partly in the noise at the end of the data part).
To my surprise communication continues normally after a recovery.

Considering these facts I get to the following conclusions:
- something is dead wrong here as those control packets should
  never end up in data packets.
- the fact that hp4nl and its trailblazers maintain a large number of
  fast uucp connections and never heard of such problems suggests that
  the problem doesn't come from there.
- the fact that if UUCP support is switched off, or if the error is
  corrected, normal communication is possible, suggests that
  the TrailBlazer is the bad guy.
- the fact that Sony workstations are uncommon and ours is probably the only
  one in the Netherlands connected to hp4nl suggests it causes the problem
  itself. Problems with the serial device drivers have been reported.
  However this is mere circumstantial evidence. The fact that the faster Sony
  had less problems, however, might be a stronger indication.

The only explanation I can think of at the moment is the following:
- hp4nl sends the RR packets synchronously with the data packets
- somehow one of the trailblazers doesn't recognize it and passes it
  straight through (as it would with any data not fitting the protocol??)
  Thus it gets out of sync.
- it then ends up at a random place in the protocol buffers of our
  TrailBlazer.
- when our TrailBlazer receives the resend packet from our uucico 
  it just sends its buffers again, repeating exactly the same error.
- a faster Sony will service the trailblazer and therefore hp4nl faster and
  therefore cause less RR packets to be sent

I should note that I know little of TrailBlazers, especially the extend of
the protocol support, so this explanation may be way off.

For the time being I will try to make uucico recover from these errors
as good as possible. It seems difficult to repair them all, but say
98% seems feasible. This will still leave a reduced transmission speed
(uucico sleeps 1 second before continuing after an error), an occasional
disruption of the communication, and the bad taste of an improper solution.

So any suggestions to the cause or to a proper solution of the problem are
welcomed. 

Reinier van den Born
Parallel Computing
E-mail: reinier@parcom.nl
Phone:  +31-20-233274
S-nail: Postbus 16775, 1001 RG Amsterdam, The Netherlands

-- 
Reinier van den Born
Parallel Computing, Amsterdam.
+31-20-233274
E-mail: reinier@parcom.nl