[comp.sys.pyramid] Problems with ucb uucp.

duncan@comp.vuw.ac.nz (Duncan McEwan) (06/17/88)
Sorry about the length of this posting.  For some time now I have been
experiencing a puzzling problem with the ucb uucico, and it is very difficult
to explain briefly.

Sometimes when a site calls us, they log on successfully, manage to negotiate
the 'g' protocol, but then have problems syncronising the protocol which causes
them to abort the call.

To illustrate what I mean, here is a sample from an AUDIT file of
one such call.

xxxxxx csvaxa (5/27-19:14-22718) DEBUG (Remote Enabled)
omsg <\020ROK\000>
Rmtname csvaxa, Role SLAVE,  Ifn - 0, Loginuser - xxxxxx
wmesg 'P' tfg
omsg <\020Ptfg\000>
rmesg - 'U' imsg sync<\020>got it
imsg input<Ug\000>got 2
got Ug
send 073
pkcget: alarm 4001
Alarm while reading SYN - RXMIT

[Repeated 7 times]

send 073
pkcget: alarm 22010
Alarm while reading SYN - RXMIT
send 073
pkgetpack failed after 10 tries
xxxxxx csvaxa (5/27-19:18-22718) FAILED (startup)

From my slight knowledge of the workings of uucp, the two sides are attempting
to exchange the INITA/INITB/INITC messages that comprise the syncronisation
phase of the 'g' protocol.

This does not just happen with one site that calls us.  The one above is a VAX
running Ultrix, but I have also seen it occur with some System V machine, so I
don't think I can blame the remote system.  Also, I have seen a similar thing
happen once with another pyramid that is trying to establish a connection to us
(more on that later).

At first I suspected noise on the phone line, but one time I had a line
monitor on the tty line to the modem when the problem occured.  I was
able to see the data that the modem passed to our machine.  The sequence
I saw on the line monitor was as follows -

  - An INITA message was passed in both directions
  - An INITB message came from the remote end
  - After a timeout period we sent another INITA to the remote
  - After some other timeout period, we received another INITB from the remote.

This continued until one side gave up.  The messages all looked uncorrupted (I
partially decoded them -- though I must admit I did not check the checksums).

This suggests to me that the remote side saw our INITA, but our uucico never
saw theirs, which seems to indicate that the blame is our's.  Could this be a
hardware problem (ITP losing characters) or is it possible that uucico gets
itself confused and misinterprets characters.

One other fact to add to this strangeness.  The problem mostly seems
to happen immediately after the selection of the 'g' protocol.  But
in one case that I have an AUDIT file saved for (the case where another
pyramid was trying to call us), the problem occured *after* the uucico's
had negotiated the first file transfer.  In that case the (abreviated)
AUDIT looks like this.

apmpyr apmpyr (5/13-11:49-21523) DEBUG (Remote Enabled)
omsg <\020ROK\000>
Rmtname apmpyr, Role SLAVE,  Ifn - 0, Loginuser - apmpyr
wmesg 'P' tfg
omsg <\020Ptfg\000>
rmesg - 'U' imsg sync<\020>got it
imsg input<Ug\000>got 2
got Ug
send 073
rec h->cntl 077
send 061
state - 01
rec h->cntl 061
send 053
state - 03
rec h->cntl 057
state - 010
Proto started g
Protocol Ug
apmpyr apmpyr (5/13-11:50-21523) OK (startup ttyi13 2400 baud)
*** TOP ***  -  role=SLAVE
rmesg - '^@' rec h->cntl 0210
send 041
got S D.apmpyrBoR72 D.apmpyrSoR72 lonam - D.apmpyrBoR72 0666
PROCESS: msg - S D.apmpyrBoR72 D.apmpyrSoR72 lonam - D.apmpyrBoR72 0666
SNDFILE:
apmpyr apmpyr (5/13-11:50-21523) REQUESTED (S D.apmpyrBoR72 D.apmpyrSoR72 lonam)
msg - S
expfile type - 0
chkpth ok Rmtname - apmpyr
wmesg 'S' Y
send 0211
rec h->cntl 040
Reack count is 1
state - 010
pkcget: alarm 4001
Alarm while reading SYN - RXMIT
send 0211
pkcget: alarm 7002

... etc.

Note the problem does not happen all the time, but I have been unable
to establish any correlation between when it does happen and factors
such as load average, number of logged on users, etc.

Finally a few bits of information that may be relevent.

We are running OSx 4.0 on a pyramid 90x with 1 ITP and 2 IOC's (Disk/Tape
and Ethernet).  Looking through our PTF log, I could not see any PTF's
relating to ucb uucp so I assume we are still running the version that
was distributed with OSx 4.0.

If anyone has any clues as to what is going on, I will be happy to hear
from them.

Thanks in advance.

Duncan.