duncan@comp.vuw.ac.nz (Duncan McEwan) (06/17/88)
Sorry about the length of this posting. For some time now I have been experiencing a puzzling problem with the ucb uucico, and it is very difficult to explain briefly. Sometimes when a site calls us, they log on successfully, manage to negotiate the 'g' protocol, but then have problems syncronising the protocol which causes them to abort the call. To illustrate what I mean, here is a sample from an AUDIT file of one such call. xxxxxx csvaxa (5/27-19:14-22718) DEBUG (Remote Enabled) omsg <\020ROK\000> Rmtname csvaxa, Role SLAVE, Ifn - 0, Loginuser - xxxxxx wmesg 'P' tfg omsg <\020Ptfg\000> rmesg - 'U' imsg sync<\020>got it imsg input<Ug\000>got 2 got Ug send 073 pkcget: alarm 4001 Alarm while reading SYN - RXMIT [Repeated 7 times] send 073 pkcget: alarm 22010 Alarm while reading SYN - RXMIT send 073 pkgetpack failed after 10 tries xxxxxx csvaxa (5/27-19:18-22718) FAILED (startup) From my slight knowledge of the workings of uucp, the two sides are attempting to exchange the INITA/INITB/INITC messages that comprise the syncronisation phase of the 'g' protocol. This does not just happen with one site that calls us. The one above is a VAX running Ultrix, but I have also seen it occur with some System V machine, so I don't think I can blame the remote system. Also, I have seen a similar thing happen once with another pyramid that is trying to establish a connection to us (more on that later). At first I suspected noise on the phone line, but one time I had a line monitor on the tty line to the modem when the problem occured. I was able to see the data that the modem passed to our machine. The sequence I saw on the line monitor was as follows - - An INITA message was passed in both directions - An INITB message came from the remote end - After a timeout period we sent another INITA to the remote - After some other timeout period, we received another INITB from the remote. This continued until one side gave up. The messages all looked uncorrupted (I partially decoded them -- though I must admit I did not check the checksums). This suggests to me that the remote side saw our INITA, but our uucico never saw theirs, which seems to indicate that the blame is our's. Could this be a hardware problem (ITP losing characters) or is it possible that uucico gets itself confused and misinterprets characters. One other fact to add to this strangeness. The problem mostly seems to happen immediately after the selection of the 'g' protocol. But in one case that I have an AUDIT file saved for (the case where another pyramid was trying to call us), the problem occured *after* the uucico's had negotiated the first file transfer. In that case the (abreviated) AUDIT looks like this. apmpyr apmpyr (5/13-11:49-21523) DEBUG (Remote Enabled) omsg <\020ROK\000> Rmtname apmpyr, Role SLAVE, Ifn - 0, Loginuser - apmpyr wmesg 'P' tfg omsg <\020Ptfg\000> rmesg - 'U' imsg sync<\020>got it imsg input<Ug\000>got 2 got Ug send 073 rec h->cntl 077 send 061 state - 01 rec h->cntl 061 send 053 state - 03 rec h->cntl 057 state - 010 Proto started g Protocol Ug apmpyr apmpyr (5/13-11:50-21523) OK (startup ttyi13 2400 baud) *** TOP *** - role=SLAVE rmesg - '^@' rec h->cntl 0210 send 041 got S D.apmpyrBoR72 D.apmpyrSoR72 lonam - D.apmpyrBoR72 0666 PROCESS: msg - S D.apmpyrBoR72 D.apmpyrSoR72 lonam - D.apmpyrBoR72 0666 SNDFILE: apmpyr apmpyr (5/13-11:50-21523) REQUESTED (S D.apmpyrBoR72 D.apmpyrSoR72 lonam) msg - S expfile type - 0 chkpth ok Rmtname - apmpyr wmesg 'S' Y send 0211 rec h->cntl 040 Reack count is 1 state - 010 pkcget: alarm 4001 Alarm while reading SYN - RXMIT send 0211 pkcget: alarm 7002 ... etc. Note the problem does not happen all the time, but I have been unable to establish any correlation between when it does happen and factors such as load average, number of logged on users, etc. Finally a few bits of information that may be relevent. We are running OSx 4.0 on a pyramid 90x with 1 ITP and 2 IOC's (Disk/Tape and Ethernet). Looking through our PTF log, I could not see any PTF's relating to ucb uucp so I assume we are still running the version that was distributed with OSx 4.0. If anyone has any clues as to what is going on, I will be happy to hear from them. Thanks in advance. Duncan.