kevin@perle.UUCP (Kevin Pickard) (03/16/91)
Pyramid 98XE (DualPort OSx 4.4) We have been having some problems with uucico on our system for almost a year and a half now. Pyramid has failed to find any solution to the problem and refuses to look at it any further. Over this time they have passed us from one technical support person to another--each time with the same result. We have upgraded our OS, put in debug versions of the code and provided reams of line traces and debug output. The problem persists. Unfortunately our support with Pyramid was through a third-party vendor and they no longer exist. Pyramid now feels they no longer need to provide a solution for the problem. Hence we are appealing to the collective knowledge of the net for some help. What happens is that during a UUCP session with another host uucico gets confused and does not respond to a message during a file transfer. It goes into a recovery mode and gets really messed up and eventually the other host gives up on us and drops the line. When the connection is later re- established things continue normally with the failed file for a while only to eventually fail again (sometimes on the same file, sometimes on a following one). This repeats over and over again. All data is eventually transferred but it takes many connections, a lot of errors and a lot of time. This obviously lowers the throughput. The problem occurs regardless of the type of other host (NCR, Bell, PC and recently AT&T). The problem occurs regardless of the modem type (Hayes, Telebit and US Robot- ics) although it is more pronounced with higher speed modems. The problem is also more pronounced when the Pyramid system is under load. Through the addition of debug statements in the kernel and uucico it has been shown that when the connection fails, the message that was not replied to by uucico was received by uucico fully intact. The message was traced coming out of the modem by using a Data Line Monitor and uucico was modified to print out all received messages. The message output matched and the message content itself was confirmed to be correct. The failure usually starts with uucico indicating 'pkcget: alarm 4001' just after it gets the last byte of the message. It does not recover from this and continues to get further such alarms (ie. 'pkcget: alarm 7002', 'pkcget: alarm 10003', etc.) Pyramid has indicated that this is some kind of timeout condition. But the message has been read in completely when this occurs and there is no idle time on the line. And uucico on the Pyramid does not see the message when it is then resent a number of times. I recently described this problem to someone I know at a neighbouring site (hi Ron!). He said that this looked just like a problem he saw on his Pyramid system back in 1986. Fortunately he had a source licence for the UUCP code and he hacked a line out of the code and the problem went away. Unfortunately he no longer has the Pyramid so we can not get a copy of the hacked binary from him. To add insult to injury, Pyramid has also refused to make the one line change to a version of the code for us (we do not have source). The change itself is in the file pk0.c and simply involves the removal of a single line of code. When Ron made this change he said he did not understand why it worked in his case, only that it did work. The change was made in code he had on August 22, 1986 and was around line 393 in pk0.c. The affected code is as follows (the removed line is marked <==): if (pk->p_state&RXMIT) { pk->p_nxtps = next[pk->p_rpr]; <== } x = pk->p_nxtps; bstate = pk->p_os[x]; This change may or may not work in our case but I have no idea. If anyone recognizes this problem or has any idea as to how we can fix it we would greatly appreciate hearing from you. We are currently at a dead end. Thanks. -- ------------------------------ ~~~~~~~ --------------------------------------- | o o | Kevin Pickard | . | UUCP: ...!uunet!mnetor!perle!kevin --------------------------^^^-----------^^^-----------------------------------