root (11/19/82)
Our "uucico" program have a "bug" which is driving me up the wall, the problem is this:- We are currently connected to 4 site on the net, one of the link is reported to be very noisy, unfortunately this is the link where a huge volume of data is transfered. This combination of noisy link and mass volume of data transfer causes the "uucico" process to hang on some error condition intermittently. (this never happened to the other three link). Yes, it simply hang and does not time out, and it does not exit. The problem with debugging such a beast is that of recreating the error condition caused by the noisy line. Almost in variably it works when we are testing it. But when we put it back to production run, it works for a few days, and the problem started to come back every now and than. The failure pattern I have collected so far is as follow: it seems that it always failed in a the "SLAVE" mode waiting for for a file to be transfered from the other site. I do not have enough debugging output yet to locate where the process is hanging, (stderr is closed when it is started from "cron", To collect debugging output , I wrote a 4 line program which attach stderr to "/tmp/uuLOG" , then "execl("/usr/lib/uucp/uucico", "uucico", "-r1", "-x7",0); now is a matter of waiting for it to fail again. ) Since "/tmp/uuLOG" has not recorded any failure yet, I can only make a guess base on the information in "../spool/uucp/LOGFILE". It seem that "uucico" usually hang somewhere in the routine "pkread(S)" in pk0.c . In fact, I am suspicious about the line such as "while (pkaccept(pk) == 0) ;" Or is the "time out" done somewhere else ?? (i.e gio.c or pk1.c ??) Does anyone had similar problem or/and can anyone provide some information on this problem ?? please mail reply to ..!allegra!sbcs!andrew or ..!peri!sbcs!andrew Andrew W. Chang Dept. of Computer Science State University of New York