root@corum.UUCP (System Administration) (09/11/88)
Sigh... I'm running News B2.11.14 on a relatively small Xenix system at os release 2.1.3. I have News set up for spooling incoming news into .rnews for later unspooling at non-peak hours in order to massively reduce the load on the system during the day. The batches are NOT compressed. I call rnews -U from news' crontab at irregular intervals -- mostly during the late night and early morning hours. I also have an entry which generates UUCP/USENET traffic reports every morning at 0730 (for the use of the mn.traffic newsgroup). The problem is that i can log on later on during the day and doing a ps -funews see that, for example, an rnew 88* process has 5 seconds of cpu time on it and has been sitting there since 0530! Inews has also been sitting around with a little bit of time on it since 0730. If i just leave things the way they are, they will simply sit there, with more inews processes appearing every day at 0730 until news runs out of processes (max count) or the system is rebooted, or i intervene by killing off either the rnews procs or the inews procs, whichever occurs first. The only thing i can really think of is that somehow the two processes (inews and rnews) are locking each other out and are in some sort of race condition. In fact, i was sort of under the impression that a .rnews.lock file was supposed to have been generated/checked by these two programs, but during these trying times (i.e., when the two procs are "racing"?) i haven't noticed any .rnews.lock file sitting around. I would really hate to have to either run inews manually or add code to the shell that invokes it to check to see if rnews is running first; besides that would only solve the problem of inews running if rnews is already running. It wouldn't alter rnews firing up after inews. Perhaps if i touch a .rnews.lock file, even though inews is supposed to do this already? So, whats a poor sysadmin to do? Any suggestions? (patches??) (Telling me to get either c-news or news 3.0 won't help since they are apparently unreleased as of yet - unless i missed the announcement) derek -- Derek Terveer root@corum.UUCP ..!clyde!lily!corum!root
gordon@sneaky.TANDY.COM (Gordon Burditt) (09/14/88)
News 2.11.14 has a deadlock problem on systems like Xenix where the file locking is non-advisory (mandatory). Xenix uses the locking() call in place of lockf() (# define in a header file), and has LOCKF defined. The situation: "rnews -U" locks the lib/seq file while it is running, including the time while it is waiting for children to finish. "rnews -U" forks a "rnews -S -p <batch>" to run, then waits for it to finish. That rnews forks a child rnews to process individual articles, and a "compress -d" if the batch happened to be compressed. The child rnews may need to access the lib/seq file to post an article locally. In particular, this happens if the incoming article is an "ihave", and some of the articles in its list aren't present on the receiving system yet, so it needs to generate a "sendme" article locally. "rnews -U" is waiting for parent "rnews -S" while holding lib/seq locked. parent "rnews -S" is waiting for child "rnews -S". child "rnews -S" is trying to access lib/seq, but it is blocked until "rnews -U" lets go of its lock. DEADLOCK! Other process will stack up behind this deadlock, making a mess to clean up, or eventually running the system out of swap space, process slots, or open file table entries. Many systems may avoid ever seeing this by running expire at a time when articles do not come in, thus never giving rnews -U any work to do. The fix: This is a kludge. This change causes "rnews -U" to lock only the portion of the file after the first 512 bytes, instead of the whole file. Since lib/seq is not likely to require more than 511 digits of article id number for quite some time, even on Portal, this will prevent the locking from interfering with access to lib/seq, but it will still permit "rnews -U" to lock out another "rnews -U". The file descriptor is not used for anything but locking, so no repositioning of the file pointer after the locking is required. The fact that the lib/seq file is well under 512 bytes long doesn't bother the locking() call at all. *** inews.old Wed Aug 24 06:55:13 1988 --- inews.c Mon Sep 12 23:12:20 1988 *************** *** 1435,1440 **** --- 1435,1441 ---- xerror("opendir can't open .:%s", errmsg(errno)); #ifdef LOCKF LockFd = xfopen(SEQFILE, "r+w"); + lseek(fileno(LockFd), 512L, 0); if (lockf(fileno(LockFd), F_TLOCK, 0L) < 0) { if (errno != EAGAIN && errno != EACCES) #else /* !LOCKF */ Note that there are some additional "problems" associated with news used with mandatory locking. For example, if you try to fire up "rn" (or likely just about any newsreader) while expire is running, it will hang until expire finishes, because the active file is locked. (You can SIGINT out of it, though). I haven't decided whether this is a bug or a feature. Gordon L. Burditt ...!ninja!sneaky!gordon