[news.software.b] concurrent execution of rnews and inews

root@corum.UUCP (System Administration) (09/11/88)

Sigh... I'm running News B2.11.14 on a relatively small Xenix system at os
release 2.1.3.

I have News set up for spooling incoming news into .rnews for later unspooling
at non-peak hours in order to massively reduce the load on the system during
the day.  The batches are NOT compressed.

I call rnews -U from news' crontab at irregular intervals -- mostly during the
late night and early morning hours.  I also have an entry which generates
UUCP/USENET traffic reports every morning at 0730 (for the use of the
mn.traffic newsgroup).

The problem is that i can log on later on during the day and doing a ps -funews
see that, for example, an rnew 88* process has 5 seconds of cpu time on it and
has been sitting there since 0530!  Inews has also been sitting around with a
little bit of time on it since 0730.  If i just leave things the way they are,
they will simply sit there, with more inews processes appearing every day at
0730 until news runs out of processes (max count) or the system is rebooted,
or i intervene by killing off either the rnews procs or the inews procs,
whichever occurs first.

The only thing i can really think of is that somehow the two processes (inews
and rnews) are locking each other out and are in some sort of race condition.
In fact, i was sort of under the impression that a .rnews.lock file was
supposed to have been generated/checked by these two programs, but during these
trying times (i.e., when the two procs are "racing"?) i haven't noticed any
.rnews.lock file sitting around.

I would really hate to have to either run inews manually or add code to the
shell that invokes it to check to see if rnews is running first; besides that
would only solve the problem of inews running if rnews is already running.  It
wouldn't alter rnews firing up after inews.  

Perhaps if i touch a .rnews.lock file, even though inews is supposed to do
this already?

So, whats a poor sysadmin to do?

Any suggestions?  (patches??)  (Telling me to get either c-news or news 3.0
won't help since they are apparently unreleased as of yet  - unless i missed
the announcement)

derek
-- 
Derek Terveer	root@corum.UUCP		..!clyde!lily!corum!root

gordon@sneaky.TANDY.COM (Gordon Burditt) (09/14/88)

News 2.11.14 has a deadlock problem on systems like Xenix where the file
locking is non-advisory (mandatory).  Xenix uses the locking() call in
place of lockf() (# define in a header file), and has LOCKF defined.

The situation:

"rnews -U" locks the lib/seq file while it is running, including the time
while it is waiting for children to finish.  "rnews -U" forks a 
"rnews -S -p <batch>" to run, then waits for it to finish.  That rnews
forks a child rnews to process individual articles, and a "compress -d"
if the batch happened to be compressed.

The child rnews may need to access the lib/seq file to post an article 
locally.  In particular, this happens if the incoming article is an "ihave",
and some of the articles in its list aren't present on the receiving system
yet, so it needs to generate a "sendme" article locally.  

"rnews -U" is waiting for parent "rnews -S" while holding lib/seq locked.
parent "rnews -S" is waiting for child "rnews -S".
child "rnews -S" is trying to access lib/seq, but it is blocked until 
	"rnews -U" lets go of its lock.

DEADLOCK!  Other process will stack up behind this deadlock, making a mess 
to clean up, or eventually running the system out of swap space, process
slots, or open file table entries.

Many systems may avoid ever seeing this by running expire at a time when
articles do not come in, thus never giving rnews -U any work to do.  

The fix:

This is a kludge.  This change causes "rnews -U" to lock only the portion of
the file after the first 512 bytes, instead of the whole file.  Since lib/seq 
is not likely to require more than 511 digits of article id number for quite 
some time, even on Portal, this will prevent the locking from interfering with
access to lib/seq, but it will still permit "rnews -U" to lock out another 
"rnews -U".  The file descriptor is not used for anything but locking, so no 
repositioning of the file pointer after the locking is required.  The fact
that the lib/seq file is well under 512 bytes long doesn't bother the 
locking() call at all.  

*** inews.old	Wed Aug 24 06:55:13 1988
--- inews.c	Mon Sep 12 23:12:20 1988
***************
*** 1435,1440 ****
--- 1435,1441 ----
  		xerror("opendir can't open .:%s", errmsg(errno));
  #ifdef	LOCKF
  	LockFd = xfopen(SEQFILE, "r+w");
+ 	lseek(fileno(LockFd), 512L, 0);
  	if (lockf(fileno(LockFd), F_TLOCK, 0L) < 0) {
  		if (errno != EAGAIN && errno != EACCES)
  #else	/* !LOCKF */

Note that there are some additional "problems" associated with news used
with mandatory locking.  For example, if you try to fire up "rn" (or likely
just about any newsreader) while expire is running, it will hang until
expire finishes, because the active file is locked.  (You can SIGINT out 
of it, though).  I haven't decided whether this is a bug or a feature.

					Gordon L. Burditt
					...!ninja!sneaky!gordon