[net.unix-wizards] unbatcher

croft@su-safe.ARPA (04/21/86)

From: Bill Croft <croft@su-safe.ARPA>

Right, that IS what we've been doing, this is the crontab line:

15 * * * * /bin/su news < /usr/lib/news/getnews.sh

(I momentarily forgot how cron interprets the first field;  you're
right, it's XX:15 not XX:15, XX:30, XX:45, XX:00.)
So it's still a mystery why the unbatchers themselves would hang.
I once mailed you a short news input file that I think caused the
very first logjam that we experienced.  I'll see if I can find it
in my archives.

mogul@su-gregorio.arpa (04/21/86)

From: Jeffrey Mogul <mogul@su-gregorio.arpa>

I think I know why the unbatcher hangs [Note: consequences of this is
that after a few hours, there are so many unbatchers running that the
"news" user is not allowed to run any more processes.  The way "rcp"
works is to cause a shell to be exec'ed at the destination end, the
shell forks a copy of "rcp", and apparently hangs at that point
because the system won't allow "news" to fork any more processes.
I suspect the shell either doesn't check the return from "fork()",
or more likely is busy-waiting until fork() works.  One note: if you
say in /etc/passwd that news uses csh, not sh (the default), then
I think your connection doesn't hang.  But I'm not positive.]

Anyway, back to unbatch hanging: what happens is that unbatch causes
/usr/bin/rnews to run.  That script renices itself to +10; I think
/usr/lib/rnews does the same thing because when I took that line out
of the script, rnews still ran at nice 10.

On navajo, at least, we have users who run compute-bound jobs
that last for days (last week, one turkey ran the same program 5
times in parallel).  They start at nice 0 and the system demotes
them to nice 4 after a while, but the nice 10 rnews never gets
anywhere.  If you renice the rnews to 0, or renice the compute-bound
jobs to 11, then rnews runs fine and things progress.  Alas, since
the rnews program is execed for every message, the former solution
(renicing rnews) is impractical because you have to do it once
per message.

I think rnews shouldn't renice itself.  If you are worried about
overload, have unbatch exit if the load is above some threshold.
The current situation leads to extreme resource stress.

Also, I think the news distribution setup needs better flow control.
Glacier creates a copy of each article that is batched but not
yet delivered to all receiving sites.  Since some sites (e.g.,
ISL) are often down for days, this means that Glacier can potentially
have double copies of several days of new news.  If Glacier has in
addition been recently reconnected to DECWRL after a few days of
that link being down, the resulting pulse of new news can soak
up >10Mb of disk space.  The batching script on Glacier checks
for space before creating the batch, but if things constipate then
that space is permanently tied up, and the 1500K yellow zone
can be swamped by a day or two of incoming news.

I think we are soon going to be spending more time processing news
than it deserves.  I hope Greg can get the bugs out of his NFS
kernel soon; then we should try having a few uVaxen (one in CIS,
one on MJH, one at SUMEX?) maintain the news, and everyone else
just remote-mount /usr/spool/news so that the timesharing hosts
don't waste their time or disk space on this.

-Jeff

mac@tflop.UUCP (04/23/86)

In article <387@Shasta.ARPA> mogul@su-gregorio.arpa writes:
>From: Jeffrey Mogul <mogul@su-gregorio.arpa>
>
>I think I know why the unbatcher hangs [Note: consequences of this is
>that after a few hours, there are so many unbatchers running that the
>"news" user is not allowed to run any more processes.  The way "rcp"
>works is to cause a shell to be exec'ed at the destination end, the
>shell forks a copy of "rcp", and apparently hangs at that point
>because the system won't allow "news" to fork any more processes.
>I suspect the shell either doesn't check the return from "fork()",
>or more likely is busy-waiting until fork() works.  
>-Jeff

	It seems you guys are etherneted together ('rcp').

	You talk about NFS and mounting /usr/spool/news on all your machines.

	Why not use NNTP (Network News Transmission Protocol) ?
	This protocol allows you to run rn on a remote machine, which gets
	each article and other files it needs (active) as required.  To the
	user on the remote machine, it appears as if the news is local, albeit
	somewhat delayed, although not noticible.  Certainly this would increase 
	the utilization of your ethernet, but only as much as NFS.

	It is available anonymous ftp from bezerkely, called nntp.tar

-- 
---------------------------------+--------------------------------------------
| Michael Mc Namara              | Let the words by yours, I'm done with mine.
| dual!vecpyr!tflop!mac          | May your life proceed by its own design.
---------------------------------+--------------------------------------------