[news.sysadmin] Abysmal disk usage in unbatching news - suggestions?

duane@anasaz.UUCP (Duane Morse) (05/25/89)

Our NCR Tower 32/600 (Unix System 5.2) spends hours unbatching news.
We're running with patch level 17, spooling stuff to .rnews
(SPOOLNEWS is defined), and running rnews -U late at night.

Looking at the code, I find that a history file is scanned for each
message in a batch of messages (looking for duplicate postings). Even
though we only keep about 7-days worth of news, this means the
history files (0 to 9) are each about 200K long; that's a lot of disk
to scan on a per-message basis!

How do big System V sites deal with this? Does rnews -U run around
the clock there?

I'm considering modifying rnews and expire to use 101 files (00 through 99
and XX for the bizarre message id's) instead of the basic 10. Does this
seem reasonable? Anybody have a better idea?
-- 

Duane Morse	...{asuvax or mcdphx}!anasaz!duane
(602) 861-7609

karl@cheops.cis.ohio-state.edu (Karl Kleinpaste) (05/25/89)

duane@anasaz.UUCP writes:
   How do big System V sites deal with this? Does rnews -U run around
   the clock there?
   ... Anybody have a better idea?

A far better solution is to convince your SysV-compiled system to cope
with DBM files.  To do so, pick up copies of a dbm library from, e.g.,
the X distributions, or the once-postd mdbm library (which was my
solution).  Build yourself a /usr/lib/libdbm.a, add #define DBM to
defs.h and -DDBM and -ldbm to your Makefile, and remake the Known
Universe.  Re-install the result, and run expire -r to rebuild your
history files in DBM format.  I got at least a 4x throughput increase
in [ir]news by doing so, on a (now defunct) 3B2/400 running SysVRel3.0.

The ancient cruft of SysV's /usr/lib/news/history.d is a crime against
Man, Nature, and Computer Science.

--Karl

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (05/26/89)

In article <26@anasaz.UUCP> duane@anasaz.UUCP (Duane Morse) writes:
>
>Looking at the code, I find that a history file is scanned for each
>message in a batch of messages (looking for duplicate postings). Even

>How do big System V sites deal with this? Does rnews -U run around
>the clock there?

Sure, use dbm().  If you don't have it, get dbz() (from me if you don't have
any other source).



-- 
  Jon Zeeff			zeeff@b-tech.ann-arbor.mi.us
  Ann Arbor, MI			sharkey!b-tech!zeeff

larry@focsys.UUCP (Larry Williamson) (05/27/89)

In article <9389@b-tech.ann-arbor.mi.us> Jon Zeeff writes:
>Sure, use dbm().  If you don't have it, get dbz() (from me if you don't have
>any other source).

Does anyone local or near to Waterloo have a copy of either of these?

I scaned the archives on watmath, but I could not find any reference
to either package.

If it is too large to mail, then I can call your site directly.

-larry

-- 
Larry Williamson  -- Focus Systems -- Waterloo, Ontario
                  watmath!focsys!larry  (519) 746-4918

jerry@olivey.olivetti.com (Jerry Aguirre) (05/31/89)

In article <KARL.89May25110233@cheops.cis.ohio-state.edu> karl@cheops.cis.ohio-state.edu (Karl Kleinpaste) writes:
>solution).  Build yourself a /usr/lib/libdbm.a, add #define DBM to
>defs.h and -DDBM and -ldbm to your Makefile, and remake the Known
>Universe.  Re-install the result, and run expire -r to rebuild your
>history files in DBM format.  I got at least a 4x throughput increase
>in [ir]news by doing so, on a (now defunct) 3B2/400 running SysVRel3.0.

Actually you should only have to run expire -R (upper case).  This will
just rebuild the dbm copy of the history file from the text version and
is a lot faster.

The -r option will have to scan the entire set of news articles, parse
each file, and sort the output by date.  Depending on the amount of news
you keep this can take many hours.

And, of course, you can delete the 0-9 files.