[news.software.b] C expire - deferring history rebuild

jef@ace.ee.lbl.gov (Jef Poskanzer) (04/12/90)

We run C expire with B news.  The rebuild phase of the expire seems
to take a long time.  More than half.  It occurred to me that deferring
the full rebuild until, say, every other Sunday morning might be a good
thing, if we could be sure that the only problem with the interim
history files was that they were larger than they need to be.

C expire comes with the following flag:

     -r        suppress history rebuild.  Mostly for emergencies.
	       (This  leaves  the  history  file  out of date and
	       larger than  necessary,	but  improves  speed  and
	       eliminates  the need for several megabytes of tem-
	       porary storage.)

What this does is set the "rebuild" boolean to 0, and that means that
(a) no linking and unlinking of history.n* and history.o* gets done,
and (b) no dbm hacking gets done.  Basically, the history files don't
get modified at all.  So yes, this leaves the history files out of date
with respect to the news spool.

Now, it seems like it would be pretty straightforward to modify this
flag (or add a new flag) to do the dbm hacking on the existing history
files.  This would leave history out of date with respect to history.pag
and .dir.  But no one looks at history directly, everyone uses the dbm
files, which would be up to date with respect to the news spool.  (They
would also be larger than they need to be, since dbm's delete() doesn't
reclaim space, but that's ok.)

Am I out to lunch here?  Am I missing something important, like maybe
doing all that dbm hacking would take even longer than a rebuild?  Should
we upgrade to a full C news system with the upcoming dbz support?  And
what about Naomi?
---
Jef

  Jef Poskanzer  jef@well.sf.ca.us  {ucbvax, apple, hplabs}!well!jef
 "Sarcasm I now see to be, in general, the language of the Devil; for
   which reason I have, long since, as good as renounced it."
                        -- Thomas Carlyle

henry@utzoo.uucp (Henry Spencer) (04/12/90)

In article <5377@helios.ee.lbl.gov> jef@ace.ee.lbl.gov (Jef Poskanzer) writes:
>Now, it seems like it would be pretty straightforward to modify this
>flag (or add a new flag) to do the dbm hacking on the existing history
>files.  This would leave history out of date with respect to history.pag
>and .dir.  But no one looks at history directly...

Actually, they do:  the dbm file is only an index into the history file.
And expire in particular works from the history file and ignores the old
dbm file.

However, in this case the dbm hacking is not necessary at all.  There is
not much point in deleting things from the dbm file without deleting
them from the history file, and not deleting things from the history
file just means that (a) expire takes a bit longer because it tries to
expire things that are already expired, and (b) the history file gets
larger.  Having old entries in the history file is otherwise harmless.
This probably ought to be documented better.

I looked at tinkering with the dbm files (etc) without doing a complete
rebuild, and basically decided it wasn't worth the trouble.

>Should we upgrade to a full C news system with the upcoming dbz support? ...

Well, yes, but for other reasons. :-)

>And what about Naomi?

She's still trying to figure out NNTP. :-) :-) :-)
-- 
With features like this,      |     Henry Spencer at U of Toronto Zoology
who needs bugs?               | uunet!attcan!utzoo!henry henry@zoo.toronto.edu