[news.software.b] Using the latest DBZ from C-news in B-news

gary@dgcad.sv.dg.com (Gary Bridgewater) (12/07/90)

I was intrigued by the dynamic sizing of the .pag file reported to be in
the latest DBZ in C-news so I decided to implement and test it.  It has
been running for a week now - through three expires - and I am pleased.
Not that it is much (if any) faster than the next previous version - it
seems nearly the same which is not surpising since B expire is I/O bound.
What is very nice is that the history.pag file is now much smaller, i.e.

    -rw-r--r--   1 news     news    12506869 Dec  6 16:24 history
    -rw-r--r--   1 news     news          75 Dec  4 18:21 history.dir
    -rw-r--r--   1 news     news     2099264 Dec  6 16:24 history.pag

(I keep 38 days of history)
The history.pag file used to be about the same size as history (although
it may be sparse I think the element size on my disk was bigger than any
gaps so it was really that big) and, more to the point, wouldn't fit in
memory when using INCORE.  This made expire thrash quite badly and caused
it to run ~6 hours on an unbusy system.   So I had gone back to ~INCORE
which took 4+ hours :-(.
Using the newest DBZ and INCORE my expire time is around 3.5 hours.  Also,
the newest DBZ has a few perfomance enhancements that seem to make the
system work a bit faster.  So, as I said, I am pleased.

It is probably possible to just replace an existing DBZ with the newest one.
However, I decided to take advantage of some additional functionality in
this DBZ to add the auto-tuning of the history.pag hash value.  I don't
have diffs and my line numbers won't match yours anyway but below are the
relevant pieces of expire.c that have been changed for the new DBZ. 

There are no changes to any other pieces of B news - other than relinking
inews and NNTP with the new dbz.o and restarting them after the obilgatory
expire -R.

Again, you probably don't have to do this to just use it as a DBM replacement.
And some of this may conflict with patches 18 or 19 which I haven't gotten
around to thinking about.  If I was into thinking about it - I would just
implement C-news expire (probably will). (and I am experimenting with
real C-news on another host so, please, no flames about that).

I have #ifdeff'd the DBZ changes using INCORE...  Explanations are
set off by lines containing "{{{" and "}}}".  Actually using incore
with INCORE is controllable separately via both a compile-time option
and a runtime switch.

//expire.c - patch level 17 + local stuff
...
{{{
    around line 39 - near the top - set up the DBZ values I want - yours
    _probably_ will differ.  You must read the dbz documentation (:->)
    and the comments in dbz.c to figure out what you want if you want
    the most out of this..
    Make using DBZ's INCORE switchable via a -DDBZ_INCORE compile
    definition while still using DBZ. Default is to incore it.
    Note that my compiler likes ANSII-style function declares
}}}

#ifdef INCORE
#define dbz_SIZE 300007L
#define dbz_FIELDSEP '\t'
#define dbz_CMAP '='
#define dbz_TAGMASK 0x7f000000
#ifdef DBZ_INCORE
#define dbz_INCORE_VAL DBZ_INCORE
#else
#define dbz_INCORE_VAL 1
#endif

int dbz_INCORE=dbz_INCORE_VAL;

int dbzincore(int);
int dbzagain(char *, char *);
int dbzfresh(char *, long, int, int, int);
int dbmclose();
#endif


{{{
    around line 281 - at the end of switch processing - add a new
    expire switch "-N" to turn off DBZ INCORE if memory is tight for
    some reason
}}}

#ifdef INCORE
                case 'N':       /* don't do incore */
                        dbz_INCORE = 0;
                        break;
#endif
                default:
#ifdef INCORE
                        printf("Usage: expire [ -v [level] ] [-e days ] [-i] [-a] [-r] [-h] [-p] [-u] [-f username] [-n newsgroups] [-H] [-N]\
n");
#else
                        printf("Usage: expire [ -v [level] ] [-e days ] [-i] [-a] [-r] [-h] [-p] [-u] [-f username] [-n newsgroups] [-H]\n");
#endif


{{{
    around line 379 - in expire() - let the new dbz routines reopen the
    database or create it using the magic values defined above.
    This enables the smart hash sizing.
    I thought about pushing all this into initdbm() but it didn't
    seem any easier or cleaner so I just punted it.
}}}

#ifndef INCORE
                (void) close(creat(PAGFILE, 0666));
                (void) close(creat(DIRFILE, 0666));
                initdbm(NARTFILE);
#else
                (void) dbzincore(dbz_INCORE);
                if ( dbzagain( NARTFILE, ARTFILE ) < 0 )
                    if ( dbzfresh( NARTFILE, dbz_SIZE, dbz_FIELDSEP, dbz_CMAP, dbz_TAGMASK ) < 0 )
                        xerror("Cannot create %s with dbzfresh or dbzagain", NARTFILE);
#endif


{{{
    about line 1200 - in rebuilddbm() - do the same thing.
    Note there are three separate patches shown here for clarity(?).
    With this patch you don't have to make the initial database using
    the DBZ utility - just do a "normal" expire -R and you will get it.
}}}

#ifndef INCORE
        (void) sprintf(namebuf, "%s.dir", ARTFILE);
        (void) close(creat(namebuf, 0666));
        (void) sprintf(namebuf, "%s.pag", ARTFILE);
        (void) close(creat(namebuf, 0666));
#endif
        (void) sprintf(namebuf, "%s", ARTFILE);

        fd = fopen(namebuf, "r");
        if (fd == NULL) {
                perror(namebuf);
                xxit(2);
        }

#ifndef INCORE
        initdbm(namebuf);
#else
        (void) dbzincore(dbz_INCORE);
            if ( dbzagain( namebuf, namebuf ) < 0 )
                if ( dbzfresh( namebuf, dbz_SIZE, dbz_FIELDSEP, dbz_CMAP, dbz_TAGMASK ) < 0 )
                    xerror("Cannot re-create %s with dbzfresh or dbzagain", namebuf);
#endif
        while (fpos=ftell(fd), fgets(lb, BUFSIZ, fd) != NULL) {
                p = index(lb, '\t');
                if (p)
                        *p = 0;
                remember(lb, fpos);
        }
#ifdef INCORE
        (void) dbmclose();
#endif


{{{
    And, finally, about line 1326 - in xxit() - at the end of expire -
    do a closedbm(), just in case, to save it if it is in core.
}}}

#if defined(DBM) && defined(INCORE)
                (void) dbmclose();
#endif
        rmlock();
        exit(i);
}