[news.software.b] Dbz and news 2.11

ahby@bungia.Bungia.MN.ORG (Shane P. McCarron) (02/03/89)

Some time ago the dbz sources were posted for use with news 2.11.
Patches have been posted, and a number of sites are using these
routines to make news run faster on machines without dbm libraries.  I
seem to remember some controversy surrounding the use of these
routines.  Something about their containing AT&T proprietary stuff?  I
realize that by asking this I am going to get a hundred answers, all
of them different.  What I am really looking for is ONE AUTHORITATIVE
ANSWER.  Can anyone out there say for sure whether this code is freely
distributable or not?
--
Shane P. McCarron				E-Mail: ahby@bungia.MN.ORG
Systems Analyst, NAPS International		AT&T: +1 (612) 224-9239

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (02/05/89)

In article <20@bungia.Bungia.MN.ORG> ahby@bungia.MN.ORG (Shane P. McCarron) writes:
>Some time ago the dbz sources were posted for use with news 2.11.
>Patches have been posted, and a number of sites are using these
>routines to make news run faster on machines without dbm libraries.  I
>seem to remember some controversy surrounding the use of these
>routines.  Something about their containing AT&T proprietary stuff?  I
>realize that by asking this I am going to get a hundred answers, all
>of them different.  What I am really looking for is ONE AUTHORITATIVE
>ANSWER.  Can anyone out there say for sure whether this code is freely
>distributable or not?

As the author, I suppose I can supply an authorative answer.  Dbz 
contains no AT&T code at all.  The only thing in it not written by me 
is the hashing, and that is taken from pathalias (with permission).  

Here it is if anyone missed it:


/*
dbz.c  V1.5

Copyright 1988 Jon Zeeff (umix!b-tech!zeeff)
You can use this code in any manner, as long as you leave my name on it
and don't hold me responsible for any problems with it.

Hacked on by gdb@ninja.UUCP (David Butler); Sun Jun  5 00:27:08 CDT 1988

These routines replace dbm as used by the usenet news software
(it's not a full dbm replacement by any means).  It's fast and
simple.  It contains no AT&T code.

BSD sites will notice some savings in disk space.  Sys V sites without
dbm will notice much faster operation.

Note: .pag files created by version 1.[0-3] need to be recreated before
      using this version.

This code relies on the fact that news stores a pointer to the history
file as the dbm data.  It doesn't store another copy of the key like
dbm does so it saves disk space.  All you can do is fetch() and
store() data.

Just make news with the DBM option and link with dbz.o.
*/

/*
   Set this to the something several times larger than the maximum # of
   lines in a history file.  It should be a prime number.
*/
#define INDEX_SIZE 99991L

#define DEBUG if (0) printf	          /* for no debugging */
/* #define DEBUG if (1) printf		  /* for debugging */
/* #define DEBUG if (debug) printf        /* for "rn" type debugging */

/* Note: let the optimizer remove any if(0) or if(1) code above */

#include <fcntl.h>
#include <string.h>
#include <ctype.h>

long lseek();

static long get_ptr();
static int put_ptr();
static void lcase();
static long hash();
static void crcinit();

typedef struct {
	char *dptr;
	int dsize;
} datum;

static int Data_fd;		/* points to /usr/lib/news/[n]history */
static int Index_fd = -1;	/* points to /usr/lib/news/[n]history.pag */

int 
dbminit(name)
char *name;
{
        char index_file[1024];	/* index file name */
	void crcinit();

	if (Index_fd >= 0) {
		DEBUG("dbminit: dbminit aready called once\n");
		return(-1);    /* init already called once */
	}

	strcpy(index_file, name);
	strcat(index_file, ".pag");

	/* if we don't have permission to open the pag file for read/write */
	/* then we may never attempt a store().  If we do, it will fail */

	if ((Index_fd = open(index_file, O_RDWR)) < 0 &&
	    (Index_fd = open(index_file, O_RDONLY)) < 0 &&
	    (Index_fd = open(index_file, O_CREAT | O_RDWR, 0644)) < 0) 
	{
		DEBUG("dbminit: Index_file open failed\n");
		return(-1);
	}

	if ((Data_fd = open(name, O_RDONLY)) < 0) {

		/* The only time the data file does not exist is when */
		/* expire is running (as "news") and it is ok to create it */
		/* If we don't have "permission" the create will fail, too */

		if (close(creat(name, 0644)) < 0 ||
		    (Data_fd = open(name, O_RDONLY)) < 0) {
			DEBUG("dbminit: Data_file open failed\n");
			close(Index_fd);
			Index_fd = -1;
			return(-1);
		}
	}

	crcinit();	/* initialize the crc table */
	DEBUG("dbminit: succeeded\n");
	return(0);
}

int
dbmclose()
{
	if (Index_fd >= 0) {
		close(Index_fd);
		Index_fd = -1;
		close(Data_fd);
	}
	DEBUG("dbmclose: succeeded\n");
	return(0);
}

/* get an entry from the database */
datum
fetch(key)
datum key;
{
	long index_ptr;
	long index_size = INDEX_SIZE;
        char buffer[1024];
	static long data_ptr;
	datum output;
	long hash();
	long get_ptr();
	void lcase();

	DEBUG("fetch: (%s)\n", key.dptr);

	for (index_ptr = hash(key.dptr, key.dsize);
	    --index_size && (data_ptr = get_ptr(index_ptr)) >= 0L;
	    index_ptr = ++index_ptr % INDEX_SIZE) {

		lseek(Data_fd, data_ptr, 0);
		read(Data_fd, buffer, (unsigned)key.dsize);

		/* key should be lcase version of article id and no tab */

		lcase(buffer, key.dsize);
		if (buffer[key.dsize - 1] == '\t') {
			buffer[key.dsize - 1] = '\0';
		}

		DEBUG("fetch: buffer (%s)\n", buffer);
		if (strncmp(key.dptr, buffer, key.dsize) == 0) {
			/* we found it */
			output.dptr = (char *)&data_ptr;
			output.dsize = sizeof(long);
			DEBUG("fetch: successful\n");
			return(output);
		}
	}

	/* we didn't find it */

	output.dptr = (char *)0;
	output.dsize = 0;
	DEBUG("fetch: failed\n");
	return(output);
}

/* add an entry to the database */
store(key, data)
datum key;
datum data;
{
	/* lint complains about a possible pointer alignment problem here */
	/* it is not a problem because dptr is the first element of a */
	/* structure and should be aligned for anything */
	DEBUG("store: (%s, %ld)\n", key.dptr, *((long *)data.dptr));
	return(put_ptr(hash(key.dptr, key.dsize), *((long *)data.dptr)));
}

/* get a data file pointer from the specified location in the index file */

static long
get_ptr(index_ptr)
long index_ptr;
{
	long data_ptr;

	DEBUG("get_ptr: (%ld)\n", index_ptr);

	/* seek to where it should be */
	lseek(Index_fd, (long)(index_ptr * sizeof(long)) ,0);

	/* read it */
	if (read(Index_fd, (char *)&data_ptr, sizeof(long)) != sizeof(long) ||
	    data_ptr == 0L) {
		DEBUG("get_ptr: failed\n");
		return(-1L);
	}
	DEBUG("get_ptr: succeeded\n");
	return(--data_ptr);	/* remove the offset we added in put_ptr */
}

/* put a data file pointer into the specified location in the index file */
/* move down further if slots are full (linear probing) */
static
put_ptr(index_ptr, data_ptr)
long index_ptr;
long data_ptr;
{
	long get_ptr();
	long index_size = INDEX_SIZE;

	/* find an empty slot */
	while (--index_size && get_ptr(index_ptr) >= 0L) {
		index_ptr = ++index_ptr % INDEX_SIZE;
	}

	if (index_size == 0L) {
		DEBUG("put_ptr: hash table overflow - failed\n");
		return(-1);
	}

	/* seek to spot */
	lseek(Index_fd, (long)(index_ptr * sizeof(long)), 0);

	++data_ptr;   /* add one so that we can use 0 as no pointer */

	/* write in data */
	if (write(Index_fd, (char *)&data_ptr, sizeof(long)) != sizeof(long)) {
		DEBUG("put_ptr: write failed\n");
		return(-1);
	}

	DEBUG("put_ptr: succeeded\n");
	return(0);
}

static void
lcase(s, n)
register char *s;
register int n;
{
	for (; n > 0; --n, ++s) {
		*s = tolower(*s);
	}
}

/* This is a simplified version of the pathalias hashing function.
 * Thanks to Steve Belovin and Peter Honeyman
 *
 * hash a string into a long int.  31 bit crc (from andrew appel).
 * the crc table is computed at run time by crcinit() -- we could
 * precompute, but it takes 1 clock tick on a 750.
 *
 * This fast table calculation works only if POLY is a prime polynomial
 * in the field of integers modulo 2.  Since the coefficients of a
 * 32-bit polynomial won't fit in a 32-bit word, the high-order bit is
 * implicit.  IT MUST ALSO BE THE CASE that the coefficients of orders
 * 31 down to 25 are zero.  Happily, we have candidates, from
 * E. J.  Watson, "Primitive Polynomials (Mod 2)", Math. Comp. 16 (1962):
 *	x^32 + x^7 + x^5 + x^3 + x^2 + x^1 + x^0
 *	x^31 + x^3 + x^0
 *
 * We reverse the bits to get:
 *	111101010000000000000000000000001 but drop the last 1
 *         f   5   0   0   0   0   0   0
 *	010010000000000000000000000000001 ditto, for 31-bit crc
 *	   4   8   0   0   0   0   0   0
 */

#define POLY 0x48000000L	/* 31-bit polynomial (avoids sign problems) */

static long CrcTable[128];

static void
crcinit()
{	register int i, j;
	register long sum;

	for (i = 0; i < 128; ++i) {
		sum = 0L;
		for (j = 7 - 1; j >= 0; --j)
			if (i & (1 << j))
				sum ^= POLY >> j;
		CrcTable[i] = sum;
	}
	DEBUG("crcinit: done\n");
}

static long
hash(name, size)
register char *name;
register int size;
{
	register long sum = 0L;

	while (size--) {
		sum = (sum >> 7) ^ CrcTable[(sum ^ (*name++)) & 0x7f];
	}
	DEBUG("hash: returns (%ld)\n", sum % INDEX_SIZE);
	return(sum % INDEX_SIZE);
}

-- 
  Jon Zeeff			zeeff@b-tech.ann-arbor.mi.us
  Ann Arbor, MI			mailrus!b-tech!zeeff

bill@twwells.uucp (T. William Wells) (02/06/89)

In article <5095@b-tech.ann-arbor.mi.us> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
: As the author, I suppose I can supply an authorative answer.  Dbz
: contains no AT&T code at all.  The only thing in it not written by me
: is the hashing, and that is taken from pathalias (with permission).
:
: Here it is if anyone missed it:
:
: dbz.c  V1.5

Any reason I should switch from 1.3?

---
Bill
{ uunet!proxftl | novavax } !twwells!bill

asgard@cpro.uucp (J.R. Stoner) (02/06/89)

From article <373@twwells.uucp>, (T. William Wells):
; In article <5095@b-tech.ann-arbor.mi.us> (Jon Zeeff) writes:
; : As the author, I suppose I can supply an authorative answer.  Dbz
; : contains no AT&T code at all.  The only thing in it not written by me
; : is the hashing, and that is taken from pathalias (with permission).
; : Here it is if anyone missed it:
; : dbz.c  V1.5
[ Code elided ]
; Any reason I should switch from 1.3?

Before you respond maybe you can address a concern of mine that has to do
with dbz 1.3 and possibly dbz 1.5 to wit:

I had previously been using dbz 1.3 with 2.11.13 news to speed up the news
system (which it did) but started seeing problems with expire when the
history file grew past some as yet unidentified boundary.  It seems that the
files are properly removed at the expiration time but the reconstructed
history file seemed to increasingly have nulls in front of the first line.
Needless to say, before I increased my ulimit (a Microport SysV/AT 2.2) to
something reasonable the history file would then not stay within bounds and
get truncated.  I liked the speedup but did not like having to blow off the
history file and reconstruct it by hand periodically.

If this is a known problem that is addressed in 1.5 then I would want to
re-install dbz (also hoping it will work with C news or whatever is coming
down since B news is about to end).  If this is something I did wrong in my
localize.sh then maybe some sample of the right thing to do would be
appreciated.
-- 
"To prevent having to tell fools to RTFM don't let on you WTFM in the first
place." - J.R. (May the Source be With You) Stoner
asgard@cpro.uucp	asgard@wotan.uucp	asgard@well.uucp
...decwrl!pacbell!cpro!asgard

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (02/09/89)

Dbz version 1.5 does have some misc fixes and a better hash algorithm 
than V1.3 had.  You'll know it if your system is affected - you can get 
multiple copies of articles.  I've never had anyone mention nulls 
being put into the history file though.  


-- 
  Jon Zeeff			zeeff@b-tech.ann-arbor.mi.us
  Ann Arbor, MI			mailrus!b-tech!zeeff