[net.micro.amiga] seeks and dbm

chris@umcp-cs.UUCP (Chris Torek) (05/19/86)

In article <297@maynard.UUCP> campbell@maynard.UUCP (Larry Campbell) writes:
>... the manual describes the [lseek] offset as a long (NOT a "simple
>integer" nor a "magic cookie").  Second, if the offsets AREN'T integers,
>then almost any database library or package won't work (like dbm(3)),
>because they often hash keys into offsets in the index file.

It is true that the current dbm code assumes that lseek offsets
are simple `long's; but it would be trivial to get dbm to run on
a system where this were not the case, as long as there were a
simple mapping from an integer index to a `block' of some fixed
size.  The dbm scheme (`extensible hashing') is to convert a string
(`datum') into a 32 bit hash index, then to start with no bits of
the hash, and increase the number of bits used until it has reached
a `leaf' value for that hash index.  There is a bit map of size
2^{max-n-bits-ever-used}-1 that has a bit set for each value that
is no longer a leaf.  When you reach a leaf, the datum is either
in the corresponding block, or not present in the database.

The only really tricky thing is storing a datum, because if a
leaf block fills up, it must be turned into a non-leaf block.
If the hash values are `good', approximately half of the objects
in the block will move to a new leaf (the other half will stay
because they have a zero in the next bit of their hash).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

chris@umcp-cs.UUCP (Chris Torek) (05/21/86)

In article <1584@umcp-cs.UUCP> I wrote:
>... The dbm scheme (`extensible hashing') ....

So as not to mislead anyone:  geowhiz!netzer!prairie!dan tells me that
the method used in the dbm library is not quite the same as that which
is normally referred to as `extensible hashing'.  Make that `The dbm
scheme (a simplified form of extensible hashing) ...'.  I freely (and
somehwat gleefully :-) ) admit to a near total lack of knowledge about
databases---a large and messy field, I understand :-).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu