chris@umcp-cs.UUCP (Chris Torek) (05/19/86)
In article <297@maynard.UUCP> campbell@maynard.UUCP (Larry Campbell) writes: >... the manual describes the [lseek] offset as a long (NOT a "simple >integer" nor a "magic cookie"). Second, if the offsets AREN'T integers, >then almost any database library or package won't work (like dbm(3)), >because they often hash keys into offsets in the index file. It is true that the current dbm code assumes that lseek offsets are simple `long's; but it would be trivial to get dbm to run on a system where this were not the case, as long as there were a simple mapping from an integer index to a `block' of some fixed size. The dbm scheme (`extensible hashing') is to convert a string (`datum') into a 32 bit hash index, then to start with no bits of the hash, and increase the number of bits used until it has reached a `leaf' value for that hash index. There is a bit map of size 2^{max-n-bits-ever-used}-1 that has a bit set for each value that is no longer a leaf. When you reach a leaf, the datum is either in the corresponding block, or not present in the database. The only really tricky thing is storing a datum, because if a leaf block fills up, it must be turned into a non-leaf block. If the hash values are `good', approximately half of the objects in the block will move to a new leaf (the other half will stay because they have a zero in the next bit of their hash). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
chris@umcp-cs.UUCP (Chris Torek) (05/21/86)
In article <1584@umcp-cs.UUCP> I wrote: >... The dbm scheme (`extensible hashing') .... So as not to mislead anyone: geowhiz!netzer!prairie!dan tells me that the method used in the dbm library is not quite the same as that which is normally referred to as `extensible hashing'. Make that `The dbm scheme (a simplified form of extensible hashing) ...'. I freely (and somehwat gleefully :-) ) admit to a near total lack of knowledge about databases---a large and messy field, I understand :-). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu