[comp.bugs.4bsd] 4.3BSD ndbm

whm@arizona.UUCP (05/27/87)

The man page for ndbm(3) says:

    The sum of the sizes of a key/content pair must not exceed the internal
    block size (currently 4096 bytes).

However, it appears that the actual limit is 1024, rising from #define
PBLKSIZ 1024 in <ndbm.h>.  We found that changing PBLKSIZ to 4096 seems
to work just fine, but it's not clear what's in error, the man page or the
code.

					Bill Mitchell
					whm@arizona.edu
					{allegra,cmcl2,ihnp4,noao}!arizona!whm

allyn@sdcsvax.UCSD.EDU (Allyn Fratkin) (05/27/87)

In article <1739@megaron.arizona.edu>, whm@arizona.edu (Bill Mitchell) writes:
> However, it appears that the actual limit is 1024, rising from #define
> PBLKSIZ 1024 in <ndbm.h>.  We found that changing PBLKSIZ to 4096 seems
> to work just fine, but it's not clear what's in error, the man page or the
> code.

Some time ago, I was looking at the code for ndbm, and I concluded that
somehow the values for the defines of DBLKSIZE and PBLKSIZE are reversed.  
It makes little sense (to me) to have DBLKSIZE be so large since most 
databases don't get that big.  Flame me if I'm wrong (somebody will), but 
isn't the .dir file basically a bitmap (used in the hash calculation)
of the available file pages?  How many people regularly have databases that 
are 4096 pages long?

Incidentally, I ported the ndbm package to an IBM PC running PC/IX and I 
changed the PBLKSIZE to 3072 and the DBLKSIZE to 512.  It works fine.
-- 
 From the virtual mind of Allyn Fratkin            allyn@sdcsvax.ucsd.edu    or
                          EMU Project              {ucbvax, decvax, ihnp4}
                          U.C. San Diego                         !sdcsvax!allyn

chris@mimsy.UUCP (05/28/87)

>In article <1739@megaron.arizona.edu> whm@arizona.edu (Bill Mitchell) writes:
>>... it appears that the actual [key+content length restriction] is
>>1024 [bytes], rising from #define PBLKSIZ 1024 in <ndbm.h>.

In article <3231@sdcsvax.UCSD.EDU> allyn@sdcsvax.UCSD.EDU (Allyn
Fratkin) writes:
>It makes little sense (to me) to have DBLKSIZE be so large since most 
>databases don't get that big. ... isn't the .dir file basically a
>bitmap (used in the hash calculation) of the available file pages?

More precisely, it is a sort of binary tree describing which pages
have been split.  It takes

   2 ** ceil(log (n+1)) - 1
		2
   
bits to describe an n-page database.  Our old hashed host files
are 2020352 bytes, or 1973 pages, requiring 2048 bits.  The 4096
DBLKSIZ means that one can describe a 32767 page, or ~33Mbyte,
database without ever having to swap bitmap blocks.

In practise, it takes fewer bits, as only ones are stored.  One
could have a database grow to 64MB before a second bitmap block
was required.

Increasing PBLKSIZ means that larger databases need fewer bits
(since each bit describes a larger `page'), but increases the
memory-to-memory copying overhead involved in adding and deleting
items.  On a 4K-block file system, it will improve I/O bandwidth.
In short, it would probably be a win to change DBLKSIZ TO 1024 and
PBLKSIZ to 4096.  Alas, all old databases would have to be
reconstructed.  (Mdbm does not suffer from this problem: the data
and map block sizes are set at database creation, and are stored
in the database itself.)

Incidentally, the actual key+content length restriction is
PBLKSIZ - 3 * sizeof (short), or 1018 bytes in 4.3BSD.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris