[comp.lang.perl] Large, sparse dbm databases

olson@sax.cs.uiuc.edu (Robert Olson) (02/07/91)

I am using a dbm database (via dbmopen) to create a rather large,
sparse database. My data fits nicely in the form...
	$db{key1}
	$db{key1, key2}
	$db{key1, key2, key3}
	etc
where there are relatively few values of each of the first couple
keys. 

(Perhaps an example is in order. The data is information about classes
gleaned from the g++ compiler.

	$db{"class"} = "X,Y,Z"	
	$db{"class", "X"} = "method,field"
	$db{"class", "X", "field" } = "field1,field2,field3"
	$db{"class", "X", "field", "field1"} = "type,offset"
	$db{"class", "X", "field", "field1", "type" } = "int"

As you can guess, there are a large number of different keys in the
second and fourth positions and relatively few in the first and third.
)

Some of the values in the db are list which I just represent as comma
separated lists of strings. It is easiest given the data I create the
database from to store the initial value of the list in the database,
and append succeeding values to the list as I read them from the
input.

Problem: I get errors such as

  dbm store returned -1, errno 28, key "class^\InfrastructureManager^\method"
	at db.pl line 42, <> line 810.

I get errno 28 to be No space left on device. This shouldn't have been
the case; there were >50M free on the disk. BUT, I guess this is
telling:

% ls -sl class*
  28 -rw-rw-r--  1 olson       61440 Feb  6 22:25 classinfo.dir
4840 -rw-rw-r--  1 olson    491334656 Feb  6 22:25 classinfo.pag

Not much real storage, but lotsa holes.... (Note: this is for the
preallocated version (see below). The unpreallocated versions showed
similar behavior but with the apparant size only in the 100M range).

The first thing I tried was to make each key a constant length, so
that instead of $db{"class", "X"} I had $db{"class", "X", '', '', ''}.

This didn't seem to change much. Suspecting that appending to database
elements (eg $db{"class"} = $db{"class"} . ',' . $elt) was hurting me,
I tried preallocating the value via
	$db{"class"} = ' ' x 100; $db{"class"} = $firstValue
but that didn't seem to help either.

From what I did it may be obvious that I do not understand the
workings of dbm files...

Any suggestions?

--bob

PS. Apologies if this gets posted twice; the first one didn't show up on the
nntp server...
--
Bob Olson			University of Illinois at Urbana/Champaign
Internet: rolson@uiuc.edu	UUCP:  {uunet|convex|pur-ee}!uiucdcs!olson
UIUC NeXT Campus Consultant	NeXT mail: olson@fazer.champaign.il.us
"You can't win a game of chess with an action figure!" AMA #522687 DoD #28

flee@cs.psu.edu (Felix Lee) (02/08/91)

>  dbm store returned -1, errno 28, key "class^\InfrastructureManager^\method"
>	at db.pl line 42, <> line 810.
>I get errno 28 to be No space left on device. This shouldn't have been

You're running into ndbm limitations.  Ndbm returns ENOSPC when your
key+data exceeds PBLKSIZ bytes (in <ndbm.h>).  This may be mentioned
in the "BUGS" section of your ndbm(3) man page.  (The SunOS man page
claims 4096 bytes, but <ndbm.h> says 1024.)

Ndbm also runs into problems when all the data that hashes to the same
32-bit integer overflows a PBLKSIZ block, but I don't think it returns
ENOSPC in this case.
--
Felix Lee	flee@cs.psu.edu

olson@sax.cs.uiuc.edu (Bob Olson) (02/08/91)

Yep, that seems to be it. Credit also to Randal who emailed a similar
answer. 

My solution?  Tokenize all strings inserted into the database
using another dbm database for the string->integer conversion and an
array for the integer->string conversion. It works quite well, and the
resulting databases are a LOT smaller, with seemingly small
performance hit. The database ends up consisting of entries like
	1^\3^\7^\190 --> 4,2,10,5

--bob