olson@sax.cs.uiuc.edu (Robert Olson) (02/07/91)
I am using a dbm database (via dbmopen) to create a rather large,
sparse database. My data fits nicely in the form...
	$db{key1}
	$db{key1, key2}
	$db{key1, key2, key3}
	etc
where there are relatively few values of each of the first couple
keys. 
(Perhaps an example is in order. The data is information about classes
gleaned from the g++ compiler.
	$db{"class"} = "X,Y,Z"	
	$db{"class", "X"} = "method,field"
	$db{"class", "X", "field" } = "field1,field2,field3"
	$db{"class", "X", "field", "field1"} = "type,offset"
	$db{"class", "X", "field", "field1", "type" } = "int"
As you can guess, there are a large number of different keys in the
second and fourth positions and relatively few in the first and third.
)
Some of the values in the db are list which I just represent as comma
separated lists of strings. It is easiest given the data I create the
database from to store the initial value of the list in the database,
and append succeeding values to the list as I read them from the
input.
Problem: I get errors such as
  dbm store returned -1, errno 28, key "class^\InfrastructureManager^\method"
	at db.pl line 42, <> line 810.
I get errno 28 to be No space left on device. This shouldn't have been
the case; there were >50M free on the disk. BUT, I guess this is
telling:
% ls -sl class*
  28 -rw-rw-r--  1 olson       61440 Feb  6 22:25 classinfo.dir
4840 -rw-rw-r--  1 olson    491334656 Feb  6 22:25 classinfo.pag
Not much real storage, but lotsa holes.... (Note: this is for the
preallocated version (see below). The unpreallocated versions showed
similar behavior but with the apparant size only in the 100M range).
The first thing I tried was to make each key a constant length, so
that instead of $db{"class", "X"} I had $db{"class", "X", '', '', ''}.
This didn't seem to change much. Suspecting that appending to database
elements (eg $db{"class"} = $db{"class"} . ',' . $elt) was hurting me,
I tried preallocating the value via
	$db{"class"} = ' ' x 100; $db{"class"} = $firstValue
but that didn't seem to help either.
From what I did it may be obvious that I do not understand the
workings of dbm files...
Any suggestions?
--bob
PS. Apologies if this gets posted twice; the first one didn't show up on the
nntp server...
--
Bob Olson			University of Illinois at Urbana/Champaign
Internet: rolson@uiuc.edu	UUCP:  {uunet|convex|pur-ee}!uiucdcs!olson
UIUC NeXT Campus Consultant	NeXT mail: olson@fazer.champaign.il.us
"You can't win a game of chess with an action figure!" AMA #522687 DoD #28flee@cs.psu.edu (Felix Lee) (02/08/91)
> dbm store returned -1, errno 28, key "class^\InfrastructureManager^\method" > at db.pl line 42, <> line 810. >I get errno 28 to be No space left on device. This shouldn't have been You're running into ndbm limitations. Ndbm returns ENOSPC when your key+data exceeds PBLKSIZ bytes (in <ndbm.h>). This may be mentioned in the "BUGS" section of your ndbm(3) man page. (The SunOS man page claims 4096 bytes, but <ndbm.h> says 1024.) Ndbm also runs into problems when all the data that hashes to the same 32-bit integer overflows a PBLKSIZ block, but I don't think it returns ENOSPC in this case. -- Felix Lee flee@cs.psu.edu
olson@sax.cs.uiuc.edu (Bob Olson) (02/08/91)
Yep, that seems to be it. Credit also to Randal who emailed a similar answer. My solution? Tokenize all strings inserted into the database using another dbm database for the string->integer conversion and an array for the integer->string conversion. It works quite well, and the resulting databases are a LOT smaller, with seemingly small performance hit. The database ends up consisting of entries like 1^\3^\7^\190 --> 4,2,10,5 --bob