sundar@wheaties.ai.mit.edu (Sundar Narasimhan) (06/09/88)
Hi: I have a question regarding the size of dbm and ndbm databases. In the bugs section of the man pages to these databases there is a comment indicating that these database files (the .dir, and .pag files) cannot be copied, tar'ed etc without filling the holes in them. Does this comment also apply to the dump program that does file system backups? Also, I have about 1000 records each ~600 bytes. The ndbm database created from these records is rather big. (see below) -rw-rw-rw- 1 sundar 196608 Jun 8 03:59 1984/WREQ.dir -rw-rw-rw- 1 sundar 1604270080 Jun 8 03:59 1984/WREQ.pag Is this normal for ndbm databases ? How does this compare with databases created with dbm routines ? Although the size is large, the database is fast for lookups. However, I have a need for a search capability that may need to iterate over all the records. This tends to be slow. Is there any version of these routines that is faster than the std. routines in 4.2/3/Sun OS 3.5? One last question, regarding reclamation of file space upon deletion. Why is this hard to do? and has anyone added this capability? -Sundar (sundar@wheaties.ai.mit.edu)
chris@mimsy.UUCP (Chris Torek) (06/14/88)
In article <16103@brl-adm.ARPA> sundar@wheaties.ai.mit.edu (Sundar Narasimhan) writes: >In the bugs section of the man pages [for dbm and ndbm] there is a >comment indicating that these database files ... cannot be copied, >tar'ed etc without filling the holes in them. (Note that this `merely' wastes disk space. Also, it is not hard to write a variant of `cp' that recreates holes.) >Does this comment also apply to the dump program that does file system >backups? (ref V7/4BSD dump+restor[e]) No. >Is [huge size] normal for ndbm databases? The .pag files grow more or less by powers of two (imagine a tree with the nodes numbered thus: level tree hash mask (in binary) 0: 0 0000 1: 0 1 0001 2: 0 2 1 3 0011 3: 0 4 2 6 1 5 3 7 0111 ------- ------- these these hash to hash to XXX0 XXX1 etc.). In a fresh database, you start at level 0, and to store an item, you compute a hash number for it, then use no bits; all items thus wind up in node 0. When node zero gets full, you `split' to level 1, recomputing the hash for each item in 0; you now use one bit to move about half the data to node 1. If node 0 gets full again, you start using two bits, and (since the things in it all end with a zero bit) about half the items move to node 2; if node 1 gets full, you start using two bits there and about half move to 3. It takes one bit per node per level to remember what was split; this bit can be named by (hash & mask) + mask. If you get (un)lucky with the hash function or have enough data, the hashing could lean on one path through the tree, making the apparent database size much larger. The heart of the whole thing is the loop hash = calculate_hash(item); for (hash_mask = 0;; hash_mask = (hash_mask << 1) + 1) { split_memory_bit = (hash & hash_mask) + hash_mask; if (!bit_is_set(split_memory_bit, split_map_table)) break; } block = hash & hash_mask; /* item is either in node `block' or not there at all */ /* read that node and search for it sequentially */ and, of course the hashing function. (Pseudocode for the split routine is left as an exercise for the student :-) .) In dbm and ndbm, tuples are stored as (key,datum) with `key' being the item hashed above, and the key and its datum being stored next to each other. The format of each node (block) is: (s = sizeof(short), o_d => `offset of datum', o_k => `offset of _key') 0s: item_count (always even) 1s: offset of key 0 2s: offset of datum 0 3s: o_k_1 4s: o_d_1 ... (2n+1)s: o_k_n (2n+2)s: o_d_n <free space> o_d_n: key n o_k_n: datum n ... o_d_0: datum 0 o_k_0: key 0 <end of .pag file block> The length of any item can be computed as (offset of previous item) - (offset of this item), where the `offset' of item number -1 is the block size. >How does this compare with databases created with dbm routines ? In 4.3BSD, dbm is just ndbm with a single static `DBM' variable to hold the (single) database. The 4.3BSD ndbm is a tweaked version of the V7 dbm, with the block sizes expanded. A larger block size means that fewer splits occur (fewer tree levels used), and larger items can be stored, but more items must be searched with each lookup. Fewer splits is desirable to keep the split table size down; if the whole table fits in memory, only one disk read ever needs to be done to find an item. Larger items are desirable for the obvious reason. >Although the size is large, the database is fast for lookups. However, >I have a need for a search capability that may need to iterate over >all the records. This tends to be slow. Is there any version of these >routines that is faster than the std. routines in 4.2/3/Sun OS 3.5? Not as far as I know. In my `multi-keyed' dbm, the format of a block is different. Items are still hashed as always, but instead of storing (key,datum) pairs, when an item is to be a key, I store its datum's hash value and its datum's `in index' in the item's `out hash' and `out index'; when an item is to be a datum, it acquires an `in index'. (Actually, all items get in indices always.) Each item also has a link (reference) count since it might be used as a datum by any number of keys. But the loop I described as the heart of the algorithm must run twice---once to hash and find the key, and again to find the datum given the key's `out hash' and `out index'---so lookups generally take two disk reads instead of one. >One last question, regarding reclamation of file space upon deletion. >Why is this hard to do? You would need to `un-split' blocks. That would not be hard, but without `ftruncate' you could never shorten the file, and even with ftruncate, you cannot get the kernel to replace a disk block with a hole, so it really would not save space after all. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris