massar@think.UUCP (07/15/86)
Does anyone out there understand dbm? The manual page says: The sum of the sizes of a key/content pair must not exceed the internal block size (currently 1024 bytes). Moreover all key/content pairs that hash together must fit on a single block. Store will return an error in the event that a disk block fills with inseparable data. However, after looking at the source and having had some experience using dbm I am not convinced that the last sentence is really implemented. Further, if it is implemented, what exactly does it mean? Will my already existent data be destroyed as dbm 'fills the disk block with inseparable data' or will dbm not write the data that would cause the disk block to fill up? The reason I am suspicious is that after creating a large hash table and apparently not getting any errors, certain entries in the database don't exist, for no reason I understand. While I am at it, does anyone know of a dbm-like implementation without this limitation on filling up disk blocks? -- -- JP Massar, Thinking Machines Corporation, Cambridge, MA -- ihnp4!think!massar, massar@think.com -- 617-876-1111
jordan@nike.UUCP (07/16/86)
The reason I am suspicious is that after creating a large hash table and apparently not getting any errors, certain entries in the database don't exist, for no reason I understand. dbm was a hack to get 4.2 out the door -- 4.3 has a new implementation that does the right thing (i.e., dbm_open() returns a DBM *) and the size restriction was raised to 4096 bytes ... however, I think there are still some inconsistancies in the man page. /jordan
chris@umcp-cs.UUCP (07/17/86)
In article <385@nike.UUCP> jordan@nike.UUCP (Jordan Hayes) writes: >dbm was a hack to get 4.2 out the door ... Not quite: dbm was in V7, and was apparently written by Ken Thompson in some connection with a chess program. >4.3 has a new implementation that does the right thing (i.e., >dbm_open() returns a DBM *) and the size restriction was raised to >4096 bytes ... Why is this `the right thing'? My `mdbm' implementation allows both the data and the bitmap block size to be specified at database creation. A larger data block allows storing larger items and/or more that hash to the same value, but also decreases `store' performance. In any case, to answer the original question, when store() runs out of room in a data block, it splits it by using one more hash bit. This normally results in two nearly-half-full blocks, and leaves enough room for the new item. In pathological cases, dbm may have to split blocks some number of times before discovering that there is no way to fit the new item into its eventual destination. This happens only when several large items hash to the same value, which is rather rare, and in this case the new item is left out of the database and an error is returned. The database might then appear extremely large (as much as 2^32 bytes), but should still be properly formed internally. Of course, if the operating system rejects any particular transfer, for whatever reason, the database can become corrupted. In such cases store should also return an error. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
shannon@sun.uucp (Bill Shannon) (07/19/86)
> dbm was a hack to get 4.2 out the door
dbm has nothing to do with getting 4.2 out the door. dbm has
been around since V7, although AT&T seems to have lost their
copy...
ian@sq.UUCP (07/21/86)
In article <5273@sun.uucp> shannon@sun.uucp (Bill Shannon) writes: >> dbm was a hack to get 4.2 out the door > >dbm has nothing to do with getting 4.2 out the door. dbm has >been around since V7, although AT&T seems to have lost their >copy... AT&T (USG) never got a copy from Bell, since System III -> V -> Vr2 -> \(if doesn't come from V7, but from V6. (I think we've been over this before; apologies to those who've heard it before :=) )
guy@sun.uucp (Guy Harris) (07/22/86)
> AT&T (USG) never got a copy (of "dbm") from Bell, since System III -> V > -> Vr2 -> \(if doesn't come from V7, but from V6. (I think we've been > over this before; apologies to those who've heard it before :=) ) If UNIX/TS 1.0 -> PWB/UNIX 2.0 -> UNIX 3.0 -> UNIX 3.0.1 (S3) -> ... came from V6, how come it has: 1) The V7 file system 2) The V7 "stat" structure 3) The V7 "lseek" system call 4) Environment variables 5) The Bourne shell etc., when V6 had *none* of those? The USG/USDL UNIX series came from a UNIX somewhere between what was sent out the door as V6 and what was sent out the door as V7; it was closer to V7 than to V6, though. (The fact that it lacked "dbm" and a few other things is hardly as important as the fact that it had the V7 features listed above.) I'm not sure what you mean by "AT&T" and "Bell" here. It's not relevant, though; the Research group that did V6, V7, etc. and the USG/USDL that did S3, S5, etc. were both part of Bell Laboratories, so they *could* have gotten a copy of "dbm" if they'd wanted to. The pipeline between Research and USG/USDL was less than smooth (consider that a bug in "fgrep", fixed in an addendum to V7, is *still* not fixed in S5!), but it could conceivably have been made to flow better. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)