doug@zaphod.prime.com (08/04/89)
I have a general question about ndbm. It seems that if you fetch the same data record twice you get null data. This seems like faulty behavior. If you store to the key you can then fetch the record again. Can anyone explain why this is so and how one can reset the db without closing and reopening the db? Thanks Douglas Rand Internet: doug@primerd.prime.com Snail: Prime Computer, 500 Old Conn Path, MS10C-17, Framingham, Ma 01701 Disclaimer: PRIME doesn't believe a word I say, and fewer that I write.
chris@mimsy.UUCP (Chris Torek) (08/05/89)
In article <34200002@zaphod> doug@zaphod.prime.com writes: >I have a general question about ndbm. It seems that if you fetch the same >data record twice you get null data. This seems like faulty behavior. It would be, but this does not happen on any machine on which we have seen ndbm used (VAX, Tahoe, Sun 2, Sun 3, Sun 4, DECstation 3100, etc). What does happen is that data stored by store() can come back in different places in memory from different fetch() operations, and that the dptr value returned by fetch points into a private buffer that is overwritten by further calls to fetch() and other operations. You must copy the data out of this private buffer if you want them to hang around across calls. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
doug@zaphod.prime.com (08/07/89)
I judged that the storage used in the dptr value was probably reused. I hadn't made assumptions about the persistance of datum -> dptr. What happens is something like: datum *x; x = dbm_fetch(thedb, thekey); x -> dptr[x -> dsize] = '\0'; printf("%s\n", x -> dptr); // Something is printed x = dbm_fetch(thedb, thekey); x -> dptr[x -> dsize] = '\0'; printf("%s\n", x -> dptr); // Null is printed ------------------------------------------------------------------------------ Douglas Rand Internet: doug@primerd.prime.com Snail: Prime Computer, 500 Old Conn Path, MS10C-17, Framingham, Ma 01701 Disclaimer: PRIME doesn't believe a word I say, and fewer that I write.
chris@mimsy.UUCP (Chris Torek) (08/08/89)
In article <34200003@zaphod> doug@zaphod.prime.com writes: >What happens is something like: > > datum *x; > x = dbm_fetch(thedb, thekey); > x -> dptr[x -> dsize] = '\0'; > printf("%s\n", x -> dptr); // Something is printed > > x = dbm_fetch(thedb, thekey); > x -> dptr[x -> dsize] = '\0'; > printf("%s\n", x -> dptr); // Null is printed This code will not even compile without warnings, and is clearly not the code you used, since dbm_fetch returns a `datum' and not a `datum *'. At a guess, I would say the original code is rather more like this: datum getkey() { datum key; char buf[SOMESIZE]; key.dptr = buf; key.dsize = nnn; return (key); } ... thekey = getkey(); x = dbm_fetch(thedb, thekey); x.dptr[x.dsize] = '\0'; printf("%s\n", x.dptr); x = dbm_fetch(thedb, thekey); x.dptr[x.dsize] = '\0'; printf("%s\n", x.dptr); in which it is just chance that the *first* call worked; or else, the original code is something like this: thekey = dbm_fetch(thedb, otherkey); <same as before> -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chris@mimsy.UUCP (Chris Torek) (08/08/89)
In article <18933@mimsy.UUCP> I wrote: >[someone else's code was probably something like this]: > x = dbm_fetch(thedb, thekey); > x.dptr[x.dsize] = '\0'; > printf("%s\n", x.dptr); and---O fool that I was---did not even notice that this corrupts the memory buffer returned from dbm_fetch. You are NOT allowed to scribble on the memory pointed to by x.dptr! In this particular case, the result is to clobber the key that follows the key `thekey' in the ndbm buffer, or, if that is the last datum, the next byte in the buffer, which is most likely unused, but could be one byte of a two-byte index. In any case, the operation is not legal, and while a subsequent fetch of the same key is likely to work, nothing is guaranteed. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
doug@zaphod.prime.com (08/09/89)
Chris - thanks very much. It was not stated anywhere in the man page (aren't man pages just the niftyest place to go for information?) that the storage is sacred and an obvious hack was to modify the value in place. I've fixed my code to do a bcopy to a buffer before playing with the data and now it's all copacetic. ------------------------------------------------------------------------------- Douglas Rand Internet: doug@primerd.prime.com Snail: Prime Computer, 500 Old Conn Path, MS10C-17, Framingham, Ma 01701 Disclaimer: PRIME doesn't believe a word I say, and fewer that I write.
madd@bu-cs.BU.EDU (Jim Frost) (08/12/89)
My peeve about ndbm is that it doesn't really allow "combined key/data of up to 4096 bytes". It'll let you *try*, but with larger key/data combinations, you'll get large numbers of failures on similar keys. We found that you don't get reliable insertion unless the records are pretty short (circa 128 bytes; 1024 failed a lot and 2048+ failed much of the time; for our use anything smaller than 1k was pretty useless). The simple way to get around this is to use ndbm to maintain indeces into a file of records, which works great, but I still tore my hair out before doing this. If someone in-the-know could explain this behavior to me, I would be grateful. jim frost software tool & die madd@std.com
chris@mimsy.UUCP (Chris Torek) (08/12/89)
In article <36321@bu-cs.BU.EDU> madd@bu-cs.BU.EDU (Jim Frost) writes: >My peeve about ndbm is that it doesn't really allow "combined key/data >of up to 4096 bytes". It'll let you *try*, but with larger key/data >combinations, you'll get large numbers of failures on similar keys. >We found that you don't get reliable insertion unless the records are >pretty short (circa 128 bytes; 1024 failed a lot and 2048+ failed much >of the time; for our use anything smaller than 1k was pretty useless). All strings that hash to the same 32-bit value wind up in the same block. It sounds as though you had strings that defeated the hash function. This is not supposed to happen, but you could fiddle with ndbm_calchash() and see if you could get `more unique' values. WARNING: Any change made to the hash function invalidates all old databases. Vendors should tread with care. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
madd@bu-cs.BU.EDU (Jim Frost) (08/13/89)
In article <19044@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: |All strings that hash to the same 32-bit value wind up in the same |block. It sounds as though you had strings that defeated the hash |function. This is not supposed to happen, but you could fiddle with |ndbm_calchash() and see if you could get `more unique' values. I'd be interested in finding out what the hash algorithm is. We were using keys something like: 0001diagram3244.a The first 4 digits were used because records could be larger than 4096 bytes and we just chained them together by serializing a portion of the key. The remainder was used as a key to the complete record. Since records could be more-or-less arbitrarily long, we tried to use larger data sizes. The first try was to use 2048, giving us records far smaller than the 4096 key/data limit. Insertion failures happened very quickly (within a few hundred insertions) even when the serialized portion of the key wasn't being used very much. Dropping the size of the data portion helped a lot; it worked most of the time when the data was less than 128 bytes long. While I admit that we were abusing the library (I was on a real tight schedule at the time or I would have done it differently at the outset), the man page didn't even imply that this kind of thing might fail, much less fail regularly. Our solution was to use ndbm to find a page number of the first page in a data page file, which worked effectively and was certainly more in line with the way ndbm should be used :-). I still have a lot of questions regarding ndbm, most of which I could answer for myself if I had any way of looking at the code. jim frost software tool & die madd@std.com