heiby@mcdchg.chg.mcd.mot.com (Ron Heiby) (05/14/91)
I've looked through the News 2.11 patch 19 source and believe I've found every place where the history DBM file is referenced. When an article arrives on the system, the dbm file is queried to see if the message-id already exists (case insensitive check). If not, then the article winds up getting installed on the system, a line is written to the text history file, and an entry is made to the DBM history file using the message-id as the key and the file offset into the text history file of the line describing the article as the data. While it might be really handy for some software that has a message-id and wants to convert that to pathname(s) or such to have that info in the DBM file that way, I can't find any software that actually makes use of that information. I've written a fairly complete expire Perl script, which I'm about ready to start testing. It occurs to me, though, that if the actual *data* stored in the DBM file isn't used by anyone, just the *key* (whether or not the key exists), then it doesn't really matter whether that information is updated by expire. If no one uses it, expire could simply delete keys for the ancient articles and leave all the others untouched. I would think that this would make for a noticeable speed improvement. Am I all wet, here? Do I actually have to maintain that correspondence between DBM and text forms of the history file? If so, why? What software makes use of the data stored, rather than the fact that some data exists? BTW, has anyone done any speed comparisons between "good old" dbm and GNU dbm? It seems like on my system, the standard 2.11 expire.c runs about three times slower if libgdbm.a (version 1.3) is linked in. -- Ron Heiby, heiby@chg.mcd.mot.com Moderator: comp.newprod "Wrong is wrong, even when it helps you." Popeye
henry@zoo.toronto.edu (Henry Spencer) (05/14/91)
In article <62955@mcdchg.chg.mcd.mot.com> heiby@mcdchg.chg.mcd.mot.com (Ron Heiby) writes: >...and an entry is made to the DBM history file >using the message-id as the key and the file offset into the text >history file of the line describing the article as the data. > >While it might be really handy for some software that has a message-id >and wants to convert that to pathname(s) or such to have that info in >the DBM file that way, I can't find any software that actually makes >use of that information... Some of the readers do. They want to look up articles by message ID; the only quick way to do that is to use the dbm/dbz index to locate the article's history line. >BTW, has anyone done any speed comparisons between "good old" dbm and >GNU dbm? ... It's kind of pointless, since dbz blows the doors off both of them for this application, and keeps much smaller files to boot. -- And the bean-counter replied, | Henry Spencer @ U of Toronto Zoology "beans are more important". | henry@zoo.toronto.edu utzoo!henry
ian@airs.com (Ian Lance Taylor) (05/15/91)
heiby@mcdchg.chg.mcd.mot.com (Ron Heiby) writes: >While it might be really handy for some software that has a message-id >and wants to convert that to pathname(s) or such to have that info in >the DBM file that way, I can't find any software that actually makes >use of that information. GNUS does, at least. Since the subject has come up, I'd like to mention that at least once a week I curse the fact that I can't look up the articles that appear in the References: line (I can do this using GNUS, but on my system it's too slow for me). I hope more newsreaders will add such a feature in the future. -- Ian Taylor ian@airs.com uunet!airs!ian First person to identify this quote wins a free e-mail message: ``Nobody believed him, so out of politeness to his listeners he pretended to be joking.''
jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) (05/16/91)
In article <62955@mcdchg.chg.mcd.mot.com> heiby@mcdchg.chg.mcd.mot.com (Ron Heiby) writes: >While it might be really handy for some software that has a message-id >and wants to convert that to pathname(s) or such to have that info in >the DBM file that way, I can't find any software that actually makes >use of that information. I've written a fairly complete expire Perl As mentioned some news readers do make use of lookup by ID. The idea is that it is possible to read the article mentioned in the refferences line. Of course the trend currently is to include the entire article being refferenced eliminating the need to find the "parent" article. :-) >script, which I'm about ready to start testing. It occurs to me, >though, that if the actual *data* stored in the DBM file isn't used by >anyone, just the *key* (whether or not the key exists), then it >doesn't really matter whether that information is updated by expire. >If no one uses it, expire could simply delete keys for the ancient >articles and leave all the others untouched. I would think that this >would make for a noticeable speed improvement. Yes, you could store 0 bytes of data and fulfill what is required for duplicate suppression. That should result in a smaller history.pag file. But consider, you are storing perhaps 30 bytes of key and 4 bytes of data. Cutting back from 34 to 30 bytes is not going to make a significant improvement. I have a "newalias" program for handling updates to my mail alias file that does dbm adds and deletes instead of rebuilding the entire thing from scratch. It runs about 10 times faster that way. But the history.pag file is a different case. Even if we ignore the period of inconsistancy of the pointers into the text file that whould exist if it was being updated, there is still a more significant problem. The history.pag file depends on being "sparse" and, as it says in the documentation, deleting an entry does not free the disk block. If you went along deleting entries the distribution would eventually result in every disk block in the history.pag file actually being allocated. In other words the physical size of the history.pag file would grow to equal its logical size. Given how the logical size of the history.pag file shocks people until they find out it is not really that big I think this would not be a good idea. It might be OK for a few times but at some point one would want to rebuild from scratch. I have been using the dbz package with my B news and it works great. The history.pag file is lots smaller and the expire is about 10 times faster. The dbz package takes advantage of the fact that both the key and the data are in the text file so it only needs to store the offset. Given that the key is lots bigger than the offset this is a bigger win than not storing the offset. Jerry Aguirre