gary@dgcad.SV.DG.COM (Gary Bridgewater) (09/26/89)
I switched to dbz in my B 2.11.17 and have noticed a pretty good performance improvement for several weeks now. Suddenly, last Thursday my expire times jumped from an hour or so to 4+ hours (I keep a 30 day history). Then, starting Friday, I noticed that processing incoming news was taking between 2 and 3 minutes per article whether it was a duplicate or not. I spent the whole weekend fiddling and getting farther and farther behind - 2 minutes to process an article means you only get to process 720 articles a day. Finally, tonight, I decided I better rethink DBZ so I went into the code and found /* Set this to the something several times larger than the maximum # of lines in a history file. It should be a prime number. */ #define INDEX_SIZE 99991L My history file is sitting at 5Mb, an average history line is ~40 bytes -> 120,000 lines. OOOPS! I bumped INDEX_SIZE up to 1000003, did an expire -R ( 30 minutes ), and am now processing articles at a rate of 4-5/minute. Another symptom of this is if your server nntpd's start chewing up cpu time. -- Gary Bridgewater, Data General Corp., Sunnyvale Ca. gary@sv4.ceo.sv.dg.com or {amdahl,aeras,amdcad,mas1,matra3}!dgcad.SV.DG.COM!gary No good deed goes unpunished.
karl@ddsw1.MCS.COM (Karl Denninger) (09/27/89)
In article <1139@svx.SV.DG.COM> gary@svx.SV.DG.COM (Gary Bridgewater) writes: >I switched to dbz in my B 2.11.17 and have noticed a pretty good performance >improvement for several weeks now. Suddenly, last Thursday my expire >times jumped from an hour or so to 4+ hours (I keep a 30 day history). ... >Finally, tonight, I decided I better rethink DBZ so I went into the code >and found > /* > Set this to the something several times larger than the maximum # of > lines in a history file. It should be a prime number. > */ > #define INDEX_SIZE 99991L > >My history file is sitting at 5Mb, an average history line is ~40 bytes -> >120,000 lines. OOOPS! I bumped INDEX_SIZE up to 1000003, did an expire -R >( 30 minutes ), and am now processing articles at a rate of 4-5/minute. Ok, how about this one? Dbz also appears to have a nasty habit of not noticing if you have a duplicate under some conditions. That is, articles which are still in the history file at times show up again if they are received twice! This didn't start happening until we changed to dbz from dbm. Is there a fix for it? We're running "C" News.... -- Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl) Public Access Data Line: [+1 312 566-8911], Voice: [+1 312 566-8910] Macro Computer Solutions, Inc. "Quality Solutions at a Fair Price"
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (09/27/89)
>Dbz also appears to have a nasty habit of not noticing if you have a >duplicate under some conditions. That is, articles which are still in the >history file at times show up again if they are received twice! > >This didn't start happening until we changed to dbz from dbm. Is there a >fix for it? We're running "C" News.... Do you have the lowercasing of article ids set right? Can anyone confirm this? -- Branch Technology | zeeff@b-tech.ann-arbor.mi.us | Ann Arbor, MI
karl@ddsw1.MCS.COM (Karl Denninger) (09/29/89)
In article <9668@b-tech.ann-arbor.mi.us> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes: >>Dbz also appears to have a nasty habit of not noticing if you have a >>duplicate under some conditions. That is, articles which are still in the >>history file at times show up again if they are received twice! >> >>This didn't start happening until we changed to dbz from dbm. Is there a >>fix for it? We're running "C" News.... > >Do you have the lowercasing of article ids set right? Can anyone confirm >this? Sure do. We're using dbz 1.5; I got the new one in the mail from you, but it is missing the ".h" file.... thus I can't compile that one. I have changed the "LIMIT" parameter to something really gross (1000003L as suggested) from the default and rebuilt the history (again). We'll see if the problem disappears. We did have somewhere in the area of 70k entries in there before.... Also on the "strange" list: A number of articles that should have expired did not. It would appear that history entries are being lost rather than simply overwritten! -- Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl) Public Access Data Line: [+1 312 566-8911], Voice: [+1 312 566-8910] Macro Computer Solutions, Inc. "Quality Solutions at a Fair Price"
todd@ivucsb.sba.ca.us (Todd Day) (09/30/89)
karl@ddsw1.MCS.COM (Karl Denninger) writes:
~Dbz also appears to have a nasty habit of not noticing if you have a
~duplicate under some conditions. That is, articles which are still in the
~history file at times show up again if they are received twice!
Are you using the dbz from contrib/dbz? If not, you should switch.
A good check is to try "nm /usr/lib/libdmb.a" or whatever you call
the dbz library. If "rfc822ize" shows up, then you are probably using
the proper dbz library.
If you look at the dbz source in contrib/dbz, the line with the B news
kludge regarding lowercase() is commented out. This is the key. I did
a check on all the duplicate articles hitting my site, and they all
had uppercase after the "@" sign (usually .COM or .UUCP or .EDU). The
problem is that relaynews calls the dbz store() function with the upper
case version, but the bad version of dbz does the check against a lower
case version. If you comment out the "lowercase" line from the dbz
source, it should work.
--
Todd Day | todd@ivucsb.sba.ca.us | ivucsb!todd@anise.acc.com
"Ya know, some day these scientists are going to invent something
that can outsmart a rabbit" -- Bugs Bunny
karl@ficc.uu.net (Karl Lehenbauer) (10/12/89)
>>Dbz also appears to have a nasty habit of not noticing if you have a >>duplicate under some conditions. That is, articles which are still in the >>history file at times show up again if they are received twice! Beware that under Sys V/386, the C optimizer breaks dbz, at least under 3.0. The misbehavior is dbz saying articles are not duplicate that actually are, and the history file is made to be sick-looking. -- -- uunet!ficc!karl "The last thing one knows in constructing a work is what to put first." -- Pascal
karl@ddsw1.MCS.COM (Karl Denninger) (10/12/89)
In article <6512@ficc.uu.net> karl@ficc.uu.net (Karl Lehenbauer) writes: >>>Dbz also appears to have a nasty habit of not noticing if you have a >>>duplicate under some conditions. That is, articles which are still in the >>>history file at times show up again if they are received twice! > >Beware that under Sys V/386, the C optimizer breaks dbz, at least under 3.0. >The misbehavior is dbz saying articles are not duplicate that actually are, >and the history file is made to be sick-looking. Ok... but I'm running Xenix 2.3.2! And yes, I compiled that module (by hand) without optimization. Any other good guesses? It is STILL happening, even now that I have turned up the hash value to something rediculous (but still a prime like it says). -- Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl) Public Access Data Line: [+1 312 566-8911], Voice: [+1 312 566-8910] Macro Computer Solutions, Inc. "Quality Solutions at a Fair Price"
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (10/13/89)
>>Beware that under Sys V/386, the C optimizer breaks dbz Is there any simple change that will make the optimizer work correctly? Consider using gcc. -- Branch Technology | zeeff@b-tech.ann-arbor.mi.us | Ann Arbor, MI
bill@twwells.com (T. William Wells) (10/14/89)
In article <9680@b-tech.ann-arbor.mi.us> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
: >>Beware that under Sys V/386, the C optimizer breaks dbz
:
: Is there any simple change that will make the optimizer work correctly?
: Consider using gcc.
^ presuming you mean Gnu
The Green Hills compiler, also called gcc, also works. This came with
Microport SysV/386 3.0e.
---
Bill { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com
epsilon@wet.UUCP (Eric P. Scott) (10/15/89)
In article <1989Oct14.062717.15420@twwells.com> bill@twwells.com (T. William Wells) writes: >The Green Hills compiler, also called gcc, also works. This came with >Microport SysV/386 3.0e. Beat me too it. :-) We've been running dbz 1.5 compiled with Green Hills since February with no problems. I guess it's time to recompile, though; INDEX_SIZE is at the default 99991 and our history file has 87140 records! Any suggestions how big I should make it? -=EPS=-
bill@twwells.com (T. William Wells) (10/16/89)
In article <675@wet.UUCP> epsilon@wet.UUCP (Eric P. Scott) writes: : In article <1989Oct14.062717.15420@twwells.com> bill@twwells.com : (T. William Wells) writes: : >The Green Hills compiler, also called gcc, also works. This came with : >Microport SysV/386 3.0e. : : Beat me too it. :-) We've been running dbz 1.5 compiled with : Green Hills since February with no problems. I guess it's time : to recompile, though; INDEX_SIZE is at the default 99991 and our : history file has 87140 records! Any suggestions how big I : should make it? You can figure that the newsfeed doubles in volume every year. This might, now, be an overestimate, but you probably won't go wrong making that assumption. Figure how long (in years) you don't want to be bothered with readjusting the size, take its log base two, add one, and multiply by your current average size. Because the newsfeed flow isn't smooth, multiply that by one plus three over your average expiration time. (Three, you ask? An empirical constant: I've seen the newsfeed volume double for a period that long. Your mileage will almost certainly vary. :-) As I recall, you need a prime number. So, you have to find some prime number larger than the one you just computed. If you just want to throw darts to find a prime number, you can use the "factor" program (on Microport, anyway). Alternately, there are factoring programs on the net. Or you can look in a table of primes, likely available in the reference section of your local library. --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com
henry@utzoo.uucp (Henry Spencer) (10/16/89)
In article <1989Oct16.043012.2938@twwells.com> bill@twwells.com (T. William Wells) writes: >: ... INDEX_SIZE is at the default 99991 and our >: history file has 87140 records! Any suggestions how big I >: should make it? > >... Figure how long (in years) you don't want to >be bothered with readjusting the size, take its log base two... In case anyone is interested, one of the reasons why an "official" dbz for C News is being delayed is that I'm experimenting with a variant which grows the table automatically when it starts to get full. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
moraes@cs.toronto.edu (Mark Moraes) (10/17/89)
In news.software.b you write: >As I recall, you need a prime number. So, you have to find some prime >number larger than the one you just computed. If you just want to >throw darts to find a prime number, you can use the "factor" program >(on Microport, anyway). Alternately, there are factoring programs on >the net. Or you can look in a table of primes, likely available in the >reference section of your local library. On BSD machines, /usr/games/primes is a useful source -- just remember to pipe the output through head or a pager -- it runs on forever otherwise... According to our version, numbers for the next few years: 100003 200003 400009 800011 1600033 3200003 6400013 12800009 25600013 51200027 102400007 204800017
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (10/17/89)
>In case anyone is interested, one of the reasons why an "official" dbz for >C News is being delayed is that I'm experimenting with a variant which >grows the table automatically when it starts to get full. Since expire has to rebuild it anyway, I propose that this is the time to check if the size is too small and adjust it (vs in the middle of things). You could also use the current size of the .pag file to save what the size currently is. -- Branch Technology | zeeff@b-tech.ann-arbor.mi.us | Ann Arbor, MI
epsilon@wet.UUCP (Eric P. Scott) (10/19/89)
In article <1989Oct16.043012.2938@twwells.com> bill@twwells.com (T. William Wells) writes: >You can figure that the newsfeed doubles in volume every year. This >might, now, be an overestimate, but you probably won't go wrong >making that assumption. Actually I can. SVR3 only has 16 bits of inodes! (I get to use 65488) The best I can do if things get really tight is to split off comp onto its own filesystem. Right now I have about 22,000 inodes free (2 weeks, no inet groups). If you're right, I better start making "imminent death" predictions. Hopefully SVR4 will be out before that's necessary. I really don't want to drop expiration below 2 weeks. -=EPS=-
bill@twwells.com (T. William Wells) (10/22/89)
In article <688@wet.UUCP> epsilon@wet.UUCP (Eric P. Scott) writes: : In article <1989Oct16.043012.2938@twwells.com> bill@twwells.com : (T. William Wells) writes: : >You can figure that the newsfeed doubles in volume every year. This : >might, now, be an overestimate, but you probably won't go wrong : >making that assumption. : : Actually I can. Um. What I meant is that it probably won't actually double each year, though it seems to have a doubling time not much larger than that. So, for the purpose of figuring the history file size, assuming that it doubles once per year is conservative. : SVR3 only has 16 bits of inodes! This, on the other hand, is quite another kettle of fish. : (I get to use : 65488) The best I can do if things get really tight is to split : off comp onto its own filesystem. Urk. But if you gotta, you gotta. : Right now I have about 22,000 : inodes free (2 weeks, no inet groups). If you're right, I better : start making "imminent death" predictions. I suspect that, before this becomes a real problem, someone will have a better solution. : Hopefully SVR4 will be out before that's necessary. We can hope. And we can also hope that it won't be such a pig that my poor little 16MHz '386 with 8M RAM can't run it. : I really : don't want to drop expiration below 2 weeks. Fortunately for me, I am satisfied to expire at 3 days, since I save any articles I'm interested in long before that time runs out. --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com
stevesc@microsoft.UUCP (Steve Schonberger) (10/27/89)
>: >You can figure that the newsfeed doubles in volume every year. This >: >might, now, be an overestimate, but you probably won't go wrong >: >making that assumption. > >: SVR3 only has 16 bits of inodes! >This, on the other hand, is quite another kettle of fish. >: (I get to use >: 65488) The best I can do if things get really tight is to split >: off comp onto its own filesystem. >Urk. But if you gotta, you gotta. I'm sure that for the sake of compatibility with machines that are limited to 64k inodes, someone will come up with a solution. Right now it's ugly in the extreme to have different files on different filesystems, because of the hard links between crossposted articles. But the patch that allows news to run on systems like VMS that don't allow hard links could be extended to use that trick (kludge, rather) where hard links can't be used, and still use hard links where they will work. I'm not that up on the innards of news to undertake such a project, but I'm sure that someone with the need will come up with a solution, and post it to the benefit of all. Splitting off comp is fairly safe, since not much is crossposted between comp and elsewhere, but as the volume gets larger that might not be such an adequate solution. Another possible kludge would be to make a custom link() for news that duplicates the file when the paths are on different filesystems, and calls the system link() when they're on the same filesystem. It's an uglier kludge, but easier to implement quick and dirty. Do Cnews or later revisions of Bnews address these potential problems in any way? -- Steve Schonberger microsoft!stevesc@uunet.uu.net "Working under pressure is the sugar that we crave" --A. Lamb
henry@utzoo.uucp (Henry Spencer) (10/27/89)
In article <8236@microsoft.UUCP> stevesc@microsoft.UUCP (Steve Schonberger) writes: >>: SVR3 only has 16 bits of inodes! > >... the patch that allows news to run on systems like VMS that don't >allow hard links could be extended to use that trick ... >... Do Cnews or later revisions of Bnews address these potential >problems in any way? Well, sort of. C News will automatically try to make a symbolic link (which is essentially what the VMS hack is, these days) if a hard link fails. And expire has an option to deal with this. It's still rather clumsy, however. I'm afraid the real solution is bigger inode numbers. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu