[news.software.b] dbz caveat

gary@dgcad.SV.DG.COM (Gary Bridgewater) (09/26/89)

I switched to dbz in my B 2.11.17 and have noticed a pretty good performance
improvement for several weeks now. Suddenly, last Thursday my expire
times jumped from an hour or so to 4+ hours (I keep a 30 day history).
Then, starting Friday, I noticed that processing incoming news was taking
between 2 and 3 minutes per article whether it was a duplicate or not. I spent
the whole weekend fiddling and getting farther and farther behind - 2 minutes
to process an article means you only get to process 720 articles a day.
Finally, tonight, I decided I better rethink DBZ so I went into the code
and found
    /*
       Set this to the something several times larger than the maximum # of
       lines in a history file.  It should be a prime number.
    */
    #define INDEX_SIZE 99991L

My history file is sitting at 5Mb, an average history line is ~40 bytes ->
120,000 lines. OOOPS! I bumped INDEX_SIZE up to 1000003, did an expire -R
( 30 minutes ), and am now processing articles at a rate of 4-5/minute.
Another symptom of this is if your server nntpd's start chewing up cpu time.
-- 
Gary Bridgewater, Data General Corp., Sunnyvale Ca.
gary@sv4.ceo.sv.dg.com or 
{amdahl,aeras,amdcad,mas1,matra3}!dgcad.SV.DG.COM!gary
No good deed goes unpunished.

karl@ddsw1.MCS.COM (Karl Denninger) (09/27/89)

In article <1139@svx.SV.DG.COM> gary@svx.SV.DG.COM (Gary Bridgewater) writes:
>I switched to dbz in my B 2.11.17 and have noticed a pretty good performance
>improvement for several weeks now. Suddenly, last Thursday my expire
>times jumped from an hour or so to 4+ hours (I keep a 30 day history).
...
>Finally, tonight, I decided I better rethink DBZ so I went into the code
>and found
>    /*
>       Set this to the something several times larger than the maximum # of
>       lines in a history file.  It should be a prime number.
>    */
>    #define INDEX_SIZE 99991L
>
>My history file is sitting at 5Mb, an average history line is ~40 bytes ->
>120,000 lines. OOOPS! I bumped INDEX_SIZE up to 1000003, did an expire -R
>( 30 minutes ), and am now processing articles at a rate of 4-5/minute.

Ok, how about this one?

Dbz also appears to have a nasty habit of not noticing if you have a
duplicate under some conditions.  That is, articles which are still in the
history file at times show up again if they are received twice!

This didn't start happening until we changed to dbz from dbm.  Is there a
fix for it?  We're running "C" News....

--
Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl)
Public Access Data Line: [+1 312 566-8911], Voice: [+1 312 566-8910]
Macro Computer Solutions, Inc.		"Quality Solutions at a Fair Price"

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (09/27/89)

>Dbz also appears to have a nasty habit of not noticing if you have a
>duplicate under some conditions.  That is, articles which are still in the
>history file at times show up again if they are received twice!
>
>This didn't start happening until we changed to dbz from dbm.  Is there a
>fix for it?  We're running "C" News....

Do you have the lowercasing of article ids set right?  Can anyone confirm
this?





-- 
Branch Technology            |  zeeff@b-tech.ann-arbor.mi.us
                             |  Ann Arbor, MI

karl@ddsw1.MCS.COM (Karl Denninger) (09/29/89)

In article <9668@b-tech.ann-arbor.mi.us> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
>>Dbz also appears to have a nasty habit of not noticing if you have a
>>duplicate under some conditions.  That is, articles which are still in the
>>history file at times show up again if they are received twice!
>>
>>This didn't start happening until we changed to dbz from dbm.  Is there a
>>fix for it?  We're running "C" News....
>
>Do you have the lowercasing of article ids set right?  Can anyone confirm
>this?

Sure do.  We're using dbz 1.5; I got the new one in the mail from you, but
it is missing the ".h" file.... thus I can't compile that one.

I have changed the "LIMIT" parameter to something really gross (1000003L as
suggested) from the default and rebuilt the history (again).  We'll see if
the problem disappears.  We did have somewhere in the area of 70k entries in
there before....

Also on the "strange" list:

A number of articles that should have expired did not.  It would appear that
history entries are being lost rather than simply overwritten!

--
Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl)
Public Access Data Line: [+1 312 566-8911], Voice: [+1 312 566-8910]
Macro Computer Solutions, Inc.		"Quality Solutions at a Fair Price"

todd@ivucsb.sba.ca.us (Todd Day) (09/30/89)

karl@ddsw1.MCS.COM (Karl Denninger) writes:
~Dbz also appears to have a nasty habit of not noticing if you have a
~duplicate under some conditions.  That is, articles which are still in the
~history file at times show up again if they are received twice!

Are you using the dbz from contrib/dbz?  If not, you should switch.
A good check is to try "nm /usr/lib/libdmb.a" or whatever you call
the dbz library.  If "rfc822ize" shows up, then you are probably using
the proper dbz library.

If you look at the dbz source in contrib/dbz, the line with the B news
kludge regarding lowercase() is commented out.  This is the key.  I did
a check on all the duplicate articles hitting my site, and they all
had uppercase after the "@" sign (usually .COM or .UUCP or .EDU).  The
problem is that relaynews calls the dbz store() function with the upper
case version, but the bad version of dbz does the check against a lower
case version.  If you comment out the "lowercase" line from the dbz
source, it should work.

-- 

Todd Day  |  todd@ivucsb.sba.ca.us  |  ivucsb!todd@anise.acc.com
"Ya know, some day these scientists are going to invent something
	that can outsmart a rabbit" -- Bugs Bunny

karl@ficc.uu.net (Karl Lehenbauer) (10/12/89)

>>Dbz also appears to have a nasty habit of not noticing if you have a
>>duplicate under some conditions.  That is, articles which are still in the
>>history file at times show up again if they are received twice!

Beware that under Sys V/386, the C optimizer breaks dbz, at least under 3.0.
The misbehavior is dbz saying articles are not duplicate that actually are,
and the history file is made to be sick-looking.
-- 
-- uunet!ficc!karl	"The last thing one knows in constructing a work 
			 is what to put first."  -- Pascal

karl@ddsw1.MCS.COM (Karl Denninger) (10/12/89)

In article <6512@ficc.uu.net> karl@ficc.uu.net (Karl Lehenbauer) writes:
>>>Dbz also appears to have a nasty habit of not noticing if you have a
>>>duplicate under some conditions.  That is, articles which are still in the
>>>history file at times show up again if they are received twice!
>
>Beware that under Sys V/386, the C optimizer breaks dbz, at least under 3.0.
>The misbehavior is dbz saying articles are not duplicate that actually are,
>and the history file is made to be sick-looking.

Ok... but I'm running Xenix 2.3.2!

And yes, I compiled that module (by hand) without optimization.

Any other good guesses?

It is STILL happening, even now that I have turned up the hash value to
something rediculous (but still a prime like it says).

--
Karl Denninger (karl@ddsw1.MCS.COM, <well-connected>!ddsw1!karl)
Public Access Data Line: [+1 312 566-8911], Voice: [+1 312 566-8910]
Macro Computer Solutions, Inc.		"Quality Solutions at a Fair Price"

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (10/13/89)

>>Beware that under Sys V/386, the C optimizer breaks dbz

Is there any simple change that will make the optimizer work correctly?
Consider using gcc.


-- 
Branch Technology                  |  zeeff@b-tech.ann-arbor.mi.us
                                   |  Ann Arbor, MI

bill@twwells.com (T. William Wells) (10/14/89)

In article <9680@b-tech.ann-arbor.mi.us> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
: >>Beware that under Sys V/386, the C optimizer breaks dbz
:
: Is there any simple change that will make the optimizer work correctly?
: Consider using gcc.
		 ^ presuming you mean Gnu

The Green Hills compiler, also called gcc, also works. This came with
Microport SysV/386 3.0e.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

epsilon@wet.UUCP (Eric P. Scott) (10/15/89)

In article <1989Oct14.062717.15420@twwells.com> bill@twwells.com
	(T. William Wells) writes:
>The Green Hills compiler, also called gcc, also works. This came with
>Microport SysV/386 3.0e.

Beat me too it.  :-)  We've been running dbz 1.5 compiled with
Green Hills since February with no problems.  I guess it's time
to recompile, though; INDEX_SIZE is at the default 99991 and our
history file has 87140 records!  Any suggestions how big I
should make it?
					-=EPS=-

bill@twwells.com (T. William Wells) (10/16/89)

In article <675@wet.UUCP> epsilon@wet.UUCP (Eric P. Scott) writes:
: In article <1989Oct14.062717.15420@twwells.com> bill@twwells.com
:       (T. William Wells) writes:
: >The Green Hills compiler, also called gcc, also works. This came with
: >Microport SysV/386 3.0e.
:
: Beat me too it.  :-)  We've been running dbz 1.5 compiled with
: Green Hills since February with no problems.  I guess it's time
: to recompile, though; INDEX_SIZE is at the default 99991 and our
: history file has 87140 records!  Any suggestions how big I
: should make it?

You can figure that the newsfeed doubles in volume every year. This
might, now, be an overestimate, but you probably won't go wrong
making that assumption. Figure how long (in years) you don't want to
be bothered with readjusting the size, take its log base two, add one,
and multiply by your current average size. Because the newsfeed flow
isn't smooth, multiply that by one plus three over your average
expiration time. (Three, you ask? An empirical constant: I've seen the
newsfeed volume double for a period that long. Your mileage will
almost certainly vary. :-)

As I recall, you need a prime number. So, you have to find some prime
number larger than the one you just computed. If you just want to
throw darts to find a prime number, you can use the "factor" program
(on Microport, anyway). Alternately, there are factoring programs on
the net. Or you can look in a table of primes, likely available in the
reference section of your local library.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

henry@utzoo.uucp (Henry Spencer) (10/16/89)

In article <1989Oct16.043012.2938@twwells.com> bill@twwells.com (T. William Wells) writes:
>: ... INDEX_SIZE is at the default 99991 and our
>: history file has 87140 records!  Any suggestions how big I
>: should make it?
>
>... Figure how long (in years) you don't want to
>be bothered with readjusting the size, take its log base two...

In case anyone is interested, one of the reasons why an "official" dbz for
C News is being delayed is that I'm experimenting with a variant which
grows the table automatically when it starts to get full.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

moraes@cs.toronto.edu (Mark Moraes) (10/17/89)

In news.software.b you write:

>As I recall, you need a prime number. So, you have to find some prime
>number larger than the one you just computed. If you just want to
>throw darts to find a prime number, you can use the "factor" program
>(on Microport, anyway). Alternately, there are factoring programs on
>the net. Or you can look in a table of primes, likely available in the
>reference section of your local library.

On BSD machines, /usr/games/primes is a useful source -- just remember
to pipe the output through head or a pager -- it runs on forever
otherwise...

According to our version, numbers for the next few years:

100003
200003
400009
800011
1600033
3200003
6400013
12800009
25600013
51200027
102400007
204800017

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (10/17/89)

>In case anyone is interested, one of the reasons why an "official" dbz for
>C News is being delayed is that I'm experimenting with a variant which
>grows the table automatically when it starts to get full.

Since expire has to rebuild it anyway, I propose that this is the time to
check if the size is too small and adjust it (vs in the middle of things).
You could also use the current size of the .pag file to save what the
size currently is.


-- 
Branch Technology                  |  zeeff@b-tech.ann-arbor.mi.us
                                   |  Ann Arbor, MI

epsilon@wet.UUCP (Eric P. Scott) (10/19/89)

In article <1989Oct16.043012.2938@twwells.com> bill@twwells.com
	(T. William Wells) writes:
>You can figure that the newsfeed doubles in volume every year. This
>might, now, be an overestimate, but you probably won't go wrong
>making that assumption.

Actually I can.  SVR3 only has 16 bits of inodes!  (I get to use
65488) The best I can do if things get really tight is to split
off comp onto its own filesystem.  Right now I have about 22,000
inodes free (2 weeks, no inet groups).  If you're right, I better
start making "imminent death" predictions.

Hopefully SVR4 will be out before that's necessary.  I really
don't want to drop expiration below 2 weeks.

					-=EPS=-

bill@twwells.com (T. William Wells) (10/22/89)

In article <688@wet.UUCP> epsilon@wet.UUCP (Eric P. Scott) writes:
: In article <1989Oct16.043012.2938@twwells.com> bill@twwells.com
:       (T. William Wells) writes:
: >You can figure that the newsfeed doubles in volume every year. This
: >might, now, be an overestimate, but you probably won't go wrong
: >making that assumption.
:
: Actually I can.

Um. What I meant is that it probably won't actually double each
year, though it seems to have a doubling time not much larger than
that. So, for the purpose of figuring the history file size,
assuming that it doubles once per year is conservative.

:                  SVR3 only has 16 bits of inodes!

This, on the other hand, is quite another kettle of fish.

:                                                    (I get to use
: 65488) The best I can do if things get really tight is to split
: off comp onto its own filesystem.

Urk. But if you gotta, you gotta.

:                                    Right now I have about 22,000
: inodes free (2 weeks, no inet groups).  If you're right, I better
: start making "imminent death" predictions.

I suspect that, before this becomes a real problem, someone will
have a better solution.

: Hopefully SVR4 will be out before that's necessary.

We can hope. And we can also hope that it won't be such a pig that
my poor little 16MHz '386 with 8M RAM can't run it.

:                                                      I really
: don't want to drop expiration below 2 weeks.

Fortunately for me, I am satisfied to expire at 3 days, since I
save any articles I'm interested in long before that time runs
out.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

stevesc@microsoft.UUCP (Steve Schonberger) (10/27/89)

>: >You can figure that the newsfeed doubles in volume every year. This
>: >might, now, be an overestimate, but you probably won't go wrong
>: >making that assumption.
>
>:                  SVR3 only has 16 bits of inodes!

>This, on the other hand, is quite another kettle of fish.

>:                                                    (I get to use
>: 65488) The best I can do if things get really tight is to split
>: off comp onto its own filesystem.

>Urk. But if you gotta, you gotta.

I'm sure that for the sake of compatibility with machines that are
limited to 64k inodes, someone will come up with a solution.  Right
now it's ugly in the extreme to have different files on different
filesystems, because of the hard links between crossposted articles.
But the patch that allows news to run on systems like VMS that don't
allow hard links could be extended to use that trick (kludge, rather)
where hard links can't be used, and still use hard links where they
will work.  I'm not that up on the innards of news to undertake such a
project, but I'm sure that someone with the need will come up with a
solution, and post it to the benefit of all.  Splitting off comp is
fairly safe, since not much is crossposted between comp and elsewhere,
but as the volume gets larger that might not be such an adequate
solution.  Another possible kludge would be to make a custom link()
for news that duplicates the file when the paths are on different
filesystems, and calls the system link() when they're on the same
filesystem.  It's an uglier kludge, but easier to implement quick and
dirty.  Do Cnews or later revisions of Bnews address these potential
problems in any way?

-- 
	Steve Schonberger	microsoft!stevesc@uunet.uu.net
	"Working under pressure is the sugar that we crave" --A. Lamb

henry@utzoo.uucp (Henry Spencer) (10/27/89)

In article <8236@microsoft.UUCP> stevesc@microsoft.UUCP (Steve Schonberger) writes:
>>:                  SVR3 only has 16 bits of inodes!
>
>... the patch that allows news to run on systems like VMS that don't
>allow hard links could be extended to use that trick ...
>... Do Cnews or later revisions of Bnews address these potential
>problems in any way?

Well, sort of.  C News will automatically try to make a symbolic link
(which is essentially what the VMS hack is, these days) if a hard link
fails.  And expire has an option to deal with this.  It's still rather
clumsy, however.

I'm afraid the real solution is bigger inode numbers.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu