[news.software.b] Cnews active min field AND nn database expire

storm@texas.dk (Kim F. Storm) (07/04/89)

lmb@vicom.com (Larry Blair) writes:

>henry@utzoo.uucp (Henry Spencer) writes:
>> Any sensible news reader has
>> to be prepared to go in and read the directory itself to find out what
>> articles are really present.  Given that, what use is min?

>Apparently the newly posted NN newsreader isn't sensible.  It maintains
>an article database on which a crude expire is performed, based on the
>min field. 

I agree with Henry Spencer that reading the spool directory is the best
way to do things, and I am starting an investigation on how this can be
done in nn (which currently uses the min field to determine whether expire 
is necessary on the database).

But before I can utilize this in nn, NNTP must be expanded to provice
the following information:

 - Modification time on each news directory (to know whether it has changed)
 - A list of all article numbers in the group (basically an ls on the
   directory with all non-numeric entries eliminated).

Actually, I would very much appreciate if the active file contained
a time-stamp on the group (instead of min) so it was possible to
see whether the group has changed since the last reading of the
active file (it would eliminate the need for the first change to NNTP
above - just get the active file).

For the time being, nn will work with C news, but expire on the database
must be initiated manually (e.g. from cron).

> A full rebuild of the database is too expensive to do every
>night.

But nn's full expire (-E) functionality could be dramatically improved
by using the already available information from the database rather
than recollecting the information from the spool directory/via NNTP.
The default (non -E) expire will reuse the database information,
but it is based on the min/max numbers in the active file and will not
check for expired articles in that range (that is a bug).

To conclude:  Henry is right about calling the news readers relying on
the min (and max) field broken, but then he (Cnews et.al) and NNTP
must provide the proper tools to fix this:

	Put last update (expire/new article) time stamps in the active file.

	NNTP must provide a *true* LIST-ARTICLE-NUMBERS-IN-GROUP command.

(nn can do without both, but it will have severe performance penalties -
and that was neither the intent of nn nor Cnews, right?)

-- 
Kim F. Storm        storm@texas.dk        Tel +45 429 174 00
Texas Instruments, Marielundvej 46E, DK-2730 Herlev, Denmark
	  No news is good news, but nn is better!

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (07/05/89)

In article <352@texas.dk>, storm@texas.dk (Kim F. Storm) writes:
> lmb@vicom.com (Larry Blair) writes:
> 
> >henry@utzoo.uucp (Henry Spencer) writes:
> >> Any sensible news reader has
> >> to be prepared to go in and read the directory itself to find out what
> >> articles are really present.  Given that, what use is min?
> 
> >Apparently the newly posted NN newsreader isn't sensible.  It maintains
> >an article database on which a crude expire is performed, based on the
> >min field. 
> 
> I agree with Henry Spencer that reading the spool directory is the best
> way to do things, and I am starting an investigation on how this can be
> done in nn (which currently uses the min field to determine whether expire 
> is necessary on the database).
> ...
>
> Kim F. Storm        storm@texas.dk        Tel +45 429 174 00
> Texas Instruments, Marielundvej 46E, DK-2730 Herlev, Denmark


Reading the spool tree directly may be fine if it is small.  It is a
dangerous idea if the tree is big.

Sgi.sgi.com expires at 32 days, and some newsgroups (esp. internal ones)
have thousands of active articles.  Most people here nfs mount /usr/lib/new
and /usr/spool/news, and execute readers in the former.  (Posting uses Mark
Callow's postnews-mail hack.  sgi:/etc/rmtab has 620 lines.)  This works
fine, except when several people decide to `find /`, or as has happened
recently, tell nn to do whatever it does to initialize itself.  Whether
measured in ethernet traffic, load on bridges and routers, load on the NFS
server, or latency for everyone else, the results are not pretty.

Pity a system with slower file or network systems.


Vernon Schryver
Silicon Graphics
vjs@sgi.com

amanda@intercon.uu.net (Amanda Walker) (07/06/89)

In article <352@texas.dk>, storm@texas.dk (Kim F. Storm) writes:
> But before I can utilize this in nn, NNTP must be expanded to provice
> the following information:
> 
>  - Modification time on each news directory (to know whether it has changed)
>  - A list of all article numbers in the group (basically an ls on the
>    directory with all non-numeric entries eliminated).
> 

#1 could be tricky, but you can fake #2 with

	XHDR Message-ID n-m

where m is the lowest article and n is the high number from the active file.
n can be gotten either from the active file in news B, from your own database
if you're willing to keep track, or if all else fails, '0'.  This will
give you message-id's as well, of course, but that can come in handy too.

--
Amanda Walker  <amanda@intercon.uu.net>
InterCon Systems Corporation

storm@texas.dk (Kim F. Storm) (07/06/89)

In article <37422@sgi.SGI.COM> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes:

>Reading the spool tree directly may be fine if it is small.  It is a
>dangerous idea if the tree is big.

Suppose an rn user starts reading a group he has never read before, and
(with Cnews) the active file says that article numbers in that group
range from 0 to 45000.  Any news reader just relying on the active file
will start asking for articles 1, 2, 3, 4, 5, ... to 45000, i.e. it
will issue 45000 stat or open calls.  And just to find out that the
only existing articles are in the range 44950 to 45000.

If the news reader had read the directory first, it would have found
that it only contained the files 44950, 44951, ... 45000, and could
restrict itself to open just those files.

Even when the active file contains a 'valid' min article value
(!Cnews), it may still be better to read the directory before
accessing the articles.  Suppose that the `min' article is an old
article with a very long expiration date, you can have the following
actual directory contents:  500, 44999, 45000 (three files).  And the
news reader would still do ~45000 stat/open calls.

BTW, nn is different:  In all cases, only the nnmaster program
maintaining the nn database will have to access the news spool
directories to see which articles are really there.  The nn news
reader gets the `result' of these efforts, and never tries to
access a file that is not in the spool directory (unless it has been
expired or cancelled without notifying the nnmaster).

>Most people here nfs mount /usr/lib/new
>and /usr/spool/news, and execute readers in the former.  This works
>fine, except when several people decide to `find /`, or as has happened
>recently, tell nn to do whatever it does to initialize itself.

But that does not happen every day!?  Once nn has initialized its
database, the network traffic related to news reading will be much
lower than with other news readers, because only the selected
articles needs to be accessed from the news server.

>Whether
>measured in ethernet traffic, load on bridges and routers, load on the NFS
>server, or latency for everyone else, the results are not pretty.

What about moving nn's database to the machine containing the news
files?  Then nnmaster would not have to go via the network to access the
articles, and all nn users could then share the nn database via NFS?
Of course this puts a bit more load on the net when reading news, but
it will still be a lot less than with ordinary news readers.
-- 
Kim F. Storm        storm@texas.dk        Tel +45 429 174 00
Texas Instruments, Marielundvej 46E, DK-2730 Herlev, Denmark
	  No news is good news, but nn is better!