[news.software.b] Cnews artnum in active file

jv@mh.nl (Johan Vromans) (08/17/90)

I have noted that in the active file, the lower article number always
remains 1. Is this normal?

E.g.:

   comp.os.vms 0000002238 0000000001 y
                          ^^^^^^^^^^
Lowest article number (currently) is 1746, not 1.

	Johan
-- 
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62911/62500
------------------------ "Arms are made for hugging" -------------------------

henry@zoo.toronto.edu (Henry Spencer) (08/17/90)

In article <1990Aug16.185023.26200@squirrel.mh.nl> Johan Vromans <jv@mh.nl> writes:
>I have noted that in the active file, the lower article number always
>remains 1. Is this normal?

This is normal unless you have a crontab entry that occasionally runs
upact (more portable) or updatemin (rather faster).  This is currently
an ill-documented option; it will probably become standard when I get
around to changing it.  We currently run updatemin weekly, for the benefit
of some stupid reader software.  (The lower number is basically an
inadequate kludge that smarter software should never look at, but there
is a lot of dumb software in the world, sigh...)
-- 
It is not possible to both understand  | Henry Spencer at U of Toronto Zoology
and appreciate Intel CPUs. -D.Wolfskill|  henry@zoo.toronto.edu   utzoo!henry

del@thrush.mlb.semi.harris.com (Don Lewis) (08/17/90)

In article <1990Aug17.034849.17801@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <1990Aug16.185023.26200@squirrel.mh.nl> Johan Vromans <jv@mh.nl> writes:
>>I have noted that in the active file, the lower article number always
>>remains 1. Is this normal?
>
>This is normal unless you have a crontab entry that occasionally runs
>upact (more portable) or updatemin (rather faster).  This is currently
>an ill-documented option; it will probably become standard when I get
>around to changing it.  We currently run updatemin weekly, for the benefit
>of some stupid reader software.  (The lower number is basically an
>inadequate kludge that smarter software should never look at, but there
>is a lot of dumb software in the world, sigh...)

Updatemin is fast enough that we run it daily, right after expire.
--
Don "Truck" Lewis                      Harris Semiconductor
Internet:  del@mlb.semi.harris.com     PO Box 883   MS 62A-028
Phone:     (407) 729-5205              Melbourne, FL  32901

tale@turing.cs.rpi.edu (David C Lawrence) (08/17/90)

In article <1990Aug17.034849.17801@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:

   This is normal unless you have a crontab entry that occasionally runs
   upact (more portable) or updatemin (rather faster).

Or optionally put it in doexpire so it is executed as soon as expire is.

   (The lower number is basically an inadequate kludge that smarter
   software should never look at, but there is a lot of dumb software
   in the world, sigh...) 

I agree; the lowest article scheme fails here due to Expires:
sometimes making large holes in groups.  I think I will review the
latest NNTP protocol for a way to put in "this group has n articles in
it" information in it after a readdir and check for S_IFREG files.
This of course isn't terribly efficient, but is accurate right at the
time of doing it.  It's a real loser when trying to present a summary
of groups and relative article volumes.

The nice thing about the min field is it does give me a usually pretty
close estimate of how many articles in the group, take the same amount
of time to figure out no matter how huge the group is.  Since I would
rather have this information slightly wrong sometimes than not at all,
I run updatemin.  Unless I am missing something obvious, which could
well be at the moment, I don't see a wonderful way which smart
newsreaders could come up with that information without doing the
costly operation above each time they wanted.  Of course, some things
like trn and nn keep their own databases (boy, do I love all this
space used on my disk) and run their own daemons, so they can keep an
up-to-date cache of this information somewhere.  Right now, mthreads
(for trn) daemon will only expire things from its database once a day.
--
   (setq mail '("tale@cs.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))

brad@looking.on.ca (Brad Templeton) (08/17/90)

In article <1990Aug17.034849.17801@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>of some stupid reader software.  (The lower number is basically an
>inadequate kludge that smarter software should never look at, but there
>is a lot of dumb software in the world, sigh...)

Want to explain this Henry?

Programs do need the minimum -- for creating reasonable sized bitmaps, for
example.

They can either figure out the minimum (by doing opendir on the spool
directory) or they can get it from the active file, which they already
read.

So you can calculate it 300 times per day in every reading session, or
once, in an upact type program.

So why is this dumb?
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

henry@zoo.toronto.edu (Henry Spencer) (08/17/90)

In article <1990Aug17.071243.16518@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
>>of some stupid reader software.  (The lower number is basically an
>>inadequate kludge that smarter software should never look at, but there
>>is a lot of dumb software in the world, sigh...)
>
>Programs do need the minimum -- for creating reasonable sized bitmaps, for
>example.

Programs should do a directory sweep to find out what articles are *actually
present* rather than making the -- unwise and often wrong -- assumption that
there is a nearly-contiguous sequence between min and max.  The code to do
this has to be present anyway, since no reader in its right mind finds the
next available article by a straight linear search.  Directory reading is
cheap and quick.  (There is admittedly a problem with doing this over NNTP,
which is a serious flaw in NNTP but is no excuse when NNTP is not involved.)

>So you can calculate it 300 times per day in every reading session, or
>once, in an upact type program.

The right way to do it is indeed to do it once, but to record useful
summary information rather than just a single number.  Some of the new
fancy newsreaders are starting to do that.
-- 
It is not possible to both understand  | Henry Spencer at U of Toronto Zoology
and appreciate Intel CPUs. -D.Wolfskill|  henry@zoo.toronto.edu   utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (08/17/90)

In article <SS^%M2&@rpi.edu> tale@turing.cs.rpi.edu (David C Lawrence) writes:
>... I think I will review the
>latest NNTP protocol for a way to put in "this group has n articles in
>it" information in it after a readdir and check for S_IFREG files.

You don't need to bother with the (relatively expensive) S_IFREG check,
actually, if you report an estimate rather than a guaranteed-accurate
number, and filter out non-numeric names.
-- 
It is not possible to both understand  | Henry Spencer at U of Toronto Zoology
and appreciate Intel CPUs. -D.Wolfskill|  henry@zoo.toronto.edu   utzoo!henry

brian@ucsd.Edu (Brian Kantor) (08/18/90)

In article <SS^%M2&@rpi.edu> tale@turing.cs.rpi.edu (David C Lawrence) writes:
>... I think I will review the
>latest NNTP protocol for a way to put in "this group has n articles in
>it" information in it after a readdir and check for S_IFREG files.

Uh, that's already there, dude.  Chapter and verse:

RFC 977                                                    February 1986
Network News Transfer Protocol

3.2.  The GROUP command

3.2.1.  GROUP

   GROUP ggg

   The required parameter ggg is the name of the newsgroup to be
   selected (e.g. "net.news").  A list of valid newsgroups may be
   obtained from the LIST command.

   The successful selection response will return the article numbers of
   the first and last articles in the group, and an estimate of the
   number of articles on file in the group.  It is not necessary that
   the estimate be correct, although that is helpful; it must only be
   equal to or larger than the actual number of articles on file.  (Some
   implementations will actually count the number of articles on file.
   Others will just subtract first article number from last to get an
   estimate.)

   When a valid group is selected by means of this command, the
   internally maintained "current article pointer" is set to the first
   article in the group.  If an invalid group is specified, the
   previously selected group and article remain selected.  If an empty
   newsgroup is selected, the "current article pointer" is in an
   indeterminate state and should not be used.

   Note that the name of the newsgroup is not case-dependent.  It must
   otherwise match a newsgroup obtained from the LIST command or an
   error will result.

3.2.2.  Responses

   211 n f l s group selected
           (n = estimated number of articles in group,
           f = first article number in the group,
           l = last article number in the group,
           s = name of the group.)
   411 no such news group

brad@looking.on.ca (Brad Templeton) (08/18/90)

In article <1990Aug17.163437.2013@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>there is a nearly-contiguous sequence between min and max.  The code to do
>this has to be present anyway, since no reader in its right mind finds the
>next available article by a straight linear search.  Directory reading is

Actually, many readers do exactly that.  By and large, many sites refuse
to accept long expiry dates on most groups (Thanks to the help fo C news
in part) so this is not that big a loss, particularly with caches.

I'll tell you why I don't do it.  Because opendir isn't fully standard yet,
and every variant feature you use is another porting headache.  This may
be an irrational fear -- opendir or a standard 16 byte record directory
format can be found almost everywhere nowadays.  But one just grows to
fear such moves, when a loop of opens is sure to work and generally isn't
far off, either.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

dave@galaxia.Newport.RI.US (News Administrator) (08/20/90)

In article <1990Aug17.071243.16518@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
>In article <1990Aug17.034849.17801@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>>of some stupid reader software.  (The lower number is basically an
>>inadequate kludge that smarter software should never look at, but there
>>is a lot of dumb software in the world, sigh...)
>
>Programs do need the minimum -- for creating reasonable sized bitmaps, for
>example.

Why do you assume that you can create a reasonable sized bitmap based on
the min and max article numbers?  If I have a high volume group that
happens to contain a few articles with long expiration dates I can still
get what would look like a huge group based on max-min but in fact it might
currrently contain significantly less than that.  At one point back in the
days of 2.10.1 (i.e. before the min field was introduced), I was concerned
about overflowing the bitmap array so I wrote a set of functions that
replaced all of the bitmap related macros and used a dynamically created
linked list as the data structure instead of using a statically created
array.  Obviously, calling a function and doing a linked list lookup is not
as fast as having a macro that does a few shifts and an array lookup, but I
challenge anybody to tell the difference between the two when they are
reading news with vnews/rn/trn/etc.  Maybe a really high performance
machine doing some kind of weird news processing in a tight loop could tell
the difference, but not a user who is generating a single bitmap access for
each article that gets displayed on their screen.  The linked list approach
has the really nice advantage of being very difficult to overflow.

I am still using the linked list approach in some programs I have that
analyze .newsrc files and they work quite nicely.  Since these programs are
not actually reading news articles, just analyzing .newsrc files, they are
primarily doing "bitmap" manipulations and I do not feel that they are
suffering any serious performance degradation from using the linked list
functions.

If anybody would like a copy of my code let me know and I will send it out.
-- 
David H. Brierley
Home: dave@galaxia.Newport.RI.US    Work: dhb@quahog.ssd.ray.com
Be excellent to each other.