[news.software.b] New .newsrc format was Dynamic "smart" expiration?

storm@texas.dk (Kim F. Storm) (01/05/90)

brad@looking.on.ca (Brad Templeton) writes:

>Perhaps something simple like:

>groupname[:!] [fieldname=data;]*

>with fields delimited by something like colons or semicolons, and the
>default field (ie anything starting with a digit) is the 'seen articles list'
>-- thus degenerating to the current format.   Leave : and ! as the delim
>after the group name, but add extra fields for other kinds of subscription.

Further requirements:

Specify that the order of fields is arbitrary.

Require that a news reader MUST preserve the fields it does not
understand itself!

Why not just require that the list of read articles should be first on
the line? Allowing boolean fields as well, we could have the format:

	groupname[:!] [list][;fieldname[=data]]*

Maybe we need a clearing house for field names, or the RFC could
contain a list of reserved names?

Then I would like to reserve the following fields:

select=list
	List of selected articles (for nn's "save selections between
	invocations" feature).

new (boolean)
	If present, the user has never seen the group, because all
	articles in the group sofar has been cross-postings read in
	other groups.  (In nn you are not presented with a new group
	until there is actually something posted to it, but it still
	has to mark cross-posted articles in the group as read until
	that happens).

In general I want as little manual editing of .newsrc as possible (to
avoid the risk of a novice user destroying it), and in particular 
the individual setup of each news reader should not be done in
.newsrc, i.e. it should not be burdened with news reader specific
data such as kill-information, macros, and the like.

A rather severe problem I have encountered recently (when converting
nn to use .newsrc instead of its own rc file), is that rn (IMHO) uses
an unpleasant hack to keep track of unsubscribed groups which does not
occur in .newsrc:

It keeps a time-stamp and a seek offset in the active file in a
separate file.  This works just fine until the day where somebody
choses to rebuild the active file (or just sorts it) [the consequences
of this is left as an excercise to the reader :-].  It also makes
simlutaneous use of rn and other news readers problematic if they
don't know about rn's hack.  For example, nn will treat the missing
groups as new groups (with default subscription), and add them to
.newsrc.

If we starts changing the .newsrc file format (maybe we should call it
.newrc :-), I would like to see a clear definition of how unsubscribed
groups are represented, and for the reason given above, the
time-offset is not appropriate in my opinion (btw, how does rrn do
this without direct access to the active file?)

-- 
Kim F. Storm        storm@texas.dk        Tel +45 429 174 00
Texas Instruments, Marielundvej 46E, DK-2730 Herlev, Denmark
	  No news is good news, but nn is better!

moraes@cs.toronto.edu (Mark Moraes) (01/05/90)

storm@texas.dk (Kim F. Storm) writes:
>It keeps a time-stamp and a seek offset in the active file in a
>separate file.  This works just fine until the day where somebody
>choses to rebuild the active file (or just sorts it) [the consequences
>of this is left as an excercise to the reader :-].  It also makes
>simlutaneous use of rn and other news readers problematic if they
>don't know about rn's hack.  For example, nn will treat the missing
>groups as new groups (with default subscription), and add them to
>.newsrc.

This was what the C News better.way patch, and the active.times file
were meant to fix. They offered a slightly cleaner and unambiguous way
of determining newsgroup creation -- the active.times file is always
sorted in order of newsgroup creation at your site, simply because
newgroup will always append to it. (It also means it stores some more
information -- the last new group created, and time it was created,
and the creator) It lets the time of newsgroup creation be determined
by the only authoritative source -- the transport system. (Otherwise,
rn had an interesting technique of going at stat'ing article 1 in every
newsgroup to see if they were newly created, I think) The only support
they require from the transport system is that whatever/whoever
processes the newgroup control message make the appropriate addition
to the active.times file. tail on our active.times file shows:

comp.sys.ncr 630434048 news@cs.purdue.EDU (News Knower)
alt.suicide.holiday 630486601 usenet@well.UUCP
bit.listserv.word-pc 631323032 WHV@PSUVM.BITNET (Bill Verity)
rec.sport.pro-wrestling 631354979 njs@scifi.UUCP (Nicholas J. Simicich)
list.unitex 631398794 lamy@csri.toronto.edu (Jean-Francois Lamy)

With the active.times file and all people that use it running a
patched rn, you can hopefully do what you like with the active file
then. (This is the theory -- I think rn may still go and trash the
.newsrc when it finds the soft pointers invalid even though it is not
supposed to -- I've never dared to experiment:-) rn is really touchy
about the active file changing - I learnt the hard way not to read
news early on Saturday mornings when updact ran...

I remember that Geoff Collyer once mentioned once that if we could
make rn less sensitive to active file changes, the active file could
be sorted in reverse order by the second field (i.e.  highest volume
newsgroup at the top), which might improve news transport performance
further.

loverso@Xylogics.COM (John Robert LoVerso) (01/06/90)

How about taking this on from the other side?  Instead of insisting
that all future news readers correspond to the .newsrc that you can
define today (just as most existing news readers use the .newsrc
originated by readnews!), why not just let each newsreader use the
bookkeeping method it wants.  If you need a way of letting the
news system `know' which groups have been read (for either smart
expiration or arbitron-like statistics), then have an explicit
accounting mechanism by which the newsreader informs the system
whats gone on.  And by all means standardize the format of the
accounting record!  Something that could decode into information
along the lines of:
	user joe in alt.air read 5 saved 2 skipped 3 discarded 113

A nice thing about this is that most of that information is known
in nntpd, and so you could provide psuedo-statistics for unmodified
NNTP clients that way (it might only know about supposedly `read'
articles).  

But, no matter what, a newsreader shouldn't be shackled to using
a particular bookkeeping method.

John
-- 
John Robert LoVerso			Xylogics, Inc.  617/272-8140x284
loverso@Xylogics.COM			Annex Terminal Server Development Group

John

brad@looking.on.ca (Brad Templeton) (01/06/90)

We may not want to force the readers to all use the same format, but if
not we have to define standard mechanisms for calling the programs to convert
to the standard format etc. etc.

That is if you want to have programs that do things like filter/KILL in the
background, or measure readership, or expire read articles etc. etc.

Having a translation mechanism all programs know how to use is clumsy.
Isn't it simpler for the reader to use whatever it wants internally, but
at the end of the session write out the standard format?

As long as the format's extensible.

While it's bulkier, we could just define a standard companion file for
the .newsrc that is extensible, and have it take the place of the
rnlast, .newsrclas and other associated hodgepodge files.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

storm@texas.dk (Kim F. Storm) (01/06/90)

moraes@cs.toronto.edu (Mark Moraes) writes:

>>It keeps a time-stamp and a seek offset in the active file in a
>>separate file.

>This was what the C News better.way patch, and the active.times file
>were meant to fix. They offered a slightly cleaner and unambiguous way
>of determining newsgroup creation -- the active.times file is always
>sorted in order of newsgroup creation at your site, simply because
>newgroup will always append to it.

Very nice, but it still leaves non-C News sites out in the dark.

And I still wonder how rrn manages to do this (if it does?).

>With the active.times file and all people that use it running a
>patched rn, you can hopefully do what you like with the active file
>then. 

Ok, but then somebody must document - maybe as part of newsrc(5) -
where the seek-offset & mod-time (or whatever) into active.times
is stored, and what the format is.

-- 
Kim F. Storm        storm@texas.dk        Tel +45 429 174 00
Texas Instruments, Marielundvej 46E, DK-2730 Herlev, Denmark
	  No news is good news, but nn is better!

peter@ficc.uu.net (Peter da Silva) (01/07/90)

> why not just let each newsreader use the
> bookkeeping method it wants.

Because then you can't use more than one news reader program. I use readnews,
rn, and vnews at different times and for different reasons. I'm sure glad they
all use .newsrc.

But I'd be willing to hack them to accept and maintain extra feilds at the end
of a .newsrc line:

	noise.froup: arti,ic-les; flag=value; flag=; ...

It's called upward compatibility...
-- 
 _--_|\  Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \ Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org>
\_.--._/
      v  "Have you hugged your wolf today?" `-_-'

brad@looking.on.ca (Brad Templeton) (01/08/90)

Actually, since it's usually easier to hack software to ignore lines
than partial lines, and because we don't want to see .newsrc lines getting
longer than the crazy lengths they can already get, I propose that any
options go on lines by themselves, after the normal line.

The standard would be something like "Any line starting with white space
is an option line, pertaining to the most recent newsgroup."

Still requires that the existing programs be modified, but it's not too
painful a hack, I think.

Within these lines. options could be present in B news header format,
something News programs all know how to read:

Global options could appear before the first newsgroup

------
news.admin: 1-7600
	Filtered: 7580
	Kill: chuq@apple.com
	Rnsoft: 5663
news.software.b: 1-2000,2002
alt.sex: 1-54324
-------

and so on.  If you like you could allow multi-options per line, but that
doesn't actually gain you a lot.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473