[news.software.b] Expire by Date:

news@massey.ac.nz (USENET News System) (03/14/91)

I like c-news a lot, but really miss the ability to expire articles
by the Date: line, rather than the arrival time, as b-news allowed.
It's much slower, of course, but I can think of two situations where
this is very important.  The first is when heaps of old articles appear,
like in comp.os.minix recently.  5Mb of garbage must be gotten rid of,
but it's not nice to have to expire recent articles along with the
garbage.  The second is just a larger example of the first.  We're a
leaf node with one feed.  I keep articles about 10 days and run the news
partition as close to full as I can without constant intervention.  If
we loose our feed for a few days for whatever reason (and it happens
more often than one would hope), we then get say two days of news all 
at once when things come right.  With b-news, I'd switch expiry over
to date posted for a while, and the late articles expire roughly the
same time they would have normally and without bothering any other
articles.  With c-news, I'm forced to set the expiry time for all
groups very low to get rid of this `lump'.  Otherwise, ten days later,
the partition will fill, stopping the feed for two days until the
lump is expired and then receive a new lump and off we go again.

Am I the only one who feels this is important?  Are there plans to add
this feature to c-news?  If not, what is the rational?

Thanx much,
-- 
K.Spagnolo@massey.ac.nz

henry@zoo.toronto.edu (Henry Spencer) (03/15/91)

In article <1991Mar14.012332.20774@massey.ac.nz> K.Spagnolo@massey.ac.nz (Ken Spagnolo) writes:
>... I can think of two situations where
>this is very important.  The first is when heaps of old articles appear,
>like in comp.os.minix recently.  5Mb of garbage must be gotten rid of...

This is being solved in a better way:  the garbage will be thrown away
on arrival rather than waiting for expire to do it.

>If we loose our feed for a few days for whatever reason (and it happens
>more often than one would hope), we then get say two days of news all 
>at once when things come right...
>... I'm forced to set the expiry time for all
>groups very low to get rid of this `lump'.

My own philosophy on this one tends to be "if your system doesn't have
enough resources in reserve to handle surges, then running news is a
poor idea".

Flipping the expiry criterion back and forth between arrival date and
posting date exacts a hideous penalty in execution time, because to do
expiry by posting date requires *reading* tens of thousands of articles
every time, to discover their posting dates.  One of the original motives
behind doing a new expire -- which is sort of what got C News started --
was getting away from scanning every article, which was intolerably slow
even when traffic was an order of magnitude lower.  The only way to make
this practical, really, would be to store the posting date centrally.
I'd rather avoid revamping the history-file format *again*.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) (03/15/91)

In article <1991Mar14.194554.12750@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>My own philosophy on this one tends to be "if your system doesn't have
>enough resources in reserve to handle surges, then running news is a
>poor idea".

While I tend to agree in general I think there is merit in the two cases
mentioned.  In the case of being down, or having your incoming feed
down, the amount of news that can queue up can be as much as your feed
keeps.  The same thing can happen with recirculated old articles.  I
could, right now and very easily, put 362 Meg of old news onto the net.
Could your system handle that kind of "surge"?

>was getting away from scanning every article, which was intolerably slow
>even when traffic was an order of magnitude lower.  The only way to make
>this practical, really, would be to store the posting date centrally.
>I'd rather avoid revamping the history-file format *again*.

What is one more <magic character>668997597 in the history file among
friends? :-)

One idea I considered a while back was to force the articles modified
time stamp, using utimes(2), to be the posting date.  That would reduce
the cost to stat-ing the file instead of reading and parsing its
headers, and with no additional storage overhead.  It would also provide
for a fairly efficient method for news readers to sort the presentation
by posting date.

				Jerry Aguirre

flee@cs.psu.edu (Felix Lee) (03/15/91)

> I could, right now and very easily, put 362 Meg of old news onto the
> net.  Could your system handle that kind of "surge"?

Not yet, but soon.  I've figured out how to implement steady-state
news, and it might even be efficient.  Maybe in a few months.
--
Felix Lee	flee@cs.psu.edu

kurt@rufus.almaden.ibm.com (Kurt Shoens) (03/16/91)

I think the underlying problem is that expire gives one the wrong control
knob.  I don't usually want to remove articles older than 15 days (say).
I want to make enough space available for the news that's coming in.
Normally, expiring by the number of days an article has lived on my
system has the desired effect.  But when the news volume ramps up
for some reason, it becomes more apparent that expiring by age is not
what I need.

What I would rather do is give expire two objectives:  get me back B
blocks of free space and I free inodes.  Then, expire should
essentially rank the articles that I currently have and delete the
least precious (typically, the oldest, but you have to take into
account the Expires:  header) until the objectives have been met.  If
the news flow slows down because of, e.g., Spring Break, then I get to
keep a little more.  If it picks up, I keep a little less.

With this sort of control, I don't think that folks would be flipping
between posting date and arrival date as the expiration criterion.

Or does CNews expire already support what I'm suggesting?
--
Kurt Shoens

rmtodd@servalan.uucp (Richard Todd) (03/16/91)

kurt@rufus.almaden.ibm.com (Kurt Shoens) writes:

>What I would rather do is give expire two objectives:  get me back B
>blocks of free space and I free inodes.  Then, expire should
>essentially rank the articles that I currently have and delete the
>least precious (typically, the oldest, but you have to take into
>account the Expires:  header) until the objectives have been met.  If
>the news flow slows down because of, e.g., Spring Break, then I get to
>keep a little more.  If it picks up, I keep a little less.

>With this sort of control, I don't think that folks would be flipping
>between posting date and arrival date as the expiration criterion.

>Or does CNews expire already support what I'm suggesting?

Hmm.  I once implemented something sort-of along the lines you suggest,
back when I was trying to fit all of Unix plus a small newsfeed on the 80M
internal drive on my Mac (can you say "cramped", boys and girls?).  It took
a rather brute-force approach, simply having a bunch of progressively
tighter "explist" files which a modified version of C News's
$NEWSBIN/expire/doexpire would run through, invoking expire with each
explist file until a certain pre-set amount of free space was cleared up.
Like I said, brute force.  One nice thing about the system is that since
the explists are completely arbitrary, you can tailor the expiry behaviour
to some extent (i.e. make it expire news.groups to the bone before starting
in on comp.unix.lizards :-) I eventually had a shell script set up to
automatically generate all my explists from a single master file.  The 
scheme did a fairly good job of tracking variations in the newsflow,
adjusting the amount of expiration done as required.  (Learned some 
interesting things that way, too, like that newsflow drops substantially 
on the weekends--enough so that the "auto-adjusted" expire times always
increased by at least a day).  
  I've still got the code lying about, even though it doesn't see much 
cause to fiddle expire times ever since I got a bigger disk.  Let me know
if you're interested.
  For the record, I recall that someone else on the net (Chip Salzenberg,
maybe?) thought up the idea of progressive expires for adaptive handling
of expiration.  His scheme was somewhat more elaborate, in that it would
actually compute a new explist on each pass, instead of relying on explists
already created in advance.   I went for simple instead of elaborate...
--
Richard Todd	rmtodd@uokmax.ecn.uoknor.edu  rmtodd@chinet.chi.il.us
	rmtodd@servalan.uucp

henry@zoo.toronto.edu (Henry Spencer) (03/17/91)

In article <50464@olivea.atc.olivetti.com> jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) writes:
>could, right now and very easily, put 362 Meg of old news onto the net.
>Could your system handle that kind of "surge"?

No, and neither could yours, since your proposed changes just make it
expire earlier -- they don't eliminate it on arrival, which is what is
really needed.

>One idea I considered a while back was to force the articles modified
>time stamp, using utimes(2), to be the posting date.  That would reduce
>the cost to stat-ing the file instead of reading and parsing its
>headers...

Still pretty expensive, unfortunately.  Name lookups cost a lot, even on
systems with namei caches.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

flee@cs.psu.edu (Felix Lee) (03/18/91)

>What I would rather do is give expire two objectives: get me back B
>blocks of free space and I free inodes.

Working on it.  You will be able to say something like
	keep 15M free /news/spool
and a continuously-running expire process will try to ensure that
there's always at least 15 megabytes of free space in /news/spool.
--
Felix Lee	flee@cs.psu.edu

henry@zoo.toronto.edu (Henry Spencer) (03/20/91)

In article <580@rufus.UUCP> shoens@ibm.com writes:
>What I would rather do is give expire two objectives:  get me back B
>blocks of free space and I free inodes.  Then, expire should
>essentially rank the articles that I currently have and delete the
>least precious ...
>Or does CNews expire already support what I'm suggesting?

No provision for it at present.  I thought about this a bit, long ago,
but getting a precise definition of "least precious", in the presence
of complications like different expiry times for different groups, is
tricky.  I decided that I didn't know what the policy should be and
so I wouldn't try.
-- 
"[Some people] positively *wish* to     | Henry Spencer @ U of Toronto Zoology
believe ill of the modern world."-R.Peto|  henry@zoo.toronto.edu  utzoo!henry

jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) (03/20/91)

In article <1991Mar17.012032.9351@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <50464@olivea.atc.olivetti.com> jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) writes:
>>could, right now and very easily, put 362 Meg of old news onto the net.

>No, and neither could yours, since your proposed changes just make it
>expire earlier -- they don't eliminate it on arrival, which is what is
>really needed.

Well, actually, I think it could.  NNTP is going to stop accepting
xfers when my free disk gets down to 5 Meg.  When the regular expire
does not free up enough space the script will run "expire -n junk -p -e
7 -E 60".  That will get rid of the old postings.  The second expire
runs fairly quickly as it only has to look at the "junk" postings, not
every bit of news.  Granted there would be a hickup in the flow but I
expect more of that would be from my feeds' problems rather than mine.
Of course if the feeds were via UUCP ....

But back to the issue of handling old articles.  I am a little leary of
just trashing them.  Suppose the problem is not with the articles but
rather with the system date.  Every once an a while the service guys
will run something that clobbers the machine date real good.  (NTP has
helped reduce this problem.)  I dislike the idea of the system just
trashing the articles though B news's technique of putting them in junk
for two weeks is not that great either.

If one does trash them then should one add them to the history file?
If not then they can be resent and trashed several times.  If they are
then one can not get them again after the system date is corrected.

How about adding old article IDs to the history file but with the
posting date instead of the arrival date?  That way they will expire
thru the normal process and even the ID will flush out of the history
file.  One has to parse the posting date anyway and this would only
apply to old articles so it would not effect normal operation.  One
could even install the articles in the normal groups with the knowledge
that they will not outlast the next regular expire.  (Of course they
should not get forwarded on.)

					Jerry Aguirre