[news.software.b] expire

forrie@morwyn.UUCP (Forrie Aldrich) (06/06/91)

I'm flabbergastered.  For some reason my EXPIRE command isn't expiring
a lot of different articles... in particular I have noticed that some
of the articles that are crossposted into groups that I don't get here
don't expire... I have to manually delete them.  This can't be right, and
I would appreciate some advice here... 

There version of news I have is:  Bnews 2.11 patchlevel 19 ... which is
the latest and greatest if I am not mistaken...

Please respond via email to:

... uunet!virgin!unhtel!morwyn!forrie

as I don't regularly get the news.*.* groups on my node.

Thanks in advance...

Forrie
-- 

--------------------=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--------------------
Forrest Aldrich, Jr.|   (a reliable path here someday)   |forrie@morwyn.UUCP
                    |           <email paths>            | 
CREATIVE CONNECTIONS|  uunet!virgin!unhtel!morwyn!forrie |Graphic Illustration
------------------\-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=/------------------
                   \___ PO Box 1541 - Dover, NH  03820 ___/                   

rusty@anasaz.UUCP (Rusty Carruth) (06/20/91)

Some time back someone remarked about the glut of "expire" replacements,
and how that they felt that people were going astray by not using the
"supplied" expire program.  (massivly interpreted/filtered observation
there - please note that this was/is NOT intended as a flame, nor am I
intending to be casting aspersions on ANYBODY's work!  I am VERY thankful
for all the software others have supplied which supports news!)

I've finally come to the point that I am able to make a semi-rational
statement as to why I believe that this happens.

When I run expire, I am usually wanting to free up some given amount
of disk space (for incoming news to land in, for example).  I rarely
am thinking of limiting the length of time articles hang around in
a newsgroup - I simply want to get some free disk space without
angering my users too much.

So, when I "write" an expire table, what do I enter?  Retention times.
and, if I don't get enough space from the expire, I either have to
make the times shorter (and remember to go back and change it after
expire has finished), manually remove some stuff, or ...

I currently administer 2 news machines.  One is in the process of
being converted to cnews, the other is still on Bnews (the one I'm
posting from, as it happens).  I've been using "reap", with some mods,
here on "this machine" (anasaz) for some time with pretty good results,
except that I've got a VERY complex reap list, which makes adding new
news groups a pain.

The algorithm I'm hoping to implement runs something like this:

  Set a goal for the amount of free space you want now.
  lump newsgroups into one of 3 categories: junk, good, archive (archive
      is not currently being done)
  set "high" and "low" limits for each newsgroup (see below for their use)
  set "rate" values for each newsgroup (also see below)

  for each junk group, expire anything older than 1 day and see if you have
  freed up the space desired if so stop, if not, continue to:

  for each good newsgroup:
  
     expire any articles older than the "high" limit for this newsgroup

     if avail space > desired space, stop

  end for

  if we still need more space:

  for each newsgroup  (note - junk groups included)

    if oldest article is older than "low limit" then

      expire "rate" days of articles (i.e. if rate = 1, expire 1 days worth)

    if avail space > desired space, stop

  end for

Another person here has an idea based upon priorities and such, but it seemed
even harder to implement than my hare-brained idea :-)

Reading the doc for expire, it looks like I could add another field to the
middle field of the history line which contains the SIZE of the file (thus
saving me from having to scan the entire directory structure to calculate
file sizes).  Would there be any massive problems with doing this?

(Note that my intention is to run the above algorithm from top to bottom,
THEN actually remove the files, thus allowing me to traverse the tree only
once)

Also, I take it that a '-' in the second subfield means that the
article has been expired?

Anybody crazy enough to help me on this insane project?

Would the "powers-that-be" be interested in including my version of
"param_expire" (or whatever in the world it turns out to be called)
in future Cnews's (as an optional method for expiration)?  (Assuming
that I get it finished this century...)

Is this even a good idea, in other folks' minds?  (PLEASE, if you reply
to this question, notice and address the issue of <why it is we run expire 
in the first place>  (see paragraph 3 above)).

"Raving wildly, Rusty hits the "s" key in rn"  :-)

Rusty


{ames!ncar!noao!asuvax,mcdphx}!anasaz!rusty      anasaz!rusty\ 
73 de Rusty Carruth, N7IKQ  (602) 870-3330   anasaz.UUCP!rusty>@asuvax
P.O. Box 27001, Tempe, AZ 85285             rusty%anasaz.UUCP/   \.eas.asu.edu

adeboer@gjetor.geac.COM (Anthony DeBoer) (06/21/91)

In article <4313@anasaz.UUCP> rusty@anasaz.UUCP (Rusty Carruth) writes:
> [ proposed spec for freespace-based expire program ]
>
>(Note that my intention is to run the above algorithm from top to bottom,
>THEN actually remove the files, thus allowing me to traverse the tree only
>once)

Actually, C news expire never traverses the /usr/spool/news tree.  It reads
through the history file, decides what to do with each line, and reaches into
the spool directories only to unlink() (or archive) articles.  It will
normally also (ie. unless you use the -r option) rewrite the history file to
reflect the deletions.

If I was sitting down to write your program, I'd use expire to do the dirty
work (it's already written, it's fast, and it works), feeding more-or-less
severe explist files to it and then checking freespace to see if the next pass
should be taken or not.  You could either write a series of explist files
manually, calling them explist.1, explist.2, and so on, each reflecting one
pass of your algorithm, or write a program (which could be an awk script) to
generate the n-th version from a master file containing additional parameters.

A shell script based on the existing "doexpire" script could handle taking the
appropriate number of passes everytime cron invokes it (and you could have it
start up periodically during the day and check if you're really tight on space
and do a pass or so, and feed it a different parameter on the cron command
line at night to do a proper cleanup).  If you want to get fancy, have it save
the "severity level" it ran at the last time, and use this when you start off.
If there's a lot of freespace, back off a level or two, then start with an
expire pass at the appropriate level.

(BTW, if you want to do a run to delete only "junk" groups, you could feed
expire an explist that tells it that all groups except the ones listed stick
around for 999 days, for example.)

Just as a disclaimer, even though my gut feeling was originally that I needed
something like this on my system, it's turned out that a pretty-near-vanilla C
News is working quite happily here, so I've never sat down to actually
implement such a thing.  The only real problem I've had with news is that
newsrun, spacefor, and relaynews were conspiring to use up all my inodes,
which I've patched, and Henry tells me they're looking at doing a proper fix
in the next major release.
-- 
Anthony DeBoer  NAUI#Z8800                             adeboer@gjetor.geac.com
Geac Canada Ltd., Toronto                             uunet!geac!gjetor!adeboer

flee@cs.psu.edu (Felix Lee) (06/22/91)

>[...], feeding more-or-less severe explist files to it and then
>checking freespace to see if the next pass should be taken or not.

This is called "progressive expire".  Several people have implemented
various forms of this, posted to alt.sources and such.

I've been sporadically working on implementing pure space-based
expiry.  You set a target amount of space free or space used in
whatever newsgroups you like, and in a single pass enough articles are
removed to satisfy the constraints.

The advantage of this is that expiry can be a continuous process.  As
you receive news you can remove a corresponding amount of old news so
your disk space usage remains at a steady state.  This should let you
run smoothly with tight space, especially with some cooperation from
"spacefor".

The disadvantage of this is that it's probably going to be a little
more expensive than simple date-based expiry.
--
Felix Lee	flee@cs.psu.edu

henry@zoo.toronto.edu (Henry Spencer) (06/25/91)

In article <4313@anasaz.UUCP> rusty@anasaz.UUCP (Rusty Carruth) writes:
>When I run expire, I am usually wanting to free up some given amount
>of disk space (for incoming news to land in, for example).  I rarely
>am thinking of limiting the length of time articles hang around in
>a newsgroup ...

We actually have two different user communities here, with the distinction
a function of how tight your disk space is.  Those of us with reasonably
ample resources (for the moment!) do tend to think about hang-around time.

>  lump newsgroups into one of 3 categories: junk, good, archive (archive
>      is not currently being done)
>  set "high" and "low" limits for each newsgroup (see below for their use)
>  set "rate" values for each newsgroup (also see below)
> ...
>Another person here has an idea based upon priorities and such, but it seemed
>even harder to implement than my hare-brained idea :-)

The main reason why we didn't attempt something like a space-based expire
in C News was the problem of defining what the policy should be.  The more
I tried to write a description, the more complex it got, and the less
obvious it was that people could understand it and that it would meet
their needs.  What you've defined is a plausible approach if your groups
can be split into those three categories easily.

>Reading the doc for expire, it looks like I could add another field to the
>middle field of the history line which contains the SIZE of the file (thus
>saving me from having to scan the entire directory structure to calculate
>file sizes).  Would there be any massive problems with doing this?

I thought very seriously about doing exactly this, in fact, and the only
reason it wasn't done was that in the end I didn't have a use for it.  I
think nothing should mind; nothing in C News depends on having exactly two
subfields, although it's possible that other stuff (NNTP?) does.  Putting
a size in as a third subfield is a reasonable idea, although please do it
in bytes -- the concept of "block" is not portable.

>Also, I take it that a '-' in the second subfield means that the
>article has been expired?

No, no.  Please read the documentation!  It means that no explicit expiry
date was supplied.

>Would the "powers-that-be" be interested in including my version of
>"param_expire" (or whatever in the world it turns out to be called)
>in future Cnews's (as an optional method for expiration)? ...

It's not out of the question, but I'd like to see more attention to a
sophisticated policy mechanism.  As you've specified it so far, it could
be done without too much trouble using iterative running of the existing
expire with tighter (possibly mechanically generated) explists.  Not as
quick as a single pass, but probably acceptably fast for most sites, and
it would be much simpler to set up.
-- 
"We're thinking about upgrading from    | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 to SunOS 3.5."              |  henry@zoo.toronto.edu  utzoo!henry