[news.software.b] dexpire

dexpire@ftp.ee.lbl.gov (Craig Leres) (01/12/91)

I finally got fed up of having to babysit my spool partition and wrote
a dynamic expire. Interested parties are invited to participate in an
alpha test. Following successful completion, I plan to post dexpire to
alt.sources and also make it available via anonymous ftp.

I think my code is pretty solid but its not completely inconceivable
that it could trash your spool partition. This means that you shouldn't
ask to be in on the alpha unless you're pretty sure you can deal with
any problems that develop. We run cnews so I need some bnews people to
help me figure out the details of using dexpire with bnews.

There are two obvious quanities this program could consider; time and
disk space. That is, decide which articles to delete based on their
relative ages or based on the relative disk consumption of their
newsgroups.

I chose to base dexpire's decisions on time since it seems more
intuitive to me. After all, if a bunch of huge articles show up, it
makes more sense to me that the length of time all articles are kept
goes down a bit instead of massively gouging one newsgroup or two
newsgroups. Also, if you base your decisions on disk space, you either
have to stat() all the articles in the spool partition (or do ugly
things like try to cache disk usage information).

Dexpire. It's fast. It's rational. Manual entry and README are
appended.

		Craig
------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	dexpire.lpr
#	README
# This archive created: Fri Jan 11 23:08:50 1991
export PATH; PATH=/bin:$PATH
echo shar: extracting "'dexpire.lpr'" '(6218 characters)'
if test -f 'dexpire.lpr'
then
	echo shar: will not over-write existing file "'dexpire.lpr'"
else
sed 's/^X//' << \SHAR_EOF > 'dexpire.lpr'
X
XNAME
X     dexpire - dynamic expire for netnews
X
XSYNOPSIS
X     dexpire [ -dnv ] [ -a active ] [ -c dexplist ]
X         [ -s spool_dir ] [ -f Kbytes ]
X
XDESCRIPTION
X     Dexpire deletes old news articles. Collections of newsgroups
X     (called "classes") are assigned priorities. These priorities
X     are used to dynamically determine how long articles in  each
X     class  may  be kept so that a specified amount of disk space
X     is made available.
X
X     Unlike expire(8), dexpire does not rebuild the history file.
X     The  administrator  must  arrange  to do this some other way
X     (see below).
X
XOPTIONS
X     The -d flag causes internal data structures  to  be  dumped.
X     This can be useful when debugging a new control file.
X
X     The -n flag prevents  dexpire  from  actually  removing  any
X     articles. It can be informative to use this flag in conjunc-
X     tion with -v to see what dexpire would do if turned loose.
X
X     The -v flag causes verbose information to  be  displayed  to
X     stdout. This flag may be used more than once to get more and
X     more detailed information.
X
X     The -a flag specifies an alternate active file.
X
X     The -c flag specifies an alternate control file.
X
X     The -s flag specifies an alternate spool directory.
X
X     The -f flag is used to specify the desired  number  of  free
X     Kbytes  upon  exit.   The default is 4000 Kbytes. A trailing
X     'M' specifies Mbytes, e.g.  "-f 4M" means 4096 Kbytes.
X
XFILE FORMATS
X     The control file, dexplist, configures  dexpire  and  has  a
X     format  similar  to the explist file used by expire(8). Com-
X     ment lines begin with '#'. The first field specifies one  or
X     more  newsgroups and/or newsgroup trees (multiples should be
X     separated by commas). The special keyword  all  matches  any
X     group  and  usually  appears as the last rule in the control
X     file.
X
X     The second field is a single letter that specified that  the
X     line  applies  only to moderated (m), unmoderated (u), or to
X     either kind of newsgroup (x).
X
X     The third field specifies the priority of  the  group;  high
X     priority  groups  are  kept longer than low priority groups.
X     The priorities are relative to each other which  means  that
X     the  ratio  of  priorities  determines how long articles are
X     kept.  For example if one group has  half  the  priority  of
X     another  group,  articles  in it are only kept half as long.
X     Groups with priority zero (0) are never expired.
X
X     The optional forth field specifies a minimum number of  days
X     to  keep articles. It's most commonly used with low priority
X     groups. However, overzealous use of this  feature  leads  to
X     the kind of problems dexpire was written to avoid.
X
X     The first line of the control file that applies to  a  given
X     newsgroup is used to determine is class.
X
XIMPLEMENTATION
X     Here's what dexpire does when it runs.  First, it checks  to
X     see  how  much space is free. (If there is nothing to do, it
X     exits.) Next, it reads  the  control  rules  from  dexplist.
X     Using  these rules, it reads the active file and places each
X     newsgroup into its appropriate class. It also keeps track of
X     the first and last articles in each newsgroup. Next, the age
X     of the oldest article in each newsgroup and class is  deter-
X     mined.   The  class ages are used to calculate the number of
X     days to keep the highest (or  "standard")  class.  (This  is
X     also  known as the "standard" number of days.) No article is
X     kept for more than the standard number of days; articles  in
X     lower  priority  classes  are  kept for less time.  Finally,
X     passes are made over the classes, starting with  the  lowest
X     priority and working up. If a pass completes without freeing
X     enough disk space, the number of standard days is lowered by
X     a  small  amount  and  a new pass is started. The process is
X     repeated as many times as necessary  to  free  the  required
X     amount of disk space.
X
X     Note that dexpire keeps track of how much space it has freed
X     by  adding  up  the  block counts from each article deleted.
X     This prevents it from getting confused by other activity  in
X     the spool partition.
X
XHISTORY REBUILD
X     Currently there's no good way to rebuild the  history  file.
X     We  run  regular  expire(8)  once a week with a control file
X     that specifies to keep history entries for at least 30  days
X     and  to  unconditionally  delete articles that are more than
X     120 days old. But since expire(8) insists  on  reading  each
X     and  every  article, this is more expensive than it needs to
X     be (for our purposes, anyway). A better solution would be to
X     write  a  utility  to  rebuild  the  history file (or add an
X     option to expire(8)).
X
X     It's usually a good idea  to  run  updatemin  afterwards  to
X     update  the "minimum" fields in the active file.  Otherwise,
X     later dexpire runs may waste cpu time and  some  newsreaders
X     (e.g. rn) may get confused.
X
XFILES
X     /usr/new/lib/news/active - newsgroup article information
X     /usr/new/lib/news/dexplist - control file
X     /usr/spool/news - news spool partition
X
XSEE ALSO
X     expire(8)
X
XAUTHOR
X     Craig Leres - leres@ee.lbl.gov
X
XBUGS
X     A large number of batched articles (or other  files  in  the
X     spool  partition)  can  cause  dexpire to free up more space
X     than is necessary. Perhaps it should be smart enough to  see
X     how much space is used in the in.coming directory.
X
X     Since dexpire uses stat(2) to  determine  the  age  articles
X     (instead  of  reading  headers)  it's possible to fool it by
X     modifying articles in the spool partition.   However,  since
X     it's  interested  in  the oldest article in each class, this
X     shouldn't cause problems unless the oldest article in  every
X     group of a particular class has the wrong timestamp.
X
X     Explicit Expires headers are completely ignored.
X
X     There should be an option to  consider  inodes  rather  than
X     Kbytes since inodes are sometimes the critical resource.
X
X     Currently, dexpire only knows how to use  statfs(2)  to  get
X     filesystem statistics.
X
SHAR_EOF
if test 6218 -ne "`wc -c < 'dexpire.lpr'`"
then
	echo shar: error transmitting "'dexpire.lpr'" '(should have been 6218 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'README'" '(4084 characters)'
if test -f 'README'
then
	echo shar: will not over-write existing file "'README'"
else
sed 's/^X//' << \SHAR_EOF > 'README'
X@(#) $Header: article,v 1.1 91/01/11 23:52:42 leres Exp $ (LBL)
X
X    README for dexpire
X
XHere is the dexpire distribution. Hopefully, you should find the
Xfollowing files:
X
X	Makefile	- compilation rules
X	README		- this file
X	dexpire.8	- manual entry
X	dexpire.c	- main program
X	disk.c		- disk usage routines
X	file.c		- active and dexplist parsers
X	util.c		- random utility routines
X	version.c	- release version number and date
X	dexpire.h	- configuration
X	disk.h		- forward declarations
X	file.h		- forward declarations
X	util.h		- forward declarations
X	patchlevel.h	- patchlevel (just in case there are bugs)
X	dexplist	- sample control file
X	dodexpire	- sample dexpire script
X	blocktest.c	- block size test program
X
X    Installation Instructions
X
XWe are a cnews site and so these instructions are biased towards
Xcnews.  This package is known to compile and nominally run under SunOS
X3.5 and SunOS 4.1 on Sun 3's and Sun 4's (and under Ultrix, thanks to
XStan Barber). Dexpire uses statfs() to determine disk usage. If you
Xdon't have statfs(), you'll have to write your own version of disk.c.
XSince disk_usage() is only invoked once, it would be acceptable to
Xfork() and parse the output of /bin/df. If your trying to build on a
XSequent, you need to add "-lseq" to "LIBS" in the Makefile to link in
Xgetopt(3).
X
XFirst test to make sure that dexpire's assumptions about the filesystem
Xblock size are correct:
X
X    make blocktest
X
XThis program checks to make sure that the st_blocks field of the stat
Xstructure. Dexpire assumes that the units are in 512 byte blocks
X(perhaps rounded up to the next even block because the filesystem
Xfragment size is 1024 bytes). If blocktest doesn't successfully build
Xand report "success," running dexpire might be dangerous. (I'm not
Xpositive there are any Unix systems this test will fail on but it helps
Xme sleep better).
X
XNext, configure dexpire.h. Although the location of the spool
Xdirectory, active and dexpire files can be changed with flags, it's
Xusually more convenient to have the builtins correct. It shouldn't be
Xnecessary to change the DTIME or TOGO limits. MAX_FREE is a safety and
Xmight need to be increased if you have a really, really small spool
Xpartition.
X
XNow configure the Makefile. If you don't have gcc, comment out the CC
Xline. It might also be necessary to change the target in the install
Xrule.
X
XNow configure a dexplist file. If you currently use the cnews expire,
Xit's pretty easy to convert a explist file to the dexplist format;
Xbasically, edit it down so that you only have the first 3 fields.
XOtherwise, make a copy of the sample dexplist and start hacking.
X
XNow compile and test. It's strongly recommended that you use -n until
Xyou're sure everything works ok. Try something like:
X
X    dexpire -vn -f 10000
X
XPut the output on a file so you can examine it at your leisure. Make
Xsure the reported disk statistics match what /bin/df says. The first
Xpass should always find at least one article that could be deleted. The
Xend of the run should report a reasonable number of articles and the
Xcorrect number of bytes to be deleted.
X
XThe -d flag can useful in debugging a new dexplist file. It's a good
Xidea to check the output of:
X
X    dexpire -vdn -f 10000
X
Xto make sure newsgroups end up in the classes you expect.
X
XIf your new dexpire policy differs from your old expire setup, it isn't
Xunusual for there to be hundreds of consecutive unproductive passes.
XBut since dexpire caches article timestamps, these extra passes only
Xuse up a little extra cpu time; and after the new policy catches up,
Xthings will settle down to 10 to 20 passes.
X
XOur news node is a Sun 3/50 with a CDC Wren V running SunOS 3.5. Our
Xspool partition is about 350 Mbytes. It usually takes about 12 minutes
Xto do a daily dexpire (including updatemin). Run time depends pretty
Xmuch depend on the number articles deleted.
X
XPlease send comments, suggestions, bug reports, etc to:
X
X    Craig Leres
X    leres@ee.lbl.gov (ucbvax!leres for uucp weenies)
X    Lawrence Berkeley Laboratory
X    One Cyclotron Road
X    Mail Stop 46A-1123
X    Berkeley, California 94720
SHAR_EOF
if test 4084 -ne "`wc -c < 'README'`"
then
	echo shar: error transmitting "'README'" '(should have been 4084 characters)'
fi
fi # end of overwriting check
#	End of shell archive
exit 0