[news.admin] Disk spaced based news expiring program for testing

brad@looking.UUCP (Brad Templeton) (03/31/88)

: Here is the news deleting program designed to keep your spool directories
: at a fixed size that I promised.  It's been working here for a few weeks
: with no hitches (and no running out of space!).  Try it here in Ontario,
: and send me any bugs or mods that you do.  Thanks.
:
: This is a shar archive.	Extract with sh, not csh.
: The rest of this file will extract:
: expscr docs sizexp.c sizerm.c gofast
echo Extracting expscr
sed 's/^X//' > expscr << 'E-O-F'
Xcd /u/news
X/usr/lib/news/sizexp s=1800 /usr/lib/news/gofast | sort -n -r | /usr/lib/news/sizerm +v b=4500 >>/tmp/expire
E-O-F
echo Extracting docs
sed 's/^X//' > docs << 'E-O-F'
X
XNotes on the spaced based spool directory expiring program:
X
XThis program keeps the most recent N blocks or the most recent N
Xinodes in a dirctory or list of directories.  It releases all other
Xfiles.  This is the way expire should work, as it keeps the size of
Xthe news spools at a constant size, which is what you want.  A contstant
Xage isn't of much value.
X
XIf you're used to coming in to see
X	no space on dev 1/72
X(or whatever) on your console because news got stopped up for a couple
Xof days, then this program is for you.  No matter what the news flow
Xpatterns, your news disk usage (or inode usage) will stay fixed if you
Xrun this program often enough.
X
XThis program works only based on the mtime of the various news article
Xfiles.  No attempt is made to read the files and examine Date-Received
Xheaders.  No attempt is made to look at "Expires" header items.  If you're
Xthis space conscious, you don't want people to be able to make some articles
Xlast longer.  This feature could be added, at a fair penalty of speed.
XNote that if an article is modified by somebody (a rare event) it is as
Xthough it had freshly arrived.
X
XFor articles that expire early, the regular news expire will get rid of them.
X
XThis program simply releases the files.  It doesn't alter news history or
XDBM files at all.  The regular news expire seems able to deal with files
Xbeing deleted, and manages the history file just fine.  Just run it from
Xtime to time, or as often as you used to.
X
XThis system has the ability to expire some groups faster than others --
Xyou can be quite selective.  You can also arrange so that big files expire
Xfaster than small ones, which lets you keep a lot more articles around at
Xthe cost of losing biggies a bit sooner.
X
XIn the unix style, this system is made up from two programs that work as
Xsoftware tools.
X
X-----------
X
XThe first program "sizexp", takes a list of directories on the standard
Xinput along with special time modifiers for each.  It generates a list
Xof all the news articles in the directory, along with their (modified)
Xmodification times and some file system information to keep track of
Xcross-postings.
X
XThis list is then passed to "sort" to sort by the arrival date, so that
Xthe newest articles appear first, and the oldest last.
X
XThe sorted list is then passed to "sizerm" which counts through the
Xlist until it has got N blocks, where N is your desired news size.  It
Xthen releases everything that follows in the list, except for links to
Xstuff that was kept.  Neat and fast.
X
X-----------
XHere's a sample expire script with verbose output, 4.5 megs of news kept,
Xand large articles expired just slightly faster than small ones:
X
Xcd /u/news
X/usr/lib/news/sizexp s=1800 /usr/lib/news/gofast | sort -n -r | /usr/lib/news/sizerm +v b=4500 >>/tmp/expire
X-----------
X
XSIZEXP:
X
X	SIZEXP has one option.
X
X		s=size_modifier
X
X	If you use this option, each file will have "size_modifier" seconds
X	added to its age for every block in the file.  For example, 3600
X	will add one hour for each block to the age of every file.  This
X	means that a 25 block file expires 1 day earlier than a typical
X	1 block file.   Somewhere around 2000 is probably a good number,
X	depending on how long you keep news.
X
XThe standard input to SIZEXP is a list of directories to scan for news
Xarticles.  All directories will be scanned recursively as deep as they
Xgo.  No directory will be visited more than once, however.  This means that
Xyou can list a subdirectory early in the list, and follow later with the
Xparent directory with no problems. 
X
XAll directories must reside in the same filesystem.
X
XThe list should contain directories, one per line.  Optionally, each directory
Xmay be followed by whitespace and an integer or real number specifying the
Xnumber of days to add to the age of files in that directory.  For example,
Xif you want talk.bizarre to go 5 days earlier, you would say:
X
X	talk/bizarre 6
X
XOn the other hand, the number can also be negative.  If you want
Xrec.humor.funny to last two extra days, you can say:
X
X	rec/humor/funny -2
X
XIf you want a group to stay around forever, you could say:
X
X	comp/sources/unix -10000
X
XAnd likewise, to kill a group quickly, you could say:
X	
X	talk/politics/misc 200
X
XNote that since dirctories are only done once, what comes first takes
Xpriority over directories listed later.  A typical file might consist
Xof all the special directories, followed by the single entry of ".",
Xindicating the current directory.  (It's wise to CD to the news directory
Xbefore starting, as that's more efficient.)
X
XOnly files whose names consist of only digits are listed. in the output.
X
XSIZERM:
X	This is the program that actually deletes the files.  It takes
X	the sorted output of SIZEXP as input.  It outputs a few statistics,
X	or more, if asked.
X
X Options:
X
X	b=maxblocks
X		Sets the maximum size of the spool directories
X	i=maxinodes
X		Sets the maximum number of inodes in the spool directories
X	+d
X		Asks to display files to remove, but not actually remove them.
X	+v
X		List all files being deleted
X
X	Sizerm is simple.  It scans down the list, counting filespace until
X	the maximum allocation limit or inode limit is reached.  It releases
X	all other files in the list past this point.  If the file that
X	takes the quota over the space limit is large, slightly more than
X	the limit will be kept.  This is not serious, since news articles
X	are limited to around 50K.
X
XTo set things up:
X
X	Go into the sources and check any defines you may wish to change.
X	The main one is BLOCKSIZE.  This should be your file system's true
X	block size.  Some systems have a true blocksize of 1K or 2K, but
X	have programs like "du" and "ls" report sizes in 512 byte blocks.
X
X	Also set the max inodes to store.  The 15000 that you get in a
X	64K address space should be enough for anybody, but if you have more
X	address space, go for it.
X
X	Compile the two programs with 
X		cc sizexp.c -o sizexp
X		cc sizerm.c -o sizerm
X
X	Install these somewhere appropriate like /usr/lib/news
X
X	Set up a shell script like the one above, and put it in the cron.
X	Run it at least once a day.  If you're tight, run it several times
X	a day.  Your script should cd to the news directory for efficiency.
X
X	In the cron, su news before executing the script, for security.
X
X	Create a directory list.  It can be as simple as "." or as complex
X	as you like.  Modify it as needed.  Use this list as input to
X	the pipeline.
X
X	Keep running the old expire, although perhaps not as frequently.
X
X	Examine the output for a few days until you're sure it's all working.
X
XA typical "gofast" file is found in this distribution.
E-O-F
echo Extracting sizexp.c
sed 's/^X//' > sizexp.c << 'E-O-F'
X
X/*
X * This program scans a list of directories for news files.  It writes out
X * (to stdout) a list of article files found at any level in the listed
X * directories, with times (modified), link counts, file sizes in blocks and
X * inode numbers.  This file is to be sorted and processed with sizerm.
X * File times can be modified with a special number so that certain directories
X * expire more quickly than others.  In addition, file times can be modified
X * according to their size, using the s= option.   For example, s=3600 adds
X * one hour for every block to the age of every file.
X *
X * The list of directories can be given as an argument, or if no argument
X * is given, read from stdin.  Directory names can be followed by a real
X * number, indicating the number of days to add to the age of every file
X * in the listed directory.
X *
X * No directory is done twice, so one may list a subdirectory (probably with
X * a modifier to age it more quickly) and later list the parent directory,
X * probably without the modifier.
X */
X
X/*
X * To compile for your system, set the two #defines below as needed.
X */
X
X/* size of a true filesystem block */
X#define BLOCKSIZE 1024
X/* maximum number of news directories */
X#define MAX_DIRS 2000
X
X#include <stdio.h>
X#include <sys/types.h>
X#include <sys/stat.h>
X#include <sys/ndir.h>
X
X/* number of seconds in a day */
X#define DAYLEN (60*60*24)
X
Xino_t known_dirs[MAX_DIRS];
Xino_t kd_count = 0;
X
Xint blockmod = 0;
Xextern long atol();
Xextern double atof();
X
Xmain( argc, argv )
Xint argc;	/* arg count */
Xchar **argv;	/* arg vector */
X{
X	int argnum;
X	char *strchr();
X	FILE *dirfile;
X	char *dirfname;
X	int blocks;
X
X	dirfname = (char *)0;
X
X	for( argnum = 1; argnum < argc; argnum++ ) {
X		char *argline;
X		char *argstr;		/* argument string */
X		int argval;
X		int isplus;		/* boolean tells +arg vs -arg */
X		argline = argv[argnum];
X
X		if (argstr = strchr(argline, '=')) {
X			argstr++;
X			argval = atoi(argstr);
X			switch( argline[0] ) {
X				case 's':
X					/* handle sizemod option */
X					blockmod = argval;
X					break;
X				default:
X					error( "Bad Option %s\n", argline );
X				}
X			}
X		else if( (isplus = argline[0] == '+') || argline[0] == '-' ) {
X			switch( argline[1] ) {
X				default:
X					error( "Bad Option %s\n", argline );
X				}
X			}
X		else {
X			/* code for untagged option */
X			dirfname = argline;
X			}
X		}
X	if( dirfname ) {
X		if( dirfile = fopen( dirfname, "r" ) )
X			scan_news( dirfile );
X	 	else
X			error( "Could not open %s\n", dirfname );
X		}
X	 else
X		scan_news( stdin );
X}
X
Xchar delims[] = " \t,:\n";
X
X		/* this could be done with FIND and LS, but it would not
X		   be very fast, would it? */
X
Xscan_news( dirlist )
XFILE *dirlist;
X{
X	char linbuf[256];
X	char *dirname;
X	long modifier;
X	char *modptr;		/* pointer to time modifier */
X	extern char *strtok();
X
X
X	while( fgets( linbuf, sizeof(linbuf), dirlist ) ) {
X		/* skip blank and comment lines */
X		if( linbuf[0] == '\n' || linbuf[0] == '#' )
X			continue;
X		dirname = strtok( linbuf, delims );
X		if( !dirname )
X			continue;		/* blank except for delims */
X
X		modptr = strtok( (char*)0, delims );
X		if( modptr )
X			modifier = (long)(atof( modptr ) * DAYLEN);
X		 else
X			modifier = 0L;
X		dodir( dirname, modifier, (ino_t)0 );
X		}
X
X}
X
Xdodir( dirname, modifier, dirinode )
Xchar *dirname;
Xlong modifier;
Xino_t dirinode;
X{
X	DIR  *dirfile;
X	struct direct *entry;
X	struct stat statbuf;
X	char fullpath[256];
X	int blocks;
X	int i;
X
X	if( dirinode == 0 ) {
X		if( stat( dirname, &statbuf ) != 0 ) {
X			printf( "Warning: Could not stat %s\n", dirname );
X			return;
X			}
X		dirinode = statbuf.st_ino;
X		}
X
X	/* check to see if we've visited this directory before */
X
X	for( i = 0; i < kd_count; i++ )
X		if( dirinode == known_dirs[i] )
X			return;		/* this directory done */
X
X	if( kd_count < MAX_DIRS )
X		known_dirs[kd_count++] = dirinode;
X	
X	dirfile = opendir( dirname );
X	if( !dirfile ) {
X		printf( "Warning: Could not open %s\n", dirname );
X		return;
X		}
X
X	while( entry = readdir( dirfile ) ) {
X		sprintf( fullpath, "%s/%s", dirname, entry->d_name );
X		stat( fullpath, &statbuf ); 
X
X		/* if it's a directory, go down it.  Else if a file with
X		   a numeric name, examine it */
X		if( statbuf.st_mode & S_IFDIR ) {
X			if( entry->d_name[0] != '.' )
X				dodir( fullpath, modifier, statbuf.st_ino );
X			continue;
X			}
X		 else if( ((statbuf.st_mode & S_IFMT) == S_IFREG ||
X				!(statbuf.st_mode & S_IFMT) ) &&
X		 		strspn(entry->d_name, "0123456789") ==
X				strlen(entry->d_name)) {
X
X			blocks = (statbuf.st_size + BLOCKSIZE - 1) / BLOCKSIZE;
X			/* write out the record for sorting */
X			printf( "%ld %d %d %ld %s\n",
X				statbuf.st_mtime - modifier - blocks * blockmod,
X				statbuf.st_nlink, blocks,
X				(long)statbuf.st_ino,
X				fullpath );
X			}
X		}
X	closedir( dirfile );
X}
X
Xerror( form, a, b, c, d, e, f, g, h )
Xchar *form;
X{
X	fprintf( stderr, form, a,b,c,d,e,f,g,h );
X	exit(1);
X}
X
E-O-F
echo Extracting sizerm.c
sed 's/^X//' > sizerm.c << 'E-O-F'
X
X#include <stdio.h>
X#include <sys/types.h>
X
X/*
X * To compile for your system, set the following variables and defines.
X */
X
X/*
X * max number of inodes to remember of cross-posted articles.  Make this the
X * number of inode numbers that will conveniently fit in memory
X */
X#define MAX_INODES 15000
X
X/* default values for b= and i= options.  */
Xlong max_blocks = 4000;		/* 4000 1K blocks default */
X
X/* generally, this number will put no limit on the number of inodes.  You
X   may want to change it to just under the number in your news filesystem */
Xino_t max_inodes = 65535;	/* no limit on inodes */
X
X
X#define MAXPATH 256
X#define FALSE 0
X#define TRUE 1
X
X/* array to keep inodes we know about */
X
Xino_t known_inodes[MAX_INODES];
Xunsigned int ki_count = 0;	/* current known inode */
X
X/* option flags */
Xint display = FALSE;
Xint verbose = FALSE;
X
Xmain( argc, argv )
Xint argc;	/* arg count */
Xchar **argv;	/* arg vector */
X{
X	int argnum;
X	char *strchr();
X
X	for( argnum = 1; argnum < argc; argnum++ ) {
X		char *argline;
X		char *argstr;		/* argument string */
X		long argval;
X		int isplus;		/* boolean tells +arg vs -arg */
X		argline = argv[argnum];
X
X		if (argstr = strchr(argline, '=')) {
X			argstr++;
X			argval = atol(argstr);
X			switch( argline[0] ) {
X				case 'b':
X					max_blocks = argval;
X					break;
X				case 'i':
X					max_inodes = (ino_t)argval;
X					break;
X				default:
X					error( "Bad Option %s\n", argline );
X				}
X			}
X		else if( (isplus = argline[0] == '+') || argline[0] == '-' ) {
X			switch( argline[1] ) {
X				case 'v': /* verbose */
X					verbose = isplus;
X					break;
X				case 'd': /* display */
X					display = isplus;
X					break;
X				default:
X					error( "Bad Option %s\n", argline );
X				}
X			}
X		else {
X			/* code for untagged option */
X			error( "Unknown option %s\n", argline );
X			}
X		}
X	/* body of program */
X	process();
X}
X
Xprocess()
X{
X	int link_count;
X	int bl_size;
X	long inode_l;
X	ino_t inode;
X	char namebuf[MAXPATH];
X	int i;
X	int deleting;
X	long totalarts, delarts;
X	long keptspace;
X	long thedate;
X	ino_t keptnodes;
X	extern char *ctime();
X
X	deleting = FALSE;
X	keptnodes = 0;
X	keptspace = 0;
X	totalarts = 0;
X	totalarts = delarts = 0;
X
X
X	while( scanf( "%ld %d %d %ld %s\n", &thedate, &link_count, &bl_size,
X				&inode_l, namebuf ) == 5 ) {
X		inode = (ino_t)inode_l;
X
X		totalarts++;
X
X		/* if the link count is > 1, check for duplicate */
X		if( link_count > 1 ) {
X			if( duplicate(inode) )
X				continue;
X			 else if( ki_count < MAX_INODES && !deleting )
X				known_inodes[ki_count++] = inode;
X			}
X		if( deleting ) {
X			delarts++;
X			if( display || verbose ) {
X				printf( "Delete: %s size %d Blocks\n", namebuf,
X							bl_size );
X				}
X			if( !display )
X				if( unlink(namebuf) )
X					printf("Could not delete %s\n",namebuf);
X			}
X		 else {
X			keptspace += bl_size;
X			keptnodes++;
X
X			if( keptspace >= max_blocks || keptnodes >= max_inodes){
X				printf("Deleting before %s\n",ctime(&thedate));
X				deleting = TRUE;
X				}
X			}
X		}
X	printf( "Kept %ld blocks, %ld inodes\n", (long)keptspace,
X				(long)keptnodes );
X	printf("Kept %ld articles, deleted %ld articles\n", totalarts-delarts,
X				delarts);
X			
X}
X
Xduplicate(inode)
Xino_t inode;
X{
X	int i;
X	for( i = ki_count-1; i >= 0; i-- )
X		if( known_inodes[i] == inode )
X			return TRUE;
X	return FALSE;
X}
X
X
Xerror( form, a, b, c, d, e, f, g, h )
Xchar *form;
X{
X	fprintf( stderr, form, a,b,c,d,e,f,g,h );
X	exit(1);
X}
X
E-O-F
echo Extracting gofast
sed 's/^X//' > gofast << 'E-O-F'
Xcan/ai 6
Xcomp/ai 6
Xcomp/arch 6
Xcomp/bugs/2bsd 6
Xcomp/bugs/4bsd 6
Xcomp/bugs/misc 6
Xcomp/bugs/sys5 6
Xcomp/cog-eng 6
Xcomp/databases 6
Xcomp/binaries/mac 6
Xcomp/dcom/modems 0
Xcomp/dcom 6
Xcomp/doc 6
Xcomp/emacs 6
Xcomp/graphics 6
Xcomp/lang/ada 6
Xcomp/lang/apl 6
Xcomp/lang/fortran 6
Xcomp/lang/lisp 6
Xcomp/lang/misc 6
Xcomp/lang/modula2 6
Xcomp/lang/prolog 6
Xcomp/lsi 6
Xcomp/mail/headers 6
Xcomp/mail/misc 6
Xcomp/mail/uucp 6
Xcomp/org 6
Xcomp/os/cpm 6
Xcomp/os/eunice 6
Xcomp/os/minix 6
Xcomp/os/misc 6
Xcomp/os/os9 6
Xcomp/periphs 6
Xcomp/protocols 6
Xcomp/sources/d 6
Xcomp/sources/wanted 6
Xcomp/std 6
Xcomp/sys/atari 0
Xcomp/sys/cbm 0
Xcomp/sys/ibm 0
Xcomp/sys/intel 0
Xcomp/sys 6
Xcomp/terminals 6
Xcomp/unix/questions 6
Xcomp/unix/wizards 6
Xcontrol 6
Xmisc/consumers 6
Xmisc/forsale 6
Xmisc/handicap 6
Xmisc/invest 6
Xmisc/legal 6
Xmisc/psi 6
Xmisc/taxes 6
Xmisc/wanted 6
Xmisc/headlines 6
Xnews/config 6
Xnews/lists 6
Xnews/newsites 6
Xnews/software 6
Xrec/arts/books 6
Xrec/arts/poems 6
Xrec/arts/wobegon 6
Xrec/bicycles 6
Xrec/birds 6
Xrec/boats 6
Xrec/food/drink 6
Xrec/food/recipes 6
Xrec/food/veg 6
Xrec/games/empire 6
Xrec/games/frp 6
Xrec/games/go 6
Xrec/games/hack 6
Xrec/games/pbm 6
Xrec/games/rogue 6
Xrec/games/trivia 6
Xrec/games/video 6
Xrec/gardens 6
Xrec/guns 6
Xrec/ham-radio 6
Xrec/mag/otherrealms 0
Xrec/mag 6
Xrec/misc 6
Xrec/music/folk 6
Xrec/music/gdead 6
Xrec/music/gaffa 6
Xrec/pets 6
Xrec/photo 6
Xrec/puzzles 6
Xrec/railroad 6
Xrec/skiing 6
Xrec/skydiving 6
Xrec/sport/basketball 6
Xrec/sport/football 6
Xrec/sport/hockey 6
Xrec/sport/misc 6
Xrec/video 6
Xrec/woodworking 6
Xsci/bio 6
Xsci/crypt 6
Xsci/electronics 6
Xsci/lang 6
Xsci/math/stat 6
Xsci/math/symbolic 6
Xsci/med 6
Xsci/research 6
Xsoc/misc 6
Xsoc/motss 6
Xsoc/net-people 6
Xsoc/roots 6
Xsoc/men 6
Xtalk 6
Xuw/ai/learning 6
Xuw/cgl 6
Xuw/cs488 6
Xuw/cs685 6
Xuw/cs686 6
Xuw/dsgroup/misc 6
Xuw/grad/cs/topics 6
Xuw/harmony 6
Xuw/x-windows 6
Xuw/icr 6
Xuw/logic 6
Xuw/lpaig 6
Xuw/pami 6
Xuw/shoshin 6
Xuw/tex 6
Xuw/vlsi 6
Xuw/vms 6
Xrec/humor/funny -5
Xrec/humor -1
Xnews/admin -1
Xjunk 6
X.
E-O-F
exit 0
-- 
Brad Templeton, Looking Glass Software Ltd. - Waterloo, Ontario 519/884-7473