[net.sources] History rebuild for 2.10.1 expire?

chuqui@nsc.UUCP (Chuq Von Rospach) (04/05/84)

Ok, here it is folks. This version of expire is designed to run under
2.10.1. It has all of the bug fixes posted to expire so far (at least,
those that I have seen), as well as enhancements to allow it to rebuild the
history file, run without it completely, and delete articles based on
posting date or posting account. 

One thing that is in this expire that hasn't been posted is that it will
now allow -e<days> as well as -e <days> and -v<level> as well as -v
<level> because I got tired of trying to remember which way the various
options wanted to be set up. If want to integrate this into a version of
expire you are running, look at the 'v' and 'e' cases in the parameter
handling.

The rest of this message is taken up by a shell archive containing two
files. Expire.c is the source, and expire.8 is the man page which has been
updated to resemble reality a little more closely. Comments, enhancements,
and encouragements welcome.

----- cut here and feed to sh (not csh!) ----

#! /bin/sh
# The rest of this file is a shell script which will extract:
# expire.8 expire.c
echo x - expire.8
cat >expire.8 <<'!!ChuquiCo!!Software!!'
.TH EXPIRE 8
.SH NAME
expire \- remove outdated news articles
.SH SYNOPSIS
.BR /usr/lib/news/expire
[
.BI \-n
.BR newsgroups
]
[
.BI \-i
]
[
.BI  \-I
]
[
.BI \-v
[
.BR level
]
]
[
.BI \-e
.BR days
] 
[ 
.BI \-a
] 
.br
.BR /usr/lib/news/expire
.BI \-p
[
.BI \-e
.BR days
]
.br
.BR /usr/lib/news/expire
.BR \-r
.br
.BR /usr/lib/news/expire 
.BR \-f
.BI account
[
.BI \-e
.BR days
]
.SH DESCRIPTION
.PP
.I Expire
is normally started up by
.IR cron (8)
every night to remove all expired news.
If no newsgroups are specified, the default is to expire
.BR all .
.PP
Articles whose specified expiration date has already passed
are considered expirable.
The
.B \-a
option causes expire to archive articles in /usr/spool/oldnews.
Otherwise, the articles are unlinked.
.PP
The
.B \-v
option causes expire to be more verbose.
It can be given a verbosity level (default 1) as in
.B \-v3
for even more output.
This is useful if articles aren't being expired and you want to know why.
.PP
The
.B \-e
flag gives the number of days to use for a default expiration date.
If not given, an installation dependent default (often 2 weeks) is used.
.PP
The
.B \-i
and
.B \-I
flags
tell
.B expire
to ignore any expiration date explicitly given on articles.
This can be used when disk space is really tight.
The
.B \-I
flag will always ignore expiration dates,
while the
.B \-i
flag will only ignore the date if ignoring it would expire the article sooner.
.I WARNING:
If you have articles archived by giving them expiration dates far into the
future, these options might remove these files anyway.
.PP
The
.B \-h
flag will cause expire to ignore the history file if it exists. It will
instead search the entire spool directory, open each file in it and check
the header for expirations. This is much less efficient than using a
history file by a couple of orders of magnitude, but if you have your
history file garbaged or have things drop out of the history file for some
reason this will track them down and remove them.
.PP
The 
.B \-r
flag will cause expire to rebuild the history file. It does this by
searching the entire spool directory and building a new history entry for
each article found. It is smart enough to put links in different topics
together based upon the header information. It will report messages that it
cannot resolve all links on. Rebuilding the history file also rebuilds the
associated DBM files. This causes you to lose the history of messages you
have seen, so duplicate articles can become a problem after rebuilding.
.PP
The 
.B \-p
flag causes expire to use the posting date rather than the received date to
expire articles. This comes in very happy if your system is inundated by a
number of messages that had been resent by another system and sneaked past
your DBM files.
.PP
The
.B \-f
flag allows you to define a specific user and expire all messages sent by
that person. The next parameter is used as the name and is in the same
format as the From: line in the header (i.e. account@site.UUCP). This can be
of service when a notes site burps and throws a lot of mail out, since
normally all notes messages funnel into the net through a single account.
It is also handy if there is a user on the network you really dislike.
.SH "SEE ALSO"
checknews(1),
inews(1),
readnews(1),
recnews(8),
sendnews(8),
uurec(8)


!!ChuquiCo!!Software!!
echo x - expire.c
cat >expire.c <<'!!ChuquiCo!!Software!!'
# ifndef NOSCCS
static char *sccsid = "@(#)expire.c	1.8	4/4/84";
static char *cpyrid = "@(#)Copyright (C) 1984 by National Semiconductor Corp.";
# endif

/*
 * expire - expire daemon runs around and nails all articles that
 *		 have expired.
 *
 * Note: This version of expire contains some code to implement new
 * history features, e.g. to work without a history file or to rebuild
 * the history file.  This include rebuilding the dbm-format files.
 *
 * 		Modifications by:
 * Steven M. Kramer	MITRE Corp.	(smk@linus.UUCP)  5/22/83
 * (who added back the -r and -h options for 2.10 B news)
 *
 * Chuq Von Rospach	NSC		(chuqui@nsc.UUCP) 2/18/84
 *    added code from cbosdg!mark to delete things based on posting date
 *    or posting site and account so that messages being regurgitated back
 *    onto the net can be zapped. Also included minor bug fixes including
 *    fixes for expires looping on large article ID numbers caused by a 
 *    failing malloc() call.
 * 
 * Chuq Von Rospach 	NSC	(nsc!chuqui)	84/4/4
 *    Added code so that -e20 works as well as -e 20 and so that
 *    -v2 works as well as -v 2 so that I don't have to remember which
 *    is which. Isn't standardization wonderful?
 */

#include "params.h"
#include <dir.h>

/*	Number of array entries to allocate at a time.	*/
#define SPACE_INCREMENT	100

extern char	groupdir[BUFSIZ], rcbuf[BUFLEN];
extern char	ACTIVE[];
extern char	SPOOL[];
extern char	ARTFILE[];
extern char	filename[];
char	NARTFILE[BUFSIZ], OARTFILE[BUFSIZ];
char	OLDNEWS[BUFLEN];
int	verbose = 0;
int	ignorexp = 0;
int	doarchive = 0;
int	nohistory = 0;
int	rebuild = 0;
int	usepost = 0;
int 	frflag = 0;
char	baduser[BUFLEN];
int 	i;			/* used to clear h.unrec[] */

/*
 * This code uses realloc to get more of the multhist array.
 * It should malloc the mh_ident field, but that's an exercise for YOU
 * to do!
 */
struct multhist {
	char	mh_ident[BUFLEN];
	char	*mh_file;
} *multhist;
unsigned int multhist_size;
char *calloc();
char *realloc();

typedef struct {
	char *dptr;
	int dsize;
} datum;

long	expincr;
long	atol();
time_t	cgtdate(), time();
FILE *popen();

main(argc, argv)
int	argc;
char	**argv;
{
	register FILE *fp = NULL;
	struct hbuf h;
	register time_t now, newtime;
	char	ngpat[LBUFLEN];
	char	afline[BUFLEN];
	char	*p1, *p2, *p3;
	FILE	*ohfd, *nhfd;
	DIR	*ngdirp;
	static struct direct *ngdir;
	char fn[BUFLEN];
	datum	key;
	extern struct passwd *getpwnam();
	struct passwd *pwd;

	if ((pwd = getpwnam(NEWSU))!= NULL) {
		setgid(pwd->pw_gid) ; setuid(pwd->pw_uid);
	}

	pathinit();
	umask(N_UMASK);
	expincr = DFLTEXP;
	ngpat[0] = '\0';
	while (argc > 1) {
		switch (argv[1][1]) {
		case 'v':
			if (isdigit(argv[1][2]))
				verbose = argv[1][2] - '0';
			else if (argc > 2 && argv[2][0] != '-')  {
			        argv++;
				argc--;
				verbose = atoi(argv[1]);
			} else
				verbose = 1;
			if (verbose < 3)
				setbuf(stdout, NULL);
			break;
		case 'e':	/* Use this as default expiration time */
			if (isdigit(argv[1][2])) {
			    register i;
			    expincr = 0;
			    for (i = 2; isdigit(argv[1][i]); i++)
				expincr = ((expincr * 10)+(argv[1][i] - '0'));
			    expincr *= DAYS;
			} else if (argc > 2 && argv[2][0] != '-') {
				argv++;
				argc--;
				expincr = atol(argv[1]) * DAYS;
			} 
			break;
		case 'I':	/* Ignore any existing expiration date */
			ignorexp = 2;
			break;
		case 'i':	/* Ignore any existing expiration date */
			ignorexp = 1;
			break;
		case 'n':
			if (argc > 2) {
				argv++;
				argc--;
				while (argc > 1 && argv[1][0] != '-') {
					strcat(ngpat, argv[1]);
					ngcat(ngpat);
					argv++;
					argc--;
				}
				argv--;
				argc++;
			}
			break;
		case 'a':	/* archive expired articles */
			doarchive++;
			break;
		case 'h':	/* ignore history */
			nohistory++;
			break;
		case 'r':	/* rebuild history file */
			rebuild++;
			nohistory++;
			break;
		case 'p':
			usepost++;
			break;
		case 'f':
			frflag++;
			if (argc > 2) {
				strcpy(baduser, argv[2]);
				argv++; argc--;
			}
			break;
		default:
			printf("Usage: expire [ -v [level] ] [-e days ] [-i] [-a] [-r] [-h] [-n newsgroups]\n");
			exit(1);
		}
		argc--; 
		argv++;
	}
	if (ngpat[0] == 0)
		strcpy(ngpat, "all,");
	now = time(0);
	if (chdir(SPOOL))
		xerror("Cannot chdir %s", SPOOL);

	sprintf(OARTFILE, "%s/%s", LIB, "ohistory");
	sprintf(ARTFILE, "%s/%s", LIB, "history");
	sprintf(NARTFILE, "%s/%s", LIB, "nhistory");
#ifdef DBM
	if (!rebuild)
		dbminit(ARTFILE);
#endif
	if (verbose)
		printf("expire: nohistory %d, rebuild %d, doarchive %d\n",
			nohistory, rebuild, doarchive);

	if (nohistory) {
		ohfd = xfopen(ACTIVE, "r");
		if (rebuild) {
			/* Allocate initial space for multiple newsgroup (for an
			   article) array */
			multhist = (struct multhist *)
				calloc (SPACE_INCREMENT,
					sizeof (struct multhist));
			multhist_size = SPACE_INCREMENT;

			sprintf(afline, "sort -T /tmp +2n >%s", NARTFILE);
			if ((nhfd = popen(afline, "w")) == NULL)
				xerror("Cannot exec %s", NARTFILE);
		} else
			nhfd = xfopen("/dev/null", "w");
	} else {
		ohfd = xfopen(ARTFILE, "r");
		nhfd = xfopen(NARTFILE, "w");
	}
	for (i = 0; i < NUNREC; i++)
	    h.unrec[i] = (char *) NULL;

	while (TRUE) {
		if (nohistory) {
			do {
				if (ngdir == NULL) {
					if ( ngdirp != NULL )
						closedir(ngdirp);
					if (fgets(afline, BUFLEN, ohfd) == NULL)
						goto out;
					strcpy(groupdir, afline);
					p1 = index(groupdir, ' ');
					if (p1 == NULL)
						p1 = index(groupdir, '\n');
					if (p1 != NULL)
						*p1 = NULL;
					ngcat(groupdir);
					if (!ngmatch(groupdir, ngpat))
						continue;
					ngdel(groupdir);

					/* Change a group name from
					   a.b.c to a/b/c */
					for (p1=groupdir; *p1; p1++)
						if (*p1 == '.')
							*p1 = '/';

					if ((ngdirp = opendir(groupdir)) == NULL)
						continue;

				}
				ngdir = readdir(ngdirp);
			/*	Continue looking if not an article.	*/
			} while ( ngdir == NULL || !islegal(fn,groupdir,ngdir->d_name));

			p2 = fn;
			if (verbose > 2)
				printf("article: %s\n", fn);
		} else {
			if (fgets(afline, BUFLEN, ohfd) == NULL)
				break;
			if (verbose > 2)
				printf("article: %s", afline);
			p1 = index(afline, '\t');
			if (p1)
				p2 = index(p1 + 1, '\t');
			else
				continue;
			if (!p2)
				continue;
			p2++;
			strcpy(groupdir, p2);
			p3 = index(groupdir, '/');
			if (p3)
				*p3 = 0;
			else {
				/*
				 * Nothing after the 2nd tab.  This happens
				 * when a control message is stored in the
				 * history file.  Use the date in the history
				 * file to decide expiration.
				 */
				h.expdate[0] = 0;
				strcpy(h.recdate, p1+1);
				goto checkdate;
			}
			ngcat(groupdir);
			if (!ngmatch(groupdir, ngpat)) {
				fputs(afline, nhfd);
				continue;
			}
			ngdel(groupdir);
			strcpy(fn, p2);
			p1 = index(fn, ' ');
			if (p1 == 0)
				p1 = index(fn, '\n');
			if (p1)
				*p1 = 0;
		}

		strcpy(filename, dirname(fn));
		if (access(filename, 4)
		|| (fp = fopen(filename, "r")) == NULL) {
			if (verbose > 3)
				printf("Can't open %s.\n", filename);
			continue;
		}
		/*
		 * Prevent memory exhaustion on PDP-11's -- cumulative 
		 * malloc()'s of unrecognized header lings, if not free()'d,
		 * could use up all of free memory
		 */
		for (i = 0; i < NUNREC;i++)
		    if (h.unrec[i] != (char *) NULL)
			free(h.unrec[i]);
		    else
			break;
		if (hread(&h, fp, TRUE) == NULL) {
			if (verbose)
				printf("Garbled article %s.\n", filename);
			fclose(fp);
			continue;
		}
		if (rebuild) {
			register char	*cp;
			register struct multhist *mhp;

			/* Format of filename until now was /SPOOL/a/b/c.4
			   and this code changes it to a.b.c/4 (the correct
			   kind of entry in the history file.  */
			strcpy (filename, filename + strlen(SPOOL)+1);
			for (p1 = filename; p1 != NULL && *p1 != '\0'; p1++)
				if (*p1 == '/' && p1 != rindex (p1, '/'))
					*p1 = '.';

			if ((cp = index(h.nbuf, NGDELIM)) == NULL) {
saveit:
				fprintf(nhfd, "%s\t%s\t%s \n", h.ident, h.recdate, filename);
				fclose(fp);
				continue;
			}
			for (mhp = multhist; mhp < multhist+multhist_size && mhp->mh_ident[0] != NULL; mhp++) {
				if (mhp->mh_file == NULL)
					continue;
				if (strcmp(mhp->mh_ident, h.ident) != 0)
					continue;
				if (index(mhp->mh_file, ' ') != NULL)
					cp = index(++cp, NGDELIM);
				strcat(filename, " ");
				strcat(filename, mhp->mh_file);
				free(mhp->mh_file);
				mhp->mh_file = NULL;
				if (*cp == NULL || (cp = index(++cp, NGDELIM)) == NULL)
					goto saveit;
				else
					break;
			}

			/* Here is where we realloc the multhist space rather
			   than the old way of static allocation.  It's
			   really trivial.  We just clear out the space
			   in case it was reused.  The old static array was
			   guaranteed to be cleared since it was cleared when
			   the process started.  */
			if (mhp >= multhist + multhist_size)
			{
				multhist = (struct multhist *)
					realloc (multhist,
					  sizeof (struct multhist) *
					  (SPACE_INCREMENT + multhist_size));
				if (multhist == NULL)
					xerror("Too many articles with multiple newsgroups");
				for (mhp = multhist + multhist_size;
				  mhp < multhist+multhist_size+SPACE_INCREMENT;
					mhp++)
				{
					mhp->mh_ident[0] = '\0';
					mhp->mh_file = NULL;
				}
				mhp = multhist + multhist_size;
				multhist_size += SPACE_INCREMENT;
			}

			/* This should be malloc'd, but then we have to
			   find an alternate way to look thru the
			   multhist array than mhp->mh_ident[0] (which
			   would not be allocated).	*/
			strcpy(mhp->mh_ident, h.ident);
			cp = malloc(strlen(filename) + 1);
			if ( cp == NULL)
				xerror("Out of memory");
			strcpy(cp, filename);
			mhp->mh_file = cp;
			fclose(fp);
			continue;
		}

		fclose(fp);
checkdate:
		if (h.expdate[0])
			h.exptime = cgtdate(h.expdate);
		newtime = cgtdate(usepost ? h.subdate : h.recdate) + expincr;
		if (!h.expdate[0] || ignorexp == 2 || 
		    (ignorexp == 1 && newtime < h.exptime))
			h.exptime = newtime;
		if (frflag ? strcmp(baduser,h.from)==0 : now >= h.exptime) {
#ifdef DEBUG
			printf("cancel %s\n", filename);
#else
			if (verbose)
				printf("cancel %s\n", filename);
			ulall(p2, &h);
# ifdef DBM
			key.dptr = h.ident;
			key.dsize = strlen(key.dptr) +1;
			delete(key);
# endif
#endif
		} else {
			fputs(afline, nhfd);
			if (verbose > 2)
				printf("Good article %s\n", rcbuf);
		}
	}

out:
	if (rebuild) {
		register struct multhist *mhp;
		for (mhp = multhist; mhp < multhist+multhist_size && mhp->mh_ident[0] != NULL; mhp++)
			/* should "never" happen */
			if (mhp->mh_file != NULL )
				printf("Article: %s %s Cannot find all links\n", mhp->mh_ident, mhp->mh_file);
		pclose(nhfd);
		free (multhist);
	}

	if (rebuild || !nohistory) {
		unlink(OARTFILE);
		link(ARTFILE, OARTFILE);
		unlink(ARTFILE);
		link(NARTFILE, ARTFILE);
		unlink(NARTFILE);
#ifdef DBM
		if (rebuild)
			rebuilddbm ( );
#endif
	}
	exit(0);
}

/* Unlink (using tail recursion) all the articles in 'artlist'. */
ulall(artlist, h)
char	*artlist;
struct hbuf *h;
{
	char	*p ,*temp;
	int	last = 0;
	char	newname[BUFLEN];
	time_t	timep[2];
	char *fn;

	if (nohistory) {
		last = 1;
	} else {
		while (*artlist == ' ' || *artlist == '\n' || *artlist == ',')
			artlist++;
		if (*artlist == 0)
			return;
		p = index(artlist, ' ');
		if (p == 0) {
			last = 1;
			p = index(artlist, '\n');
		}
		if (p == 0) {
			last = 1;
			p = index(artlist, ',');
		}
		if (p == 0) {
			last = 1;
			fn = dirname(artlist);
			unlink(fn);
			return;
		}
		if (p)
			*p = 0;
	}
	fn = dirname(artlist);
	if (doarchive && access(OLDNEWS, 0) == 0) {
		temp = fn + strlen(SPOOL) + 1;
		sprintf(newname, "%s/%s", OLDNEWS, temp);
		if (verbose > 1)
			printf("link %s to %s\n", fn, newname);
		if (link(fn, newname) == -1) {
			if (mkparents(newname) == 0)
				link(fn, newname);
		}
		timep[0] = timep[1] = cgtdate(h->subdate);
		utime(newname, timep);
	}

	if (verbose)
		printf("unlink %s\n", fn);
	unlink(fn);
	if (!last)
		ulall(p + 1, h);
}


xerror(message)
char	*message;
{
	printf("expire: %s.\n", message);
	fflush(stdout);
	exit(1);
}

/*
 * If any parent directories of this dir don't exist, create them.
 */
mkparents(dirname)
char *dirname;
{
	char buf[200], sysbuf[200];
	register char *p;
	int rc;
	struct passwd *pw;

	strcpy(buf, dirname);
	p = rindex(buf, '/');
	if (p)
		*p = '\0';
	if (exists(buf))
		return (0);
	mkparents(buf);
	sprintf(sysbuf, "mkdir %s", buf);
	rc = system(sysbuf);
	sprintf(sysbuf, "%s", buf); 
	if (verbose)
		printf("mkdir %s, rc %d\n", sysbuf, rc);
	chmod(sysbuf, 0755);
	if ((pw = getpwnam(NEWSU)) != NULL)
		chown(sysbuf, pw->pw_uid, pw->pw_gid);
	return rc;
}


/*	Make sure this file is a legal article. */
islegal (fullname, path, name)
	register char *fullname;
	register char *path;
	register char *name;
{
	struct stat buffer;

	sprintf (fullname, "%s/%s", path, name);

	/*	make sure the article is numeric.	*/
	while (*name != '\0')
		if (!isascii (*name) || !isdigit (*name))
			return 0;
		else
			name++;

	/*	Now make sure we don't have a group like net.micro.432,
		which is numeric but not a regular file -- i.e., check
		for being a regular file.	*/
	if ((stat (fullname, &buffer) == 0) &&
		((buffer.st_mode & S_IFMT) == S_IFREG))
	{
		/* Now that we foun a legal group in a/b/c/4
		   notation, switch it to a.b.c/4 notation.  */
		for (name = fullname; name != NULL && *name != '\0'; name++)
			if (*name == '/' && name != rindex (name, '/'))
				*name = '.';

			return 1;
	}
	return 0;
}



#ifdef DBM
/*
 * This is taken mostly intact from ../cvt/cvt.hist.c and is used at the
 * end by the options that make a new history file.
 * Routine to convert history file to dbm file.  The old 3 field
 * history file is still kept there, because we need it for expire
 * and for a human readable copy.  But we keep a dbm hashed copy
 * around by message ID so we can answer the yes/no question "have
 * we already seen this message".  The content is the ftell offset
 * into the real history file when we get the article - you can't
 * really do much with this because the file gets compacted.
 */



FILE *fd;

char namebuf[BUFSIZ];
char lb[BUFSIZ];

rebuilddbm( )
{
	register char *p, *q;
	long fpos;
	datum lhs, rhs;
	int rv;

	umask(0);
	sprintf(namebuf, "%s.dir", ARTFILE);
	close(creat(namebuf, 0644));
	sprintf(namebuf, "%s.pag", ARTFILE);
	close(creat(namebuf, 0644));
	sprintf(namebuf, "%s", ARTFILE);

	fd = fopen(namebuf, "r");
	if (fd == NULL) {
		perror(namebuf);
		exit(2);
	}

	dbminit(namebuf);
	while (fpos=ftell(fd), fgets(lb, BUFSIZ, fd) != NULL) {
		p = index(lb, '\t');
		if (p)
			*p = 0;
		lhs.dptr = lb;
		lhs.dsize = strlen(lb) + 1;
		rhs.dptr = (char *) &fpos;
		rhs.dsize = sizeof fpos;
		rv = store(lhs, rhs);
		if (rv < 0)
			fprintf(stderr, "store(%s) failed\n", lb);
	}
	exit(0);
}
#endif DBM
!!ChuquiCo!!Software!!
exit
-- 
From under the bar at Callahan's:		Chuq Von Rospach
{amd70,fortune,hplabs,menlo70}!nsc!chuqui	(408) 733-2600 x242

A toast! To absent friends... {clink}

chuqui@nsc.UUCP (Chuq Von Rospach) (04/24/84)

<*sigh* Here we go again folks....>

I have had a couple of reports that parts of the expire program don't seem
to work properly. The common denominator in this is USG. I run a BSD
system, and I have no way of testing the thing on anything else. If you are
running a USG system, consider the -r and -h options flakey and/or broken.
I am looking through things to see if I can find any blatant BSDisms, but
unless I get lucky I doubt I'll find too much. If someone with access to
USG wants to fix this thing and post the changes, I'll make sure they 
get integrated into my copy and try to make sure they get into future
releases as well... The -r and -h options DO WORK on BSD systems (both 4.1
and 4.2).


chuq

-- 
From under the bar at Callahan's:		Chuq Von Rospach
{amd70,fortune,hplabs,menlo70}!nsc!chuqui	(408) 733-2600 x242

ninety nine dead baboons, sitting in my living room...