feldman@tymix.UUCP (Steve Feldman) (04/03/84)
I realize this has been asked before, but has anyone fixed the history rebuild stuff in 2.10.1 expire.c? If so, please mail me a copy or post it to the net. (It's probably of sufficiently general interest.) Thanks, Steve Feldman
chuqui@nsc.UUCP (Chuq Von Rospach) (04/04/84)
*sigh* I have seen one too many articles for features on expire that have been posted before. This is ***NOT*** to be considered a flame at those doing the requesting!!! I have a version of expire for 2.10.1 that includes all of the bells and whistles that have been posted and the bug fixes that I have been posted that actually fixed bugs. The only thing I haven't done yet is update the man page. As soon as I do that, I'll post the whole kitten-kaboodle to net.sources. chuq -- From under the bar at Callahan's: Chuq Von Rospach {amd70,fortune,hplabs,menlo70}!nsc!chuqui (408) 733-2600 x242 A toast! To absent friends... {clink}
chuqui@nsc.UUCP (Chuq Von Rospach) (04/05/84)
Ok, here it is folks. This version of expire is designed to run under 2.10.1. It has all of the bug fixes posted to expire so far (at least, those that I have seen), as well as enhancements to allow it to rebuild the history file, run without it completely, and delete articles based on posting date or posting account. One thing that is in this expire that hasn't been posted is that it will now allow -e<days> as well as -e <days> and -v<level> as well as -v <level> because I got tired of trying to remember which way the various options wanted to be set up. If want to integrate this into a version of expire you are running, look at the 'v' and 'e' cases in the parameter handling. The rest of this message is taken up by a shell archive containing two files. Expire.c is the source, and expire.8 is the man page which has been updated to resemble reality a little more closely. Comments, enhancements, and encouragements welcome. ----- cut here and feed to sh (not csh!) ---- #! /bin/sh # The rest of this file is a shell script which will extract: # expire.8 expire.c echo x - expire.8 cat >expire.8 <<'!!ChuquiCo!!Software!!' .TH EXPIRE 8 .SH NAME expire \- remove outdated news articles .SH SYNOPSIS .BR /usr/lib/news/expire [ .BI \-n .BR newsgroups ] [ .BI \-i ] [ .BI \-I ] [ .BI \-v [ .BR level ] ] [ .BI \-e .BR days ] [ .BI \-a ] .br .BR /usr/lib/news/expire .BI \-p [ .BI \-e .BR days ] .br .BR /usr/lib/news/expire .BR \-r .br .BR /usr/lib/news/expire .BR \-f .BI account [ .BI \-e .BR days ] .SH DESCRIPTION .PP .I Expire is normally started up by .IR cron (8) every night to remove all expired news. If no newsgroups are specified, the default is to expire .BR all . .PP Articles whose specified expiration date has already passed are considered expirable. The .B \-a option causes expire to archive articles in /usr/spool/oldnews. Otherwise, the articles are unlinked. .PP The .B \-v option causes expire to be more verbose. It can be given a verbosity level (default 1) as in .B \-v3 for even more output. This is useful if articles aren't being expired and you want to know why. .PP The .B \-e flag gives the number of days to use for a default expiration date. If not given, an installation dependent default (often 2 weeks) is used. .PP The .B \-i and .B \-I flags tell .B expire to ignore any expiration date explicitly given on articles. This can be used when disk space is really tight. The .B \-I flag will always ignore expiration dates, while the .B \-i flag will only ignore the date if ignoring it would expire the article sooner. .I WARNING: If you have articles archived by giving them expiration dates far into the future, these options might remove these files anyway. .PP The .B \-h flag will cause expire to ignore the history file if it exists. It will instead search the entire spool directory, open each file in it and check the header for expirations. This is much less efficient than using a history file by a couple of orders of magnitude, but if you have your history file garbaged or have things drop out of the history file for some reason this will track them down and remove them. .PP The .B \-r flag will cause expire to rebuild the history file. It does this by searching the entire spool directory and building a new history entry for each article found. It is smart enough to put links in different topics together based upon the header information. It will report messages that it cannot resolve all links on. Rebuilding the history file also rebuilds the associated DBM files. This causes you to lose the history of messages you have seen, so duplicate articles can become a problem after rebuilding. .PP The .B \-p flag causes expire to use the posting date rather than the received date to expire articles. This comes in very happy if your system is inundated by a number of messages that had been resent by another system and sneaked past your DBM files. .PP The .B \-f flag allows you to define a specific user and expire all messages sent by that person. The next parameter is used as the name and is in the same format as the From: line in the header (i.e. account@site.UUCP). This can be of service when a notes site burps and throws a lot of mail out, since normally all notes messages funnel into the net through a single account. It is also handy if there is a user on the network you really dislike. .SH "SEE ALSO" checknews(1), inews(1), readnews(1), recnews(8), sendnews(8), uurec(8) !!ChuquiCo!!Software!! echo x - expire.c cat >expire.c <<'!!ChuquiCo!!Software!!' # ifndef NOSCCS static char *sccsid = "@(#)expire.c 1.8 4/4/84"; static char *cpyrid = "@(#)Copyright (C) 1984 by National Semiconductor Corp."; # endif /* * expire - expire daemon runs around and nails all articles that * have expired. * * Note: This version of expire contains some code to implement new * history features, e.g. to work without a history file or to rebuild * the history file. This include rebuilding the dbm-format files. * * Modifications by: * Steven M. Kramer MITRE Corp. (smk@linus.UUCP) 5/22/83 * (who added back the -r and -h options for 2.10 B news) * * Chuq Von Rospach NSC (chuqui@nsc.UUCP) 2/18/84 * added code from cbosdg!mark to delete things based on posting date * or posting site and account so that messages being regurgitated back * onto the net can be zapped. Also included minor bug fixes including * fixes for expires looping on large article ID numbers caused by a * failing malloc() call. * * Chuq Von Rospach NSC (nsc!chuqui) 84/4/4 * Added code so that -e20 works as well as -e 20 and so that * -v2 works as well as -v 2 so that I don't have to remember which * is which. Isn't standardization wonderful? */ #include "params.h" #include <dir.h> /* Number of array entries to allocate at a time. */ #define SPACE_INCREMENT 100 extern char groupdir[BUFSIZ], rcbuf[BUFLEN]; extern char ACTIVE[]; extern char SPOOL[]; extern char ARTFILE[]; extern char filename[]; char NARTFILE[BUFSIZ], OARTFILE[BUFSIZ]; char OLDNEWS[BUFLEN]; int verbose = 0; int ignorexp = 0; int doarchive = 0; int nohistory = 0; int rebuild = 0; int usepost = 0; int frflag = 0; char baduser[BUFLEN]; int i; /* used to clear h.unrec[] */ /* * This code uses realloc to get more of the multhist array. * It should malloc the mh_ident field, but that's an exercise for YOU * to do! */ struct multhist { char mh_ident[BUFLEN]; char *mh_file; } *multhist; unsigned int multhist_size; char *calloc(); char *realloc(); typedef struct { char *dptr; int dsize; } datum; long expincr; long atol(); time_t cgtdate(), time(); FILE *popen(); main(argc, argv) int argc; char **argv; { register FILE *fp = NULL; struct hbuf h; register time_t now, newtime; char ngpat[LBUFLEN]; char afline[BUFLEN]; char *p1, *p2, *p3; FILE *ohfd, *nhfd; DIR *ngdirp; static struct direct *ngdir; char fn[BUFLEN]; datum key; extern struct passwd *getpwnam(); struct passwd *pwd; if ((pwd = getpwnam(NEWSU))!= NULL) { setgid(pwd->pw_gid) ; setuid(pwd->pw_uid); } pathinit(); umask(N_UMASK); expincr = DFLTEXP; ngpat[0] = '\0'; while (argc > 1) { switch (argv[1][1]) { case 'v': if (isdigit(argv[1][2])) verbose = argv[1][2] - '0'; else if (argc > 2 && argv[2][0] != '-') { argv++; argc--; verbose = atoi(argv[1]); } else verbose = 1; if (verbose < 3) setbuf(stdout, NULL); break; case 'e': /* Use this as default expiration time */ if (isdigit(argv[1][2])) { register i; expincr = 0; for (i = 2; isdigit(argv[1][i]); i++) expincr = ((expincr * 10)+(argv[1][i] - '0')); expincr *= DAYS; } else if (argc > 2 && argv[2][0] != '-') { argv++; argc--; expincr = atol(argv[1]) * DAYS; } break; case 'I': /* Ignore any existing expiration date */ ignorexp = 2; break; case 'i': /* Ignore any existing expiration date */ ignorexp = 1; break; case 'n': if (argc > 2) { argv++; argc--; while (argc > 1 && argv[1][0] != '-') { strcat(ngpat, argv[1]); ngcat(ngpat); argv++; argc--; } argv--; argc++; } break; case 'a': /* archive expired articles */ doarchive++; break; case 'h': /* ignore history */ nohistory++; break; case 'r': /* rebuild history file */ rebuild++; nohistory++; break; case 'p': usepost++; break; case 'f': frflag++; if (argc > 2) { strcpy(baduser, argv[2]); argv++; argc--; } break; default: printf("Usage: expire [ -v [level] ] [-e days ] [-i] [-a] [-r] [-h] [-n newsgroups]\n"); exit(1); } argc--; argv++; } if (ngpat[0] == 0) strcpy(ngpat, "all,"); now = time(0); if (chdir(SPOOL)) xerror("Cannot chdir %s", SPOOL); sprintf(OARTFILE, "%s/%s", LIB, "ohistory"); sprintf(ARTFILE, "%s/%s", LIB, "history"); sprintf(NARTFILE, "%s/%s", LIB, "nhistory"); #ifdef DBM if (!rebuild) dbminit(ARTFILE); #endif if (verbose) printf("expire: nohistory %d, rebuild %d, doarchive %d\n", nohistory, rebuild, doarchive); if (nohistory) { ohfd = xfopen(ACTIVE, "r"); if (rebuild) { /* Allocate initial space for multiple newsgroup (for an article) array */ multhist = (struct multhist *) calloc (SPACE_INCREMENT, sizeof (struct multhist)); multhist_size = SPACE_INCREMENT; sprintf(afline, "sort -T /tmp +2n >%s", NARTFILE); if ((nhfd = popen(afline, "w")) == NULL) xerror("Cannot exec %s", NARTFILE); } else nhfd = xfopen("/dev/null", "w"); } else { ohfd = xfopen(ARTFILE, "r"); nhfd = xfopen(NARTFILE, "w"); } for (i = 0; i < NUNREC; i++) h.unrec[i] = (char *) NULL; while (TRUE) { if (nohistory) { do { if (ngdir == NULL) { if ( ngdirp != NULL ) closedir(ngdirp); if (fgets(afline, BUFLEN, ohfd) == NULL) goto out; strcpy(groupdir, afline); p1 = index(groupdir, ' '); if (p1 == NULL) p1 = index(groupdir, '\n'); if (p1 != NULL) *p1 = NULL; ngcat(groupdir); if (!ngmatch(groupdir, ngpat)) continue; ngdel(groupdir); /* Change a group name from a.b.c to a/b/c */ for (p1=groupdir; *p1; p1++) if (*p1 == '.') *p1 = '/'; if ((ngdirp = opendir(groupdir)) == NULL) continue; } ngdir = readdir(ngdirp); /* Continue looking if not an article. */ } while ( ngdir == NULL || !islegal(fn,groupdir,ngdir->d_name)); p2 = fn; if (verbose > 2) printf("article: %s\n", fn); } else { if (fgets(afline, BUFLEN, ohfd) == NULL) break; if (verbose > 2) printf("article: %s", afline); p1 = index(afline, '\t'); if (p1) p2 = index(p1 + 1, '\t'); else continue; if (!p2) continue; p2++; strcpy(groupdir, p2); p3 = index(groupdir, '/'); if (p3) *p3 = 0; else { /* * Nothing after the 2nd tab. This happens * when a control message is stored in the * history file. Use the date in the history * file to decide expiration. */ h.expdate[0] = 0; strcpy(h.recdate, p1+1); goto checkdate; } ngcat(groupdir); if (!ngmatch(groupdir, ngpat)) { fputs(afline, nhfd); continue; } ngdel(groupdir); strcpy(fn, p2); p1 = index(fn, ' '); if (p1 == 0) p1 = index(fn, '\n'); if (p1) *p1 = 0; } strcpy(filename, dirname(fn)); if (access(filename, 4) || (fp = fopen(filename, "r")) == NULL) { if (verbose > 3) printf("Can't open %s.\n", filename); continue; } /* * Prevent memory exhaustion on PDP-11's -- cumulative * malloc()'s of unrecognized header lings, if not free()'d, * could use up all of free memory */ for (i = 0; i < NUNREC;i++) if (h.unrec[i] != (char *) NULL) free(h.unrec[i]); else break; if (hread(&h, fp, TRUE) == NULL) { if (verbose) printf("Garbled article %s.\n", filename); fclose(fp); continue; } if (rebuild) { register char *cp; register struct multhist *mhp; /* Format of filename until now was /SPOOL/a/b/c.4 and this code changes it to a.b.c/4 (the correct kind of entry in the history file. */ strcpy (filename, filename + strlen(SPOOL)+1); for (p1 = filename; p1 != NULL && *p1 != '\0'; p1++) if (*p1 == '/' && p1 != rindex (p1, '/')) *p1 = '.'; if ((cp = index(h.nbuf, NGDELIM)) == NULL) { saveit: fprintf(nhfd, "%s\t%s\t%s \n", h.ident, h.recdate, filename); fclose(fp); continue; } for (mhp = multhist; mhp < multhist+multhist_size && mhp->mh_ident[0] != NULL; mhp++) { if (mhp->mh_file == NULL) continue; if (strcmp(mhp->mh_ident, h.ident) != 0) continue; if (index(mhp->mh_file, ' ') != NULL) cp = index(++cp, NGDELIM); strcat(filename, " "); strcat(filename, mhp->mh_file); free(mhp->mh_file); mhp->mh_file = NULL; if (*cp == NULL || (cp = index(++cp, NGDELIM)) == NULL) goto saveit; else break; } /* Here is where we realloc the multhist space rather than the old way of static allocation. It's really trivial. We just clear out the space in case it was reused. The old static array was guaranteed to be cleared since it was cleared when the process started. */ if (mhp >= multhist + multhist_size) { multhist = (struct multhist *) realloc (multhist, sizeof (struct multhist) * (SPACE_INCREMENT + multhist_size)); if (multhist == NULL) xerror("Too many articles with multiple newsgroups"); for (mhp = multhist + multhist_size; mhp < multhist+multhist_size+SPACE_INCREMENT; mhp++) { mhp->mh_ident[0] = '\0'; mhp->mh_file = NULL; } mhp = multhist + multhist_size; multhist_size += SPACE_INCREMENT; } /* This should be malloc'd, but then we have to find an alternate way to look thru the multhist array than mhp->mh_ident[0] (which would not be allocated). */ strcpy(mhp->mh_ident, h.ident); cp = malloc(strlen(filename) + 1); if ( cp == NULL) xerror("Out of memory"); strcpy(cp, filename); mhp->mh_file = cp; fclose(fp); continue; } fclose(fp); checkdate: if (h.expdate[0]) h.exptime = cgtdate(h.expdate); newtime = cgtdate(usepost ? h.subdate : h.recdate) + expincr; if (!h.expdate[0] || ignorexp == 2 || (ignorexp == 1 && newtime < h.exptime)) h.exptime = newtime; if (frflag ? strcmp(baduser,h.from)==0 : now >= h.exptime) { #ifdef DEBUG printf("cancel %s\n", filename); #else if (verbose) printf("cancel %s\n", filename); ulall(p2, &h); # ifdef DBM key.dptr = h.ident; key.dsize = strlen(key.dptr) +1; delete(key); # endif #endif } else { fputs(afline, nhfd); if (verbose > 2) printf("Good article %s\n", rcbuf); } } out: if (rebuild) { register struct multhist *mhp; for (mhp = multhist; mhp < multhist+multhist_size && mhp->mh_ident[0] != NULL; mhp++) /* should "never" happen */ if (mhp->mh_file != NULL ) printf("Article: %s %s Cannot find all links\n", mhp->mh_ident, mhp->mh_file); pclose(nhfd); free (multhist); } if (rebuild || !nohistory) { unlink(OARTFILE); link(ARTFILE, OARTFILE); unlink(ARTFILE); link(NARTFILE, ARTFILE); unlink(NARTFILE); #ifdef DBM if (rebuild) rebuilddbm ( ); #endif } exit(0); } /* Unlink (using tail recursion) all the articles in 'artlist'. */ ulall(artlist, h) char *artlist; struct hbuf *h; { char *p ,*temp; int last = 0; char newname[BUFLEN]; time_t timep[2]; char *fn; if (nohistory) { last = 1; } else { while (*artlist == ' ' || *artlist == '\n' || *artlist == ',') artlist++; if (*artlist == 0) return; p = index(artlist, ' '); if (p == 0) { last = 1; p = index(artlist, '\n'); } if (p == 0) { last = 1; p = index(artlist, ','); } if (p == 0) { last = 1; fn = dirname(artlist); unlink(fn); return; } if (p) *p = 0; } fn = dirname(artlist); if (doarchive && access(OLDNEWS, 0) == 0) { temp = fn + strlen(SPOOL) + 1; sprintf(newname, "%s/%s", OLDNEWS, temp); if (verbose > 1) printf("link %s to %s\n", fn, newname); if (link(fn, newname) == -1) { if (mkparents(newname) == 0) link(fn, newname); } timep[0] = timep[1] = cgtdate(h->subdate); utime(newname, timep); } if (verbose) printf("unlink %s\n", fn); unlink(fn); if (!last) ulall(p + 1, h); } xerror(message) char *message; { printf("expire: %s.\n", message); fflush(stdout); exit(1); } /* * If any parent directories of this dir don't exist, create them. */ mkparents(dirname) char *dirname; { char buf[200], sysbuf[200]; register char *p; int rc; struct passwd *pw; strcpy(buf, dirname); p = rindex(buf, '/'); if (p) *p = '\0'; if (exists(buf)) return (0); mkparents(buf); sprintf(sysbuf, "mkdir %s", buf); rc = system(sysbuf); sprintf(sysbuf, "%s", buf); if (verbose) printf("mkdir %s, rc %d\n", sysbuf, rc); chmod(sysbuf, 0755); if ((pw = getpwnam(NEWSU)) != NULL) chown(sysbuf, pw->pw_uid, pw->pw_gid); return rc; } /* Make sure this file is a legal article. */ islegal (fullname, path, name) register char *fullname; register char *path; register char *name; { struct stat buffer; sprintf (fullname, "%s/%s", path, name); /* make sure the article is numeric. */ while (*name != '\0') if (!isascii (*name) || !isdigit (*name)) return 0; else name++; /* Now make sure we don't have a group like net.micro.432, which is numeric but not a regular file -- i.e., check for being a regular file. */ if ((stat (fullname, &buffer) == 0) && ((buffer.st_mode & S_IFMT) == S_IFREG)) { /* Now that we foun a legal group in a/b/c/4 notation, switch it to a.b.c/4 notation. */ for (name = fullname; name != NULL && *name != '\0'; name++) if (*name == '/' && name != rindex (name, '/')) *name = '.'; return 1; } return 0; } #ifdef DBM /* * This is taken mostly intact from ../cvt/cvt.hist.c and is used at the * end by the options that make a new history file. * Routine to convert history file to dbm file. The old 3 field * history file is still kept there, because we need it for expire * and for a human readable copy. But we keep a dbm hashed copy * around by message ID so we can answer the yes/no question "have * we already seen this message". The content is the ftell offset * into the real history file when we get the article - you can't * really do much with this because the file gets compacted. */ FILE *fd; char namebuf[BUFSIZ]; char lb[BUFSIZ]; rebuilddbm( ) { register char *p, *q; long fpos; datum lhs, rhs; int rv; umask(0); sprintf(namebuf, "%s.dir", ARTFILE); close(creat(namebuf, 0644)); sprintf(namebuf, "%s.pag", ARTFILE); close(creat(namebuf, 0644)); sprintf(namebuf, "%s", ARTFILE); fd = fopen(namebuf, "r"); if (fd == NULL) { perror(namebuf); exit(2); } dbminit(namebuf); while (fpos=ftell(fd), fgets(lb, BUFSIZ, fd) != NULL) { p = index(lb, '\t'); if (p) *p = 0; lhs.dptr = lb; lhs.dsize = strlen(lb) + 1; rhs.dptr = (char *) &fpos; rhs.dsize = sizeof fpos; rv = store(lhs, rhs); if (rv < 0) fprintf(stderr, "store(%s) failed\n", lb); } exit(0); } #endif DBM !!ChuquiCo!!Software!! exit -- From under the bar at Callahan's: Chuq Von Rospach {amd70,fortune,hplabs,menlo70}!nsc!chuqui (408) 733-2600 x242 A toast! To absent friends... {clink}
jerry@oliveb.UUCP (Jerry Aguirre) (04/05/84)
I have fixed it and I am currently testing it out. I needed it to build the dbm(3) database which we had not been using until then. Also some of our machines had their history damaged and need rebuilding. Using the dbm seems to speed up rnews by a factor of 3 or more. News used to still be processing after 11am but is now finished by 8am. The modifications to expire for rebuild/nohistory are not complex. I basically did a popen(3) of a find(1) to generate the names of all the files (find SPOOLDIR -type f -name '[1-9]*'). It was actually simpler than both the old code and the code for working with a history file. I ran into mucho problems with the hread function doing mallocs of unrecognized header lines which were never freed (a previously published bug). As I can find no other reference to the unrecognized header lines I took the simple solution of commenting the malloc out. I also added an option to keep expired IDs in the dbm database for 90 days. This solves the problem of articles which time warp into your system again after they have been expired. If every thing works out I plan to install it on our other 4 sites and after that post the changes to the net. The code used popen already so that should not be a portability problem. Does everybody have the find program? Jerry Aguirre {hplabs|fortune|ios|tolerant|allegra|tymix}!oliveb!jerry
chuqui@nsc.UUCP (Chuq Von Rospach) (04/24/84)
<*sigh* Here we go again folks....> I have had a couple of reports that parts of the expire program don't seem to work properly. The common denominator in this is USG. I run a BSD system, and I have no way of testing the thing on anything else. If you are running a USG system, consider the -r and -h options flakey and/or broken. I am looking through things to see if I can find any blatant BSDisms, but unless I get lucky I doubt I'll find too much. If someone with access to USG wants to fix this thing and post the changes, I'll make sure they get integrated into my copy and try to make sure they get into future releases as well... The -r and -h options DO WORK on BSD systems (both 4.1 and 4.2). chuq -- From under the bar at Callahan's: Chuq Von Rospach {amd70,fortune,hplabs,menlo70}!nsc!chuqui (408) 733-2600 x242 ninety nine dead baboons, sitting in my living room...