rs@uunet.UU.NET (Rich Salz) (08/07/87)
Submitted-by: sundc!hqda-ai!merlin@seismo.CSS.GOV (David S. Hayes) Posting-number: Volume 10, Issue 100 Archive-name: agef [ This is like "du" for files that are, e.g., 1-5 days old, 6-10, etc. As David says, it's useful for Usenet administration. --r$ ] This doesn't seem like much, but it is very helpful when working with news spooling directories. I run this over my news every night, to see where my disk space is being used. Rich: when you get this, please send an knowledgement back so I know it didn't get lost in transit. David S. Hayes, The Merlin of Avalon PhoneNet: (202) 694-6900 UUCP: *!seismo!sundc!hqda-ai!merlin ARPA: merlin%hqda-ai@seismo.css.gov # This is a shell archive. Remove anything before this line, then # unpack it by saving it in a file and typing "sh file". (Files # unpacked will be owned by you and have default permissions.) # # This archive contains: # README Makefile agef.8 agef.c customize.h direct.c hash.c hash.h patchlevel.h # #!/bin/sh echo x - README cat > "README" << '//E*O*F README//' This is the third revision of the AGEF (please pronounce this AGE-F, for "age files") program, which was initially posted to net.sources March 2, 1987. I expected this to be a sleeper, but many people seem to use it, and I have received far more comments and suggestions than I anticipated. CHANGES: Multiply-linked inodes are only counted once. As each inode is examined, the device/inode pair in entered into a hash table. The hashing code is included; you don't need any library support for it. AGEF has been dependent on the UCB directory reading routines. Public-domain routines for System V have been released to the Usenet (comp.sources.unix) by Doug Gwyn (gwyn@brl.mil). AGEF has been modified to use them. If you don't have them, they're worth your trouble to get. Still, you may be able to use the System III configuration of the Makefile as a stopgap measure. The age categories may now be entered on the command line. Use the -d (days) switch. The program can now age by inode change time (-c), file modification time (-m), or time of last access (-a). THANKS: I am particularly indebted to the following people. Paul Czarnecki (harvard!munsell!pac) suggested the display of sizes in megabytes when the numbers get too big, the use of st_blocks to show actual disk blocks used, and gave me the code for user-specified age categories. Anders Andersson (enea!kuling!andersa) suggested the method of handling multiply-linked inodes. His suggestion neatly prevented double-counting, and also allows the handling of "." and ".." as arguments. AGEF previously choked on those. Paul Czarnecki, Anders Andersson, Karl Nyberg, Andrew Partan, and Joel McClung acted as my alpha-testers. Cyrus Rahman, Sid Shapiro, Lyndon Nerenberg, and Lloyd Taylor were the beta-test crew. My thanks to all of them. I am pleased to see that my work has been useful. If you find bugs in it, I'd like to hear about them. Happy hacking, David S. Hayes, The Merlin of Avalon Phone: (202) 694-6900 UUCP: *!mimsy!hqda-ai!merlin ARPA: merlin%hqda-ai@mimsy.umd.EDU Smail: merlin@hqda-ai.UUCP //E*O*F README// echo x - Makefile cat > "Makefile" << '//E*O*F Makefile//' # Build AGEF v3 # # SCCS ID: @(#)Makefile 1.6 7/9/87 # # Define the type of system we're working with. Three # choices: # # 1. BSD Unix 4.2 or 4.3. Directory read support in the # standard library, so we don't have to do much. Select BSD. # # 2. System V. I depend on Doug Gwyn's directory reading # routines. They were posted to Usenet "comp.sources" early in # May 1987. They're worth the effort to get, if you don't have # them already. Select SYS_V. Be sure to define NLIB to be the # 'cc' option to include the directory library. # # 3. System III, or machines without any directory read # packages. I have a minimal kludge. Select SYS_III. # # Case 1: SYS= -DBSD NLIB= # Case 2: #SYS= -DSYS_V #NLIB= -lndir # Case 3: #SYS= -DSYS_III #NLIB= # Standard things you may wish to change: # # INSTALL directory to install agef in INSTALL = /usr/local/bin # The following OPTIONS may be defined: # # LSTAT we have the lstat(2) system call (BSD only) # HSTATS print hashing statistics at end of run # # Define LSTAT, HSTATS here OPTIONS = -DLSTAT # END OF USER-DEFINED OPTIONS CFLAGS= -O $(SYS) $(OPTIONS) SRCS= agef.c hash.c direct.c \ hash.h customize.h patchlevel.h OBJS= agef.o hash.o install: agef install -m 0511 agef $(INSTALL) clean: rm -f $(OBJS) agef *~ agef: $(OBJS) cc -o agef $(CFLAGS) $(OBJS) $(NLIB) agef.o: agef.c direct.c hash.h customize.h patchlevel.h hash.o: hash.c hash.h customize.h patchlevel.h //E*O*F Makefile// echo x - agef.8 cat > "agef.8" << '//E*O*F agef.8//' .\"SCCS ID @(#)agef.8 1.6 7/9/87 .TH AGEF 8 "28 March 1987" .SH SYNOPSIS .B agef [-m | -a | -c ] [-l] [-d days-list] .I file file ... .SH DESCRPITION .B Agef is a tool intended to help manage the expiration of Usenet news. It displays a table of file sizes and counts, sorted by age. Each argument has one line in the table. The columns show the number of files, and their total size. Normally, each argument would be a directory. .B Agef displays the total for all files in the directory, and recursively through all subdirectories. If no arguments are given, .B agef examines the current directory. .PP .B Agef works with inodes, not files. Each inode examined is remembered internally. If it is subsequently encountered again, it is ignored. This will occur in the case of news articles cross-posted to several different newsgroups. There is only one file, but there is a link to it in the directory of every newsgroup where the article was posted. .PP Because .B agef is designed to work with news articles, it does not count the sizes of directory files in its tallies. .PP File ages are based on the modification time (default). This value is set whenever the file is written. Changes in ownership and permission do not affect it. .SH OPTIONS .IP -l Do not follow symbolic links. This applies to BSD systems only. Without this switch, .B agef counts the file the link refers to. With -l, it counts the link itself. .IP -m Use date of last modification. .IP -c Use date of last inode change. .IP -a Use date of last access. .IP "-d \fIdays-list\fP" Specify the age categories. .I days-list is a list of comma-separated numbers. The default is "7,14,30,45". .B Agef will add two additional columns. The first counts files older than the oldest specified time. The second is a total for all files. The times must be specified in ascending numerical sequence. .ne 15v .SH EXAMPLE .nf % cd /usr/man % agef -a -d 7,14,21 man[1-8] 7 days 14 days 21 days Over 21 Total Name ---------- ---------- ---------- ---------- ---------- ---- 25 195k 4 26k 11 76k 398 1398k 438 1695k man1 2 5k 133 394k 135 399k man2 10 34k 1 5k 784 1218k 795 1257k man3 61 351k 61 351k man4 3 5k 3 12k 66 300k 72 317k man5 28 50k 28 50k man6 10 45k 10 45k man7 5 26k 3 20k 152 434k 160 480k man8 45 265k 4 26k 18 113k 1632 4190k 1699 4594k Grand Total .fi .SH AUTHOR David S. Hayes, Site Manager, US Army Artificial Intelligence Center. This program is in the public domain. This manual page describes version 3, which is a considerable improvement over the original. Much of the credit for this goes to Paul Czarnecki and Anders Andersson for their suggestions and bug fixes. .SH BUGS This program uses the directory reading routines of 4.2BSD. Suitable directory routines are available from the Usenet comp.sources archives, courtesy of Doug Gwyn (doug@brl.mil), to allow this program to run under System V. .LP .B Agef uses the .I st_blocks value from .I stat(2) to determine file size. Files under 4.2BSD may contain "holes", that is, a 1-megabyte file may not actually have enough disk blocks allocated to hold a megabyte. The sizes reported are indicative of the actual number of disk blocks used. This is not a bug, just a word of caution. .LP Other bugs may be reported to the author via e-mail to .sp .nf .in +.5i Internet: merlin%hqda-ai@seismo.CSS.GOV UUCP: seismo!sundc!hqda-ai!merlin Smart mailers: merlin@hqda-ai.UUCP //E*O*F agef.8// echo x - agef.c cat > "agef.c" << '//E*O*F agef.c//' /* agef SCCS ID @(#)agef.c 1.6 7/9/87 David S. Hayes, Site Manager Army Artificial Intelligence Center Pentagon HQDA (DACS-DMA) Washington, DC 20310-0200 Phone: 202-694-6900 Email: merlin@hqda-ai.UUCP merlin%hqda-ai@seismo.CSS.GOV +=======================================+ | This program is in the public domain. | +=======================================+ This program scans determines the amount of disk space taken up by files in a named directory. The space is broken down according to the age of the files. The typical use for this program is to determine what the aging breakdown of news articles is, so that appropriate expiration times can be established. Call via agef fn1 fn2 fn3 ... If any of the given filenames is a directory (the usual case), agef will recursively descend into it, and the output line will reflect the accumulated totals for all files in the directory. */ #include "patchlevel.h" #include <ctype.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/param.h> #include <stdio.h> #include "customize.h" #include "hash.h" #define SECS_DAY (24L * 60L * 60L) /* seconds in one day */ #define TOTAL (n_ages-1) /* column number of total * column */ #define SAME 0 /* for strcmp */ #define MAXUNS ((unsigned) -1L) #define MAXAGES 40 /* max number of age columns */ #define BLOCKSIZE 512 /* size of a disk block */ #define K(x) ((x+1) >> 1) /* convert stat(2) blocks into * k's. On my machine, a block * is 512 bytes. */ extern char *optarg; /* from getopt(3) */ extern int optind, opterr; char *Program; /* our name */ short sw_follow_links = 1; /* follow symbolic links */ /* Types of inode times for sw_time. */ #define MODIFIED 1 #define CHANGED 2 #define ACCESSED 3 short sw_time = MODIFIED; short sw_summary; /* print Grand Total line */ short n_ages = 0; /* how many age categories */ unsigned ages[MAXAGES]; /* age categories */ int inodes[MAXAGES];/* inode count */ long sizes[MAXAGES]; /* block count */ char topdir[NAMELEN];/* our starting directory */ long today, time(); /* today's date */ main(argc, argv) int argc; char *argv[]; { int i, j; int option; int total_inodes[MAXAGES]; /* for grand totals */ long total_sizes[MAXAGES]; Program = *argv; /* save our name for error messages */ /* Pick up options from command line */ while ((option = getopt(argc, argv, "smacd:")) != EOF) { switch (option) { case 's': sw_follow_links = 0; break; case 'm': sw_time = MODIFIED; break; case 'a': sw_time = ACCESSED; break; case 'c': sw_time = CHANGED; break; case 'd': n_ages = 0; while (*optarg) { ages[n_ages] = atoi(optarg); /* get day number */ if (ages[n_ages] == 0) /* check, was it a number */ break; /* no, exit the while loop */ n_ages++; while (isdigit(*optarg)) /* advance over the * digits */ optarg++; if (*optarg == ',') /* skip a comma separator */ optarg++; if (n_ages > (MAXAGES - 2)) { fprintf(stderr, "too many ages, max is %d\n", MAXAGES - 2); exit(-1); }; }; ages[n_ages++] = MAXUNS; /* "Over" column */ ages[n_ages++] = MAXUNS; /* "Total" column */ break; }; }; /* If user didn't specify ages, make up some that sound good. */ if (!n_ages) { n_ages = 6; ages[0] = 7; ages[1] = 14; ages[2] = 30; ages[3] = 45; ages[4] = MAXUNS; ages[5] = MAXUNS; }; /* If user didn't specify targets, inspect current directory. */ if (optind >= argc) { argc = 2; argv[1] = "."; }; sw_summary = argc > 2; /* should we do a grant total? */ getwd(topdir); /* find out where we are */ today = time(0) / SECS_DAY; make_headers(); /* print column heades */ /* Zero out grand totals */ for (i = 0; i < n_ages; i++) total_inodes[i] = total_sizes[i] = 0; /* Inspect each argument */ for (i = optind; i < argc; i++) { for (j = 0; j < n_ages; j++) inodes[j] = sizes[j] = 0; chdir(topdir); /* be sure to start from the same place */ get_data(argv[i]); /* this may change our cwd */ display(argv[i], inodes, sizes); for (j = 0; j < n_ages; j++) { total_inodes[j] += inodes[j]; total_sizes[j] += sizes[j]; }; }; if (sw_summary) { putchar('\n'); /* blank line */ display("Grand Total", total_inodes, total_sizes); }; #ifdef HSTATS fflush(stdout); h_stats(); #endif exit(0); }; /* * Get the aged data on a file whose name is given. If the file is a * directory, go down into it, and get the data from all files inside. */ get_data(path) char *path; { struct stat stb; int i; long age; /* file age in days */ #ifdef LSTAT if (sw_follow_links) stat(path, &stb); /* follows symbolic links */ else lstat(path, &stb); /* doesn't follow symbolic links */ #else stat(path, &stb); #endif /* Don't do it again if we've already done it once. */ if (h_enter(stb.st_dev, stb.st_ino) == OLD) return; if ((stb.st_mode & S_IFMT) == S_IFDIR) down(path); if ((stb.st_mode & S_IFMT) == S_IFREG) { switch (sw_time) { case MODIFIED: age = today - stb.st_mtime / SECS_DAY; break; case CHANGED: age = today - stb.st_ctime / SECS_DAY; break; case ACCESSED: age = today - stb.st_atime / SECS_DAY; break; }; for (i = 0; i < TOTAL; i++) { if (age <= ages[i]) { inodes[i]++; sizes[i] += K(stb.st_blocks); break; }; }; inodes[TOTAL]++; sizes[TOTAL] += K(stb.st_blocks); }; } /* * We ran into a subdirectory. Go down into it, and read everything * in there. */ down(subdir) char *subdir; { OPEN *dp; /* stream from a directory */ char cwd[NAMELEN]; READ *file; /* directory entry */ if ((dp = opendir(subdir)) == NULL) { fprintf(stderr, "%s: can't read %s/%s\n", Program, topdir, subdir); return; }; getwd(cwd); /* remember where we are */ chdir(subdir); /* go into subdir */ for (file = readdir(dp); file != NULL; file = readdir(dp)) if (strcmp(NAME(*file), "..") != SAME) get_data(NAME(*file)); chdir(cwd); /* go back where we were */ closedir(dp); /* shut down the directory */ } /* * Print one line of the table. */ display(name, inodes, sizes) char *name; int inodes[]; long sizes[]; { char tmpstr[30]; int i; for (i = 0; i < n_ages; i++) { tmpstr[0] = '\0'; if (inodes[i] || i == TOTAL) { if (sizes[i] < 10000) sprintf(tmpstr, "%d %4ldk", inodes[i], sizes[i]); else sprintf(tmpstr, FLOAT_FORMAT, inodes[i], sizes[i] / 1000.0); }; printf("%10s ", tmpstr); }; printf(" %s\n", name); } /* * Print column headers, given the ages. */ make_headers() { char header[15]; int i; for (i = 0; i < TOTAL; i++) { if (ages[i] == MAXUNS) sprintf(header, "Over %d", ages[i - 1]); else sprintf(header, "%d %s", ages[i], ages[i] > 1 ? "days" : "day"); printf("%10s ", header); }; printf(" Total Name\n"); for (i = 0; i < n_ages; i++) printf("---------- "); printf(" ----\n"); } //E*O*F agef.c// echo x - customize.h cat > "customize.h" << '//E*O*F customize.h//' /* agef SCCS ID @(#)customize.h 1.6 7/9/87 This is the customizations file. It changes our ideas of how to read directories. */ #define FLOAT_FORMAT "%d %#4.1fM" /* if your printf does %# */ /*#define FLOAT_FORMAT "%d %4.1fM" /* if it doesn't do %# */ #define NAMELEN 512 /* max size of a full pathname */ #ifdef BSD # include <sys/dir.h> # define OPEN DIR # define READ struct direct # define NAME(x) ((x).d_name) #else #ifdef SYS_V /* Customize this. This is part of Doug Gwyn's package for */ /* reading directories. If you've put this file somewhere */ /* else, edit the next line. */ # include <sys/dirent.h> # define OPEN struct direct # define READ struct dirent # define NAME(x) ((x).d_name) #else #ifdef SYS_III # define OPEN FILE # define READ struct direct # define NAME(x) ((x).d_name) # define INO(x) ((x).d_ino) # include "direct.c" #endif #endif #endif //E*O*F customize.h// echo x - direct.c cat > "direct.c" << '//E*O*F direct.c//' /* direct.c SCCS ID @(#)direct.c 1.6 7/9/87 * * My own substitution for the berkeley reading routines, * for use on System III machines that don't have any other * alternative. */ #define NAMELENGTH 14 #define opendir(name) fopen(name, "r") #define closedir(fp) fclose(fp) struct dir_entry { /* What the system uses internally. */ ino_t d_ino; char d_name[NAMELENGTH]; }; struct direct { /* What these routines return. */ ino_t d_ino; char d_name[NAMELENGTH]; char terminator; }; /* * Read a directory, returning the next (non-empty) slot. */ READ * readdir(dp) OPEN *dp; { static READ direct; /* This read depends on direct being similar to dir_entry. */ while (fread(&direct, sizeof(struct dir_entry), 1, dp) != 0) { direct.terminator = '\0'; if (INO(direct) != 0) return &direct; }; return (READ *) NULL; } //E*O*F direct.c// echo x - hash.c cat > "hash.c" << '//E*O*F hash.c//' /* hash.c SCCS ID @(#)hash.c 1.6 7/9/87 * Hash table routines for AGEF. These routines keep the program from * counting the same inode twice. This can happen in the case of a * file with multiple links, as in a news article posted to several * groups. The use of a hashing scheme was suggested by Anders * Andersson of Uppsala University, Sweden. (enea!kuling!andersa) */ /* hash.c change history: 28 March 1987 David S. Hayes (merlin@hqda-ai.UUCP) Initial version. */ #include <stdio.h> #include <sys/types.h> #include "hash.h" static struct htable *tables[TABLES]; /* These are for statistical use later on. */ static int hs_tables = 0, /* number of tables allocated */ hs_duplicates = 0, /* number of OLD's returned */ hs_buckets = 0, /* number of buckets allocated */ hs_extensions = 0, /* number of bucket extensions */ hs_searches = 0,/* number of searches */ hs_compares = 0,/* total key comparisons */ hs_longsearch = 0; /* longest search */ /* * This routine takes in a device/inode, and tells whether it's been * entered in the table before. If it hasn't, then the inode is added * to the table. A separate table is maintained for each major device * number, so separate file systems each have their own table. */ h_enter(dev, ino) dev_t dev; ino_t ino; { static struct htable *tablep = (struct htable *) 0; register struct hbucket *bucketp; register ino_t *keyp; int i; hs_searches++; /* stat, total number of calls */ /* * Find the hash table for this device. We keep the table pointer * around between calls to h_enter, so that we don't have to locate * the correct hash table every time we're called. I don't expect * to jump from device to device very often. */ if (!tablep || tablep->device != dev) { for (i = 0; tables[i] && tables[i]->device != dev;) i++; if (!tables[i]) { tables[i] = (struct htable *) malloc(sizeof(struct htable)); if (tables[i] == NULL) { perror("can't malloc hash table"); return NEW; }; bzero(tables[i], sizeof(struct htable)); tables[i]->device = dev; hs_tables++; /* stat, new table allocated */ }; tablep = tables[i]; }; /* Which bucket is this inode assigned to? */ bucketp = &tablep->buckets[ino % BUCKETS]; /* * Now check the key list for that bucket. Just a simple linear * search. */ keyp = bucketp->keys; for (i = 0; i < bucketp->filled && *keyp != ino;) i++, keyp++; hs_compares += i + 1; /* stat, total key comparisons */ if (i && *keyp == ino) { hs_duplicates++; /* stat, duplicate inodes */ return OLD; }; /* Longest search. Only new entries could be the longest. */ if (bucketp->filled >= hs_longsearch) hs_longsearch = bucketp->filled + 1; /* Make room at the end of the bucket's key list. */ if (bucketp->filled == bucketp->length) { /* No room, extend the key list. */ if (!bucketp->length) { bucketp->keys = (ino_t *) calloc(EXTEND, sizeof(ino_t)); if (bucketp->keys == NULL) { perror("can't malloc hash bucket"); return NEW; }; hs_buckets++; } else { bucketp->keys = (ino_t *) realloc(bucketp->keys, (EXTEND + bucketp->length) * sizeof(ino_t)); if (bucketp->keys == NULL) { perror("can't extend hash bucket"); return NEW; }; hs_extensions++; }; bucketp->length += EXTEND; }; bucketp->keys[++(bucketp->filled)] = ino; return NEW; } /* Buffer statistics functions. Print 'em out. */ #ifdef HSTATS void h_stats() { fprintf(stderr, "\nHash table management statistics:\n"); fprintf(stderr, " Tables allocated: %d\n", hs_tables); fprintf(stderr, " Buckets used: %d\n", hs_buckets); fprintf(stderr, " Bucket extensions: %d\n\n", hs_extensions); fprintf(stderr, " Total searches: %d\n", hs_searches); fprintf(stderr, " Duplicate keys found: %d\n", hs_duplicates); if (hs_searches) fprintf(stderr, " Average key search: %d\n", hs_compares / hs_searches); fprintf(stderr, " Longest key search: %d\n", hs_longsearch); fflush(stderr); } #endif //E*O*F hash.c// echo x - hash.h cat > "hash.h" << '//E*O*F hash.h//' /* Defines for the agef hashing functions. SCCS ID @(#)hash.h 1.6 7/9/87 */ #define BUCKETS 257 /* buckets per hash table */ #define TABLES 50 /* hash tables */ #define EXTEND 100 /* how much space to add to a bucket */ struct hbucket { int length; /* key space allocated */ int filled; /* key space used */ ino_t *keys; }; struct htable { dev_t device; /* device this table is for */ struct hbucket buckets[BUCKETS]; /* the buckets of the table */ }; #define OLD 0 /* inode was in hash already */ #define NEW 1 /* inode has been added to hash */ //E*O*F hash.h// echo x - patchlevel.h cat > "patchlevel.h" << '//E*O*F patchlevel.h//' /* Patchlevel for AGEF. SCCS ID @(#)patchlevel.h 1.6 7/9/87 */ #define PATCHLEVEL V3.0 //E*O*F patchlevel.h// echo Possible errors detected by \'wc\' [hopefully none]: temp=/tmp/shar$$ trap "rm -f $temp; exit" 0 1 2 3 15 cat > $temp <<\!!! 50 333 2136 README 69 252 1458 Makefile 115 603 3716 agef.8 323 1088 7614 agef.c 41 148 900 customize.h 46 128 970 direct.c 143 597 4283 hash.c 22 106 625 hash.h 6 13 92 patchlevel.h 815 3268 21794 total !!! wc README Makefile agef.8 agef.c customize.h direct.c hash.c hash.h patchlevel.h | sed 's=[^ ]*/==' | diff -b $temp - exit 0 -- David S. Hayes, The Merlin of Avalon PhoneNet: (202) 694-6900 UUCP: *!seismo!sundc!hqda-ai!merlin ARPA: merlin%hqda-ai@seismo.css.gov -- Rich $alz "Anger is an energy" Cronus Project, BBN Labs rsalz@bbn.com Moderator, comp.sources.unix sources@uunet.uu.net