rs@uunet.UU.NET (Rich Salz) (08/07/87)
Submitted-by: sundc!hqda-ai!merlin@seismo.CSS.GOV (David S. Hayes)
Posting-number: Volume 10, Issue 100
Archive-name: agef
[ This is like "du" for files that are, e.g., 1-5 days old, 6-10, etc.
As David says, it's useful for Usenet administration. --r$ ]
This doesn't seem like much, but it is very helpful when working
with news spooling directories. I run this over my news every
night, to see where my disk space is being used.
Rich: when you get this, please send an knowledgement back so I
know it didn't get lost in transit.
David S. Hayes, The Merlin of Avalon PhoneNet: (202) 694-6900
UUCP: *!seismo!sundc!hqda-ai!merlin ARPA: merlin%hqda-ai@seismo.css.gov
# This is a shell archive. Remove anything before this line, then
# unpack it by saving it in a file and typing "sh file". (Files
# unpacked will be owned by you and have default permissions.)
#
# This archive contains:
# README Makefile agef.8 agef.c customize.h direct.c hash.c hash.h patchlevel.h
#
#!/bin/sh
echo x - README
cat > "README" << '//E*O*F README//'
This is the third revision of the AGEF (please pronounce this
AGE-F, for "age files") program, which was initially posted to
net.sources March 2, 1987. I expected this to be a sleeper, but
many people seem to use it, and I have received far more comments
and suggestions than I anticipated.
CHANGES:
Multiply-linked inodes are only counted once. As each inode
is examined, the device/inode pair in entered into a hash table.
The hashing code is included; you don't need any library support
for it.
AGEF has been dependent on the UCB directory reading
routines. Public-domain routines for System V have been released
to the Usenet (comp.sources.unix) by Doug Gwyn (gwyn@brl.mil).
AGEF has been modified to use them. If you don't have them,
they're worth your trouble to get. Still, you may be able to use
the System III configuration of the Makefile as a stopgap measure.
The age categories may now be entered on the command line.
Use the -d (days) switch.
The program can now age by inode change time (-c), file
modification time (-m), or time of last access (-a).
THANKS:
I am particularly indebted to the following people. Paul
Czarnecki (harvard!munsell!pac) suggested the display of sizes in
megabytes when the numbers get too big, the use of st_blocks to
show actual disk blocks used, and gave me the code for
user-specified age categories.
Anders Andersson (enea!kuling!andersa) suggested the method
of handling multiply-linked inodes. His suggestion neatly
prevented double-counting, and also allows the handling of "." and
".." as arguments. AGEF previously choked on those.
Paul Czarnecki, Anders Andersson, Karl Nyberg, Andrew Partan,
and Joel McClung acted as my alpha-testers. Cyrus Rahman, Sid
Shapiro, Lyndon Nerenberg, and Lloyd Taylor were the beta-test
crew. My thanks to all of them.
I am pleased to see that my work has been useful. If you
find bugs in it, I'd like to hear about them. Happy hacking,
David S. Hayes, The Merlin of Avalon
Phone: (202) 694-6900
UUCP: *!mimsy!hqda-ai!merlin
ARPA: merlin%hqda-ai@mimsy.umd.EDU
Smail: merlin@hqda-ai.UUCP
//E*O*F README//
echo x - Makefile
cat > "Makefile" << '//E*O*F Makefile//'
# Build AGEF v3
#
# SCCS ID: @(#)Makefile 1.6 7/9/87
#
# Define the type of system we're working with. Three
# choices:
#
# 1. BSD Unix 4.2 or 4.3. Directory read support in the
# standard library, so we don't have to do much. Select BSD.
#
# 2. System V. I depend on Doug Gwyn's directory reading
# routines. They were posted to Usenet "comp.sources" early in
# May 1987. They're worth the effort to get, if you don't have
# them already. Select SYS_V. Be sure to define NLIB to be the
# 'cc' option to include the directory library.
#
# 3. System III, or machines without any directory read
# packages. I have a minimal kludge. Select SYS_III.
#
# Case 1:
SYS= -DBSD
NLIB=
# Case 2:
#SYS= -DSYS_V
#NLIB= -lndir
# Case 3:
#SYS= -DSYS_III
#NLIB=
# Standard things you may wish to change:
#
# INSTALL directory to install agef in
INSTALL = /usr/local/bin
# The following OPTIONS may be defined:
#
# LSTAT we have the lstat(2) system call (BSD only)
# HSTATS print hashing statistics at end of run
#
# Define LSTAT, HSTATS here
OPTIONS = -DLSTAT
# END OF USER-DEFINED OPTIONS
CFLAGS= -O $(SYS) $(OPTIONS)
SRCS= agef.c hash.c direct.c \
hash.h customize.h patchlevel.h
OBJS= agef.o hash.o
install: agef
install -m 0511 agef $(INSTALL)
clean:
rm -f $(OBJS) agef *~
agef: $(OBJS)
cc -o agef $(CFLAGS) $(OBJS) $(NLIB)
agef.o: agef.c direct.c hash.h customize.h patchlevel.h
hash.o: hash.c hash.h customize.h patchlevel.h
//E*O*F Makefile//
echo x - agef.8
cat > "agef.8" << '//E*O*F agef.8//'
.\"SCCS ID @(#)agef.8 1.6 7/9/87
.TH AGEF 8 "28 March 1987"
.SH SYNOPSIS
.B agef
[-m | -a | -c ]
[-l]
[-d days-list]
.I file file ...
.SH DESCRPITION
.B Agef
is a tool intended to help manage the expiration of Usenet news.
It displays a table of file sizes and counts, sorted by age.
Each argument has one line in the table.
The columns show the number of files, and their total size.
Normally, each argument would be a directory.
.B Agef
displays the total for all files in the directory,
and recursively through all subdirectories.
If no arguments are given,
.B agef
examines the current directory.
.PP
.B Agef
works with inodes, not files.
Each inode examined is remembered internally.
If it is subsequently encountered again, it is ignored.
This will occur in the case of news articles cross-posted to
several different newsgroups. There is only one file, but there
is a link to it in the directory of every newsgroup where the
article was posted.
.PP
Because
.B agef
is designed to work with news articles, it does not count
the sizes of directory files in its tallies.
.PP
File ages are based on the modification time (default).
This value is set whenever the file is written.
Changes in ownership and permission do not affect it.
.SH OPTIONS
.IP -l
Do not follow symbolic links.
This applies to BSD systems only.
Without this switch,
.B agef
counts the file the link refers to.
With -l, it counts the link itself.
.IP -m
Use date of last modification.
.IP -c
Use date of last inode change.
.IP -a
Use date of last access.
.IP "-d \fIdays-list\fP"
Specify the age categories.
.I days-list
is a list of comma-separated numbers.
The default is "7,14,30,45".
.B Agef
will add two additional columns.
The first counts files older than the oldest specified time.
The second is a total for all files.
The times must be specified in ascending numerical sequence.
.ne 15v
.SH EXAMPLE
.nf
% cd /usr/man
% agef -a -d 7,14,21 man[1-8]
7 days 14 days 21 days Over 21 Total Name
---------- ---------- ---------- ---------- ---------- ----
25 195k 4 26k 11 76k 398 1398k 438 1695k man1
2 5k 133 394k 135 399k man2
10 34k 1 5k 784 1218k 795 1257k man3
61 351k 61 351k man4
3 5k 3 12k 66 300k 72 317k man5
28 50k 28 50k man6
10 45k 10 45k man7
5 26k 3 20k 152 434k 160 480k man8
45 265k 4 26k 18 113k 1632 4190k 1699 4594k Grand Total
.fi
.SH AUTHOR
David S. Hayes, Site Manager, US Army Artificial Intelligence
Center.
This program is in the public domain.
This manual page describes version 3, which is a considerable
improvement over the original.
Much of the credit for this goes to Paul Czarnecki and
Anders Andersson for their suggestions and bug fixes.
.SH BUGS
This program uses the directory reading routines of 4.2BSD.
Suitable directory routines are available from the
Usenet comp.sources archives, courtesy of Doug Gwyn (doug@brl.mil),
to allow this program to run under System V.
.LP
.B Agef
uses the
.I st_blocks
value from
.I stat(2)
to determine file size.
Files under 4.2BSD may contain "holes", that is, a 1-megabyte file
may not actually have enough disk blocks allocated to hold a
megabyte.
The sizes reported are indicative of the actual number of disk
blocks used.
This is not a bug, just a word of caution.
.LP
Other bugs may be reported to the author via e-mail to
.sp
.nf
.in +.5i
Internet: merlin%hqda-ai@seismo.CSS.GOV
UUCP: seismo!sundc!hqda-ai!merlin
Smart mailers: merlin@hqda-ai.UUCP
//E*O*F agef.8//
echo x - agef.c
cat > "agef.c" << '//E*O*F agef.c//'
/* agef
SCCS ID @(#)agef.c 1.6 7/9/87
David S. Hayes, Site Manager
Army Artificial Intelligence Center
Pentagon HQDA (DACS-DMA)
Washington, DC 20310-0200
Phone: 202-694-6900
Email: merlin@hqda-ai.UUCP merlin%hqda-ai@seismo.CSS.GOV
+=======================================+
| This program is in the public domain. |
+=======================================+
This program scans determines the amount of disk space taken up
by files in a named directory. The space is broken down
according to the age of the files. The typical use for this
program is to determine what the aging breakdown of news
articles is, so that appropriate expiration times can be
established.
Call via
agef fn1 fn2 fn3 ...
If any of the given filenames is a directory (the usual case),
agef will recursively descend into it, and the output line will
reflect the accumulated totals for all files in the directory.
*/
#include "patchlevel.h"
#include <ctype.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/param.h>
#include <stdio.h>
#include "customize.h"
#include "hash.h"
#define SECS_DAY (24L * 60L * 60L) /* seconds in one day */
#define TOTAL (n_ages-1) /* column number of total
* column */
#define SAME 0 /* for strcmp */
#define MAXUNS ((unsigned) -1L)
#define MAXAGES 40 /* max number of age columns */
#define BLOCKSIZE 512 /* size of a disk block */
#define K(x) ((x+1) >> 1) /* convert stat(2) blocks into
* k's. On my machine, a block
* is 512 bytes. */
extern char *optarg; /* from getopt(3) */
extern int optind,
opterr;
char *Program; /* our name */
short sw_follow_links = 1; /* follow symbolic links */
/* Types of inode times for sw_time. */
#define MODIFIED 1
#define CHANGED 2
#define ACCESSED 3
short sw_time = MODIFIED;
short sw_summary; /* print Grand Total line */
short n_ages = 0; /* how many age categories */
unsigned ages[MAXAGES]; /* age categories */
int inodes[MAXAGES];/* inode count */
long sizes[MAXAGES]; /* block count */
char topdir[NAMELEN];/* our starting directory */
long today,
time(); /* today's date */
main(argc, argv)
int argc;
char *argv[];
{
int i,
j;
int option;
int total_inodes[MAXAGES]; /* for grand totals */
long total_sizes[MAXAGES];
Program = *argv; /* save our name for error messages */
/* Pick up options from command line */
while ((option = getopt(argc, argv, "smacd:")) != EOF) {
switch (option) {
case 's':
sw_follow_links = 0;
break;
case 'm':
sw_time = MODIFIED;
break;
case 'a':
sw_time = ACCESSED;
break;
case 'c':
sw_time = CHANGED;
break;
case 'd':
n_ages = 0;
while (*optarg) {
ages[n_ages] = atoi(optarg); /* get day number */
if (ages[n_ages] == 0) /* check, was it a number */
break; /* no, exit the while loop */
n_ages++;
while (isdigit(*optarg)) /* advance over the
* digits */
optarg++;
if (*optarg == ',') /* skip a comma separator */
optarg++;
if (n_ages > (MAXAGES - 2)) {
fprintf(stderr, "too many ages, max is %d\n", MAXAGES - 2);
exit(-1);
};
};
ages[n_ages++] = MAXUNS; /* "Over" column */
ages[n_ages++] = MAXUNS; /* "Total" column */
break;
};
};
/* If user didn't specify ages, make up some that sound good. */
if (!n_ages) {
n_ages = 6;
ages[0] = 7;
ages[1] = 14;
ages[2] = 30;
ages[3] = 45;
ages[4] = MAXUNS;
ages[5] = MAXUNS;
};
/* If user didn't specify targets, inspect current directory. */
if (optind >= argc) {
argc = 2;
argv[1] = ".";
};
sw_summary = argc > 2; /* should we do a grant total? */
getwd(topdir); /* find out where we are */
today = time(0) / SECS_DAY;
make_headers(); /* print column heades */
/* Zero out grand totals */
for (i = 0; i < n_ages; i++)
total_inodes[i] = total_sizes[i] = 0;
/* Inspect each argument */
for (i = optind; i < argc; i++) {
for (j = 0; j < n_ages; j++)
inodes[j] = sizes[j] = 0;
chdir(topdir); /* be sure to start from the same place */
get_data(argv[i]); /* this may change our cwd */
display(argv[i], inodes, sizes);
for (j = 0; j < n_ages; j++) {
total_inodes[j] += inodes[j];
total_sizes[j] += sizes[j];
};
};
if (sw_summary) {
putchar('\n'); /* blank line */
display("Grand Total", total_inodes, total_sizes);
};
#ifdef HSTATS
fflush(stdout);
h_stats();
#endif
exit(0);
};
/*
* Get the aged data on a file whose name is given. If the file is a
* directory, go down into it, and get the data from all files inside.
*/
get_data(path)
char *path;
{
struct stat stb;
int i;
long age; /* file age in days */
#ifdef LSTAT
if (sw_follow_links)
stat(path, &stb); /* follows symbolic links */
else
lstat(path, &stb); /* doesn't follow symbolic links */
#else
stat(path, &stb);
#endif
/* Don't do it again if we've already done it once. */
if (h_enter(stb.st_dev, stb.st_ino) == OLD)
return;
if ((stb.st_mode & S_IFMT) == S_IFDIR)
down(path);
if ((stb.st_mode & S_IFMT) == S_IFREG) {
switch (sw_time) {
case MODIFIED:
age = today - stb.st_mtime / SECS_DAY;
break;
case CHANGED:
age = today - stb.st_ctime / SECS_DAY;
break;
case ACCESSED:
age = today - stb.st_atime / SECS_DAY;
break;
};
for (i = 0; i < TOTAL; i++) {
if (age <= ages[i]) {
inodes[i]++;
sizes[i] += K(stb.st_blocks);
break;
};
};
inodes[TOTAL]++;
sizes[TOTAL] += K(stb.st_blocks);
};
}
/*
* We ran into a subdirectory. Go down into it, and read everything
* in there.
*/
down(subdir)
char *subdir;
{
OPEN *dp; /* stream from a directory */
char cwd[NAMELEN];
READ *file; /* directory entry */
if ((dp = opendir(subdir)) == NULL) {
fprintf(stderr, "%s: can't read %s/%s\n", Program, topdir, subdir);
return;
};
getwd(cwd); /* remember where we are */
chdir(subdir); /* go into subdir */
for (file = readdir(dp); file != NULL; file = readdir(dp))
if (strcmp(NAME(*file), "..") != SAME)
get_data(NAME(*file));
chdir(cwd); /* go back where we were */
closedir(dp); /* shut down the directory */
}
/*
* Print one line of the table.
*/
display(name, inodes, sizes)
char *name;
int inodes[];
long sizes[];
{
char tmpstr[30];
int i;
for (i = 0; i < n_ages; i++) {
tmpstr[0] = '\0';
if (inodes[i] || i == TOTAL) {
if (sizes[i] < 10000)
sprintf(tmpstr, "%d %4ldk", inodes[i], sizes[i]);
else
sprintf(tmpstr, FLOAT_FORMAT, inodes[i], sizes[i] / 1000.0);
};
printf("%10s ", tmpstr);
};
printf(" %s\n", name);
}
/*
* Print column headers, given the ages.
*/
make_headers()
{
char header[15];
int i;
for (i = 0; i < TOTAL; i++) {
if (ages[i] == MAXUNS)
sprintf(header, "Over %d", ages[i - 1]);
else
sprintf(header, "%d %s", ages[i], ages[i] > 1 ? "days" : "day");
printf("%10s ", header);
};
printf(" Total Name\n");
for (i = 0; i < n_ages; i++)
printf("---------- ");
printf(" ----\n");
}
//E*O*F agef.c//
echo x - customize.h
cat > "customize.h" << '//E*O*F customize.h//'
/* agef
SCCS ID @(#)customize.h 1.6 7/9/87
This is the customizations file. It changes our ideas of
how to read directories.
*/
#define FLOAT_FORMAT "%d %#4.1fM" /* if your printf does %# */
/*#define FLOAT_FORMAT "%d %4.1fM" /* if it doesn't do %# */
#define NAMELEN 512 /* max size of a full pathname */
#ifdef BSD
# include <sys/dir.h>
# define OPEN DIR
# define READ struct direct
# define NAME(x) ((x).d_name)
#else
#ifdef SYS_V
/* Customize this. This is part of Doug Gwyn's package for */
/* reading directories. If you've put this file somewhere */
/* else, edit the next line. */
# include <sys/dirent.h>
# define OPEN struct direct
# define READ struct dirent
# define NAME(x) ((x).d_name)
#else
#ifdef SYS_III
# define OPEN FILE
# define READ struct direct
# define NAME(x) ((x).d_name)
# define INO(x) ((x).d_ino)
# include "direct.c"
#endif
#endif
#endif
//E*O*F customize.h//
echo x - direct.c
cat > "direct.c" << '//E*O*F direct.c//'
/* direct.c
SCCS ID @(#)direct.c 1.6 7/9/87
*
* My own substitution for the berkeley reading routines,
* for use on System III machines that don't have any other
* alternative.
*/
#define NAMELENGTH 14
#define opendir(name) fopen(name, "r")
#define closedir(fp) fclose(fp)
struct dir_entry { /* What the system uses internally. */
ino_t d_ino;
char d_name[NAMELENGTH];
};
struct direct { /* What these routines return. */
ino_t d_ino;
char d_name[NAMELENGTH];
char terminator;
};
/*
* Read a directory, returning the next (non-empty) slot.
*/
READ *
readdir(dp)
OPEN *dp;
{
static READ direct;
/* This read depends on direct being similar to dir_entry. */
while (fread(&direct, sizeof(struct dir_entry), 1, dp) != 0) {
direct.terminator = '\0';
if (INO(direct) != 0)
return &direct;
};
return (READ *) NULL;
}
//E*O*F direct.c//
echo x - hash.c
cat > "hash.c" << '//E*O*F hash.c//'
/* hash.c
SCCS ID @(#)hash.c 1.6 7/9/87
* Hash table routines for AGEF. These routines keep the program from
* counting the same inode twice. This can happen in the case of a
* file with multiple links, as in a news article posted to several
* groups. The use of a hashing scheme was suggested by Anders
* Andersson of Uppsala University, Sweden. (enea!kuling!andersa)
*/
/* hash.c change history:
28 March 1987 David S. Hayes (merlin@hqda-ai.UUCP)
Initial version.
*/
#include <stdio.h>
#include <sys/types.h>
#include "hash.h"
static struct htable *tables[TABLES];
/* These are for statistical use later on. */
static int hs_tables = 0, /* number of tables allocated */
hs_duplicates = 0, /* number of OLD's returned */
hs_buckets = 0, /* number of buckets allocated */
hs_extensions = 0, /* number of bucket extensions */
hs_searches = 0,/* number of searches */
hs_compares = 0,/* total key comparisons */
hs_longsearch = 0; /* longest search */
/*
* This routine takes in a device/inode, and tells whether it's been
* entered in the table before. If it hasn't, then the inode is added
* to the table. A separate table is maintained for each major device
* number, so separate file systems each have their own table.
*/
h_enter(dev, ino)
dev_t dev;
ino_t ino;
{
static struct htable *tablep = (struct htable *) 0;
register struct hbucket *bucketp;
register ino_t *keyp;
int i;
hs_searches++; /* stat, total number of calls */
/*
* Find the hash table for this device. We keep the table pointer
* around between calls to h_enter, so that we don't have to locate
* the correct hash table every time we're called. I don't expect
* to jump from device to device very often.
*/
if (!tablep || tablep->device != dev) {
for (i = 0; tables[i] && tables[i]->device != dev;)
i++;
if (!tables[i]) {
tables[i] = (struct htable *) malloc(sizeof(struct htable));
if (tables[i] == NULL) {
perror("can't malloc hash table");
return NEW;
};
bzero(tables[i], sizeof(struct htable));
tables[i]->device = dev;
hs_tables++; /* stat, new table allocated */
};
tablep = tables[i];
};
/* Which bucket is this inode assigned to? */
bucketp = &tablep->buckets[ino % BUCKETS];
/*
* Now check the key list for that bucket. Just a simple linear
* search.
*/
keyp = bucketp->keys;
for (i = 0; i < bucketp->filled && *keyp != ino;)
i++, keyp++;
hs_compares += i + 1; /* stat, total key comparisons */
if (i && *keyp == ino) {
hs_duplicates++; /* stat, duplicate inodes */
return OLD;
};
/* Longest search. Only new entries could be the longest. */
if (bucketp->filled >= hs_longsearch)
hs_longsearch = bucketp->filled + 1;
/* Make room at the end of the bucket's key list. */
if (bucketp->filled == bucketp->length) {
/* No room, extend the key list. */
if (!bucketp->length) {
bucketp->keys = (ino_t *) calloc(EXTEND, sizeof(ino_t));
if (bucketp->keys == NULL) {
perror("can't malloc hash bucket");
return NEW;
};
hs_buckets++;
} else {
bucketp->keys = (ino_t *)
realloc(bucketp->keys,
(EXTEND + bucketp->length) * sizeof(ino_t));
if (bucketp->keys == NULL) {
perror("can't extend hash bucket");
return NEW;
};
hs_extensions++;
};
bucketp->length += EXTEND;
};
bucketp->keys[++(bucketp->filled)] = ino;
return NEW;
}
/* Buffer statistics functions. Print 'em out. */
#ifdef HSTATS
void
h_stats()
{
fprintf(stderr, "\nHash table management statistics:\n");
fprintf(stderr, " Tables allocated: %d\n", hs_tables);
fprintf(stderr, " Buckets used: %d\n", hs_buckets);
fprintf(stderr, " Bucket extensions: %d\n\n", hs_extensions);
fprintf(stderr, " Total searches: %d\n", hs_searches);
fprintf(stderr, " Duplicate keys found: %d\n", hs_duplicates);
if (hs_searches)
fprintf(stderr, " Average key search: %d\n",
hs_compares / hs_searches);
fprintf(stderr, " Longest key search: %d\n", hs_longsearch);
fflush(stderr);
}
#endif
//E*O*F hash.c//
echo x - hash.h
cat > "hash.h" << '//E*O*F hash.h//'
/* Defines for the agef hashing functions.
SCCS ID @(#)hash.h 1.6 7/9/87
*/
#define BUCKETS 257 /* buckets per hash table */
#define TABLES 50 /* hash tables */
#define EXTEND 100 /* how much space to add to a bucket */
struct hbucket {
int length; /* key space allocated */
int filled; /* key space used */
ino_t *keys;
};
struct htable {
dev_t device; /* device this table is for */
struct hbucket buckets[BUCKETS]; /* the buckets of the table */
};
#define OLD 0 /* inode was in hash already */
#define NEW 1 /* inode has been added to hash */
//E*O*F hash.h//
echo x - patchlevel.h
cat > "patchlevel.h" << '//E*O*F patchlevel.h//'
/* Patchlevel for AGEF.
SCCS ID @(#)patchlevel.h 1.6 7/9/87
*/
#define PATCHLEVEL V3.0
//E*O*F patchlevel.h//
echo Possible errors detected by \'wc\' [hopefully none]:
temp=/tmp/shar$$
trap "rm -f $temp; exit" 0 1 2 3 15
cat > $temp <<\!!!
50 333 2136 README
69 252 1458 Makefile
115 603 3716 agef.8
323 1088 7614 agef.c
41 148 900 customize.h
46 128 970 direct.c
143 597 4283 hash.c
22 106 625 hash.h
6 13 92 patchlevel.h
815 3268 21794 total
!!!
wc README Makefile agef.8 agef.c customize.h direct.c hash.c hash.h patchlevel.h | sed 's=[^ ]*/==' | diff -b $temp -
exit 0
--
David S. Hayes, The Merlin of Avalon PhoneNet: (202) 694-6900
UUCP: *!seismo!sundc!hqda-ai!merlin ARPA: merlin%hqda-ai@seismo.css.gov
--
Rich $alz "Anger is an energy"
Cronus Project, BBN Labs rsalz@bbn.com
Moderator, comp.sources.unix sources@uunet.uu.net