[comp.sources.misc] v10i065: Watching disk usage

jv@mh.nl (Johan Vromans) (02/14/90)

Posting-number: Volume 10, Issue 65
Submitted-by: jv@mh.nl (Johan Vromans)
Archive-name: dusage.pl

Guarding disk space is one of the problems of system management. 

Some time ago I converted an old awk/sed/sh script to keep track of
disk usage to a new perl program, added new features, options; even
wrote a manual page.

Using a list of pathnames, this program filters the output of du(1) to
find the amount of disk space used for each of the paths (actually, it
collects all values in one single du run). It adds the new value to
the list, shifting old values up. It then generates a nice report of
the amount of disk space occupied in each of the specified paths,
together with the amount it grew (or shrinked) since the previous run,
and since 7 runs ago. When run daily, this gives daily and weekly
figures.

#---------------------------------- cut here ----------------------------------
# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by Johan Vromans <jv@mh.nl> on Sat Feb  3 21:36:57 1990
#
# This archive contains:
#	README		dusage.1	dusage.pl	
#
# Error checking via wc(1) will be performed.

LANG=""; export LANG

echo x - README
cat >README <<'@EOF'
Guarding disk space is one of the problems of system management. 

Some time ago I converted an old awk/sed/sh script to keep track of
disk usage to a new perl program, added new features, options; even
wrote a manual page.

Using a list of pathnames, this program filters the output of du(1) to
find the amount of disk space used for each of the paths (actually, it
collects all values in one single du run). It adds the new value to
the list, shifting old values up. It then generates a nice report of
the amount of disk space occupied in each of the specified paths,
together with the amount it grew (or shrinked) since the previous run,
and since 7 runs ago. When run daily, this gives daily and weekly
figures.

# This program requires perl version 3.0, patchlevel 4 or higher.

# Copyright 1990 Johan Vromans, all rights reserved.
# Peaceware. This program may be used, modified and distributed as long as
# this copyright notice remains part of the source. It may not be sold, or 
# be used to harm any living creature including the world and the universe.
@EOF
set `wc -lwc <README`
if test $1$2$3 != 211931064
then
	echo ERROR: wc results of README are $* should be 21 193 1064
fi

chmod 644 README

echo x - dusage.1
cat >dusage.1 <<'@EOF'
.TH DUSAGE 1
.SH NAME
dusage \- provide disk usage statistics
.SH SYNOPSIS
.B dusage
.RB [ \-afghruD ]
.RI "[\fB\-i\fR" " input" ]
.RI "[\fB\-p\fR" " dir" ]
.RI [ "control file" ]
.SH DESCRIPTION
.I Dusage
is a perl script which produces disk usage statistics. These
statistics include the number of blocks, the increment since the previous run
(which is assumed to be yesterday if run daily), and the increment
since 7 runs ago (which could be interpreted as a week if run daily).
.I Dusage
is driven by a 
.IR "control file" ,
which describes the names of the files (directories) to be reported,
and which also contains the results of previous runs.
.PP
When
.I dusage
is run, it reads the
.IR "control file" ,
[optionally] gathers new disk usage values by calling
.IR du (1),
prints the report, and [optionally] updates the
.I control file
with the new information.
.PP
Filenames in the control file may have wildcards. In this case, the
wildcards are expanded, and all entries reported. Both the expanded
names as the wildcard info are maintained in the control file. New
files in these directories will automatically show up, deleted files
will disappear when they have run out of data in the control file (but
see the 
.B \-r
option).
.br
Wildcard expansion only adds filenames which are not already on the list.
.PP
The control file may also contain filenames preceded with an
exclamation mark ``!''; these entries are skipped. This is meaningful
in conjunction with wildcards, to exclude entries which result from a
wildcard expansion.
.PP
The control file may have lines starting with a dash ``\-'',
which causes the report to start on a new page. Any text following the
dash is placed in the page header, immediately following the text
``Disk usage statistics''.
.PP
The available command line options are:
.TP 5
.B \-D
Turns on debugging, which yields lots of trace information.
.TP
.B \-a
Reports the statistics for this and all previous runs, as opposed to
the normal case, which is to generate the statistics for this run, and
the differences between the previous and 7th previous run.
.TP
.B \-f
Reports file statistics also. Default is to only report directories.
.TP
.B \-g
Gathers new data by calling 
.IR du (1).
.TP
.B \-h
Provides a help message. No work is done.
.TP
.BI \-i " input"
Uses
.I input
as data obtained by calling
.IR du (1).
.TP
.BI \-p " dir"
All filenames in the control file are interpreted relative to this
directory.
.TP
.B \-r
Retains entries which don't have any data anymore. If this option is
not used, entries without data are not reported, and removed from the
control file.
.TP
.B \-u
Update the control file with new values.
.PP
The default name for the control file is
.BR .du.ctl ,
optionally preceded by the name supplied with the
.B \-p
option.
.SH EXAMPLES
Given the following control file:
.sp
.nf
.ne 3
.in +.5i
\- for manual page
maildir
maildir/*
!maildir/unimportant
src
.in
.fi
.sp
This will generate the following (example) report when running the
command ``dusage -gu controlfile'':
.sp
.nf
.ne 7
.in +.5i
Disk usage statistics for manual page      Wed Jan 10 13:38

 blocks    +day     +week  directory
-------  -------  -------  --------------------------------
   6518                    maildir
      2                    maildir/dirent
    498                    src
.in
.fi
.sp
After updating the control file, it will contain:
.sp
.nf
.ne 4
.in +.5i
\- for manual page
maildir 6518::::::
maildir/dirent  2::::::
maildir/*
!maildir/unimportant
src     498::::::
.in
.fi
.sp
The names in the control file are separated by the values with a TAB;
the values are separated with colons. Also, the entries found by
expanding the wildcard are added. If the wildcard expansion had
generated a name ``maildir/unimportant'' it would have been skipped.
.br
When the program is rerun after one day, it could print the following
report:
.sp
.nf
.ne 7
.in +.5i
Disk usage statistics for manual page      Wed Jan 10 13:38

 blocks    +day     +week  directory
-------  -------  -------  --------------------------------
   6524       +6           maildir
      2        0           maildir/dirent
    486      -12           src
.in
.fi
.sp
The control file will contain:
.sp
.nf
.ne 4
.in +.5i
\- for manual page
maildir 6524:6518:::::
maildir/dirent  2:2:::::
maildir/*
!maildir/unimportant
src     486:498:::::
.in
.fi
.sp
It takes very little fantasy to imagine what will happen on subsequent
runs...
.PP
When the contents of the control file are to be changed, e.g. to add
new filenames, a normal text editor can be used. Just add or remove
lines, and they will be taken into account automatically.
.PP
When run without 
.B \-g
or
.B \-u
options, it actually reproduces the report from the previous run.
.PP
When multiple runs are required, save the output of
.IR du (1)
in a file, and pass this file to
.I dusage
using the 
.BI \-i "file"
option.
.SH BUGS
Running the same control file with different values of the 
.B \-f
and
.B \-r
options may cause strange results.
.SH AUTHOR
Johan Vromans, Multihouse Research, Gouda, The Netherlands.
.sp
Send bugs and remarks to <jv@mh.nl> .
@EOF
set `wc -lwc <dusage.1`
if test $1$2$3 != 2048505139
then
	echo ERROR: wc results of dusage.1 are $* should be 204 850 5139
fi

chmod 444 dusage.1

echo x - dusage.pl
sed 's/^@//' >dusage.pl <<'@EOF'
#!/usr/bin/perl

# This program requires perl version 3.0, patchlevel 4 or higher.

# Copyright 1990 Johan Vromans, all rights reserved.
# Peaceware. This program may be used, modified and distributed as long as
# this copyright notice remains part of the source. It may not be sold, or 
# be used to harm any living creature including the world and the universe.

$my_name = $0;

################ usage ################

sub usage {
  local ($help) = shift (@_);
  local ($usg) = "usage: $my_name [-afghruD][-i input][-p dir] ctlfile";
  die "$usg\nstopped" unless $help;
  print STDERR "$usg\n";
  print STDERR <<EndOfHelp

    -D          - provide debugging info
    -a          - provide all statis
    -f          - also report file statistics
    -g          - gather new data
    -h          - this help message
    -i input    - input data as obtained by 'du dir' [def = 'du dir']
    -p dir      - path to which files in the control file are relative
    -r          - do not discard entries which don't have data
    -u          - update the control file with new values
    ctlfile     - file which controls which dirs to report [def = dir/.du.ctl]
EndOfHelp
  ;
  exit 1;
}

################ main stream ################

&do_get_options;		# process options
&do_parse_ctl;			# read the control file
&do_gather if $gather;		# gather new info
&do_report_and_update;		# report and update

################ end of main stream ################

################ other subroutines ################

sub do_get_options {

  # Default values for options

  $debug = 0;
  $noupdate = 1;
  $retain = 0;
  $gather = 0;
  $allfiles = 0;
  $allstats = 0;

  # Command line options. We use a modified version of getopts.pl.

  &usage (0) if &Getopts ("Dafghi:p:ru");
  &usage (1) if $opt_h;
  &usage (0) if $#ARGV > 0;

  $debug    |= $opt_D if defined $opt_D;	# -D -> debug
  $allstats |= $opt_a if defined $opt_a;	# -a -> all stats
  $allfiles |= $opt_f if defined $opt_f;	# -f -> report all files
  $gather   |= $opt_g if defined $opt_g;	# -g -> gather new data
  $retain   |= $opt_r if defined $opt_r;	# -r -> retain old entries
  $noupdate = !$opt_u if defined $opt_u;	# -u -> update the control file
  $du        = $opt_i if defined $opt_i;	# -i input file
  if ( defined $opt_p ) {			# -p path
    $root = $opt_p;
    $root = $` while ($root =~ m|/$|);
    $prefix = "$root/";
    $root = "/" if $root eq "";
  }
  else {
    $prefix = $root = "";
  }
  $table    = ($#ARGV == 0) ? shift (@ARGV) : "$prefix.du.ctl";
  $runtype = $allfiles ? "file" : "directory";
  if ($debug) {
    print STDERR "@(#)@ dusage	1.7 - dusage.pl\n";
    print STDERR "Options:";
    print STDERR " debug" if $debug;	# silly, isn't it...
    print STDERR $noupdate ? " no" : " ", "update";
    print STDERR $retain ? " " : " no", "retain";
    print STDERR $gather ? " " : " no", "gather";
    print STDERR $allstats ? " " : " no", "allstats";
    print STDERR "\n";
    print STDERR "Root = $root [prefix = $prefix]\n";
    print STDERR "Control file = $table\n";
    print STDERR "Input data = $du\n" if defined $du;
    print STDERR "Run type = $runtype\n";
    print STDERR "\n";
  }
}

sub do_parse_ctl {

  # Parsing the control file.
  #
  # This file contains the names of the (sub)directories to tally,
  # and the values dereived from previous runs.
  # The names of the directories are relative to the $root.
  # The name may contain '*' or '?' characters, and will be globbed if so.
  # An entry starting with ! is excluded.
  #
  # To add a new dir, just add the name. The special name '.' may 
  # be used to denote the $root directory. If used, '-p' must be
  # specified.
  #
  # Upon completion:
  #  - %oldblocks is filled with the previous values,
  #    colon separated, for each directory.
  #  - @targets contains a list of names to be looked for. These include
  #    break indications and globs info, which will be stripped from
  #    the actual search list.

  open (tb, "<$table") || die "Cannot open control file $table, stopped";
  @targets = ();
  %oldblocks = ();
  %newblocks = ();

  while ($tb = <tb>) {
    chop ($tb);

    # preferred syntax: <dir><TAB><size>:<size>:....
    # allowable	      <dir><TAB><size> <size> ...
    # possible	      <dir>

    if ( $tb =~ /^-/ ) {	# break
      push (@targets, "$tb");
      printf STDERR "tb: *break* $tb\n" if $debug;
      next;
    }

    if ( $tb =~ /^!/ ) {	# exclude
      $excl = $';		#';
      @a = grep ($_ ne $excl, @targets);
      @targets = @a;
      push (@targets, "*$tb");
      printf STDERR "tb: *excl* $tb\n" if $debug;
      next;
    }

    if ($tb =~ /^(.+)\t([\d: ]+)/) {
      $name = $1;
      @blocks = split (/[ :]/, $2);
    }
    else {
      $name = $tb;
      @blocks = ("","","","","","","","");
    }

    if ($name eq ".") {
      if ( $root eq "" ) {
	printf STDERR "Warning: \".\" in control file w/o \"-p path\" - ignored\n";
	next;
      }
      $name = $root;
    } else {
      $name = $prefix . $name unless ord($name) == ord ("/");
    }

    # Check for globs ...
    if ( $gather && $name =~ /\*|\?/ ) {
      print STDERR "glob: $name\n" if $debug;
      foreach $n ( <${name}> ) {
	next unless $allfiles || -d $n;
	# Globs never overwrite existing entries
	if ( !defined $oldblocks{$n} ) {
	  $oldblocks{$n} = ":::::::";
	  push (@targets, $n);
	}
	printf STDERR "glob: -> $n\n" if $debug;
      }
      # Put on the globs list, and terminate this entry
      push (@targets, "*$name");
      next;
    }

    push (@targets, "$name");
    # Entry may be rewritten (in case of globs)
    $oldblocks{$name} = join (":", @blocks[0..7]);

    print STDERR "tb: $name\t$oldblocks{$name}\n" if $debug;
  }
  close (tb);
}

sub do_gather {

  # Build a targets match string, and an optimized list of directories to
  # search.
  $targets = "//";
  @list = ();
  $last = "///";
  foreach $name (sort (@targets)) {
    next if $name =~ /^[-*]/;
    next unless $allfiles || -d $name;
    $targets .= "$name//"; 
    next if ($name =~ m|^$last/|);
    push (@list, $name);
    $last = $name;
  }

  print STDERR "targets: $targets\n" if $debug;
  print STDERR "list: @list\n" if $debug;
  print STDERR "reports: @targets\n" if $debug;

  $du = "du " . ($allfiles ? "-a" : "") . " @list|"
    unless defined $du; # in which case we have a data file

  # Process the data. If a name is found in the target list, 
  # %newblocks will be set to the new blocks value.

  open (du, "$du") || die "Cannot get data from $du, stopped";
  while ($du = <du>) {
    chop ($du);
    ($blocks,$name) = split (/\t/, $du);
    if (($i = index ($targets, "//$name//")) >= 0) {
      # tally and remove entry from search list
      $newblocks{$name} = $blocks;
      print STDERR "du: $name $blocks\n" if $debug;
      substr ($targets, $i, length($name) + 2) = "";
    }
  }
  close (du);
}


# Report generation

format std_hdr =
Disk usage statistics@<<<<<<<<<<<<<<<<<<<<<@<<<<<<<<<<<<<<<
$subtitle, $date

 blocks    +day     +week  @<<<<<<<<<<<<<<<
$runtype
-------  -------  -------  --------------------------------
@.
format std_out =
@@>>>>>> @>>>>>>> @>>>>>>>  ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
$blocks, $d_day, $d_week, $name
@.

format all_hdr =
Disk usage statistics@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<           @<<<<<<<<<<<<<<<
$subtitle, $date

 --0--    --1--    --2--    --3--    --4--    --5--    --6--    --7--   @<<<<<<<<<<<<<<<
$runtype
-------  -------  -------  -------  -------  -------  -------  -------  --------------------------------
@.
format all_out =
@@>>>>>> @>>>>>>> @>>>>>>> @>>>>>>> @>>>>>>> @>>>>>>> @>>>>>>> @>>>>>>>  ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
$a[0],  $a[1],   $a[2],   $a[3],   $a[4],   $a[5],   $a[6],   $a[7],    $name
@.

sub do_report_and_update {

  # Prepare update of the control file
  if ( !$noupdate ) {
    if ( !open (tb, ">$table") ) {
      print STDERR "Warning: cannot update control file $table - continuing\n";
      $noupdate = 1;
    }
  }

  if ( $allstats ) {
    $^ = "all_hdr";
    $~ = "all_out";
  }
  else {
    $^ = "std_hdr";
    $~ = "std_out";
  }
  $date = `date`;
  chop ($date);

  # In one pass the report is generated, and the control file rewritten.

  foreach $name (@targets) {
    if ($name =~ /^-/ ) {
      $subtitle = $';				#';
      print tb "$name\n" unless $noupdate;
      print STDERR "tb: $name\n" if $debug;
      $- = -1;
      next;
    }
    if ($name  =~ /^\*$prefix/ ) {
      print tb "$'\n" unless $noupdate;		#';
      print STDERR "tb: $'\n" if $debug;	#';
      next;
    }
    @a = split (/:/, $oldblocks{$name});
    unshift (@a, $newblocks{$name}) if $gather;
    $name = "." if $name eq $root;
    $name = $' if $name =~ /^$prefix/;		#';
    if ($#a < 0) {	# no data?
      if ($retain) {
	@a = ("","","","","","","","");
      }
      else {
	# Discard
	print STDERR "--: $name\n" if $debug;
	next;
      }
    }
    print STDERR "Warning: ", 1+$#a, " entries for $name\n"
      if ($debug && $#a != 8);
    $line = "$name\t" . join(":",@a[0..7]) . "\n";
    print tb $line unless $noupdate;
    print STDERR "tb: $line" if $debug;

    $blocks = $a[0];
    if ( !$allstats ) {
      $d_day = $d_week = "";
      if ($blocks ne "") {
	if ($a[1] ne "") {		# dayly delta
	  $d_day = $blocks - $a[1];
	  $d_day = "+" . $d_day if $d_day > 0;
	}
	if ($a[7] ne "") {		# weekly delta
	  $d_week = $blocks - $a[7];
	  $d_week = "+" . $d_week if $d_week > 0;
	}
      }
    }
    write;
  }

  # Close control file, if opened
  close (tb) unless $noupdate;
}

# Modified version of getopts ...

sub Getopts {
    local($argumentative) = @_;
    local(@args,$_,$first,$rest);
    local($opterr) = 0;

    @args = split( / */, $argumentative );
    while(($_ = $ARGV[0]) =~ /^-(.)(.*)/) {
	($first,$rest) = ($1,$2);
	$pos = index($argumentative,$first);
	if($pos >= $[) {
	    if($args[$pos+1] eq ':') {
		shift(@ARGV);
		if($rest eq '') {
		    $rest = shift(@ARGV);
		}
		eval "\$opt_$first = \$rest;";
	    }
	    else {
		eval "\$opt_$first = 1";
		if($rest eq '') {
		    shift(@ARGV);
		}
		else {
		    $ARGV[0] = "-$rest";
		}
	    }
	}
	else {
	    print stderr "Unknown option: $first\n";
	    $opterr++;
	    if($rest ne '') {
		$ARGV[0] = "-$rest";
	    }
	    else {
		shift(@ARGV);
	    }
	}
    }
    return $opterr;
}
@EOF
set `wc -lwc <dusage.pl`
if test $1$2$3 != 379157810356
then
	echo ERROR: wc results of dusage.pl are $* should be 379 1578 10356
fi

chmod 444 dusage.pl

exit 0
--
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------