[alt.sources] directory clusters

jv@squirrel.mh.nl (Johan Vromans) (03/05/90)

I usually maintain several releases of programs on-line, and find the
following program very useful.

It inspects the contents of two directories, and links files which are
identical. This cuts down on the disk usage without losing
information.

For example: suppose you have two directories called NEW and OLD, with the
following files:
	NEW/one		OLD/one		same name, but different
	NEW/two		OLD/two		same contents
	NEW/three	OLD/four	unique

Then running lndir.pl will remove NEW/two and replace it by a link to
OLD/two. The other files will be untouched.

Multiple directories can be clustered, e.g.:

	lndir perl3.0.4 perl3.0.0
	lndir perl3.0.6 perl3.0.4
	lndir perl3.0.8 perl3.0.6
	lndir perl3.0.12 perl3.0.8

etc. Specify the new directory first.

Have fun!
Comments are welcome - send them to jv@mh.nl .

------ start of lndir.pl -- ascii -- complete ------
#!/usr/bin/perl
#
#	@(#)lndir.pl	1.4
#
# Compare two directories, and link all files which are identical.
# For a cluster of directories, specify the NEW directory first
#
# It is not recursive (yet).
#
# This program requires Perl version 3.0, patchlevel 8 or later.

# Check args && fetch names.
if ( $#ARGV != $[ + 1 
    || (!( -d ($newdir = shift(@ARGV)) && -d ($refdir = shift(@ARGV))))) {
  print STDERR "Usage: $0 new-dir ref-dir\n";
  exit 1;
}

# Get all files in the new dir. Use readdir since it is faster than
# globbing, but also because we need to include .-files.
opendir (newdir, $newdir) || die "$!: $newdir\nStopped";
@all = readdir (newdir);
closedir (newdir);

$|=1;

for $file ( @all ) {
  $ok = 1;

  # Get stat info for both files
  @stat1 = stat("$newdir/$file");
  if ($#stat1 < $[) {		#];
    printf "$newdir/$file ... $! - skipped\n";
    $ok = 0;
  }
  @stat2 = stat("$refdir/$file");
  if ($#stat2 < $[) {		#];
    printf "$refdir/$file ... $! - skipped\n";
    $ok = 0;
  }
  next unless $ok;

  printf "$newdir/$file ... ";
  if ( ! -f "$newdir/$file" ) {
    printf "not a plain file - skipped\n";
    next;
  }
  if ( ! -f "$refdir/$file" ) {
    printf "not in $refdir\n";
    next;
  }

  # Quick check on size, if equal: use cmp
  if (($stat1[7] != $stat2[7])
      || system ("cmp -s $newdir/$file $refdir/$file")) {
    printf "differ\n";
    next;
  }

  # Already linked? Compare dev/inode numbers
  if (($stat1[0] == $stat2[0]) && ($stat1[1] == $stat2[1])) {
    printf "identical to $refdir/$file\n";
    next;
  }

  # Okay, let's link
  if (unlink ("$newdir/$file") && link ("$refdir/$file", "$newdir/$file")) {
    printf "linked to $refdir/$file\n";
    next;
  }

  printf "$!\n";
}
------ end of lndir.pl -- ascii -- complete ------
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------