[comp.lang.perl] Dcmp - recursive directory comparison

lgy@phys.washington.edu (Laurence G. Yaffe) (03/13/91)

    I frequently use a recursive directory comparison program to compare
the contents of entire directory trees.  I recently decided to rewrite
my previous shell based 'dcmp' program in Perl (and of course, add various
bells and whistles).  Appended below is the result (my first serious Perl
programming).  If others find it useful, great.  Comments on Perl programming
style welcome.

    By default, this program reports on files present in one directory tree,
but absent in the other, and files present in both trees but with differing
contents, ownerships or permissions.  'Diff' may run on differing text files
to display differences.  Its output is designed to be terse but informative.
(I find the sysV "dircmp" unpleasantly verbose and insufficiently flexible.)
It's wrapped with its own man page (but not shar'ed).

Enjoy

--------------------------------------------------------------------------
Laurence G. Yaffe		Internet: lgy@newton.phys.washington.edu
University of Washington	Bitnet:   yaffe@uwaphast.bitnet
--------------------------------------------------------------------------
#!/usr/local/bin/perl
'di';
'ig00';
#
# Usage: dcmp [options] dir1 dir2 [pattern]
#
# Compare all (or selected files in) two directories trees
#
#    Options:	-r	Don't recursively descend into subdirectories.
#		-a	Don't report absent files.
#		-o	Don't report differing ownerships.
#		-p	Don't report differing permissions.
#		-t	Don't report differing file types.
#		-f	Don't follow symbolic links.  (Default is to follow.)
#		-n	Don't run 'diff' on differing text files, w/o querying.
#		-y	Do run 'diff' on differing text files, w/o querying.
#		-q	Query about running 'diff', even if stdin/out not ttys.
#		-v	Run 'verbosely'.
#
# Laurence G. Yaffe
# lgy@phys.washington.edu
# 3/12/91
#

# Check args && fetch directory names.

require "stat.pl" ;
require "getopts.pl" ;

$options = 'afnopqrtvy' ;

(&Getopts ($options) && (@ARGV >= 2))
    ||
    die "Usage: $0 [-$options] dir1 dir2 [pattern]\n";

$statcmd	= ($opt_f) ? 'lstat' : 'stat' ;
$rpt_absent	= !$opt_a ;
$rpt_owner	= !$opt_o ;
$rpt_perm	= !$opt_p ;
$rpt_type	= !$opt_t ;
$recurse	= !$opt_r ;
$diff_all	=  $opt_y ;
$verbose	=  $opt_v ;
$query		=  $opt_q || (!$opt_y && !$opt_n && (-t STDIN && -t STDOUT)) ;
$pat		= splice (@ARGV,2,1) || '^' ;

&dcmp (@ARGV) ;

sub gripe
    {
    local ($cond,$file,$left,$right) = @_ ;
    local ($lt,$rt) ;

    if (length ($file) < 20)	{ $lt  = -16 ; $rt = -19 ; }
    else			{ $lt  = -7  ; $rt =  28 ; }

    printf ("  %-7s  %-${lt}s %${rt}s  %s\n", $cond, $left, $file, $right) ;
    }

sub dcmp
    {
    local ($dir1, $dir2) = @_;
    local (@recurse,$_) ;
    undef (%files) ;

    return unless opendir(DIR1,$dir1) || !warn "  Can't open directory $dir1\n";
    return unless opendir(DIR2,$dir2) || !warn "  Can't open directory $dir2\n";

    $rem = (length ($dir1) > 37) ? 0 : 30 ;
    printf ("Comparing %-37s %-$rem.${rem}s\n", $dir1, $dir2) ;

    while ($_ = readdir (DIR1)) { $files {$_}++  ; }
    while ($_ = readdir (DIR2)) { $files {$_}+=2 ; }

    closedir (DIR1) ;
    closedir (DIR2) ;

    if ($rpt_absent)
	{
	foreach (sort grep ($files {$_} != 3 && /$pat/o, keys (%files)))
	    {
	    if ($files{$_} == 1) { &gripe ('absent:', $_, '::::::', '') ;}
	    else		 { &gripe ('absent:', $_, '', '::::::') ;}
	    }
	}
    $dir1 .= '/' ;
    $dir2 .= '/' ;

    compare:
    foreach $file (sort grep ($files {$_} == 3, keys (%files)))
	{
	next if $file eq "." ;
	next if $file eq ".." ;

	$SIG {'INT'} = 'interrupt' ;

	$flag = &comp ($dir1, $dir2, $file) ;

	push (@recurse, $file) if ($flag > 0) ;

	&interrupt if ($flag < 0) ;

	sub interrupt
	    {
	    if (-t STDIN && -t STDOUT)
		{
		print "\nInterrupt: quit, continue, ascend? " ;

		$reply = <STDIN> ;

		next if $reply =~ /^c/io ;
		@recurse = () ;
		last compare if $reply =~ /^a/io ;
		}
	    exit 1 ;
	    }
	}
    if ($recurse)
	{
	foreach $file (@recurse)
	    {
	    &dcmp ($dir1.$file, $dir2.$file, '') ;
	    }
	}
    }

sub comp
    {
    local ($dir1, $dir2, $file) = @_ ;
    local ($file1) = $dir1 . $file ;
    local ($file2) = $dir2 . $file ;

    ($type1, $mod1, $uid1, $gid1, $size1) = &filetype ($file1) ;
    ($type2, $mod2, $uid2, $gid2, $size2) = &filetype ($file2) ;

    if ($file =~ /$pat/o)
	{
	$ne = 0 ;

	$ne = &gripe ('owners:', $file, $uid1.'/'.$gid1, $uid2.'/'.$gid2)
	    if ($rpt_owner && ($uid1 != $uid2 || $gid1 != $gid2)) ;

	$ne = &gripe ('types:',$file,sprintf("%lo",$mod1),sprintf("%lo",$mod2))
	    if ($rpt_type && ($mod1 | 07777) != ($mod2 | 07777)) ;

	$ne = &gripe ('perms:',$file,sprintf("%lo",$mod1),sprintf("%lo",$mod2))
	    if ($rpt_perm && ($mod1 & 07777) != ($mod2 & 07777)
			  && ($mod1 | 07777) == ($mod2 | 07777)) ;

	if ($type1 eq 'f' && $type2 eq 'f')
	    {
	    $differ = ($size1 != $size2) || &cmp ($file1, $file2) ;

	    if ($differ)
		{
		if ($differ > 0 && ($do_diff = $diff_all) || $query)
		    {
		    if ($query)
			{
			printf ("  %-25s %s ? ", 'diff [ny]', $file) ;
			$reply = <STDIN> ;

			exit 0  if $reply =~ /^q/io ;
			$do_diff = $reply =~ /^y/io ;

			return 0 if !$do_diff ;
			}
		    }
		else { $do_diff = 0 ; }

		if ($do_diff)
		    {
		    return -1 if (system ("diff $file1 $file2") & 255) ;
		    }
		else { &gripe ('differ:', $file, '', '') ; }
		}
	    elsif ($verbose && !$ne)
		{
		&gripe ('same:', $file, '', '') ;
		}
	    return 0 ;
	    }
	}
    ($type1 eq 'd' && $type2 eq 'd') ;
    }

sub cmp
    {
    local ($FILE1, $FILE2) = @_ ;

    if    (!open FILE1) { warn "  Can't open $FILE1\n" ; }
    elsif (!open FILE2) { warn "  Can't open $FILE2\n" ; }
    else
	{
	while ()
	    {
	    $len1 = sysread (FILE1, $buf1, 16384) ;
	    $len2 = sysread (FILE2, $buf2, 16384) ;

	    if ((!defined $len1) || (!defined $len2))
		{
		next if $! =~ /^Interrupted/ ;
		warn "  System read error: $!\n" ;
		last ;
		}
	    last if ($len1 == 0 && $len2 == 0) ;
	    next if ($len1 == $len2 && $buf1 eq $buf2) ;

	    $wierd = ($buf1 =~ tr/\000\200-\255/\000\200-\255/) ;

	    return $wierd ? -1 : 1 ;
	    }
	}
    0 ;
    }

sub filetype
    {
    local ($file) = @_ ;
    local (@st) = eval "$statcmd (\$file)" ;
    local ($type) = -d _ ? 'd' :
		    -f _ ? 'f' :
		    -p _ ? 'p' :
		    -S _ ? 'S' : 's' ;

    return ($type, $st[$ST_MODE], $st[$ST_UID], $st[$ST_GID], $st[$ST_SIZE]) ;
    }
########################

.00;

'di
.nr nl 0-1
.nr % 0
'; __END__

.TH DCMP 1 "March 7, 1991"
.AT 3
.SH NAME
dcmp \- recursive directory comparison
.SH SYNOPSIS
.B dcmp [options] dir1 dir2 [pattern]
.SH DESCRIPTION
.I Dcmp
compares all (or selected files within) two directories trees.
By default,
.I dcmp
reports files which are present in one directory tree but not the other,
and files present in both trees with differing contents, permissions, or
ownerships.
When run from a terminal,
the user is queried about each differing text file,
and may choose to view the differences (produced by 'diff').
The optional
.I pattern
may be any regular expression (in the style of 'ed' or 'perl').
If given, only files with names matching the pattern are compared.

The default bahavior may be modified by the following options:
    -r	Don't recursively descend into subdirectories.
    -a	Don't report absent files.
    -o	Don't report differing ownerships.
    -p	Don't report differing permissions.
    -t	Don't report differing file types.
    -f	Don't follow symbolic links.
      	(Default is to follow symbolic links.)
    -n	Don't run 'diff' on differing text files.
    -y	Do run 'diff' on differing text files.
    -q	Query about running 'diff', even if stdin
      	or stdout are not connected to ttys.
    -v	Run 'verbosely'.  Report identical files.

.SH ENVIRONMENT
No Environment variables used.
.SH FILES
None.
.SH AUTHOR
Laurence G. Yaffe	(lgy@phys.washington.edu)
.SH "SEE ALSO"
dircmp(1)
.SH DIAGNOSTICS
Complains if files or directories are unreadable.

--
--------------------------------------------------------------------------
Laurence G. Yaffe		Internet: lgy@newton.phys.washington.edu
University of Washington	Bitnet:   yaffe@uwaphast.bitnet

rbj@uunet.UU.NET (Root Boy Jim) (03/13/91)

<lgy.668812268@newton> lgy@phys.washington.edu (Laurence G. Yaffe) writes:
>
>    I frequently use a recursive directory comparison program to compare
>the contents of entire directory trees.  I recently decided to rewrite
>my previous shell based 'dcmp' program in Perl (and of course, add various
>bells and whistles).  Appended below is the result (my first serious Perl
>programming).  If others find it useful, great.  Comments on Perl programming
>style welcome.

Funny how things tend to get invented at about the same time.
I am planning to post my version soon. It does have one
other feature, in that it removes identical versions
from one half of the tree.

>    By default, this program reports on files present in one directory tree,
>but absent in the other, and files present in both trees but with differing
>contents, ownerships or permissions.  'Diff' may run on differing text files
>to display differences.  Its output is designed to be terse but informative.
>(I find the sysV "dircmp" unpleasantly verbose and insufficiently flexible.)
>It's wrapped with its own man page (but not shar'ed).

So get GNU diff instead. It does recursive comparison nicer.

I will look at your version more thoroughly soon. However, one
thing SCREAMS for attention. DON'T FOLLOW SYMBOLIC LINKS BY DEFAULT!
This is a dangerous thing to do in any program that deals
with the filesystem as a recursive structure.
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane

lgy@phys.washington.edu (Laurence G. Yaffe) (03/13/91)

rbj@uunet.UU.NET (Root Boy Jim) writes:

>I will look at your version more thoroughly soon. However, one
>thing SCREAMS for attention. DON'T FOLLOW SYMBOLIC LINKS BY DEFAULT!
>This is a dangerous thing to do in any program that deals
>with the filesystem as a recursive structure.

    Some people like to live dangerously! :-)
    Still, your point is well taken.

--
--------------------------------------------------------------------------
Laurence G. Yaffe		Internet: lgy@newton.phys.washington.edu
University of Washington	Bitnet:   yaffe@uwaphast.bitnet