[alt.sources] simple tarfilter in perl

ronald@robobar.co.uk (Ronald S H Khoo) (01/24/91)

Archive-Name: perl-sources/mtf.pl

goer@quads.uchicago.edu (Richard L. Goerwitz) writes:

> X#  PURPOSE: Maps 15+ char. filenames in a tar archive to 14 chars.
> X#  to facilitate installation of tar'd archives on systems subject to
> X#  the System V 14-character filename limit.

[ introduces his Icon program "mtf" ]
I can't comment on that program because I don't speak Icon....

> X#  Final word of caution:  Try not to use mtf on binaries.  It cannot
> X#  possibly preserve the correct format and alignment of strings in an
> X#  executable.

Things written in perl tend to be binary friendly.  Here's a less fully
featured mtf in perl.  Mine's a simple filter, no arguments.  Report
goes to stderr, redirect with your shell to taste.

Actually, there's no reason it shouldn't have been written as a complete
tar replacement program, it wouldn't have been much bigger.   Has anyone
thought of re-implementing most of /bin in perl ?  It would make for
a much smaller system distribution kit :-)

ObPerlQuestion:  If I want to be able to map both forwards and backwards
		 key->value and value->key, is there a less memory intensive
		 way of doing it other than having two separate assoc
		 arrays (as in %map and %revmap below), which can get expensive
		 if the values are large ?

#! /usr/bin/perl
# filter a tar stream converting file path components to <= 14 chars for SysV
# bugs: doesn't preserve null padding at end, use dd if you need it :-)
# Ronald Khoo <ronald@robobar.co.uk> hacked this together because
# Richard Goerwitz <goer@sophist.uchicago.edu> posted a nice one in Icon.
# His has more features but this one is binary clean and I Can Understand It:-)
# normal usage: zcat < dist.tar.Z | this_script 2>transcript | tar xf -
# leaves the filename mapping on "transcript".
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}' # OK, so I don't use the args yet...
	if 0;

die "Get a newer Perl\n" if $] < 3.044; # is this when checksums first made it?
# open(STDERR, ">/dev/null");	# uncomment this line for silent operation.
$output = 1;			# set to zero for no output
$stop_at_null = 1; 		# set to zero when hacking at broken tarfiles.
$maxlen = 14;			# 14 is max length of Sys V R < 4 files

# people stuff all kindsa junk in $tar_the_rest, but these are enuff...
$tar_hdr = "a100a8a8a8a12a12a8a1a100a*";
$tar_name		=  0;	$tar_mode		=  1;
$tar_uid		=  2;	$tar_gid		=  3;
$tar_size		=  4;	$tar_mtime		=  5;
$tar_chksum		=  6;	$tar_linkflag		=  7;
$tar_linkname		=  8;	$tar_the_rest		=  9;

$nullblock = "\0" x 512;
$bad = $null = 0;

while (($nread = read(STDIN, $hdr, 512)) == 512) {
	@H[0..9] = unpack($tar_hdr, $hdr);
	$nhdr = pack($tar_hdr, @H[0..5], " " x 8, @H[7..9]);
	$c = sprintf( "% 6o\0 ", unpack("%16C*", $nhdr));
	($name = $H[$tar_name]) =~ s/\0.*//;
	($linkname = $H[$tar_linkname]) =~ s/\0.*//;
#	($omode = $H[$tar_mode]) =~ s/^\s+//;
#	$mode = oct($omode);
	($osize = $H[$tar_size]) =~ s/^\s+//;
	$size = oct($osize);
	if (length($name) && 0+$c == 0+$H[6]) {
		if ($bad || $null) {
			$bad && print STDERR "$bad bad + ";
			print STDERR "$null null blocks skipped.\n";
		}
		$skipping = $bad = $null = 0;
		$blocks = int($size / 512) + (($size % 512) ? 1 : 0);
		if (($newnm = &munge($name)) ne $name) {
			print STDERR "(renamed to $newnm) ";
			$nhdr = pack($tar_hdr,$newnm,@H[1..5]," " x 8,@H[7..9]);
			$c = sprintf( "% 6o\0 ", unpack("%16C*", $nhdr));
			print pack($tar_hdr, $newnm, @H[1..5], $c, @H[7..9]);
		} else
			{ print $hdr; }
		if ($blocks == 0 && $name =~ m|/$|) {
		    print STDERR "$name: is a directory\n";
		} elsif (0+$H[$tar_linkflag]) {
		    print STDERR "$name: linked to $linkname\n";
		} else {
		    print STDERR "$name: $size bytes ($blocks tar blocks)\n";
		}
# try to gain a little efficiency by doing large reads....
# 16 blks is supposedly good for BSD files, I don't have BSD but so what :-)
		while ($blocks > 16) {
			$blocks -= 16;
			read(STDIN, $hdr, 8192)==8192 || die "Premature EOF\n";
			print $hdr if $output;
		}
		for (1..$blocks) {
			read(STDIN, $hdr, 512) == 512 || die "Premature EOF\n";
			print $hdr if $output;
		}
	} else {
		$isnull = ($hdr eq $nullblock);
		print STDERR "Skipping ... " if (! $isnull && $skipping++ == 0);
		$isnull ? ($stop_at_null? &quit: $null++): $bad++;
		print $hdr if $output;
	}
}
$bad && print STDERR "$bad bad + ";
($bad || $null) && print STDERR "$null null blocks skipped at the end.\n";
$nread && print STDERR "Partial block ($nread) bytes ignored at the end.\n";
exit 1;

sub quit { print $nullblock x 2 if $output; exit 0; }

sub munge { # munge a whole path
	local($", $orig, $head, $tail, @out) = ("/", @_);
	$head = (substr($orig, 0, 1) eq "/")? "/": "";
	$tail = (substr($orig, -1) eq "/")? "/": "";
	@in = split('/', $orig);
	while (defined($next = shift @in)) {
		next unless length($next);
		push(@out, (length($next) > $maxlen) ? &cmunge($next) : $next);
	}
	return $head . "@out" . $tail;
}

sub cmunge { # munge one component of a path
	local($aa, $name, $trunc, $suff) = ("00", @_);
	return $map{$name} if $map{$name};
	if (substr($name, -2, 1) eq ".") {
		$trunc = substr($name, 0, $maxlen - 4);
		$suff =  substr($name, -2);
	} else {
		$trunc = substr($name, 0, $maxlen - 2);
		$suff = "";
	}
	$aa++ while ($revmap{"$trunc$aa$suff"});
	$revmap{"$trunc$aa$suff"} = $name;
	$map{$name} = "$trunc$aa$suff";
}
__END__
Just another Perl Wannabe,
-- 
Ronald Khoo <ronald@robobar.co.uk> +44 81 991 1142 (O) +44 71 229 7741 (H)