[comp.lang.perl] simple tarfilter in perl

ronald@robobar.co.uk (Ronald S H Khoo) (01/24/91)

Archive-Name: perl-sources/mtf.pl

goer@quads.uchicago.edu (Richard L. Goerwitz) writes:

> X#  PURPOSE: Maps 15+ char. filenames in a tar archive to 14 chars.
> X#  to facilitate installation of tar'd archives on systems subject to
> X#  the System V 14-character filename limit.

[ introduces his Icon program "mtf" ]
I can't comment on that program because I don't speak Icon....

> X#  Final word of caution:  Try not to use mtf on binaries.  It cannot
> X#  possibly preserve the correct format and alignment of strings in an
> X#  executable.

Things written in perl tend to be binary friendly.  Here's a less fully
featured mtf in perl.  Mine's a simple filter, no arguments.  Report
goes to stderr, redirect with your shell to taste.

Actually, there's no reason it shouldn't have been written as a complete
tar replacement program, it wouldn't have been much bigger.   Has anyone
thought of re-implementing most of /bin in perl ?  It would make for
a much smaller system distribution kit :-)

ObPerlQuestion:  If I want to be able to map both forwards and backwards
		 key->value and value->key, is there a less memory intensive
		 way of doing it other than having two separate assoc
		 arrays (as in %map and %revmap below), which can get expensive
		 if the values are large ?

#! /usr/bin/perl
# filter a tar stream converting file path components to <= 14 chars for SysV
# bugs: doesn't preserve null padding at end, use dd if you need it :-)
# Ronald Khoo <ronald@robobar.co.uk> hacked this together because
# Richard Goerwitz <goer@sophist.uchicago.edu> posted a nice one in Icon.
# His has more features but this one is binary clean and I Can Understand It:-)
# normal usage: zcat < dist.tar.Z | this_script 2>transcript | tar xf -
# leaves the filename mapping on "transcript".
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}' # OK, so I don't use the args yet...
	if 0;

die "Get a newer Perl\n" if $] < 3.044; # is this when checksums first made it?
# open(STDERR, ">/dev/null");	# uncomment this line for silent operation.
$output = 1;			# set to zero for no output
$stop_at_null = 1; 		# set to zero when hacking at broken tarfiles.
$maxlen = 14;			# 14 is max length of Sys V R < 4 files

# people stuff all kindsa junk in $tar_the_rest, but these are enuff...
$tar_hdr = "a100a8a8a8a12a12a8a1a100a*";
$tar_name		=  0;	$tar_mode		=  1;
$tar_uid		=  2;	$tar_gid		=  3;
$tar_size		=  4;	$tar_mtime		=  5;
$tar_chksum		=  6;	$tar_linkflag		=  7;
$tar_linkname		=  8;	$tar_the_rest		=  9;

$nullblock = "\0" x 512;
$bad = $null = 0;

while (($nread = read(STDIN, $hdr, 512)) == 512) {
	@H[0..9] = unpack($tar_hdr, $hdr);
	$nhdr = pack($tar_hdr, @H[0..5], " " x 8, @H[7..9]);
	$c = sprintf( "% 6o\0 ", unpack("%16C*", $nhdr));
	($name = $H[$tar_name]) =~ s/\0.*//;
	($linkname = $H[$tar_linkname]) =~ s/\0.*//;
#	($omode = $H[$tar_mode]) =~ s/^\s+//;
#	$mode = oct($omode);
	($osize = $H[$tar_size]) =~ s/^\s+//;
	$size = oct($osize);
	if (length($name) && 0+$c == 0+$H[6]) {
		if ($bad || $null) {
			$bad && print STDERR "$bad bad + ";
			print STDERR "$null null blocks skipped.\n";
		}
		$skipping = $bad = $null = 0;
		$blocks = int($size / 512) + (($size % 512) ? 1 : 0);
		if (($newnm = &munge($name)) ne $name) {
			print STDERR "(renamed to $newnm) ";
			$nhdr = pack($tar_hdr,$newnm,@H[1..5]," " x 8,@H[7..9]);
			$c = sprintf( "% 6o\0 ", unpack("%16C*", $nhdr));
			print pack($tar_hdr, $newnm, @H[1..5], $c, @H[7..9]);
		} else
			{ print $hdr; }
		if ($blocks == 0 && $name =~ m|/$|) {
		    print STDERR "$name: is a directory\n";
		} elsif (0+$H[$tar_linkflag]) {
		    print STDERR "$name: linked to $linkname\n";
		} else {
		    print STDERR "$name: $size bytes ($blocks tar blocks)\n";
		}
# try to gain a little efficiency by doing large reads....
# 16 blks is supposedly good for BSD files, I don't have BSD but so what :-)
		while ($blocks > 16) {
			$blocks -= 16;
			read(STDIN, $hdr, 8192)==8192 || die "Premature EOF\n";
			print $hdr if $output;
		}
		for (1..$blocks) {
			read(STDIN, $hdr, 512) == 512 || die "Premature EOF\n";
			print $hdr if $output;
		}
	} else {
		$isnull = ($hdr eq $nullblock);
		print STDERR "Skipping ... " if (! $isnull && $skipping++ == 0);
		$isnull ? ($stop_at_null? &quit: $null++): $bad++;
		print $hdr if $output;
	}
}
$bad && print STDERR "$bad bad + ";
($bad || $null) && print STDERR "$null null blocks skipped at the end.\n";
$nread && print STDERR "Partial block ($nread) bytes ignored at the end.\n";
exit 1;

sub quit { print $nullblock x 2 if $output; exit 0; }

sub munge { # munge a whole path
	local($", $orig, $head, $tail, @out) = ("/", @_);
	$head = (substr($orig, 0, 1) eq "/")? "/": "";
	$tail = (substr($orig, -1) eq "/")? "/": "";
	@in = split('/', $orig);
	while (defined($next = shift @in)) {
		next unless length($next);
		push(@out, (length($next) > $maxlen) ? &cmunge($next) : $next);
	}
	return $head . "@out" . $tail;
}

sub cmunge { # munge one component of a path
	local($aa, $name, $trunc, $suff) = ("00", @_);
	return $map{$name} if $map{$name};
	if (substr($name, -2, 1) eq ".") {
		$trunc = substr($name, 0, $maxlen - 4);
		$suff =  substr($name, -2);
	} else {
		$trunc = substr($name, 0, $maxlen - 2);
		$suff = "";
	}
	$aa++ while ($revmap{"$trunc$aa$suff"});
	$revmap{"$trunc$aa$suff"} = $name;
	$map{$name} = "$trunc$aa$suff";
}
__END__
Just another Perl Wannabe,
-- 
Ronald Khoo <ronald@robobar.co.uk> +44 81 991 1142 (O) +44 71 229 7741 (H)

aem@mthvax.cs.miami.edu (a.e.mossberg) (01/24/91)

In <1991Jan23.175822.24238@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes:

>Actually, there's no reason it shouldn't have been written as a complete
>tar replacement program, it wouldn't have been much bigger.   Has anyone
>thought of re-implementing most of /bin in perl ?  It would make for
>a much smaller system distribution kit :-)

Hmmm.. perhaps the kernel should have the perl interpreter built-in?

da de daaaa! Perlix!

heh heh aem
-- 
aem@mthvax.cs.miami.edu .......................................................
Out of life comes death and out of death life, out of the young the old,
and out of the old the young, out of waking sleep and out of sleep waking, 
the stream of creation and dissolution never stops.	- Heraclitus

chris@utgard.uucp (Chris Anderson) (01/25/91)

In article <1991Jan24.134708.24086@mthvax.cs.miami.edu> aem@mthvax.cs.miami.edu writes:
>Hmmm.. perhaps the kernel should have the perl interpreter built-in?
>
>da de daaaa! Perlix!

Naah.  Just have /vmunix.pl.

Chris
-- 
+---------------------------------------------------------------+
|  Chris Anderson, QMA, Inc.  utgard!chris@csusac.ecs.csus.edu  |
|      My employer doesn't listen to me... why should you?      |
+---------------------------------------------------------------+

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (01/25/91)

As quoted from <1991Jan23.175822.24238@robobar.co.uk> by ronald@robobar.co.uk (Ronald S H Khoo):
+---------------
| Actually, there's no reason it shouldn't have been written as a complete
| tar replacement program, it wouldn't have been much bigger.   Has anyone
+---------------

In fact, I have a set of routines, based in concept but not in code, on
perltar.pl as posted to comp.lang.perl some time ago.  It can read archives
and dynamically determines whether they are tar, cpio, ar, xar (Xenix/PDP-11
ar), and it can write all of those formats as well.  I haven't posted them
because I'm still making sure they're fully operational, but I can post them
if anyone wants them.  Adding name fixing and/or resynchronization should be
relatively easy; adding more archive formats is also fairly simple, thanks to
the fact that it's in Perl.  ;-)

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY