[comp.lang.perl] perl tar file routines

eichin@athena.mit.edu (Mark W. Eichin) (07/16/90)

Gee, if I had posted this a few weeks ago when I wrote it, you might have been
saved some time. I guess it's time to inflict it upon the world of Perl...

Story: SIMTEL20 is far away from almost *everything* on the network. I
typically get connections at around 4800 *baud* from MIT via NEARNET, BBN, and
what is left of MILNET... I decided I wanted a full set of the CPM archives
locally. The first problem was feeding ftp a list of names and having it get
the files; the second was not having enough space around to *hold* a full
tape's worth of data (45Meg cartridge.)

The first problem was solved admirably by Khun Yee <clipper@csd.uwo.ca> whose
"nftp.pl" you can find from around 31 May 1990. I hacked it into a library.
The second problem was solved by writing some perl code to take a string and
write it out to a file as a tar record, so I could slurp the files over from
SIMTEL20 and pushing them out directly to tape.

Following is "perltar.perl", with the appropriate subroutines. (just 
	do 'perltar.perl'; 
and then use them. Comments on my subroutine style are welcomed; is there a
better way to make perl code *modular* and have rudimentary abstraction?)

When writing, to write an EOF at the end of the tape, remember to do a 
	print pack("x8192",1);
before you close your output. (I had trouble with print pack("x8192"); should
I have?)
			Mark Eichin
			<eichin@athena.mit.edu>
			MIT Student Information Processing Board

# perltar.perl
# Perl functions for dealing with tar files
#
# write tar header (given name)
# write file

sub TBLOCK {512;}
sub NAMSIZ {100;}
	
### $chksum =~ s/^\s+//;

# int tar_oct(string) 
# takes: an octal string w/leading spaces
# gives: integer value
sub tar_oct {
	local($ll)=$_[0];
	oct((@tmp=split(" ",$ll))[0]);
}

# string hum_time(int)
# takes: integer time value (UNIX seconds since EPOCH)
# gives: "human readable" time (the way tar does it.)
sub hum_time { 
	local($mt)=$_[0];
	local($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
		localtime($mt);
        sprintf("%s %2d %02d:%2d %4d",
		("Jan","Feb","Mar","Apr","May","Jun",
		 "Jul","Aug","Sep","Oct","Nov","Dec")[$mon],
		$mday,$hour,$min,$year+1900);
}

# int sum_chars(string)
# takes: a string of characters
# gives: integer sum of ASCII values of the characters
sub sum_chars {
	local($chrs)=$_[0];
	local($len)=length($chrs);
	local(@bytes)=unpack("C$len",$chrs);
	local($i,$cnt); $cnt=0;
	foreach $i (@bytes) { $cnt += $i; }
	$cnt;
}


# status,tar_struct read_tar_header(string)
# takes: 512 byte tar block
# gives: status={1:error,0:ok},tar data structure=
#        ($name,$mode,$uid,$gid,$size,$mtime,$chksum,$linkflag,$linkname)
sub read_tar_header {
	local($tar_block)=$_[0];
	local($tar_header)='A100A8A8A8A12A12a8A1A100';
	local(@line)=unpack($tar_header,$tar_block);
	local($zblock) = $tar_block; # unpack('A257',$tar_block);
	local($ckver)=&sum_chars($zblock)-&sum_chars($line[6])
		+&sum_chars(" " x 8);
	local($nchksum)=&tar_oct($line[6]);
	return (1, "directory checksum error $ckver != $nchksum <$line[7]>")
		if($ckver != $nchksum);
	for(1..6) { $line[$_] = &tar_oct($line[$_]); }
	(0, @line);
}

# string build_tar_header(tar_struct)
# takes: tar data structure (see above)
# gives: a string suitable for writing to a tar file.
sub build_tar_header {
	local($name,$mode,$uid,$gid,$size,$mtime,$chksum,$linkflag,$linkname)
		=@_;
	local($tar_header)='a100a8a8a8a12a12a8aa100x255';
	local($zblock)=pack($tar_header,$name,
				sprintf("%6o ",$mode),
				sprintf("%6o ",$uid),
				sprintf("%6o ",$gid),
				sprintf("%11o ",$size),
				sprintf("%11o ",$mtime),
				" " x 8,
				$linkflag, $linkname);

	$chksum = &sum_chars($zblock);
#DBG#	print "cks: <$chksum>\n";
	substr($zblock,100+8+8+8+12+12,6)=sprintf("%6o",$chksum);
	$zblock;
}

# string mode_string(int)
# takes: UNIX mode bits
# gives: rwxrwxrwx pattern
sub mode_string {
	local($bits)=$_[0];
	local(@flags)=('r','w','x','r','w','x','r','w','x');
	local($i);
	for($i=8;$i>=0;$i--) {
		$flags[$i]='-' unless($bits & 1);
		$bits >>= 1;
	}
	$flags[8]='t' if($bits & 1);
	$flags[5]='s' if($bits & 2);
	$flags[2]='s' if($bits & 4);
	pack("aaaaaaaaa",@flags);
}

# void bigprint(tar_struct)
sub bigprint {
	local($name,$mode,$uid,$gid,$size,$mtime,$chksum,$linkflag,$linkname)
		= @_;
	print "block: <$hblock>\n";
	print "Name: <$name>\n";
	$modestr = &mode_string($mode); print "mode: <$modestr>\n";
	print "uid:  <$uid>\n";
	print "gid:  <$gid>\n";
	print "size: <$size>\n";
	$htime = &hum_time($mtime); print "mtime: <$mtime> ($htime)\n";
	print "chksum: <$nchksum>\n";
	print "linkflag: <$linkflag>\n";
	print "linkname: <$linkname>\n";
}

# void tar_print(tar_struct)
sub tar_print {
	local($name,$mode,$uid,$gid,$size,$mtime,$chksum,$linkflag,$linkname)
		= @_;
	$modestr = &mode_string($mode); 
	$htime = &hum_time($mtime);
###	print "$modestr$uid/$gid $size $htime $name\n";
	if($linkflag == 0) {
		print sprintf("%s%3d/%d %6d %s %s\n",
			      $modestr,$uid,$gid,$size,$htime,$name);
	} elsif($linkflag == '1') {
		print sprintf("%s%3d/%d %6d %s %s linked to %s\n",
			      $modestr,$uid,$gid,$size,$htime,$name,$linkname);
	} elsif($linkflag == '2') {
		print sprintf("%s%3d/%d %6d %s %s symbolic link to %s\n",
			      $modestr,$uid,$gid,$size,$htime,$name,$linkname);
	}		
}

# void skip_recs(FILE,int)
# takes: file descriptor, size
sub skip_recs {
	local($FILE,$size)=@_;
	local($dump,$rlen);
#DBG#	print "looking for $size bytes in $FILE\n";
	while($size>0) {
		$rlen=read($FILE,$dump,&TBLOCK);
		if($rlen!=&TBLOCK) {
			die "short read $rlen (skip)";
		}
		$size -= &TBLOCK;
#DBG#		print "#";
	}
#DBG#	print "\n";
}

# open_tar_file(string FILE, string name, int len)
# simple hook for saving out files as tarfiles
# takes: filehandle to put tarfile to, name of file and length it will be
# gives: nothing.
sub open_tar_file {
	local($FILE,$name,$len)=@_;
	print $FILE (&build_tar_header($name,0644,$>,$),$len,time,0,"",""));
}

# tar_stuff(string INFILE, string OUTFILE, string NAME)
# takes: input from INFILE until EOF
# gives: a tar image of it on OUTFILE
# first lazy version: just suck it all into memory...
sub tar_stuff {
	local($INFILE,$OUTFILE,$NAME) = @_;
	local($buf,$contents);
	while (read($INFILE,$buf,1024)>0) {
	    $contents .= $buf;
	}
	&open_tar_file($OUTFILE,$NAME,length($contents));
	print $OUTFILE $contents;
	$fixit = 512-(length($contents) % 512);
	print $OUTFILE pack("x$fixit",1) unless ($fixit == 512);
}

# tar_mem(string data, string OUTFILE, string NAME)
# takes: input from data
# gives: a tar image of it on OUTFILE
# first lazy version: just suck it all into memory...
sub tar_mem {
	local($contents,$OUTFILE,$NAME) = @_;
	&open_tar_file($OUTFILE,$NAME,length($contents));
	print $OUTFILE $contents;
	$fixit = 512-(length($contents) % 512);
	print $OUTFILE pack("x$fixit",1) unless ($fixit == 512);
}
1;
# end of perltar.perl
			Mark Eichin
			<eichin@athena.mit.edu>
			MIT Student Information Processing Board

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (07/17/90)

In article <1990Jul15.235710.14293@uvaarpa.Virginia.EDU> eichin@athena.mit.edu writes:
: Following is "perltar.perl", with the appropriate subroutines. (just 
: 	do 'perltar.perl'; 
: and then use them. Comments on my subroutine style are welcomed; is there a
: better way to make perl code *modular* and have rudimentary abstraction?)

Yes, consider using the "package" declaration, which lets you have a separate
namespace for your code.  It basically gives you the ability to differentiate
between public and private names.  Rudimentary, but nice.

: When writing, to write an EOF at the end of the tape, remember to do a 
: 	print pack("x8192",1);
: before you close your output. (I had trouble with print pack("x8192"); should
: I have?)

Hmm, well, maybe.  Packing nothing to get a null string is probably not the
most efficient way.  Nowadays "\0" x 8192 is probably more efficient.  But
if you do use pack that way, since the second argument is a LIST, you can
say pack("x8192",()) and pack a null list, which is more satisfactory
in some indefinite way.

Larry