[comp.sources.misc] v16i085: mtf - Map tar filenames, Part01/02

goer@midway.uchicago.edu (Richard L. Goerwitz) (01/29/91)

Submitted-by: goer@midway.uchicago.edu (Richard L. Goerwitz)
Posting-number: Volume 16, Issue 85
Archive-name: mtf/part01

Tar archives often come packed with filenames longer than 15 chars,
and with source code that requires that the filenames be fully pre-
served.  This utility, mtf, runs through the tar headers, finds all
overlong filenames, renames them, renames them in any text files it
finds, and then rewrites the tar header checksums.

-Richard

---- Cut Here and feed the following to sh ----
#!/bin/sh
# This is a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 01/20/1991 23:34 UTC by goer@sophist.uchicago.edu
# Source directory /u/richard/Mtf
#
# existing files will NOT be overwritten unless -c is specified
# This format requires very little intelligence at unshar time.
# "if test", "cat", "rm", "echo", "true", and "sed" may be needed.
#
# This is part 1 of a multipart archive                                    
# do not concatenate these parts, unpack them in order with /bin/sh        
#
# This shar contains:
# length  mode       name
# ------ ---------- ------------------------------------------
#  16721 -r--r--r-- mtf.icn
#   3341 -rw-r--r-- README
#    659 -rw-r--r-- Makefile.dist
#
if test -r _shar_seq_.tmp; then
	echo 'Must unpack archives in sequence!'
	echo Please unpack part `cat _shar_seq_.tmp` next
	exit 1
fi
# ============= mtf.icn ==============
if test -f 'mtf.icn' -a X"$1" != X"-c"; then
	echo 'x - skipping mtf.icn (File already exists)'
	rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting mtf.icn (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'mtf.icn' &&
X#############################################################################
X#
X#	NAME:	mtf3.icn
X#
X#	TITLE:	map tar file
X#
X#	AUTHOR:	Richard Goerwitz
X#
X#	VERSION: 3.3
X#
X#############################################################################
X#
X#  This and future versions of mtf are hereby placed in the public domain -RLG
X#
X#############################################################################
X#
X#  PURPOSE: Maps 15+ char. filenames in a tar archive to 14 chars.
X#  Handles both header blocks and the archive itself.  Mtf is intended
X#  to facilitate installation of tar'd archives on systems subject to
X#  the System V 14-character filename limit.
X#
X#  USAGE:  mtf inputfile [-r reportfile] [-e .extensions] [-x exceptions]
X#
X#  "Inputfile" is a tar archive.  "Reportfile" is file containing a
X#  list of files already mapped by mtf in a previous run (used to
X#  avoid clashes with filenames in use outside the current archive).
X#  The -e switch precedes a list of filename .extensions which mtf is
X#  supposed to leave unscathed by the mapping process
X#  (single-character extensions such as .c and .o are automatically
X#  preserved; -e allows the user to specify additional extensions,
X#  such as .pxl, .cpi, and .icn).  The final switch, -x, precedes a
X#  list of strings which should not be mapped at all.  Use this switch
X#  if, say, you have a C file with a structure.field combination such
X#  as "thisisveryverybig.hashptr" in an archive that contains a file
X#  called "thisisveryverybig.h," and you want to avoid mapping that
X#  portion of the struct name which matches the name of the overlong
X#  file (to wit, "mtf inputfile -x thisisveryverybig.hashptr").  To
X#  prevent mapping of any string (including overlong filenames) begin-
X#  ning, say, with "thisisvery," use "mtf inputfile -x thisisvery."
X#  Be careful with this option, or you might end up defeating the
X#  whole point of using mtf in the first place.
X#
X#  OUTPUT FORMAT:  Mtf writes a mapped tar archive to the stdout.
X#  When finished, it leaves a file called "map.report" in the current
X#  directory which records what filenames were mapped and how.  Rename
X#  and save this file, and use it as the "reportfile" argument to any
X#  subsequent runs of mtf in this same directory.  Even if you don't
X#  plan to run mtf again, this file should still be examined, just to
X#  be sure that the new filenames are acceptable, and to see if
X#  perhaps additional .extensions and/or exceptions should be
X#  specified.
X#
X#  BUGS:  Mtf only maps filenames found in the main tar headers.
X#  Because of this, mtf cannot accept nested tar archives.  If you try
X#  to map a tar archive within a tar file, mtf will abort with a nasty
X#  message about screwing up your files.  Please note that, unless you
X#  give mtf a "reportfile" to consider, it knows nothing about files
X#  existing outside the archive.  Hence, if an input archive refers to
X#  an overlong filename in another archive, mtf naturally will not
X#  know to shorten it.  Mtf will, in fact, have no way of knowing that
X#  it is a filename, and not, say, an identifier in a C program.
X#  Final word of caution:  Try not to use mtf on binaries.  It cannot
X#  possibly preserve the correct format and alignment of strings in an
X#  executable.  Same goes for compressed files.  Mtf can't map
X#  filenames that it can't read!
X#
X####################################################################
X
X
Xglobal filenametbl, chunkset, short_chunkset   # see procedure mappiece(s)
Xglobal extensions, no_nos                      # ditto
X
Xrecord hblock(name,junk,size,mtime,chksum,     # tar header struct;
X              linkflag,linkname,therest)       # see readtarhdr(s)
X
X
Xprocedure main(a)
X
X    usage := "usage:  mtf inputfile [-r reportfile] " ||
X	     "[-e .extensions] [-x exceptions]"
X
X    *a = 0 & stop(usage)
X
X    intext := open_input_file(a[1]) & pop(a)
X
X    i := 0
X    extensions := []; no_nos := []
X    while (i +:= 1) <= *a do {
X	case a[i] of {
X	    "-r"    :    readin_old_map_report(a[i+:=1])
X	    "-e"    :    current_list := extensions
X	    "-x"    :    current_list := no_nos
X	    default :    put(current_list,a[i])
X	}
X    }
X
X    every !extensions ?:= (=".", tab(0))
X    
X    # Run through all the headers in the input file, filling
X    # (global) filenametbl with the names of overlong files;
X    # make_table_of_filenames fails if there are no such files.
X    make_table_of_filenames(intext) | {
X	write(&errout,"mtf:  no overlong path names to map") 
X	a[1] ? (tab(find(".tar")+4), pos(0)) |
X	  write(&errout,"(Is ",a[1]," even a tar archive?)")
X 	exit(1)
X    } 
X
X    # Now that a table of overlong filenames exists, go back
X    # through the text, remapping all occurrences of these names
X    # to new, 14-char values; also, reset header checksums, and
X    # reformat text into correctly padded 512-byte blocks.  Ter-
X    # minate output with 512 nulls.
X    seek(intext,1)
X    every writes(output_mapped_headers_and_texts(intext))
X
X    close(intext)
X    write_report()   # Record mapped file and dir names for future ref.
X    exit(0)
X    
Xend
X
X
X
Xprocedure open_input_file(s)
X    intext := open("" ~== s,"r") |
X	stop("mtf:  can't open ",s)
X    find("UNIX",&features) |
X	stop("mtf:  I'm not tested on non-Unix systems.")
X    s[-2:0] == ".Z" &
X        stop("mtf:  sorry, can't accept compressed files")
X    return intext
Xend
X
X
X
Xprocedure readin_old_map_report(s)
X
X    initial {
X	filenametbl := table()
X	chunkset := set()
X	short_chunkset := set()
X    }
X
X    mapfile := open_input_file(s)
X    while line := read(mapfile) do {
X	line ? {	
X	    if chunk := tab(many(~' \t')) & tab(upto(~' \t')) &
X		lchunk := move(14) & pos(0) then {
X		filenametbl[chunk] := lchunk
X		insert(chunkset,chunk)
X		insert(short_chunkset,chunk[1:16])
X	    }
X	if /chunk | /lchunk
X	then stop("mtf:  report file, ",s," seems mangled.")
X	}
X    }
X
Xend
X
X
X
Xprocedure make_table_of_filenames(intext)
X
X    local header # chunkset is global
X
X    # search headers for overlong filenames; for now
X    # ignore everything else
X    while header := readtarhdr(reads(intext,512)) do {
X	# tab upto the next header block
X	tab_nxt_hdr(intext,trim_str(header.size),1)
X	# record overlong filenames in several global tables, sets
X	fixpath(trim_str(header.name))
X    }
X    *\chunkset ~= 0 | fail
X    return &null
X
Xend
X
X
X
Xprocedure output_mapped_headers_and_texts(intext)
X
X    # Remember that filenametbl, chunkset, and short_chunkset
X    # (which are used by various procedures below) are global.
X    local header, newtext, full_block, block, lastblock
X
X    # Read in headers, one at a time.
X    while header := readtarhdr(reads(intext,512)) do {
X
X	# Replace overlong filenames with shorter ones, according to
X	# the conversions specified in the global hash table filenametbl
X	# (which were generated by fixpath() on the first pass).
X      	header.name := left(map_filenams(header.name),100,"\x00")
X	header.linkname := left(map_filenams(header.linkname),100,"\x00")
X
X	# Use header.size field to determine the size of the subsequent text.
X	# Read in the text as one string.  Map overlong filenames found in it
X 	# to shorter names as specified in the global hash table filenamtbl.
X	newtext := map_filenams(tab_nxt_hdr(intext,trim_str(header.size)))
X
X	# Now, find the length of newtext, and insert it into the size field.
X	header.size := right(exbase10(*newtext,8) || " ",12," ")
X
X	# Calculate the checksum of the newly retouched header.
X	header.chksum := right(exbase10(get_checksum(header),8)||"\x00 ",8," ")
X
X	# Finally, join all the header fields into a new block and write it out
X	full_block := ""; every full_block ||:= !header
X	suspend left(full_block,512,"\x00")
X
X	# Now we're ready to write out the text, padding the final block
X	# out to an even 512 bytes if necessary; the next header must start
X	# right at the beginning of a 512-byte block.
X	newtext ? {
X	    while block := move(512)
X	    do suspend block
X	    pos(0) & next
X            lastblock := left(tab(0),512,"\x00")
X	    suspend lastblock
X	}
X    }
X    # Write out a final null-filled block.  Some tar programs will write
X    # out 1024 nulls at the end.  Dunno why.
X    return repl("\x00",512)
X
Xend
X
X
X
Xprocedure trim_str(s)
X
X    # Knock out spaces, nulls from those crazy tar header
X    # block fields (some of which end in a space and a null,
X    # some just a space, and some just a null [anyone know
X    # why?]).
X    return s ? {
X	(tab(many(' ')) | &null) &
X	    trim(tab(find("\x00")|0))
X    } \ 1
X
Xend 
X
X
X
Xprocedure tab_nxt_hdr(f,size_str,firstpass)
X
X    # Tab upto the next header block.  Return the bypassed text
X    # as a string if not the first pass.
X
X    local hs, next_header_offset
X
X    hs := integer("8r" || size_str)
X    next_header_offset := (hs / 512) * 512
X    hs % 512 ~= 0 & next_header_offset +:= 512
X    if 0 = next_header_offset then return ""
X    else {
X	# if this is pass no. 1 don't bother returning a value; we're
X	# just collecting long filenames;
X	if \firstpass then {
X	    seek(f,where(f)+next_header_offset)
X	    return
X	}
X	else {
X	    return reads(f,next_header_offset)[1:hs+1] |
X		stop("mtf:  error reading in ",
X		     string(next_header_offset)," bytes.")
X	}
X    }
X
Xend
X
X
X
Xprocedure fixpath(s)
X
X    # Fixpath is a misnomer of sorts, since it is used on
X    # the first pass only, and merely examines each filename
X    # in a path, using the procedure mappiece to record any
X    # overlong ones in the global table filenametbl and in
X    # the global sets chunkset and short_chunkset; no fixing
X    # is actually done here.
X
X    s2 := ""
X    s ? {
X	while piece := tab(find("/")+1)
X	do s2 ||:= mappiece(piece) 
X	s2 ||:= mappiece(tab(0))
X    }
X    return s2
X
Xend
X
X
X
Xprocedure mappiece(s)
X
X    # Check s (the name of a file or dir as recorded in the tar header
X    # being examined) to see if it is over 14 chars long.  If so,
X    # generate a unique 14-char version of the name, and store
X    # both values in the global hashtable filenametbl.  Also store
X    # the original (overlong) file name in chunkset.  Store the
X    # first fifteen chars of the original file name in short_chunkset.
X    # Sorry about all of the tables and sets.  It actually makes for
X    # a reasonably efficient program.  Doing away with both sets,
X    # while possible, causes a tenfold drop in execution speed!
X    
X    # global filenametbl, chunkset, short_chunkset, extensions
X    local j, ending
X
X    initial {
X	/filenametbl := table()
X	/chunkset := set()
X	/short_chunkset := set()
X    }
X   
X    chunk := trim(s,'/')
X    if chunk ? (tab(find(".tar")+4), pos(0)) then {
X	write(&errout, "mtf:  Sorry, I can't let you do this.\n",
X	               "      You've nested a tar archive within\n",
X	               "      another tar archive, which makes it\n",
X	               "      likely I'll f your filenames ubar.")
X	exit(2)
X    }
X    if *chunk > 14 then {
X	i := 0
X
X	if /filenametbl[chunk] then {
X	# if we have not seen this file, then...
X	    repeat {
X		# ...find a new unique 14-character name for it;
X		# preserve important suffixes like ".Z," ".c," etc.
X		# First, check to see if the original filename (chunk)
X		# ends in an important extension...
X		if chunk ?
X		    (tab(find(".")),
X		     ending := move(1) || tab(match(!extensions)|any(&ascii)),
X		     pos(0)
X		     )
X		# ...If so, then leave the extension alone; mess with the
X		# middle part of the filename (e.g. file.with.extension.c ->
X		# file.with001.c).
X		then {
X		    j := (15 - *ending - 3)
X		    lchunk:= chunk[1:j] || right(string(i+:=1),3,"0") || ending
X		}
X		# If no important extension is present, then reformat the
X		# end of the file (e.g. too.long.file.name -> too.long.fi01).
X		else lchunk := chunk[1:13] || right(string(i+:=1),2,"0")
X
X		# If the resulting shorter file name has already been used...
X		if lchunk == !filenametbl
X		# ...then go back and find another (i.e. increment i & try
X		# again; else break from the repeat loop, and...
X		then next else break
X	    }
X            # ...record both the old filename (chunk) and its new,
X	    # mapped name (lchunk) in filenametbl.  Also record the
X	    # mapped names in chunkset and short_chunkset.
X	    filenametbl[chunk] := lchunk
X	    insert(chunkset,chunk)
X	    insert(short_chunkset,chunk[1:16])
X	}
X    }
X
X    # If the filename is overlong, return lchunk (the shortened
X    # name), else return the original name (chunk).  If the name,
X    # as passed to the current function, contained a trailing /
X    # (i.e. if s[-1]=="/"), then put the / back.  This could be
X    # done more elegantly.
X    return (\lchunk | chunk) || ((s[-1] == "/") | "")
X
Xend
X
X
X
Xprocedure readtarhdr(s)
X
X    # Read the silly tar header into a record.  Note that, as was
X    # complained about above, some of the fields end in a null, some
X    # in a space, and some in a space and a null.  The procedure
X    # trim_str() may (and in fact often _is_) used to remove this
X    # extra garbage.
X
X    this_block := hblock()
X    s ? {
X	this_block.name     := move(100)    # <- to be looked at later
X	this_block.junk     := move(8+8+8)  # skip the permissions, uid, etc.
X	this_block.size     := move(12)     # <- to be looked at later
X	this_block.mtime    := move(12)
X	this_block.chksum   := move(8)      # <- to be looked at later
X	this_block.linkflag := move(1)
X	this_block.linkname := move(100)    # <- to be looked at later
X	this_block.therest  := tab(0)
X    }
X    integer(this_block.size) | fail  # If it's not an integer, we've hit
X                                     # the final (null-filled) block.
X    return this_block
X
Xend
X
X
X
Xprocedure map_filenams(s)
X
X    # Chunkset is global, and contains all the overlong filenames
X    # found in the first pass through the input file; here the aim
X    # is to map these filenames to the shortened variants as stored
X    # in filenametbl (GLOBAL).
X
X    local s2, tmp_chunk_tbl, tmp_lst
X    static new_chunklist
X    initial {
X
X        # Make sure filenames are sorted, longest first.  Say we
X        # have a file called long_file_name_here.1 and one called
X        # long_file_name_here.1a.  We want to check for the longer
X        # one first.  Otherwise the portion of the second file which
X        # matches the first file will get remapped.
X        tmp_chunk_tbl := table()
X        every el := !chunkset
X        do insert(tmp_chunk_tbl,el,*el)
X        tmp_lst := sort(tmp_chunk_tbl,4)
X        new_chunklist := list()
X        every put(new_chunklist,tmp_lst[*tmp_lst-1 to 1 by -2])
X
X    }
X
X    s2 := ""
X    s ? {
X	until pos(0) do {
X	    # first narrow the possibilities, using short_chunkset
X	    if member(short_chunkset,&subject[&pos:&pos+15])
X            # then try to map from a long to a shorter 14-char filename
X	    then {
X		if match(ch := !new_chunklist) & not match(!no_nos)
X		then s2 ||:= filenametbl[=ch]
X		else s2 ||:= move(1)
X	    }
X	    else s2 ||:= move(1)
X	}
X    }
X    return s2
X
Xend
X
X
X#  From the IPL.  Thanks, Ralph -
X#  Author:  Ralph E. Griswold
X#  Date:  June 10, 1988
X#  exbase10(i,j) convert base-10 integer i to base j
X#  The maximum base allowed is 36.
X
Xprocedure exbase10(i,j)
X
X   static digits
X   local s, d, sign
X   initial digits := &digits || &lcase
X   if i = 0 then return 0
X   if i < 0 then {
X      sign := "-"
X      i := -i
X      }
X   else sign := ""
X   s := ""
X   while i > 0 do {
X      d := i % j
X      if d > 9 then d := digits[d + 1]
X      s := d || s
X      i /:= j
X      }
X   return sign || s
X
Xend
X
X# end IPL material
X
X
Xprocedure get_checksum(r)
X 
X    # Calculates the new value of the checksum field for the
X    # current header block.  Note that the specification say
X    # that, when calculating this value, the chksum field must
X    # be blank-filled.
X
X    sum := 0
X    r.chksum := "        "
X    every field := !r
X    do every sum +:= ord(!field)
X    return sum
X
Xend
X
X
X
Xprocedure write_report()
X
X    # This procedure writes out a list of filenames which were
X    # remapped (because they exceeded the SysV 14-char limit),
X    # and then notifies the user of the existence of this file.
X
X    local outtext, stbl, i, j, mapfile_name
X
X    # Get a unique name for the map.report (thereby preventing
X    # us from overwriting an older one).
X    mapfile_name := "map.report"; j := 1
X    until not close(open(mapfile_name,"r"))
X    do mapfile_name := (mapfile_name[1:11] || string(j+:=1))
X
X    (outtext := open(mapfile_name,"w")) |
X	open(mapfile_name := "/tmp/map.report","w") |
X	     stop("mtf:  Can't find a place to put map.report!")
X    stbl := sort(filenametbl,3)
X    every i := 1 to *stbl -1 by 2 do {
X	match(!no_nos,stbl[i]) |
X	    write(outtext,left(stbl[i],35," ")," ",stbl[i+1])
X    }
X    write(&errout,"\nmtf:  ",mapfile_name," contains the list of changes.")
X    write(&errout,"      Please save this list!")
X    close(outtext)
X    return &null
X
Xend
SHAR_EOF
true || echo 'restore of mtf.icn failed'
rm -f _shar_wnt_.tmp
fi
# ============= README ==============
if test -f 'README' -a X"$1" != X"-c"; then
	echo 'x - skipping README (File already exists)'
	rm -f _shar_wnt_.tmp
else
> _shar_wnt_.tmp
echo 'x - extracting README (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'README' &&
XNAME:  mtf
X
XLANGUAGE:  Icon
X
XAUTHOR:  Richard Goerwitz (goer@sophist.uchicago.edu)
X
XPURPOSE:  Maps 15+ char. filenames in a tar archive to 14 chars.
XHandles both header blocks and the archive itself.  Mtf is intended to
Xfacilitate installation of tar'd archives on systems subject to a
X14-character filename limit.
X
XINSTALLATION:  Cp Makefile.dist to Makefile and make.  If all goes
Xwell, and you have root priviledges, edit the Makefile to reflect
Xyour local file structure, and make install.
X
XUSAGE:  mtf inputfile [-r reportfile] [-e .extensions] [-x exceptions]
X
X"Inputfile" is a tar archive.  "Reportfile" is file containing a list
Xof files already mapped by mtf in a previous run (used to avoid
Xclashes with filenames in use outside the current archive).  The -e
Xswitch precedes a list of filename .extensions which mtf is supposed
SHAR_EOF
true || echo 'restore of README failed'
fi
echo 'End of  part 1'
echo 'File README is continued in part 2'
echo 2 > _shar_seq_.tmp
exit 0

exit 0 # Just in case...
-- 
Kent Landfield                   INTERNET: kent@sparky.IMD.Sterling.COM
Sterling Software, IMD           UUCP:     uunet!sparky!kent
Phone:    (402) 291-8300         FAX:      (402) 291-4362
Please send comp.sources.misc-related mail to kent@uunet.uu.net.