[comp.lang.icon] BSD -> SysV filename conversion

goer@SOPHIST.UCHICAGO.EDU (Richard Goerwitz) (06/29/90)

Yes, more code.  Isn't this newsgroup fun?

This is a repost of a now much enlarged and expanded tar archive
converter.  I frequently find that the main hassle porting BSD things
to SysV comes down to renaming files, and then weeding through the
source to find references to the old, overlong filenames.  This soft-
ware essentially automates the process.

Note:  This is NOT just a filename re-namer.  Yes, it renames files.
It also goes through any text files in the archive, and inserts the
new filenames there as well.  Note also:  The mapping algorithm is
smart.  It will keep one-letter extensions like .c, and will at the
user's option, keep any other extensions as well (e.g. .icn, .tex,
.uu).  With most archives, you should be able to just run this pro-
gram, pipe it to a file, unarchive this file, and then forget about
filename-length problems.

No one complained about the shar archive last time, so I'll shar
this one as well.  As usual, send me bug reports.

   -Richard L. Goerwitz              goer%sophist@uchicago.bitnet
   goer@sophist.uchicago.edu         rutgers!oddjob!gide!sophist!goer



#!/bin/sh
# This is a shell archive (shar 3.24)
# made 06/28/1990 18:46 UTC by richard@zenu (goer@sophist.uchicago.edu)
#
# existing files WILL be overwritten
# This format requires very little intelligence at unshar time.
# "echo" and "sed" will be needed.
#
# ============= mtf3.icn ==============
echo "x - extracting mtf3.icn (Text)"
sed 's/^X//' << 'SHAR_EOF' > mtf3.icn &&
X#############################################################################
X#
X#	NAME:	mtf3.icn
X#
X#	TITLE:	map tar file
X#
X#	AUTHOR:	Richard Goerwitz
X#
X#	DATE:	5/23/90  (version 3.0 - beta)
X#
X#############################################################################
X#
X#  Copyright (c) 1990, Richard L. Goerwitz, III
X#
X#  This software is intended for free and unrestricted distribution.
X#  I place only two conditions on its use:  1) That you clearly mark
X#  any additions or changes you make to the source code, and 2) that
X#  you do not delete this message therefrom.  In order to protect
X#  myself from spurious litigation, it must also be stated here that,
X#  because this is free software, I, Richard Goerwitz, make no claim
X#  about the applicability or fitness of this software for any
X#  purpose, and expressly disclaim any responsibility for any damages
X#  that might be incurred in conjunction with its use.
X#
X###########################################################################
X#
X#  PURPOSE: Maps 15+ char. filenames in a tar archive to 14 chars.
X#  Handles both header blocks and the archive itself.  Mtf is intended
X#  to facilitate installation of tar'd archives on systems subject to
X#  the System V 14-character filename limit.
X#
X#  USAGE:  mtf3 inputfile [-r reportfile] [-e .extensions] [-x exceptions]
X#
X#  "Inputfile" is a tar archive.  "Reportfile" is file containing a
X#  list of files already mapped by mtf in a previous run (used to
X#  avoid clashes with filenames in use outside the current archive).
X#  The -e switch precedes a list of filename .extensions which mtf is
X#  supposed to leave unscathed by the mapping process
X#  (single-character extensions such as .c and .o are automatically
X#  preserved; -e allows the user to specify additional extensions,
X#  such as .pxl, .cpi, and .icn).  The final switch, -x, precedes a
X#  list of strings which should not be mapped at all.  Use this switch
X#  if, say, you have a C file with a structure.field combination such
X#  as "thisisveryverybig.hashptr" in an archive that contains a file
X#  called "thisisveryverybig.h," and you want to avoid mapping that
X#  portion of the struct name which matches the name of the overlong
X#  file (to wit, "mtf inputfile -x thisisveryverybig.hashptr").  To
X#  prevent mapping of any string (including overlong filenames) begin-
X#  ning, say, with "thisisvery," use "mtf inputfile -x thisisvery."
X#  Be careful with this option, or you might end up defeating the
X#  whole point of using mtf in the first place.
X#
X#  OUTPUT FORMAT:  Mtf writes a mapped tar archive to the stdout.
X#  When finished, it leaves a file called "map.report" in the current
X#  directory which records what filenames were mapped and how.  Rename
X#  and save this file, and use it as the "reportfile" argument to any
X#  subsequent runs of mtf in this same directory.  Even if you don't
X#  plan to run mtf again, this file should still be examined, just to
X#  be sure that the new filenames are acceptable, and to see if
X#  perhaps additional .extensions and/or exceptions should be
X#  specified.
X#
X#  BUGS:  Mtf only maps filenames found in the main tar headers.
X#  Because of this, mtf cannot accept nested tar archives.  If you try
X#  to map a tar archive within a tar file, mtf will abort with a nasty
X#  message about screwing up your files.  Please note that, unless you
X#  give mtf a "reportfile" to consider, it knows nothing about files
X#  existing outside the archive.  Hence, if an input archive refers to
X#  an overlong filename in another archive, mtf naturally will not
X#  know to shorten it.  Mtf will, in fact, have no way of knowing that
X#  it is a filename, and not, say, an identifier in a C program.
X#  Final word of caution:  Try not to use mtf on binaries.  It cannot
X#  possibly preserve the correct format and alignment of strings in an
X#  executable.  Same goes for compressed files.  Mtf can't map
X#  filenames that it can't read!
X#
X####################################################################
X
X
Xglobal filenametbl, chunkset, short_chunkset   # see procedure mappiece(s)
Xglobal extensions, no_nos                      # ditto
X
Xrecord hblock(name,junk,size,mtime,chksum,     # tar header struct;
X              linkflag,linkname,therest)       # see readtarhdr(s)
X
X
Xprocedure main(a)
X
X    usage := "usage:  mtf3 inputfile [-r reportfile] " ||
X	     "[-e .extensions] [-x exceptions]"
X
X    *a = 0 & stop(usage)
X
X    intext := open_input_file(a[1]) & pop(a)
X
X    i := 0
X    extensions := []; no_nos := []
X    while (i +:= 1) <= *a do {
X	case a[i] of {
X	    "-r"    :    readin_old_map_report(a[i+:=1])
X	    "-e"    :    current_list := extensions
X	    "-x"    :    current_list := no_nos
X	    default :    put(current_list,a[i])
X	}
X    }
X
X    every !extensions ?:= (=".", tab(0))
X    
X    # Run through all the headers in the input file, filling
X    # (global) filenametbl with the names of overlong files;
X    # make_table_of_filenames fails if there are no such files.
X    make_table_of_filenames(intext) | {
X	write(&errout,"mtf:  no overlong path names to map") 
X	a[1] ? (tab(find(".tar")+4), pos(0)) |
X	  write(&errout,"(Is ",a[1]," even a tar archive?)")
X 	exit(1)
X    } 
X
X    # Now that a table of overlong filenames exists, go back
X    # through the text, remapping all occurrences of these names
X    # to new, 14-char values; also, reset header checksums, and
X    # reformat text into correctly padded 512-byte blocks.  Ter-
X    # minate output with 512 nulls.
X    seek(intext,1)
X    every writes(output_mapped_headers_and_texts(intext))
X
X    close(intext)
X    write_report()   # Record mapped file and dir names for future ref.
X    exit(0)
X    
Xend
X
X
X
Xprocedure open_input_file(s)
X    intext := open("" ~== s,"r") |
X	stop("mtf:  can't open ",s)
X    find("UNIX",&features) |
X	stop("mtf:  I'm not tested on non-Unix systems.")
X    s[-2:0] == ".Z" &
X        stop("mtf:  sorry, can't accept compressed files")
X    return intext
Xend
X
X
X
Xprocedure readin_old_map_report(s)
X
X    initial {
X	filenametbl := table()
X	chunkset := set()
X	short_chunkset := set()
X    }
X
X    mapfile := open_input_file(s)
X    while line := read(mapfile) do {
X	line ? {	
X	    if chunk := tab(many(~' \t')) & tab(upto(~' \t')) &
X		lchunk := move(14) & pos(0) then {
X		filenametbl[chunk] := lchunk
X		insert(chunkset,chunk)
X		insert(short_chunkset,chunk[1:16])
X	    }
X	if /chunk | /lchunk
X	then stop("mtf:  report file, ",s," seems mangled.")
X	}
X    }
X
Xend
X
X
X
Xprocedure make_table_of_filenames(intext)
X
X    local header # chunkset is global
X
X    # search headers for overlong filenames; for now
X    # ignore everything else
X    while header := readtarhdr(reads(intext,512)) do {
X	# tab upto the next header block
X	tab_nxt_hdr(intext,trim_str(header.size),1)
X	# record overlong filenames in several global tables, sets
X	fixpath(trim_str(header.name))
X    }
X    *\chunkset ~= 0 | fail
X    return &null
X
Xend
X
X
X
Xprocedure output_mapped_headers_and_texts(intext)
X
X    # Remember that filenametbl, chunkset, and short_chunkset
X    # (which are used by various procedures below) are global.
X    local header, newtext, full_block, block, lastblock
X
X    # Read in headers, one at a time.
X    while header := readtarhdr(reads(intext,512)) do {
X
X	# Replace overlong filenames with shorter ones, according to
X	# the conversions specified in the global hash table filenametbl
X	# (which were generated by fixpath() on the first pass).
X      	header.name := left(map_filenams(header.name),100,"\x00")
X	header.linkname := left(map_filenams(header.linkname),100,"\x00")
X
X	# Use header.size field to determine the size of the subsequent text.
X	# Read in the text as one string.  Map overlong filenames found in it
X 	# to shorter names as specified in the global hash table filenamtbl.
X	newtext := map_filenams(tab_nxt_hdr(intext,trim_str(header.size)))
X
X	# Now, find the length of newtext, and insert it into the size field.
X	header.size := right(exbase10(*newtext,8) || " ",12," ")
X
X	# Calculate the checksum of the newly retouched header.
X	header.chksum := right(exbase10(get_checksum(header),8)||"\x00 ",8," ")
X
X	# Finally, join all the header fields into a new block and write it out
X	full_block := ""; every full_block ||:= !header
X	suspend left(full_block,512,"\x00")
X
X	# Now we're ready to write out the text, padding the final block
X	# out to an even 512 bytes if necessary; the next header must start
X	# right at the beginning of a 512-byte block.
X	newtext ? {
X	    while block := move(512)
X	    do suspend block
X	    pos(0) & next
X            lastblock := left(tab(0),512,"\x00")
X	    suspend lastblock
X	}
X    }
X    # Write out a final null-filled block.  Some tar programs will write
X    # out 1024 nulls at the end.  Dunno why.
X    return repl("\x00",512)
X
Xend
X
X
X
Xprocedure trim_str(s)
X
X    # Knock out spaces, nulls from those crazy tar header
X    # block fields (some of which end in a space and a null,
X    # some just a space, and some just a null [anyone know
X    # why?]).
X    return s ? {
X	(tab(many(' ')) | &null) &
X	    trim(tab(find("\x00")|0))
X    } \ 1
X
Xend 
X
X
X
Xprocedure tab_nxt_hdr(f,size_str,firstpass)
X
X    # Tab upto the next header block.  Return the bypassed text
X    # as a string if not the first pass.
X
X    local hs, next_header_offset
X
X    hs := integer("8r" || size_str)
X    next_header_offset := (hs / 512) * 512
X    hs % 512 ~= 0 & next_header_offset +:= 512
X    if 0 = next_header_offset then return ""
X    else {
X	# if this is pass no. 1 don't bother returning a value; we're
X	# just collecting long filenames;
X	if \firstpass then {
X	    seek(f,where(f)+next_header_offset)
X	    return
X	}
X	else {
X	    return reads(f,next_header_offset)[1:hs+1] |
X		stop("mtf:  error reading in ",
X		     string(next_header_offset)," bytes.")
X	}
X    }
X
Xend
X
X
X
Xprocedure fixpath(s)
X
X    # Fixpath is a misnomer of sorts, since it is used on
X    # the first pass only, and merely examines each filename
X    # in a path, using the procedure mappiece to record any
X    # overlong ones in the global table filenametbl and in
X    # the global sets chunkset and short_chunkset; no fixing
X    # is actually done here.
X
X    s2 := ""
X    s ? {
X	while piece := tab(find("/")+1)
X	do s2 ||:= mappiece(piece) 
X	s2 ||:= mappiece(tab(0))
X    }
X    return s2
X
Xend
X
X
X
Xprocedure mappiece(s)
X
X    # Check s (the name of a file or dir as recorded in the tar header
X    # being examined) to see if it is over 14 chars long.  If so,
X    # generate a unique 14-char version of the name, and store
X    # both values in the global hashtable filenametbl.  Also store
X    # the original (overlong) file name in chunkset.  Store the
X    # first fifteen chars of the original file name in short_chunkset.
X    # Sorry about all of the tables and sets.  It actually makes for
X    # a reasonably efficient program.  Doing away with both sets,
X    # while possible, causes a tenfold drop in execution speed!
X    
X    # global filenametbl, chunkset, short_chunkset, extensions
X    local j, ending
X
X    initial {
X	/filenametbl := table()
X	/chunkset := set()
X	/short_chunkset := set()
X    }
X   
X    chunk := trim(s,'/')
X    if chunk ? (tab(find(".tar")+4), pos(0)) then {
X	write(&errout, "mtf:  Sorry, I can't let you do this.\n",
X	               "      You've nested a tar archive within\n",
X	               "      another tar archive, which makes it\n",
X	               "      likely I'll f your filenames ubar.")
X	exit(2)
X    }
X    if *chunk > 14 then {
X	i := 0
X
X	if /filenametbl[chunk] then {
X	# if we have not seen this file, then...
X	    repeat {
X		# ...find a new unique 14-character name for it;
X		# preserve important suffixes like ".Z," ".c," etc.
X		# First, check to see if the original filename (chunk)
X		# ends in an important extension...
X		if chunk ?
X		    (tab(find(".")),
X		     ending := move(1) || tab(match(!extensions)|any(&ascii)),
X		     pos(0)
X		     )
X		# ...If so, then leave the extension alone; mess with the
X		# middle part of the filename (e.g. file.with.extension.c ->
X		# file.with001.c).
X		then {
X		    j := (15 - *ending - 3)
X		    lchunk:= chunk[1:j] || right(string(i+:=1),3,"0") || ending
X		}
X		# If no important extension is present, then reformat the
X		# end of the file (e.g. too.long.file.name -> too.long.fi01).
X		else lchunk := chunk[1:13] || right(string(i+:=1),2,"0")
X
X		# If the resulting shorter file name has already been used...
X		if lchunk == !filenametbl
X		# ...then go back and find another (i.e. increment i & try
X		# again; else break from the repeat loop, and...
X		then next else break
X	    }
X            # ...record both the old filename (chunk) and its new,
X	    # mapped name (lchunk) in filenametbl.  Also record the
X	    # mapped names in chunkset and short_chunkset.
X	    filenametbl[chunk] := lchunk
X	    insert(chunkset,chunk)
X	    insert(short_chunkset,chunk[1:16])
X	}
X    }
X
X    # If the filename is overlong, return lchunk (the shortened
X    # name), else return the original name (chunk).  If the name,
X    # as passed to the current function, contained a trailing /
X    # (i.e. if s[-1]=="/"), then put the / back.  This could be
X    # done more elegantly.
X    return (\lchunk | chunk) || ((s[-1] == "/") | "")
X
Xend
X
X
X
Xprocedure readtarhdr(s)
X
X    # Read the silly tar header into a record.  Note that, as was
X    # complained about above, some of the fields end in a null, some
X    # in a space, and some in a space and a null.  The procedure
X    # trim_str() may (and in fact often _is_) used to remove this
X    # extra garbage.
X
X    this_block := hblock()
X    s ? {
X	this_block.name     := move(100)    # <- to be looked at later
X	this_block.junk     := move(8+8+8)  # skip the permissions, uid, etc.
X	this_block.size     := move(12)     # <- to be looked at later
X	this_block.mtime    := move(12)
X	this_block.chksum   := move(8)      # <- to be looked at later
X	this_block.linkflag := move(1)
X	this_block.linkname := move(100)    # <- to be looked at later
X	this_block.therest  := tab(0)
X    }
X    integer(this_block.size) | fail  # If it's not an integer, we've hit
X                                     # the final (null-filled) block.
X    return this_block
X
Xend
X
X
X
Xprocedure map_filenams(s)
X
X    # Chunkset is global, and contains all the overlong filenames
X    # found in the first pass through the input file; here the aim
X    # is to map these filenames to the shortened variants as stored
X    # in filenametbl (GLOBAL).
X
X    local s2, tmp_chunk_tbl, tmp_lst
X    static new_chunklist
X    initial {
X
X        # Make sure filenames are sorted, longest first.  Say we
X        # have a file called long_file_name_here.1 and one called
X        # long_file_name_here.1a.  We want to check for the longer
X        # one first.  Otherwise the portion of the second file which
X        # matches the first file will get remapped.
X        tmp_chunk_tbl := table()
X        every el := !chunkset
X        do insert(tmp_chunk_tbl,el,*el)
X        tmp_lst := sort(tmp_chunk_tbl,4)
X        new_chunklist := list()
X        every put(new_chunklist,tmp_lst[*tmp_lst-1 to 1 by -2])
X
X    }
X
X    s2 := ""
X    s ? {
X	until pos(0) do {
X	    # first narrow the possibilities, using short_chunkset
X	    if member(short_chunkset,&subject[&pos:&pos+15])
X            # then try to map from a long to a shorter 14-char filename
X	    then {
X		if match(ch := !new_chunklist) & not match(!no_nos)
X		then s2 ||:= filenametbl[=ch]
X		else s2 ||:= move(1)
X	    }
X	    else s2 ||:= move(1)
X	}
X    }
X    return s2
X
Xend
X
X
X#  From the IPL.  Thanks, Ralph -
X#  Author:  Ralph E. Griswold
X#  Date:  June 10, 1988
X#  exbase10(i,j) convert base-10 integer i to base j
X#  The maximum base allowed is 36.
X
Xprocedure exbase10(i,j)
X
X   static digits
X   local s, d, sign
X   initial digits := &digits || &lcase
X   if i = 0 then return 0
X   if i < 0 then {
X      sign := "-"
X      i := -i
X      }
X   else sign := ""
X   s := ""
X   while i > 0 do {
X      d := i % j
X      if d > 9 then d := digits[d + 1]
X      s := d || s
X      i /:= j
X      }
X   return sign || s
X
Xend
X
X# end IPL material
X
X
Xprocedure get_checksum(r)
X 
X    # Calculates the new value of the checksum field for the
X    # current header block.  Note that the specification say
X    # that, when calculating this value, the chksum field must
X    # be blank-filled.
X
X    sum := 0
X    r.chksum := "        "
X    every field := !r
X    do every sum +:= ord(!field)
X    return sum
X
Xend
X
X
X
Xprocedure write_report()
X
X    # This procedure writes out a list of filenames which were
X    # remapped (because they exceeded the SysV 14-char limit),
X    # and then notifies the user of the existence of this file.
X
X    local outtext, stbl, i, j, mapfile_name
X
X    # Get a unique name for the map.report (thereby preventing
X    # us from overwriting an older one).
X    mapfile_name := "map.report"; j := 1
X    until not close(open(mapfile_name,"r"))
X    do mapfile_name := (mapfile_name[1:11] || string(j+:=1))
X
X    (outtext := open(mapfile_name,"w")) |
X	open(mapfile_name := "/tmp/map.report","w") |
X	     stop("mtf:  Can't find a place to put map.report!")
X    stbl := sort(filenametbl,3)
X    every i := 1 to *stbl -1 by 2 do {
X	match(!no_nos,stbl[i]) |
X	    write(outtext,left(stbl[i],35," ")," ",stbl[i+1])
X    }
X    write(&errout,"\nmtf:  ",mapfile_name," contains the list of changes.")
X    write(&errout,"      Please save this list!")
X    close(outtext)
X    return &null
X
Xend
SHAR_EOF
exit 0