[comp.unix.wizards] Woes of absolute path names in tar

anw@nott-cs.UUCP (06/04/88)

Many years ago, some kind soul [may even have been me, :-(, I tend to forget
such things] installed a shell script on our PDP 11 [probably when it was a
34 running V6, perhaps when it was a 70 running V7 -- now it's a 44 running
V7, gloom] to copy file hierarchies.  Well, time passes;  and as the discs
fill up, I archive elderly subdirectories:

		copy old-dir /nicedisc/anw/archive
		: check that all is well, then ...
		rm -rf old-dir

Repeat as necessary.  More time passes.  Yesterday, we re-organised our
discs a bit, and suddenly there is oodles of space around my home directory.
Wouldn't it be nice to get all that stuff back in the right place?

		mkdir old-dir
		copy /nicedisc/anw/archive old-dir
		: Aaarrrggghhh!  Lots of error messages

Yikes!  Look in the archive directory.  Funny, all those files are still there,
and "ls -l" shows they haven't been altered, but "cat" one or two, and they're
empty.  More tests -- they aren't actually empty, they've been NULlified.
All my beloved old files!  Hope this wasn't too long ago, might not be able
to find a useful dump tape.  Phew, they are on that morning's dump tape, all
correct, and 10 mins later my sanity and the files are restored.

	What happened?  Well, you've guessed from the "Subject:" line, but I
had to scrat around for a bit.  Here is "/usr/bin/copy", as was:

		echo copying from $1 to $2
		tar cvf - $1 | (cd $2; tar xfp -)

[I don't think this is my code, even from 10 years ago, if only 'cos I
always put quotes around parameters.  Well, nearly always.]  Someone has
been reading the "tar" manual entry;  but this is disastrous.  If "$1"
begins with "/", the right-hand "tar" overwrites the directory the left-
hand "tar" is reading from, zapping most of the files, but restoring the
modified times, etc., with potentially terrible consequences, as I found out.
There are other bugs as well, but to cut a long story short, "/usr/bin/copy"
*now* looks like this:

	error () { echo $0: "$@" 1>&2; exit 1; }

	case $# in
		0)	echo usage: $0 fromdir todir; exit 0 ;;
		1)	error must supply target directory ;;
		2)	;;
		*)	error too many parameters
	esac

	[ -d "$1" -a -d "$2" ] ||
		error parameters must be existing directories

	case "$1" in
		/*)	error first param must not begin with /
	esac

	FROM=`(cd "$1"; pwd)`
	TO=`(cd "$2"; pwd)`
	case "$TO" in
		"$FROM")	error must not copy directory to itself ;;
		"$FROM"/*)	error must not copy directory to sub-dir
	esac

	echo copying from "$1" to "$2"
	sleep 10
	tar cvf - "$1" | (cd "$2"; tar xfp -)

Still not perfect, but at least somewhat safer.  And the morals are:
	a) Good dumping strategies are essential.
	b) Even your most ancient and trusted tools can suddenly bite you.
	c) Those of you with whizzo "cpio"s, and other bells and whistles,
	   and much more careful "tar"s, and Suns, and Crays, and what
	   have-you, please spare a thought for your less fortunate
	   brethren and cistern.
I'd better stop before I get too "comp.RISK"-ish.  Thank you for listening.

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK
anw@maths.nott.ac.uk

mouse@mcgill-vision.UUCP (der Mouse) (06/18/88)

In article <564@tuck.nott-cs.UUCP>, anw@nott-cs.UUCP writes:
[stuff was archive with]
> 		copy old-dir /nicedisc/anw/archive
> 		: check that all is well, then ...
> 		rm -rf old-dir
[when attempting to restore]
> 		mkdir old-dir
> 		copy /nicedisc/anw/archive old-dir
> 		: Aaarrrggghhh!  Lots of error messages
[This was because "copy" was....]
> 		echo copying from $1 to $2
> 		tar cvf - $1 | (cd $2; tar xfp -)

> [...] this is disastrous.  If "$1" begins with "/", the right-hand
> "tar" overwrites the directory the left-hand "tar" is reading from,

> There are other bugs as well, but to cut a long story short,
> "/usr/bin/copy" *now* looks like this:

[long script.  Many checks, in particular $1 must not begin with a
slash.  Ultimately....]
> 	echo copying from "$1" to "$2"
> 	sleep 10
> 	tar cvf - "$1" | (cd "$2"; tar xfp -)

This still has problems.  If, for example, I want to copy /foo/bar/baz
to /newfoo/bar/gleep, and I do it thus....

	% copy /foo/bar/baz /newfoo/bar/gleep
copy: [error message about from directory must not begin with / here]
me: why in the name of poslfit not?  Oh well....
	% cd /
	% copy foo/bar/baz newfoo/bar/gleep
...pause while it does it
	%

I then find that it has actually copied foo/bar/baz/* to
newfoo/bar/gleep/foo/bar/baz/* instead of newfoo/bar/gleep/* as I
expected.

Now, all this aggravation, including the original one about the leading
slash, could have been avoided if only you'd done....

	( cd "$1" ; tar cf - . ) | ( cd "$2" ; tar xvf - )

The code

FROM=`(cd "$1"; pwd)`
TO=`(cd "$2"; pwd)`

which appears to be intended to guard the subsequent checks against the
target being the same as, or a subdirectory of, the source, has other
problems.  In particular, it assumes that all the directories in the
chain leading to $1 and $2 are readable, which is not necessarily true
(one or more of them may well be execute-only, causing pwd to fail).

As insurance against this sort of fun, *my* tar (plug plug) won't
extract rooted pathnames without a special flag.  (It has other nice
features, like conforming to tar(5).  No, stock tar doesn't, at least
not the ones available to me: 4.3BSD, mtXinu 4.3+NFS, Sun 3.5, and Iris
(version unknown).)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

bzs@bu-cs.BU.EDU (Barry Shein) (06/18/88)

I once built a tape utility (under contract) which had the option of
passing all pathnames thru a user-provided awk program before being
used. This not only solves absolute pathname problems (trivially) but
also things like (relatively) foreign or just bogus names and the
possibility of just constructing arbitrarily fancy file selectors
(assuming that there's a way to give negative feedback, an empty
reply, just newline, would do to say "skip this file".)

Anyhow, the worst is absolute pathnames, but the suggestion above
should be little more than adding a flag and a popen() to tar.

	-Barry Shein, Boston University