[net.unix-wizards] Flames on system backups

CHUQUI@MIT-MC (03/22/83)

From:  Charles F. Von Rospach <CHUQUI @ MIT-MC>


Does anybody out there know of a decent backup system available for Unix?
I have been trying to get soemthing running with dump and tar, and I am
extremely disgusted with the quality of this software. My basic requirements
would be (in relative order of preference):

o	Incremental backup capabilities

o	Multiple filesystems

o	Multiple volumes

o	Reasonably straightforward and easy restores on the file,
	directory, and system level (in that order).

o	Decent documentation

o	Decent handling of the mag tape device.

o	internal control of the tape libary, with volume number prompting

o	on-line backups

The main problem I am having are because my filesystem structure is not
what most Unix people would consider 'normal'. I currently have 8 mounted
filesystems (includeing root and /usr), and some of them are fairly large.
Before people tell me to change this, let me add that this cannot change
because of the way some of the users have designed their software, and 
getting them to change is not reasonable. 

In attempting to get dump to work on this system in a reasonable way, I
have found the following problems:

o	dump does not allow multiple filesystems on a tape. This means that
	I have to keep 8 sets of backups (with the associated 8 tape swaps
	per day) or figure out some way around this. At one time, I was
	writing to nrmt8 and doing forward and backward spaces to get to 
	dump images on the tape, but restoration from this kind of tape was
	almost impossible.

o	dump does not recognize end of tape. It takes the given density and
	length of tape and estimates the number of bytes it can fit. When it
	figures it is close, it asks for a new volume. This creates a few
	problems. I have to run dump giving it what will be the smallest size
	tape it will see, or it aborts. If it hits end of tape, there is no
	way for me to get it to checkpoint at the last completed file before
	going to the next volume. IDumps estimates also seem a little wishful,
	since I am currently having to tell dump that my 2400 foot tapes are
	1800 feet to keep it from running off the edge.

o	restorations on dump are painful. If I ned to restor a directory, 
	I almost have to restore the full system onto a scratch partition 
	and move it . I don't know that I will be able to keep that much spare
	disk around. Restoring a file into a file called by its inode number is
	simply ridiculous.

o	Dumps documentation, to put it lightly, easts it. Under diagnostics,
	it defines them as 'many, and varied'. That doesn't tell me what to
	expect, and it doesn't tell my non-programmer operators what to do
	when something goes wrong. That may seem cute to whomever wrote it, but
	in a commercial installation it just gets people like myself irate.

o	Dump also returns with non-standard exit codes. When dump exits 
        normally, it does so with an exit code 1, instead of 0. This is 
	also not documented. I have noticedin my experiments exit codes of
	2, 3, 4, and 1. None of these are documented. If yopu are going to
	send a special exit code for some reason, PLEASE tell us what it is for.

TAr is a little better because it allows me to save and restore across filesystems
and by using filenames. It does not have incremental capabilities, and while
I could front end it fairly easily, the overhead of 200 to 2000 'tar r <file>'
calls on an 11/750 would be laughable. Also, since Tar doesn't support multiple
volumes, I have another end of tape problem.


Anyone out there have this problem and solve it already? Anyone out there want
to justify some of these design 'features'? Maybe there is something I have
missed, but Unix is the first system I have worked on that seems to have put 
almost no time in on such basic system requirements as system backups and
print drivers (another flame altogether).

chuck (chuqui at mit-mc)

guy (04/09/83)

We have substantially modified the V7/S3 "restor" to solve *some* of the
problems mentioned:

o	dump does not allow multiple filesystems on a tape.

Our "restor" has a "F" (so sue me, I'm too tired to fight UNIX's one-character
option orientation) option, which says the file on the tape to restore from
is file number "n", so that you can say

	restor XfF /dev/rmt8 3 ...

to restore from the third file on the tape (I will explain the "X" later).

o	dump does not recognize end of tape.

This may involve beating the UNIX magtape driver up a bit (believe me, with
the exception of the 4.1BSD driver, it deserves it; how many of you have
had to hack the magtape driver(s) to give it special function commands to
skip blocks/files, or write an EOF, or rewind, or rewind/unload, or...?  To
quote Monty Python, "I know I have.").

o	restorations on dump are painful.

*FIXED*!  The "X" option to our "restor" dumps the file into itself;
furthermore, if you specify a directory to "restor" with the "X" option
it restores the directory by creating it (fork and exec of "mkdir") and
restoring all its contents recursively.  I.e., if you have a directory
/usr/foo/bar, if you say

	restor X foo/bar

on the "/usr" dump it will create a directory "foo/bar" in the current
directory (so cd to "/usr" if you want "/usr/foo/bar") and pull in all
its contents.  It will create all directories needed to restore a file
by pathname, just like "tar" (which isn't surprising, considering that's
where I ripped the code off from).

We also have a "T" option which does a "dumpdir"; since "dumpdir" is just
a mutant version of "restor" (check the code out!), and since all the
stuff added to "restor" to make "dumpdir" was needed for "X", we put in
a "T".  Also, there is a "c" option which works like "r" only it works
as a read/compare instead of a write; i.e., it should be run immediately
after a "dump" so you know that you can trust the dump you just made.

Also beware of using the S3 "restor" with the "r" flag; it does *not*
maintain the s_tfree and s_tinode fields in the superblock, so you will
have to "fsck" the file system after that if you want the "ustat" system
call, and all the programs that depend on it (like "df", the RJE software,
and "ed") to work.  4.1BSD also maintains the s_tfree and s_tinode field
(although they don't provide the "ustat" call), and they fixed "restor" to
update it when restoring; our "restor" actually came from the 4.1BSD one
and therefore does this.  It also fixes a couple of bugs involving dump tapes
with files that happen to have a bit pattern that matches a dump tape header
block (this fix also came from 4.1BSD), and a number of other bugs that we
ran into along the way.

					Guy Harris
					RLG Corporation
					{seismo,mcnc,we13}!rlgvax!guy

kar (04/11/83)

Regarding a recent submission about how neither dump nor tar were suitable as
a means of backing up a system with lots of file systems.  The objections below
are quoted from the original article.

	"dump does not allow multiple filesystems on a tape."
We dump several filesystems to a single tape when we know that each of the
dumps is going to be short (i.e. they'll all fit).  Restoring was a problem
until we made the simple addition of the F option, which specifies which file
on a reel contains the dump.  This has been described in a previous submission.

	"dump does not recognize end of tape."
On our old V7 and 4.1bsd, our problem was not using the last several
hundred feet of tape rather than running out.  Dump is very conservative about
its estimated tape usage; if you are running off of the end of the reel, your
tape drive might be writing long interrecord gaps, or something like that.

	I fixed this on V7, but have not cared to mess with it on our 4.1bsd
system.  The naive approach involves changing the tape driver, and having it
return an explicit error code when the end of tape marker is sensed. 
Curiously, the hooks for doing this are all there; there is a field in the
buffer header for an error code that, if not filled in with something specific
by the driver, gets loaded with a generic error number by (I think) the buffer
manager.

	Unfortunately, this does not work for "dump".  The error code, strictly
speaking, should be returned for the first write AFTER the marker goes by,
as the previous one was actually written properly.  Dump, however, needs to
know whether it will be possible to write another block just AFTER it finishes
writing the previous one, so it can decide whether to continue writing out
files or to write the updated bit maps.  One could have the end of tape error
code returned on the write that passes the marker, but then the error code
isn't really an error.  We put in an ioctl instead that returned the status
of the drive (which, by the way, must settle down first).  Then dump can
determine (without actually trying it) whether another attempt to write will
succeed.

	"restorations on dump are painful."
The use of "dumpdir" has been discussed in previous followups, and, with a
little good old-fashioned hacking, can be at least partially automated.

	"Dumps documentation, to put it lightly, easts it."
No comment.

	"Dump also returns with non-standard exit codes."
Yes, and it played hob with one of my programs (described below) until I
fixed it.  I used to know what the number meant; I think it is number of file-
systems actually dumped, but I'm not sure.

--------

	A while back, in an effort to automate our filesave procedures, I wrote
a program called "save", which reads a table describing when various types of
dumps are to be made on assorted file systems.  I also modified "dump" and
"restor" to write a label on each tape, including multi-reel sets.

	What we do now is to run "save".  It scans its file and figures out
(based on the day of the week) which filesystems are to be dumped and what
tape sets are to be used.  It prompts the operator to mount the proper tape
and waits for a response.  Dump then checks the label on the tape and gives
the bozo another chance if s/he mounted the wrong one.

	The table we use (redundancy edited out) is shown below:

set v1root-m 3 2
set v1usr-m 3 0 v1root-m
set v1acct-m 3 0 v1root-m
set v1acct-w 4 0
work 1 mon, reset v1acct-w, dump v1root-m root b, dump v1usr-m usr 0, dump v1acct-m acct 0
work mon, dump v1acct-w acct 1 root 0 usr 1

	The "set" lines identify tape sets that are used for dumps.  For exam-
ple, the first line says that there are three "v1root-m" tapes, which are
labelled "v1root-m1", "v1root-m2", and "v1root-m3".  The next time a dump is
taken on one of these, number two should be used.  The second line identifies
another set of tapes, whose numbering should match the numbering on "v1root-m".

	The "work" lines identify work to be done on various days of the week.
The first one says that on the 1st Monday of the month, reset the counter for
tape set "v1acct-w", then dump "/dev/root" on the next "v1root-m" tape in
self-booting format, dump "/dev/usr" at level 0 on the next "v1usr-m" tape set,
etc.  The second "work" line says that on all other Mondays, the devices
"/dev/acct", "/dev/root", "/dev/usr" are all to be dumped on the next tape in
the "v1acct-w" set, at level 1, 0, and 1 respectively.

	By now, you get the idea.  I wrote this a long time ago, and have not
done anything much to it since, so I don't want any flames about my code;
nevertheless, I am willing to post the source if there is any interest.

donn (04/14/83)

References: ittvax.670 uiucdcs.1842 ritcv.273 rlgvax.202

Although this has been quite an active discussion there are a couple of
points that no one has brought up.

Maintaining Multiple Filesystem Backups on One Tape:

We do this with our 4.1 BSD software on our VAX 11/750 on a daily
basis, without the need for "restor F".  We use different strategies
for short and long dumps.  Short dumps are those that don't use
multiple volumes; in particular our day and week dumps are usually so
short that all the dumps for all the filesystems fit on one 2400 foot
tape.  To make these we just have a shell script that dumps each
filesystem onto the no-rewind hi-density tape device.  Multi-volume
dumps get their own tapes.  We have had problems in the past where an
entire box of tapes was unacceptable for dump using 2400 foot
estimates; we just told dump to try shorter estimates for a week and
went to a different brand of tape thereafter.  (Why settle for less?)

Restores from multiple-file tapes are done with job control and the
"mt" program.  When restoring individual files you just:  use "mt fsf
nnn" to move forward nnn files to the file that contains the dump you
want to look at; start "restor x" on your files; stop "restor" with ^Z
when it asks you to mount the tape volume; use "mt" to reposition the
tape after it rewinds; then continue "restor" and it will find your
file(s).  You can be clever and have "restor" use the no-rewind tape
device, in which case it won't rewind the tape:  use "mt bsf 1" to move
back to the previous file on the tape and "mt fsf 1" to return to the
beginning of the desired file.

Problems With Restoring Directories and Handling Tape Errors:

Kirk McKusick at Berkeley has changed "dump" and "restor" for 4.2 BSD
so that they act much more like "tar".  Dumping and restoring are both
faster; not by an order of magnitude but "about as fast as tar", due to
the improved filesystem layout.  You restore to a MOUNTED filesystem.
You can give directories as arguments and they will be recursively
extracted.  Files are always extracted under their real names, not some
inode number.  (In fact the inode number is no longer relevant, since
files are created using the normal file creation mechanisms even on
full filesystem restores.  This means that you can dump and restore a
filesystem to a SMALLER partition.) All your old tapes will be
compatible with 4.2 BSD even though the filesystem layout is completely
different.  If you get a tape error while restoring, the program takes
advantage of the (relatively) smart tape driver and the "redundant
information" which the manual entry talks about, and attempts to help
you.  And, best of all, the program is now called "restore"!  The only
disadvantage is that you will never get it to run without virtual
memory (or lots of memory period), so it is not portable to smaller
(dinky!) machines.  Latest projections for the first 4.2 tapes:
beginning of May(be).

Donn Seeley  UCSD Chemistry Dept. RRCF  ucbvax!sdcsvax!sdchema!donn
             (619) 452-4016             sdamos!donn@nprdc

kar (04/20/83)

Regarding the discussion of how to restore from tapes that have multiple dumps
on them:

	Many people (ourselves included) have implemented an F option in 
"restor" to automatically space the tape to the proper dump when several have
been recorded on the reel.  Others retort that this is not necessary; all you
need to do is "mt fsf N", do the restor, ^Z, "mt fsf N", %1, then enter the
reel number, and you're all set! (How intuitively obvious!)

	Here's what we used to do on our V7 systems (no ^Z, thank you!) before
we got 4.1bsd:

dd if=/dev/nrmt0 of=/dev/null bs=20b files=N	# skip N files on the tape
(sleep 60; dd if=/dev/nrmt0 of=/dev/null bs=20b files=N) &
restor xf /dev/rmt0 file ...

	The "restor" command would read the bit maps and directories from the
tape and rewind it.  It would then print the i-numbers and names of the files
it found on the tape, and ask the operator to "mount the desired volume".  The
operator simply waited until the "sleep" interval elapsed, at which point the
second "dd" positioned the tape back to the right spot.  Hitting "return"
completed the successful restore!

	Not afraid to say
	Not afraid to admit to being a hacker:

	Ken Reek, Rochester Institute of Technology
	ucbvax!allegra!rochester!ritcv!kar