[comp.sys.sun] A dump/restore HAZARD involving consecutive level 0 dumps

whm@uunet.uu.net (Bill Mitchell) (03/23/91)

This note describes a problem involving consecutive level 0 dumps that we
discovered the hard way.  Sun has been notified of this problem and has
assigned a bug number, 1052964.  We have requested an official workaround,
but haven't received one as yet.  We ran into this problem under 4.0.3 and
I assume it still exists in 4.1.1.

We use a fairly simple backup scheme here: level 0 dumps every month or so
and level 1 dumps daily.  We maintain off-site archives and to avoid
making trips to our storage facility to retrieve the most recent level 0
dumps, we do two sets of level 0 dumps, one right after the other, with
the system in single-user mode.  One set is sent off-site and one is kept
on-site.  A set of level 1 dumps is sent off-site on a weekly basis.

We used this scheme for a couple of years without any trouble, but one day
we ran into a problem.  For the sake of discussion, let's say that we did
level 0 dumps on 2/25 with the first set at 6:00pm and the second
(identical) set at 7:30pm.  It happened that the 6:00pm set was on-site
and we did a full restore from that set.  We then tried to restore a level
1 dump on top of just-restored filesystem, but restore squawked with
"incremental tape too high".  The problem was that when the level 1 dump
was made, it was noted on the tape that it was based on a level 0 dump
done at 7:30 on 2/25, but the restoresymtable built by the level 0
restoration said the level 0 dump was done at 6:00pm on 2/25.  restore
correctly detected an apparent mismatch and wouldn't let us proceed.  

restore is clearly doing the right thing, but there should also be some
way to tell restore to "trust me and use the tape".  In our particular
case we recovered by hacking the date recorded in restoresymtable so that
it matched the date on the level 1 tape.  (FYI-My notes indicate that the
time and date were the third and second words from the end of
restoresymtable.)

In the course of typing this note it occurred to me that it might be
possible to avoid this problem by not specifying dump's "u" flag on the
second dump, but I haven't tried this to see if it works.

If the problem occurs you can always do a restore of all the files on the
level 1, either ploppping them in on top of the level 0 files or putting
them in a separate hierarchy and manually merging them, but either one is
pretty ugly.

If you're careful to send the first set off-site and keep the second-set
on-site, you won't run into this problem unless you have to fall back to
the off-site archives.

If you'd like to see Sun address this issue, I encourage you to give them
a call.

Bill Mitchell				whm@sunquest.com
Sunquest Information Systems		sunquest!whm@arizona.edu
930 N. Finance Center Dr.               {arizona,uunet}!sunquest!whm
Tucson, AZ, 85710                       sunquest!whm@uunet.uu.net
602-885-7700

iand@krite.labtam.oz.au (Ian Donaldson) (03/26/91)

sunquest!whm@uunet.uu.net (Bill Mitchell) writes:
>In the course of typing this note it occurred to me that it might be
>possible to avoid this problem by not specifying dump's "u" flag on the
>second dump, but I haven't tried this to see if it works.

A reasonable solution to this is to allow a command line argument to
specify an alternate /etc/dumpdates file.  This way you can have multiple
consistent dumps done simultaneously.

Dump upto 4.3bsd-reno doesn't have such an option unfortunately. (would
be nice!)

A workaround is to maintain multiple /etc/dumpdates yourself, by using a
wrapper script for alternate sump sequences that mv's /etc/dumpdates to a
safe place, the alternate one in for the alternate dump, then out again
and restores the original one afterwards.

Ian D

whm@uunet.uu.net (Bill Mitchell) (03/27/91)

In article <2100@brchh104.bnr.ca>, I described a problem that prevents restore
from properly using an incremental tape that one would expect to be valid.
In that article I wrote:

> In the course of typing this note it occurred to me that it might be
> possible to avoid this problem by not specifying dump's "u" flag on the
> second dump, but I haven't tried this to see if it works.

Today I tried this approach and it *did not* work.

Also, in the orginal article I said that using the first of two
consecutive level 0 dumps with an incremental dump yielded the message
"incremental tape too high".  I got that wrong, the message is
"Incremental volume too low." (If you do two consecutive level 0 dumps,
omit "u" on the second dump, restore the second dump, and then try to put
on an incremental, the message is "Incremental volume too high".

An old version of the source for restore seems to indicate that restore
expects the date match to be exact.  That is, if a level 0 dump is done at
10:12 on 3/1/91, the level 1 dump must contain information that indicates
that it was based on a level 0 dump done at 10:12 on 3/1/91.

I presume one could workaround by having a script that sets the system
date appropriately based on the contents of /etc/dumpdates and then starts
the second dump, but the date needs to be accurate to the second.  I don't
think it would be a snap to do this reliably.

Bill Mitchell				whm@sunquest.com
Sunquest Information Systems		sunquest!whm@arizona.edu
930 N. Finance Center Dr.               {arizona,uunet}!sunquest!whm
Tucson, AZ, 85710                       sunquest!whm@uunet.uu.net
602-885-7700