[comp.sys.sun] single user while dumping

jorgnsn@qucis.queensu.ca (11/23/88)

We just got an Exabyte 8 mm tape drive to do unattended dumps during the
night.  Since we have a couple of nightowl users who are active at all
hours, we were also wondering whether dumps run on an active filesystem
can be trusted.  So I tried a few experiments.

I started dump, then suspended it at different points and created or
deleted files, then continued the dump and tried a full restore with the
resulting tape.  The upshot is that it is possible, though unlikely, that
activity on the filesystem could invalidate dumped data other than the
active files.  Everytime I deleted a directory during one of dump's first
three passes, dump would continue without complaint, but restore would
give me something like:

    <filename>: not found on tape
    .
    .        (similar messages)
    .
    expected next file 14340, got 14337
    .
    .        (similar messages)
    .
    cannot find directory inode 4
    abort? [yn] 

I would say `n', and then restore would dump core.  Many files, with no
obvious connection to the deleted directory, would be missing from the
restored file system.  These files were also missing from the listing you
get with restore t or restore i, so you would be able to tell you had a
bad dump if you checked the listing.

I thought that actually suspending dump to delete the directory might be
too severe a test, since from dump's point of view the directory is there
one split second, and completely gone the next.  So I also tried to run
dump on one terminal as root while I deleted directories on another as an
ordinary user.  The timing has to be just so (I tried to hit the return
key on my ``rm -r'' command just after the "DUMP:  mapping (Pass II)
[directories]" message), and the deleted directory has to be from the
right spot on the disk, but one time out of three, I did manage to spoil a
dump.

So now we're planning to unmount the filesystem on its server before
dumping it.  It looks like if you unmount the filesystem on the server
while it is still mounted on clients, users on the clients just get
messages about stale NFS handles when they try to look at the missing
filesystem (if anyone knows of any more drastic results of leaving a
filesystem mounted on clients when it is not mounted on the server, please
let me know).

John Jorgensen
jorgnsn@qucis.queensu.ca
jorgnsn@qucis.bitnet

knutson%sw.MCC.COM@mcc.com (Jim Knutson) (11/29/88)

Don't judge the amount of time it takes to do a level 0 in single-user
mode by the time it takes in multi-user mode.  Multi-user mode dumps can
often take 4 to 8 times as long to finish depending on how busy the
machine is.  Servers are often busy resulting in disk contention between
the dump program and file service daemons.

Also, depending on your situation, you might want to consider doing a
level 0 once a month in single-user mode and the weekly dumps as level 1
in multi-user mode.  This should give you enough coverage to recover from
catastrophic disk failure with a clean level 0 as well as file retrieval
from the multi-user dumps.

Jim Knutson
knutson@mcc.com
cs.utexas.edu!milano!knutson