jorgnsn@qucis.queensu.ca (11/23/88)
We just got an Exabyte 8 mm tape drive to do unattended dumps during the night. Since we have a couple of nightowl users who are active at all hours, we were also wondering whether dumps run on an active filesystem can be trusted. So I tried a few experiments. I started dump, then suspended it at different points and created or deleted files, then continued the dump and tried a full restore with the resulting tape. The upshot is that it is possible, though unlikely, that activity on the filesystem could invalidate dumped data other than the active files. Everytime I deleted a directory during one of dump's first three passes, dump would continue without complaint, but restore would give me something like: <filename>: not found on tape . . (similar messages) . expected next file 14340, got 14337 . . (similar messages) . cannot find directory inode 4 abort? [yn] I would say `n', and then restore would dump core. Many files, with no obvious connection to the deleted directory, would be missing from the restored file system. These files were also missing from the listing you get with restore t or restore i, so you would be able to tell you had a bad dump if you checked the listing. I thought that actually suspending dump to delete the directory might be too severe a test, since from dump's point of view the directory is there one split second, and completely gone the next. So I also tried to run dump on one terminal as root while I deleted directories on another as an ordinary user. The timing has to be just so (I tried to hit the return key on my ``rm -r'' command just after the "DUMP: mapping (Pass II) [directories]" message), and the deleted directory has to be from the right spot on the disk, but one time out of three, I did manage to spoil a dump. So now we're planning to unmount the filesystem on its server before dumping it. It looks like if you unmount the filesystem on the server while it is still mounted on clients, users on the clients just get messages about stale NFS handles when they try to look at the missing filesystem (if anyone knows of any more drastic results of leaving a filesystem mounted on clients when it is not mounted on the server, please let me know). John Jorgensen jorgnsn@qucis.queensu.ca jorgnsn@qucis.bitnet
knutson%sw.MCC.COM@mcc.com (Jim Knutson) (11/29/88)
Don't judge the amount of time it takes to do a level 0 in single-user mode by the time it takes in multi-user mode. Multi-user mode dumps can often take 4 to 8 times as long to finish depending on how busy the machine is. Servers are often busy resulting in disk contention between the dump program and file service daemons. Also, depending on your situation, you might want to consider doing a level 0 once a month in single-user mode and the weekly dumps as level 1 in multi-user mode. This should give you enough coverage to recover from catastrophic disk failure with a clean level 0 as well as file retrieval from the multi-user dumps. Jim Knutson knutson@mcc.com cs.utexas.edu!milano!knutson