jerry@olivey.olivetti.com (Jerry Aguirre) (01/26/91)
Those familiar with using dump and restore will have noticed the difference in speed between them. The dump procedure, expecially with the current multi-buffered version, usually sings along at close to full tape speed. Restore, on the other hand, is a real dog taking up to 10 times as long for the same amount of data. Has anyone done any evaluations of why there is such an extreem difference in speed? Granted that creating files involves more overhead than dumping them restore still seems very slow. As restore operates on the mounted file system it has the advantage of accessing a buffered file system with write behind. My particular theory is that the disk buffering algorithms are precisely wrong for restore. By this I mean they keep in the buffers the data that will never be needed again and flush the data that will. I plan to do some experimentation and would appreciate hearing any ideas you might offer. Jerry Aguirre
das@eplunix.UUCP (David Steffens) (01/30/91)
All right, my $0.02 on this issue. Who cares how slow restore is? How often do you do have to do full restore on a filesystem or a whole disk? Once or twice a year? If it's more often than that, then you have a REAL problem and maybe you ought to spend your time and energy fixing THAT! And none of your fancy programming tricks for me, thank you. I'd much rather have a SLOW restore that was guaranteed to WORK than one that was FAST and had unknown bugs because of some magic algorithm that wasn't tested under all possible conditions. My users and I would rather wait longer for a reliable restoration of our files than have incomplete or inaccurate results in a hurry. Reminds me of the 4.2bsd restore which claimed to have a checkpoint option that supposedly allowed the restore to be stopped and restarted. Never could get it to work correctly for us. Wasted an awful lot of time on that one. I also remember the Ultrix1.1 dump which DEC tried to "improve". Unfortunately, one of their "optimizations" had a small, undiscovered side-effect -- the highest-numbered inode on the filesystem was never written on the dump tape. Produced no end of fun during the restore if said inode happened to be a directory. I don't wish to repeat these experiences! Repeat after me, the three most important performance characteristics of dump and restore are: RELIABILITY, RELIABILITY and RELIABILITY. -- David Allan Steffens | I believe in learning from past mistakes... Eaton-Peabody Laboratory | ...but does a good education require so many? Mass. Eye & Ear Infirmary, 243 Charles Street, Boston, MA 02114 {harvard,mit-eddie,think}!eplunix!das (617) 573-3748 (1400-1900h EST)
jfh@rpp386.cactus.org (John F Haugh II) (01/30/91)
In article <1013@eplunix.UUCP> das@eplunix.UUCP (David Steffens) writes: >Who cares how slow restore is? How often do you do have to do >full restore on a filesystem or a whole disk? Once or twice a year? >If it's more often than that, then you have a REAL problem >and maybe you ought to spend your time and energy fixing THAT! There are quite a few reasons in an EDP environment for restoring files. For example, before a large unreversible process, it is common to dump the entire database partition so it can be restored if the process is found to have completed incorrectly. This is very common for such operations as payroll, monthly account closing, quarterly stuff, etc. The answer to questions like "How often do you do X" often come down to "Often enough that we can't stand it any longer." -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "13 of 17 valedictorians in Boston High Schools last spring were immigrants or children of immigrants" -- US News and World Report, May 15, 1990
jc@skyking.UUCP (J.C. Webber III) (01/30/91)
In <2880@redstar.cs.qmw.ac.uk> liam@cs.qmw.ac.uk (William Roberts;) writes: >Restore suffers from the fact that files are stored in inode-number order: >this is not the ideal order for createing files as it thrashes the namei-cache >because the files are recreated randomly all over the place. We reorganised >our machine once and used dump/restore to move our /usr/spool and /usr/mail >partitions around: /usr/spool contains lots of tiny files called things like >/usr/spool/news/comp/unix/internals/5342 and this took an incredibly long time >to restore. /usr/mail contains several hundred files but no subdirectories and >restored in about the same sort of time as it took to dump. I have the imfamous lost inode problem on my system (after installing Bnews) so periodically I need to recover my /usr/spool partition. What I have been doing is using "find . print|cpio -pudmv /new.slice /usr/spool" to move the files to a different partition while I clean up the /usr/spool slice. I do a rm -r * on /usr/spool, umount it, fsck it, remount it and the cpio all the files back from the backup partition. My purpose in doing all this rather than just a simple fsck is an attempt to recover *all* stray inodes sprinkled throughout the /usr/spool slice. My assumption is that the cpio'ing of these files will cause them to have their inodes reassigned and layed out sequencially on the disk. Is this a correct assumtion? It seams to last longer (between "out of inode" messages) than just simply fsck'ing the partition. BTW, does anyone know of a resonable fix for this problem. I don't have kernal sources, but I can do some kernel parameter tweeking and rebuilding using the sysconf utilities on my system. I have a CounterPoint System19k 68020 SystemV.3.0. 've read some postings about how this bug manifests itself, but haven't seen anything about how to fix it except to contact the vendor. Well, that won't work, CounterPoint is out of buisness. thx...jc -- /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \ J.C. Webber III jc@skyking.UUCP / / R&D Lab Manager (...uunet!mips!skyking!jc) \ \ Mips Computer Systems {ames,decwrl,pyramid,prls)!mips!skyking!jc / / (408)524-8260 \ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
das@eplunix.UUCP (David Steffens) (01/31/91)
In article <19012@rpp386.cactus.org>, jfh@rpp386.cactus.org (John F Haugh II) says: > In article <1013@eplunix.UUCP> das@eplunix.UUCP (David Steffens) writes: >> Who cares how slow restore is? How often do you do have to do >> full restore on a filesystem or a whole disk? Once or twice a year? > There are quite a few reasons in an EDP environment for restoring > files. For example, before a large unreversible process, it is > common to dump the entire database partition so it can be restored > if the process is found to have completed incorrectly. OK, so it would seem that the remainder of my posting (which you didn't quote) is even more relevant in your case. Wouldn't you: | ...much rather have a SLOW restore that was guaranteed to WORK | than one that was FAST and had unknown bugs because of some | magic algorithm that wasn't tested under all possible conditions? And therefore, don't you agree that: | the three most important performance characteristics | of dump and restore are: RELIABILITY, RELIABILITY and RELIABILITY? -- David Allan Steffens | I believe in learning from past mistakes... Eaton-Peabody Laboratory | ...but does a good education require so many? Mass. Eye & Ear Infirmary, 243 Charles Street, Boston, MA 02114 {harvard,mit-eddie,think}!eplunix!das (617) 573-3748 (1400-1900h EST)
martin@mwtech.UUCP (Martin Weitzel) (01/31/91)
In article <19012@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: :In article <1013@eplunix.UUCP> das@eplunix.UUCP (David Steffens) writes: :>Who cares how slow restore is? How often do you do have to do :>full restore on a filesystem or a whole disk? Once or twice a year? :>If it's more often than that, then you have a REAL problem :>and maybe you ought to spend your time and energy fixing THAT! : :There are quite a few reasons in an EDP environment for restoring :files. For example, before a large unreversible process, it is :common to dump the entire database partition so it can be restored ^^^ :if the process is found to have completed incorrectly. This is :very common for such operations as payroll, monthly account closing, :quarterly stuff, etc. But you are talking about a dump here that can be restored. CAN be. Not MUST be. Restoring the whole thing only becomes necessary IF the process has completed incorrectly. If this gets the normal case rather than the exeception, you were right, but then I dare to say the software is seriously flawed, if some operation frequently completes incorrectly. One case where it is normal that some operation fails frequently is during program development and testing. But then there should be enough space to have several sets of test data on disk. Otherwise the development system is badly choosen and you were better of to buy a larger disk. Programmers are expensive. Their costs accumulate over time. Buying some additional hardware costs only once! (Remembers me of the time when I had to insert a special floppy if I wanted to copy a file, since for this machine, a IBM 5110 single user BASIC-"PC", you couldn't copy files with the builtin software. It was rather counter productive, since the copy program used the same memory where the BASIC source was and if you forgot to save the source, the work of the last hours may have been gone :-(). :The answer to questions like "How often do you do X" often come :down to "Often enough that we can't stand it any longer." Sure, but in any case the question should be allowed: "Why have you to do it so often?". What would you say if someone complained that formatting and verifying a 380 MB disks takes him half a day and that's simply too much time each the day? Wouldn't you ask him WHY he is formatting the disk evey day? -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
das@eplunix.UUCP (David Steffens) (02/03/91)
In article <19019@rpp386.cactus.org>, jfh@rpp386.cactus.org (John F Haugh II) says: > ... The most likely real-world ordering is probably reliability, > usablity and performance... but you can't ignore all else - > performance and usability really do count for quite a bit, > because you are dealing with perceptions > which affect how the user views the software... If we were talking about any other unix utility (users, perhaps? :-), I would probably agree with you. But we're not. We're talking about restore, and to a lesser extent dump. A similar, albeit weaker, case can be made for tar and cpio since these utilities are also frequently used in situations that demand extremely high reliabilty, e.g. archiving. Unlike most unix utilities, there's only ONE reason to use restore... to recover lost data. If restore fails to perform this function, then NO other performance characteristics matter. Partial or inaccurate recovery of data in a hurry is simply not very useful, IMHO. > ... No, the psychology of computer users is such that any process > which is SLOW will be avoided... Users have nothing to do with it -- dump/restore is a sysadmin task. And as an experienced sysadmin, I can tell you from hard-won personal experience that a procedure that is slow but reliable is almost always to be preferred to one that is faster but screws up occasionally. The reason should be obvious -- in the long run, the time required to detect and correct failures will probably far exceed the time saved. > ... Although RELIABILITY is paramount, any process > which operators are inclined to skip is of no value - they > will pick the first less reliable process which is markedly faster... The I Ching says: Inferior people should not be employed. :-) -- David Allan Steffens | I believe in learning from past mistakes... Eaton-Peabody Laboratory | ...but does a good education require so many? Mass. Eye & Ear Infirmary, 243 Charles Street, Boston, MA 02114 {harvard,mit-eddie,think}!eplunix!das (617) 573-3748 (1400-1900h EST)
das@eplunix.UUCP (David Steffens) (02/05/91)
In article <19023@rpp386.cactus.org>, jfh@rpp386.cactus.org (John F Haugh II) says: > Futhermore, you will find little consolation from your boss who doesn't > understand why 300 EDP employees are sitting on their hands while you > restore that =one= file they use so frequently. Your boss will be even _more_ annoyed with you when he discovers that you _cannot_ restore that file because you (or your OS vendor) mucked around with dump/restore in order to "improve performance", successfully trading reliability for some piddling increase in speed! -- David Allan Steffens | I believe in learning from past mistakes... Eaton-Peabody Laboratory | ...but does a good education require so many? Mass. Eye & Ear Infirmary, 243 Charles Street, Boston, MA 02114 {harvard,mit-eddie,think}!eplunix!das (617) 573-3748 (1400-1900h EST)
jfh@rpp386.cactus.org (John F Haugh II) (02/05/91)
In article <1022@eplunix.UUCP> das@eplunix.UUCP (David Steffens) writes: >Your boss will be even _more_ annoyed with you when he discovers >that you _cannot_ restore that file because you (or your OS vendor) >mucked around with dump/restore in order to "improve performance", >successfully trading reliability for some piddling increase in speed! In my original response I noted that reliability is the number one concern. This means that performance, which is a significant concern because of human factors, should be improved whenever possible to not impact reliability. There are quite a few programming techniques which could be heaved at dump and restore which would greatly increase performance or usability without impacting reliability at all. The increases in performance which I've seen made to dump/restore with zero decrease in reliability range from 2x to 10x. As I stated earlier as well, the best written dump/restore type of utility I've used was free software from Archive Corp that was included with a tape drive I had purchased for a MS-DOS/PC. It included double buffering to drive the tape and disk at their limits and a point- and-shoot interface for navigating the tape. In terms of reliability, usability and performance, this was a 4-star product. By comparision, dump/restore is 3-stars for reliability, and 1-star each for performance and usability, IMNSHO. As would be predicted, the user's of that particular PC were very willing to backup and restore their own files given the ease and speed with which the task could be accomplished. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "I've never written a device driver, but I have written a device driver manual" -- Robert Hartman, IDE Corp.
xtdn@levels.sait.edu.au (02/07/91)
OK, so I've seen lot's of postings suggesting that the reason restore is so slow (compared to dump) is because reliability is more important than efficiency. But that argument is just a nonsense. And it completely fails to explain why dump (which, after all, needs to be just as reliable) is so much faster than restore. Now I don't know why there's such a difference in performance but I do suspect that perhaps it's deliberate. I think it's a reasonable assumption that (sensible) people do backups much more often than they do restores. Given that, I also think it makes good sense to optimise dump, even to the point that restore suffers in performance. One such optimisation could be to write the raw disk to tape (actually you'd only dump those blocks that contain data that you want backed up, but the point is that you'd be reading from the raw disk). This would be quite fast because you wouldn't be opening each file (which takes time), or reading the file sequentially - see how much disk head movement you avoid? Now such a tape would consist of a number of chunks, each chunk detailing the file, the file offset, and the data to write at that offset. The restore process then becomes a matter of reading the next chunk, opening and seeking the file, and then writing the data. All that head movement, opening files, seeking to the right spot, and later, closing files, would certainly slow down the process. I already said that I don't know how dump/restore works, but I would almost be willing to bet that it's something like the scheme I just outlined. Maybe someone who does know could tell us what really happens? David Newall, who no longer works Phone: +61 8 344 2008 for SA Institute of Technology E-mail: xtdn@lux.sait.edu.au "Life is uncertain: Eat dessert first"
greywolf@unisoft.UUCP (The Grey Wolf) (02/08/91)
In article <15866.27b02da2@levels.sait.edu.au> xtdn@levels.sait.edu.au writes: >One such optimisation could be to write the raw disk to tape (actually you'd >only dump those blocks that contain data that you want backed up, but the >point is that you'd be reading from the raw disk). This would be quite fast >because you wouldn't be opening each file (which takes time), or reading the >file sequentially - see how much disk head movement you avoid? Now such a >tape would consist of a number of chunks, each chunk detailing the file, the >file offset, and the data to write at that offset. The restore process then >becomes a matter of reading the next chunk, opening and seeking the file, and >then writing the data. All that head movement, opening files, seeking to the >right spot, and later, closing files, would certainly slow down the process. > >I already said that I don't know how dump/restore works, but I would almost >be willing to bet that it's something like the scheme I just outlined. Maybe >someone who does know could tell us what really happens? You're not terribly far off, with the exception that UNIX doesn't keep a timestamp for individual blocks -- only inodes hold the timestamp, and there's no way to tell whether a particular block in the file has been updated (this would be terribly inefficient anyway -- chances are that if you've blown away a file, only having the changed blocks would be useless). Dump works by reading the disk partition directly -- it performs all the directory/file mapping on its own by reading the on-disk inode list for that partition. It looks in /etc/dumpdates to determine how recent changes have happened and, by looking at the inodes, makes an internal map of those inodes which have been affected within the requested period of time (with a "level 0" dump, everything since the beginning of time ( 4:00 pm, New Year's Eve, 1969 on the American West Coast ... (-:), and then starts mapping the directories in, dumping the directory information out and finally dumping the contents of the files. Wandering through the file- system by oneself and performing only the necessary operations is going to be much faster than sitting and going through the kernel's filesystem overhead. [ Side note: I *hate* operators who cannot think to keep track of the inode number of the file that is being dumped when they do multiple tape dumps! Makes restores a *pain*. ] Restore, on the other hand, is a dog. Why? It *has* to be. When files are getting restored, one cannot simply re-write the raw disk ; the filesystem overhead cannot be avoided on anything less than a full restore. Even there, a reason for avoiding just doing a raw data dump (via dd(1) (yes, I know that's not what dd stands for)) is that full backup/restores serve to reduce the disk fragmentation by putting everything back more or less contiguously. (We used to have to do this periodically back at the lab because class users had a tendency to produce lots and lots of little files. The /users file system would fragment ridiculously quickly over the semester. I think fragmentation reached about 5% (which is very high).) It's also kind of convenient that if a normal user wishes to effect a partial restore, he/she generally can, without having to be placed into a special group or be given super-user privileges. > > >David Newall, who no longer works Phone: +61 8 344 2008 >for SA Institute of Technology E-mail: xtdn@lux.sait.edu.au > "Life is uncertain: Eat dessert first" -- thought: I ain't so damb dumn! | Your brand new kernel just dump core on you war: Invalid argument | And fsck can't find root inode 2 | Don't worry -- be happy... ...!{ucbvax,acad,uunet,amdahl,pyramid}!unisoft!greywolf