jerry@olivey.olivetti.com (Jerry Aguirre) (01/26/91)
Those familiar with using dump and restore will have noticed the difference in speed between them. The dump procedure, expecially with the current multi-buffered version, usually sings along at close to full tape speed. Restore, on the other hand, is a real dog taking up to 10 times as long for the same amount of data. Has anyone done any evaluations of why there is such an extreem difference in speed? Granted that creating files involves more overhead than dumping them restore still seems very slow. As restore operates on the mounted file system it has the advantage of accessing a buffered file system with write behind. My particular theory is that the disk buffering algorithms are precisely wrong for restore. By this I mean they keep in the buffers the data that will never be needed again and flush the data that will. I plan to do some experimentation and would appreciate hearing any ideas you might offer. Jerry Aguirre
liam@cs.qmw.ac.uk (William Roberts;) (01/28/91)
In <50235@olivea.atc.olivetti.com> jerry@olivey.olivetti.com (Jerry Aguirre) writes: >Has anyone done any evaluations of why there is such an extreem >difference in speed? Granted that creating files involves more overhead >than dumping them restore still seems very slow. As restore operates on >the mounted file system it has the advantage of accessing a buffered >file system with write behind. >My particular theory is that the disk buffering algorithms are precisely >wrong for restore. By this I mean they keep in the buffers the data >that will never be needed again and flush the data that will. I plan to >do some experimentation and would appreciate hearing any ideas you might >offer. Restore suffers from the fact that files are stored in inode-number order: this is not the ideal order for createing files as it thrashes the namei-cache because the files are recreated randomly all over the place. We reorganised our machine once and used dump/restore to move our /usr/spool and /usr/mail partitions around: /usr/spool contains lots of tiny files called things like /usr/spool/news/comp/unix/internals/5342 and this took an incredibly long time to restore. /usr/mail contains several hundred files but no subdirectories and restored in about the same sort of time as it took to dump. Restore suffers the same accidents of history as a lot of other system utilities. It dates back to sub 1-megabyte memory machines (maybe 64k separate I/D space PDPs) and so it uses pathetically small buffers. If you want to speed up restore, steal maybe 2 Megabytes of memory as file buffers, fill that with file images to be restored and then write them to disk in something resembling a depth-first traversal of the directory tree. This costs a little bit of ingenuity but doesn't involve any kernel changes. -- William Roberts ARPA: liam@cs.qmw.ac.uk Queen Mary & Westfield College UUCP: liam@qmw-cs.UUCP Mile End Road AppleLink: UK0087 LONDON, E1 4NS, UK Tel: 071-975 5250 (Fax: 081-980 6533)
slevy@poincare.geom.umn.edu (Stuart Levy) (01/29/91)
In article <2880@redstar.cs.qmw.ac.uk> liam@cs.qmw.ac.uk (William Roberts;) writes: >In <50235@olivea.atc.olivetti.com> jerry@olivey.olivetti.com (Jerry Aguirre) >writes: > >>Has anyone done any evaluations of why there is such an extreem >>difference in speed? Granted that creating files involves more overhead >>than dumping them restore still seems very slow. As restore operates on >>the mounted file system it has the advantage of accessing a buffered >>file system with write behind. Aha, that's the problem. On BSD-derived systems (e.g. Suns) at least, there are lots of synchronous operations done as files are created, space allocated, and directories modified. The point is to make the filesystem more robust -- a system crash in mid-update doesn't leave corrupted directories, or blocks in the free list that also point into a file, or etc. Write-behind still applies to file *data*, but the restore bottleneck is in creating all those files. A few years ago I hacked the BSD filesystem code so, when a mode was set, most of those synchronous bwrite() calls would get changed to delayed bdwrite()s. It helped -- restore performance rose by about a factor of 2, still much slower than dump but more nearly tolerable. Of course this was unsafe, and the mode was only enabled when we were restoring a wrecked filesystem. Stuart Levy, Geometry Group, University of Minnesota slevy@geom.umn.edu
rbj@uunet.UU.NET (Root Boy Jim) (02/01/91)
In article <173@skyking.UUCP> jc@skyking.UUCP (J.C. Webber III) writes:
?What I have been doing is using "find . print|cpio -pudmv /new.slice
?/usr/spool" to move the files to a different partition while I clean
?up the /usr/spool slice. I do a rm -r * on /usr/spool, umount it,
?fsck it, remount it and the cpio all the files back from the backup
?partition.
Why don't you just newfs (or mkfs) rather than removing everything
and fscking? BTW, you might want to try making the filesystem with
more inodes than usual. This might not solve your problem, but
it might make it less frequent. Get a better OS if you can. Good Luck.
--
Root Boy Jim Cottrell <rbj@uunet.uu.net>
Close the gap of the dark year in between
andys@ulysses.att.com (Andy Sherman) (02/04/91)
In article <1013@eplunix.UUCP> das@eplunix.UUCP (David Steffens) writes: >All right, my $0.02 on this issue. > >Who cares how slow restore is? How often do you do have to do >full restore on a filesystem or a whole disk? Once or twice a year? >If it's more often than that, then you have a REAL problem >and maybe you ought to spend your time and energy fixing THAT! The frequency of file system restores, even in the best run shops, is directly proportional to the number of disks spinning. We have something like 100 disks in our computer center (*NOT* counting the things attached to workstations and PCs). MTBFs are 30000 hours on about a third of them and 100000 hours on the rest. Go ahead and do the math. We are statistically doomed to an awful lot of disk failures just based on the volume. Now not every failure is a catastrophic failure requiring a full restore of the file system, but I suspect that more than one or two a year will be. With users clamoring for us to get their data back on line, restore performance is a concern. Reliability is a more urgent concern, of course, but you try fighting these folks off..... -- Andy Sherman/AT&T Bell Laboratories/Murray Hill, NJ AUDIBLE: (201) 582-5928 READABLE: andys@ulysses.att.com or att!ulysses!andys What? Me speak for AT&T? You must be joking!
torek@elf.ee.lbl.gov (Chris Torek) (02/18/91)
In article <2880@redstar.cs.qmw.ac.uk> liam@cs.qmw.ac.uk (William Roberts) writes: >Restore suffers from the fact that files are stored in inode-number order: >this is not the ideal order for createing files as it thrashes the namei-cache >because the files are recreated randomly all over the place. Well, no and yes. While the files are indeed in inode order, and the restore program (as opposed to the old `restor' program) does recreate them in this order, the Fast File System tends to set things up so that all the files in any one directory are in the same cylinder group as that directory. Depending on cylinder group sizes this may or may not overload the name cache, since only the directory parts of the names are cached (each trailing name is unique within its directory, but the directory must be searched anyway to verify this first). More important are two other facts: - Each directory must be scanned entirely (to make sure the name is unique); - Directory operations are synchronous. The latter is usually the performance-killer since the directory blocks tend to remain in the buffer cache. Directory writes are done synchronously to make crash recovery possible. Ordered (but otherwise delayed) writes should give the same effect with a much smaller performance penalty; this is being investigated. >/usr/spool/news/comp/unix/internals/5342 and this took an incredibly long time >to restore. /usr/mail contains several hundred files but no subdirectories and >restored in about the same sort of time as it took to dump. The presence or absence of subdirectories is largely irrelevant: the problem is the large number of files. One big file restores much faster than several dozen small files, even though both take the same amount of space, because one big file equals one synchronous directory write (preceded by one synchronous inode write) followed by many asynchronous data writes. If you do many full file system restores, it would probably be worth your effort to make a kernel that does delayed writes for inode and directory operations, and run it (or enable delayed writes on each file system in question) each time you do such a restore. If the system crashes, you can just start over. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
bzs@world.std.com (Barry Shein) (02/19/91)
We've suffered thru this slow restore problem here, so it's on my mind. Anyone have any thoughts on the idea of writing a restore which restores a standard dump thru the raw device? This would be for the first, zero level dump image (I guess, perhaps it would be easy enough to apply incrementals this way also tho they're usually less of a problem.) It would seem that would bypass the synchronous directory problem w/o too much disruption to, eg, the kernel. And one could still use good ol' restore on the same tapes if there were any doubts or problems. I like the conservative nature of this approach, your tapes and backup procedures remain unchanged, only restoral is affected. Mostly a matter of simulating the file system at user-level and deciding which block it would throw things into as they come off of tape. Thoughts? Think it would be a lot faster? -- -Barry Shein Software Tool & Die | bzs@world.std.com | uunet!world!bzs Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
torek@elf.ee.lbl.gov (Chris Torek) (02/19/91)
In article <BZS.91Feb18182857@world.std.com> bzs@world.std.com (Barry Shein) writes: >Anyone have any thoughts on the idea of writing a restore which >restores a standard dump thru the raw device? Funny how these things come full circle: the original `restor' scribbled directly on the raw device. Kirk threw it out when he designed the 4.2BSD Fast File System, and not just because it only wrote 4.1BSD-style file systems. It also required that you restore onto the same *size* file system. You could, of course, move the kernel FFS code into user space (or maybe find a copy of Kirk's original implementation, which lived in user space) and make restore talk to that. If you really wanted to reimplement wheels, you could make your user FFS run via RPC+XDR over the network (sockets/pipes). -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
bzs@world.std.com (Barry Shein) (03/04/91)
>Who cares how slow restore is? How often do you do have to do >full restore on a filesystem or a whole disk? Once or twice a year? This is a value judgement that may or may not be true in other people's facilities. C'mon, not everyone does exactly what you do for a living. >If it's more often than that, then you have a REAL problem >and maybe you ought to spend your time and energy fixing THAT! No, not clear. I worked in a place where they used huge scratch files and at the time scratch space was at a premium (this is in the days of washing machine drives.) What they did was take turns running to a point in their computations (which took days) and then yielding to the next group (say, over the weekend.) This involved backing up and restoring all the files for each swap. In theory it was no big deal and the stop/start had already been honed down to a simple procedure in the code (given a signal it would write out its state and exit.) The only painful part was slow-moving tapes, each hour of compute time was precious and the switch-over could take a couple of hours or more. Just adding more disks wasn't available in the short-term since these were fixed grant contracts and, alas, using a coupla grad students who were already paid for made a fair amount of sense (besides, to get more money would invariably involve promising more work, diminishing returns.) Now, there are other ways to do this and they were used (e.g. "dd"), but that begs the question. But I think to just cast off a reasonable question with "no reasonable person would ever want this" often just belies a limit of one's own imagination. It's a bad knee-jerk in systems work (particularly because systems people are usually woefully ignorant and even callous about what their systems are actually used for, and tend to consider any feature they're personally not interested in as "unnecessary", I consider that to be the dark side of the systems religion.) -- -Barry Shein Software Tool & Die | bzs@world.std.com | uunet!world!bzs Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
lm@slovax.Berkeley.EDU (Larry McVoy) (03/04/91)
I believe that a key component to the slowness of restore is the synchronous nature of directory operations in the Unix file system. For example, a create, something that occurs quite often in restore :-), is synchronous. It has to be, those are the semantics of a Unix file system (can you say lock files?). It actually has to be atomic and completed when the system call returns to the user, the fact that it is synchronous is an implementation issue that has been much discussed in comp.arch; I took the point of view that it was a "good thing" and somebody from Japan took the point of view that it was "too slow". Before everyone starts complaining, think back to the days that you had to repair file systems with fsdb (remember that? If not, be quiet). The correct fix, in my opinion, is hardware, not software. Use NVRAM and reclaim the directory pages from that. The semantics remain and you get the performance back. --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
torek@elf.ee.lbl.gov (Chris Torek) (03/04/91)
In article <480@appserv.Eng.Sun.COM> lm@Eng.Sun.COM writes: >I believe that a key component to the slowness of restore is the >synchronous nature of directory operations in the Unix file system. >For example, a create, something that occurs quite often in restore >:-), is synchronous. It has to be, those are the semantics of a Unix >file system (can you say lock files?). (Funny to hear someone from Sun arguing for Unix FS semantics :-) ) Seriously, `synchronous' is more restrictive than necessary. Directory operations must be ordered. They need not be complete by the time the call returns. If they are properly ordered, the inode will exist before the directory entry, and the directory entry will exist before the first file block appears, so that fsync() will guarantee that the file exists and is in permanent storage. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
terryl@sail.LABS.TEK.COM (03/05/91)
In article <BZS.91Mar3135546@world.std.com> bzs@world.std.com (Barry Shein) writes: >But I think to just cast off a reasonable question with "no reasonable >person would ever want this" often just belies a limit of one's own >imagination. It's a bad knee-jerk in systems work (particularly >because systems people are usually woefully ignorant and even callous >about what their systems are actually used for, and tend to consider >any feature they're personally not interested in as "unnecessary", I >consider that to be the dark side of the systems religion.) Barry hit the nail squarely on the head, so to speak, and I'll do a little confessional thing here to demonstrate why(good for the soul, and all that rub- bish...) I do a lot of systems work in my job, the majority of which is kernel- related work, although I do have some user level experience (have to use what I develop, ya know...). Anyways, rewind back to the early `80's. We rolled our own hardware for what would now be similar to the original Sun (not a clone, but functionally equivalent). I was heavily involved in the kernel work, first on V7 Unix(tm), and then on 4.2 BSD later in the decade. Our disk drive of choice was a Micropolis drive with an embedded control- ler. The early drives had some "interesting" failure modes, so to speak. Once a particular failure mode appeared (and there were several), the drive was basically out to lunch. Since I had one of the systems built early in the game, and since I heavily used my system to continue development, it was a royal pain in the keester. So I put a LOT of error recovery in the device driver, and things were hunky dory, and it was released to our general user community. Now for the confessional part: If it wasn't MY system that was experien- cing the difficulties, I doubt that all of that error recovery would have made it into the device driver. Let's face it, that kind of stuff is not all that interesting to do, anyways....... BTW, I finally powered off my old system about a year ago, `cause I needed the space it was occupying on my desk in my office, and you wouldn't believe how emotional it was (NO smileys here). It was just like telling a parent to abandon a child, because the child was no longer useful to the parent. I've worked through my guilt now!!!! (-: __________________________________________________________ Terry Laskodi "There's a permanent crease of in your right and wrong." Tektronix Sly and the Family Stone, "Stand!" __________________________________________________________
lerman@stpstn.UUCP (Ken Lerman) (03/05/91)
In article <480@appserv.Eng.Sun.COM> lm@Eng.Sun.COM writes: > >I believe that a key component to the slowness of restore is the synchronous >nature of directory operations in the Unix file system. For example, a create, ... >--- >Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com I was always taught that there is no point in debating how many angels can dance on the head of the pin when one can just go count them. :-) Has anyone out there with the appropriate source done some measurement of where the time goes in restore? How many reads/writes does it do? How long does each take? Do those figures seem reasonable? ...etc... Just a humble suggestion from someone who neither has the problem nor the solution. :-) Ken
lm@slovax.Eng.Sun.COM (Larry McVoy) (03/06/91)
In article <6511@stpstn.UUCP> lerman@stpstn.UUCP (Ken Lerman) writes: >In article <480@appserv.Eng.Sun.COM> lm@Eng.Sun.COM writes: >> >>I believe that a key component to the slowness of restore is the synchronous >>nature of directory operations in the Unix file system. For example, a create, >Has anyone out there with the appropriate source done some measurement >of where the time goes in restore? How many reads/writes does it do? >How long does each take? Do those figures seem reasonable? ...etc... Well, I didn't want to tip my hand, but someone at Sun actually tried turning off the sync writes (dir ops) while restoring a system. A speed up of 4X is what I remember, but I might be a little off. Your mileage may vary. NVRAM in the disk interface is the easy answer, a option to mount is the sleazy answer. --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
stefano@angmar.sublink.ORG (Stefano Longano) (03/06/91)
In <480@appserv.Eng.Sun.COM> lm@slovax.Berkeley.EDU (Larry McVoy) writes: >I believe that a key component to the slowness of restore is the synchronous >nature of directory operations in the Unix file system. For example, a create, >something that occurs quite often in restore :-), is synchronous. It has to >be, those are the semantics of a Unix file system (can you say lock files?). These operation need not to be syncronous to retain the semantics of the Unix file system. You could read the paper presented at the Summer '90 USENIX Technical Conference by Mendel Rosenblum and John Ousterhout about the Sprite Log-structured File System. This paper is available for anonymous FTP from sprite.berkeley.edu. -- Stefano Longano WW WW EMAIL : stefano@angmar.sublink.ORG Viale Trento 1 ||wwwwwwwwwww|| Happy are those who dream dreams 38068 Rovereto (TN) || --- || and are ready to pay the price Tel : +39 (464) 436042 ||____|_|____|| to make them come true
grr@cbmvax.commodore.com (George Robbins) (03/07/91)
In article <485@appserv.Eng.Sun.COM> lm@slovax.Eng.Sun.COM (Larry McVoy) writes: > In article <6511@stpstn.UUCP> lerman@stpstn.UUCP (Ken Lerman) writes: > >In article <480@appserv.Eng.Sun.COM> lm@Eng.Sun.COM writes: > >> > >>I believe that a key component to the slowness of restore is the synchronous > >>nature of directory operations in the Unix file system. For example, a create, > >Has anyone out there with the appropriate source done some measurement > >of where the time goes in restore? How many reads/writes does it do? > >How long does each take? Do those figures seem reasonable? ...etc... > > Well, I didn't want to tip my hand, but someone at Sun actually tried turning > off the sync writes (dir ops) while restoring a system. A speed up of 4X > is what I remember, but I might be a little off. Your mileage may vary. > > NVRAM in the disk interface is the easy answer, a option to mount is the > sleazy answer. I don't see what's easy about NVRAM, it's expensive and still requires some new software action on restart that unix doesn't do presently. The mount option isn't sleazy, it just represents putting some options at a very key point where the "one size fits all" philosophy is getting painful. I've felt for a long time that options at the mount point for 100% synchronous writes (for floppies) was pretty obvious, providing a similar option for non-synchronous operation, for either restores or "don't care" temporary filesystems seems painless. I shouldn't mention the never sync option to confine writes to a rom based filesystem to the buffer pool... --- A hardware type who gets very bored waiting to restore ~500 MB news paritions. -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing: domain: grr@cbmvax.commodore.com Commodore, Engineering Department phone: 215-431-9349 (only by moonlite)