field@elvis.cs.pitt.edu (Gus) (10/14/90)
Last night I tried dumping a 300 MB file system to tape (via rdump) from a Decstation 3100 (Ultrix 3.1) to a 1/4" SCSI tape drive hanging off a Sun3 (4.0.3). rdump reported this would take 24 hours! Is anyone else using a similiar setup and getting resonable backup times? How can I get the 3100 to stream to the remote tape? Thanks Brian ----- field@cs.pitt.edu
grr@cbmvax.commodore.com (George Robbins) (10/14/90)
In article <8844@pitt.UUCP> field@elvis.cs.pitt.edu (Gus) writes: > > Last night I tried dumping a 300 MB file system to tape (via rdump) from > a Decstation 3100 (Ultrix 3.1) to a 1/4" SCSI tape drive hanging off > a Sun3 (4.0.3). rdump reported this would take 24 hours! Is anyone > else using a similiar setup and getting resonable backup times? How > can I get the 3100 to stream to the remote tape? When I tried this, the dump wasn't as slow as the prediction... -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing: domain: grr@cbmvax.commodore.com Commodore, Engineering Department phone: 215-431-9349 (only by moonlite)
mjr@hussar.dco.dec.com (Marcus J. Ranum) (10/15/90)
In article <8844@pitt.UUCP> field@elvis.cs.pitt.edu (Gus) writes: >Last night I tried dumping a 300 MB file system to tape (via rdump) from >a Decstation 3100 (Ultrix 3.1) to a 1/4" SCSI tape drive hanging off >a Sun3 (4.0.3). rdump reported this would take 24 hours! Sometimes rdump exaggerates a little with its time estimates, but it *IS* pretty slow. I've heard (haven't measured it) that using dd to block the transfer can be much faster, EG: dump 0f - /filesystem | rsh tapehost dd of=/dev/tapedrive I'm not sure why this is the case, to tell the truth. The rmt protocol sends a return value after each remote read/write, which should make it slower than the rsh/dd combination (which doesn't check errors or returns on the write) but I can't imagine it would make it that much slower. Has anyone ever measured this ? This is a subject of some interest to me. mjr. -- coffeecoffeecoffeecoffeecoffeecoffeecoffeecoffeecoffeecoffeecoffeecoffeecoffee
D. Allen [CGL]) (10/15/90)
Here's stuff on dump/rdump I sent to comp.unix.ultrix last summer. We run Ultrix 3.1 and 3.1C. From idallen Thu Jun 7 21:42:39 1990 To: comp.unix.ultrix Subject: Why isn't dump maximally efficient with TK70 tapes? DECsystem 5400, Ultrix 3.1C, RA90 disk, one user (me). Watch the elapsed real times here. Here's a plain root dump to tape (TK70): # time dump 0 / DUMP: Date of this level 0 dump: Thu Jun 7 21:17:43 1990 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rra0a (/) to /dev/rmt0h DUMP: Mapping (Pass I) [regular files] DUMP: Mapping (Pass II) [directories] DUMP: Estimates based on 1200 feet of tape at a density of 10240 BPI... DUMP: This dump will occupy 1103 (10240 byte) blocks on 0.13 tape(s). DUMP: Dumping (Pass III) [directories] DUMP: Dumping (Pass IV) [regular files] DUMP: 57.43% done, finished in 0:03 DUMP: 1103 tape blocks were dumped on 1 tape(s) DUMP: Tape rewinding DUMP: Dump is done 0% real=9:29 usr=0.3 sys=1.9 rd=0 wr=4 mem=56 pg=3 rec=17 sw=0 sig=0 cs=2776 Here's the identical root dump piped to dd to tape: recorder# mt rew recorder# time sh -c "dump 0f - / | dd bs=32k rbuf=2 wbuf=2 of=/dev/rmt0h" DUMP: Date of this level 0 dump: Thu Jun 7 21:28:18 1990 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/rra0a (/) to standard output DUMP: Mapping (Pass I) [regular files] DUMP: Mapping (Pass II) [directories] DUMP: Estimated 11295744 bytes output to Standard Output DUMP: Dumping (Pass III) [directories] DUMP: Dumping (Pass IV) [regular files] DUMP: 11295744 bytes were dumped to Standard Output DUMP: Dump is done 0+2780 records in 0+2780 records out 4% real=3:34 usr=0.7 sys=8.7 rd=1 wr=8 mem=37 pg=2 rec=17 sw=0 sig=0 cs=10111 That's almost three times faster! Why can't dump be as good as dd? Dumps are of major importance; I would have thought that dump would be the most clever user of the tape drive. I can't believe this. Am I missing something? I must be missing something. From idallen Fri Jun 8 02:46:42 1990 Subject: Fun with dump Ultrix dump of root to nowhere: bandicoot# time dump 0f - / >/dev/null [dump stuff deleted] 16% real=0:22 usr=0.9 sys=2.7 rd=8 wr=4 mem=332 pg=0 rec=0 sw=0 sig=0 cs=959 Ultrix rdump of root to nowhere: bandicoot# time /bin/rdump -0f bandicoot:/dev/null / [dump stuff deleted] 39% real=0:55 usr=2.8 sys=19.1 rd=2 wr=6 mem=282 pg=3 rec=60 sw=0 sig=0 cs=4533 Ultrix rdump of root to a real tape: bandicoot# time rdump -0f recorder:/dev/nrmt0h / [dump stuff deleted] [I hit break after 6 minutes when dump estimated the dump would take another 20 minutes] Ultrix dump of root to rsh/dd to a tape: bandicoot# time dump 0f - / | rsh rec dd bs=32k rbuf=2 wbuf=2 of=/dev/rmt0h [dump stuff deleted] 7% real=1:31 usr=1.0 sys=6.0 rd=2 wr=4 mem=351 pg=0 rec=3 sw=0 sig=0 cs=4900 15% real=2:48 usr=2.5 sys=24.1 rd=15 wr=7 mem=206 pg=0 rec=3 sw=0 sig=0 cs=10300 What I learned: Don't use rdump. It's an order of magnitude slower than a pipe to dd. In fact, even dump is slower than dump to stdout piped into dd with wbuf=2, because of bugs in the Ultrix nbuf code. At least Ultrix dd handles multiple tapes and multi-buffer writes; isn't that convenient? From idallen Fri Jun 8 04:19:02 1990 Subject: More fun with dump on Ultrix You'd think that the dump command would have the smarts in it to write tapes efficiently. Wrong. I wrote a simple program that reads stdin, builds a 32K buffer, and writes it out using Ultrix double-buffer I/O. I used it on 198525952 bytes of /usr file system on our DS5400, sent to a TK70 295Mb tape cartridge: # time sh -c "dump 0f - /usr | ./a.out >/dev/rmt0h" [dump info deleted] 5% real=43:01 usr=11.3 sys=131.8 rd=4 wr=8 mem=63 pg=2 rec=18 sw=0 sig=0 cs=138864 43 minutes elapsed time. Compare that with what the default gets you: # dump 0 /usr [dump info deleted] DUMP: Estimates based on 1200 feet of tape at a density of 10240 BPI... DUMP: This dump will occupy 19400 (10240 byte) blocks on 2.29 tape(s). Woops. This dump won't even fit on the tape using the defaults. Even if I kludged the tape size to make it seem to fit, it would still take 3 *hours* to dump. Ultrix dump also uses double-buffer I/O, but it specifies 8 buffers instead of just 2. The software release notes for Ultrix 3.1C suggest 2 is better than more than 2, and this sure bears that out. From idallen Fri Jun 8 17:08:17 1990 To: comp.unix.ultrix Subject: More undocumented performance issues with dump Dump to stdout (a tape): # time dump 0bf 32k - / >/dev/nrmt0h DUMP: 11318272 bytes were dumped to Standard Output DUMP: Dump is done 0% real=15:34 usr=0.3 sys=1.6 rd=0 wr=4 mem=92 pg=2 rec=18 sw=0 sig=0 cs=2058 Dump to the same tape directly: # time dump 0bf 32k /dev/nrmt0h / DUMP: 345 tape blocks were dumped on 1 tape(s) DUMP: Dump is done 0% real=7:16 usr=0.4 sys=1.6 rd=0 wr=4 mem=94 pg=2 rec=18 sw=0 sig=0 cs=2019 Ultrix dump assumes that any output to stdout is to a pipe; it doesn't do the same stat() [fstat()] tests to determine device type that it does when you give the file name on the command line. From idallen Fri Jun 8 17:16:46 1990 To: jpe@egr.duke.edu Subject: Re: Why isn't dump maximally efficient with TK70 tapes? > Problem #2 -- according to my man pages for "dd" the rbuf and wbuf options > cannot be used at the same time. Besides, the default wbuf value is 8 > for devices that support it. Indeed, you, the source, and the man page are correct. The example in section 1.1.13 of the Ultrix 3.1C release notes is wrong, and I copied it. Silly me -- I thought the release notes knew something the man page did not. The first option wins and over-rides following [rw]buf values. What is not wrong is the statement in 1.1.13 "to get the most performance gain, use a value of 2 with rbuf and wbuf options". The default 8 buffers cause *worse* performance that specifying 2. > Problem #3 -- The block size you specified to "dd" was wrong. Dump writes > in 10k blocks, not 32k. Also you need to specify the obs (instead of bs) > and specify a cbs equal to the obs. This will buffer the input to the > output block size, then write it to tape. Restore will read a tape > created this way, I doubt if it can read yours. No, I wanted to write 32k blocks; it's faster and more efficient. I write dump tapes far more than I read them; I wanted to speed up the writing. Restore reads such tapes just fine if unblocked first: # dd if=/dev/rmt0h bs=32k rbuf=2 | restore -if - You're right about the failure to buffer up to the output buffer size, but I don't want to pay the price of using dd to make the conversion -- it's way too slow. See my latest note in comp.unix.ultrix about a little program that buffers up to 32k and writes. > Problem #4 -- You should note instead the system times and percentage of > CPU used. On my VAXserver 3600 these times jumped dramatically in order > to give me a few seconds real-time savings. Also when I used a no-rewind > device "dump" was actually faster than the "dd pipe." On a CPU-loaded > system you might not have such a big win.. I observed a factor of three in real-time performance; more than a few seconds and of importance to us. The tape rewind added 12 seconds to any times I posted. Looking at the source to dd, I see that if one doesn't use "bs=", it copies the data painfully from the input buffer to the output buffer one byte at a time. No wonder that eats cpu. I wrote a simple program to buffer input and write it out in 32k chunks; this works much better than dd, but it won't handle multiple volumes. See my comp.unix.ultrix posting. > Problem #5 -- What happens if one of your partitions becomes larger than > a TK70? Ultrix dd handles multi-volumes. I think the problem is just that dump uses too many buffers and in Ultrix 3.1C that makes things worse rather than better. Or perhaps dump's buffers aren't aligned on page boundaries, and dd's are. (See Ultrix Version 3.1C Release Notes section 1.1.12.) -- -IAN! (Ian! D. Allen) idallen@watcgl.uwaterloo.ca idallen@watcgl.waterloo.edu [129.97.128.64] Computer Graphics Lab/University of Waterloo/Ontario/Canada
grr@cbmvax.commodore.com (George Robbins) (10/15/90)
In article <1990Oct15.034337.21119@watcgl.waterloo.edu> idallen@watcgl.waterloo.edu (Ian! D. Allen [CGL]) writes: > Here's stuff on dump/rdump I sent to comp.unix.ultrix last summer. > We run Ultrix 3.1 and 3.1C. ... ... ... ... > > I think the problem is just that dump uses too many buffers and in Ultrix > 3.1C that makes things worse rather than better. Or perhaps dump's > buffers aren't aligned on page boundaries, and dd's are. (See Ultrix > Version 3.1C Release Notes section 1.1.12.) Somewhere in the Ultrix 4.0 Release notes, it admits that multi-buffer I/O sucks if the buffers aren't "properly aligned". This might explain why dump gets slow, but leaves it as an open question why they didn't bother to fix the underlying problems in dump or multi-buffer I/O... Perhaps using Ultrix 1.2 dump is the ticket? Before they kludged in the multi-buffered I/O and the generic syscalls? Where did I put that release tape... 8-) BTW, dump really seems to work just fine on my configuration, 5810/HSC50/TA78, but I don't have much to compare it with. BTW, I'd still like to get the patches to BSD 4.3 dump to sleaze over the differences between the 4.3 include files and what resolves out of the ultrix [gvi]node equates. Somebody last year claimed that this was "easy", but perhaps this was a theoretical assertion? -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing: domain: grr@cbmvax.commodore.com Commodore, Engineering Department phone: 215-431-9349 (only by moonlite)