glenn@uunet.uu.net (Glenn Herteg) (12/14/90)
In v9n397, wsrcc!wolfgang@uunet.uu.net (Wolfgang S. Rupprecht) writes: >SCSI itself has a similar limit. Thats why one can't get more than 126 >blocks of 512 bytes in one tape read or write. Ideas like this have tended to propagate into the lore about parameters you should specify to user-level tape commands. For example, setenv TAPE /dev/nrst8 tar cvbfle 126 $TAPE tree has often been considered the way to "efficiently" create a QIC-24 tape archive. However, regardless of whether such a limitation exists at the hardware level, current SunOS releases (I use 4.0.1 on a 3/50) do a good job of hiding this from the user. For a long time I, too, didn't understand this, and I often waited hours as my 1/4" cartridge drive sawed back and forth. Recently, though, I have run experiments which prove that much larger user block sizes work just fine, and FAR FASTER. For example, dd if=diskfile of=$TAPE bs=1000b can be used to transfer the given diskfile (if its size is a multiple of 512 bytes). This block size is a big improvement over "bs=126b". Reading the tape back afterwards with dd if=$TAPE bs=1000b | cmp - diskfile proves that the data was written correctly. (I don't know how much of a performance difference it makes, but note that I often access files from a remote-mounted filesystem [Wren, 3/60] in such transfers.) Now my only questions are, now that we know the hardware value is not the limit, what is the actual limit, and what is the optimal block size to specify on tar, dd, and similar commands? Certainly the optimal size must be a tradeoff between the speed of the *disk* (and/or network connection) you're reading from / writing to, and the time penalty for stopping and starting the tape drive. You want to advantageously overlap disk and tape i/o, just as network analysts have found that optimal network throughput is achieved not by huge blocks, but by balancing the time spent in generating the data with the time spent in communicating it. The best performance comes when both the CPU and the network are simultaneously active, not when one has to wait for the other to finish handling a large block. In the case of a QIC tape, however, the cost of starting and stopping the streaming action to a large extent seems to outweigh the cost of non-overlapping computation and communication. So now that the truth is revealed, has anyone done more extensive testing, and could they provide some guidance to all of us so we can collectively save years of wasted time?
dan@breeze.bellcore.com (Daniel Strick) (01/01/91)
The SCSI transfer count limit for a single sequential access read/write command is 2^24. If the tape is QIC, the units are 512 byte records. You won't hit this limit. The magic number 126 was probably chosen because the traditional mag tape minphys() limit is 63 kb. (Raw device drivers gratuitously split up large i/o requests into chunks of the minphys() size. The usual motivation is a limited i/o dma memory map. See the documentation for physio() in the manual on writing device drivers.) The 63 kb limit for mag tape is actually arbitrary (and arguably stupid because then you can't read 9 track tapes with 64 kb records and such tapes do exist). SunOS installation manuals have more recently recommended the use of 100 kb buffer sizes with SCSI cartridge tape, suggesting that the minphys limit was changed in the st driver (can't tell without looking at the source). Possibly someone did a few performance tests and discovered that the particular system on his/her desk ran those particular performance tests faster at that buffer size. It is also possible that someone arbitrarily decreed that bigger was better on average. There is some justification for this attitude (since otherwise you have to repeat performance tests for each possible system configuration), but bigger doesn't always win. For example, modern SCSI tape and disk systems have lots of internal data buffers and can overlap i/o operations. They may stream quite well when you use small buffer sizes. A large buffer size may prevent continuous streaming. It depends on the specific system and pattern of i/o activity. There is no universally optimum buffer size. Dan Strick, aka dan@bellcore.com or bellcore!dan, (201)829-4624
henry@zoo.toronto.edu (Henry Spencer) (01/01/91)
In article <986@brchh104.bnr.ca> dan@breeze.bellcore.com (Daniel Strick) writes: >The 63 kb limit for mag tape is actually arbitrary ... Actually, no: it came from tape controllers with 16-bit count registers, which were still relatively common not long ago. (The Xylogics 472 that a lot of Sun 3s shipped with has a 16-bit count, for example.) Agreed that it is silly to impose such limits on hardware that doesn't need them, like properly-implemented SCSI controllers. "The average pointer, statistically, |Henry Spencer at U of Toronto Zoology points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu utzoo!henry
fischer@iesd.auc.dk (Lars P. Fischer) (01/09/91)
>>>>> On 31 Dec 90 17:28:40 GMT, dan@breeze.bellcore.com (Daniel Strick) said:
Daniel> There is no universally optimum buffer size.
True. It's strange. People keep telling that I should use the default,
that all sort of horrible things will happen otherwise, that the default
is faster anyway, etc.
On all the various platforms I've tried, I have found something faster,
and I've yet to see problems. I often use 2000 for block size, and it
tends to be *much* faster that the default (a factor of three or so). It
worked in '85, in worked three days ago, and it probably still does.