[comp.sys.sun] Myths about tape block sizes

glenn@uunet.uu.net (Glenn Herteg) (12/14/90)

In v9n397, wsrcc!wolfgang@uunet.uu.net (Wolfgang S. Rupprecht) writes:
>SCSI itself has a similar limit.  Thats why one can't get more than 126
>blocks of 512 bytes in one tape read or write.

Ideas like this have tended to propagate into the lore about parameters
you should specify to user-level tape commands.  For example,

	setenv TAPE /dev/nrst8
	tar cvbfle 126 $TAPE tree

has often been considered the way to "efficiently" create a QIC-24 tape
archive.  However, regardless of whether such a limitation exists at the
hardware level, current SunOS releases (I use 4.0.1 on a 3/50) do a good
job of hiding this from the user.  For a long time I, too, didn't
understand this, and I often waited hours as my 1/4" cartridge drive sawed
back and forth.  Recently, though, I have run experiments which prove that
much larger user block sizes work just fine, and FAR FASTER.  For example,

	dd if=diskfile of=$TAPE bs=1000b

can be used to transfer the given diskfile (if its size is a multiple of
512 bytes).  This block size is a big improvement over "bs=126b".  Reading
the tape back afterwards with

	dd if=$TAPE bs=1000b | cmp - diskfile

proves that the data was written correctly.  (I don't know how much of a
performance difference it makes, but note that I often access files from a
remote-mounted filesystem [Wren, 3/60] in such transfers.)

Now my only questions are, now that we know the hardware value is not the
limit, what is the actual limit, and what is the optimal block size to
specify on tar, dd, and similar commands?  Certainly the optimal size must
be a tradeoff between the speed of the *disk* (and/or network connection)
you're reading from / writing to, and the time penalty for stopping and
starting the tape drive.  You want to advantageously overlap disk and tape
i/o, just as network analysts have found that optimal network throughput
is achieved not by huge blocks, but by balancing the time spent in
generating the data with the time spent in communicating it.  The best
performance comes when both the CPU and the network are simultaneously
active, not when one has to wait for the other to finish handling a large
block.  In the case of a QIC tape, however, the cost of starting and
stopping the streaming action to a large extent seems to outweigh the cost
of non-overlapping computation and communication.

So now that the truth is revealed, has anyone done more extensive testing,
and could they provide some guidance to all of us so we can collectively
save years of wasted time?

dan@breeze.bellcore.com (Daniel Strick) (01/01/91)

The SCSI transfer count limit for a single sequential access read/write
command is 2^24.  If the tape is QIC, the units are 512 byte records.  You
won't hit this limit.

The magic number 126 was probably chosen because the traditional mag tape
minphys() limit is 63 kb.  (Raw device drivers gratuitously split up large
i/o requests into chunks of the minphys() size.  The usual motivation is a
limited i/o dma memory map.  See the documentation for physio() in the
manual on writing device drivers.)

The 63 kb limit for mag tape is actually arbitrary (and arguably stupid
because then you can't read 9 track tapes with 64 kb records and such
tapes do exist).  SunOS installation manuals have more recently
recommended the use of 100 kb buffer sizes with SCSI cartridge tape,
suggesting that the minphys limit was changed in the st driver (can't tell
without looking at the source).  Possibly someone did a few performance
tests and discovered that the particular system on his/her desk ran those
particular performance tests faster at that buffer size.

It is also possible that someone arbitrarily decreed that bigger was
better on average.  There is some justification for this attitude (since
otherwise you have to repeat performance tests for each possible system
configuration), but bigger doesn't always win.  For example, modern SCSI
tape and disk systems have lots of internal data buffers and can overlap
i/o operations.  They may stream quite well when you use small buffer
sizes.  A large buffer size may prevent continuous streaming.  It depends
on the specific system and pattern of i/o activity.  There is no
universally optimum buffer size.

Dan Strick, aka dan@bellcore.com or bellcore!dan, (201)829-4624

henry@zoo.toronto.edu (Henry Spencer) (01/01/91)

In article <986@brchh104.bnr.ca> dan@breeze.bellcore.com (Daniel Strick) writes:
>The 63 kb limit for mag tape is actually arbitrary ...

Actually, no:  it came from tape controllers with 16-bit count registers,
which were still relatively common not long ago.  (The Xylogics 472 that a
lot of Sun 3s shipped with has a 16-bit count, for example.)

Agreed that it is silly to impose such limits on hardware that doesn't
need them, like properly-implemented SCSI controllers.

"The average pointer, statistically,    |Henry Spencer at U of Toronto Zoology
points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu   utzoo!henry

fischer@iesd.auc.dk (Lars P. Fischer) (01/09/91)

>>>>> On 31 Dec 90 17:28:40 GMT, dan@breeze.bellcore.com (Daniel Strick) said:

Daniel> There is no universally optimum buffer size.

True. It's strange. People keep telling that I should use the default,
that all sort of horrible things will happen otherwise, that the default
is faster anyway, etc.

On all the various platforms I've tried, I have found something faster,
and I've yet to see problems. I often use 2000 for block size, and it
tends to be *much* faster that the default (a factor of three or so). It
worked in '85, in worked three days ago, and it probably still does.