[comp.sys.sun] Optimization for caching controllers

lmb@vicom.com (Larry Blair) (05/20/89)

In a response to "Are fileservers a waste of money", I talked about using
a caching disk controller, and said:

> Be sure to do a tunefs -a 16 to get the full caching advantage.

I received a mail message from Glenn P. Davis <ames!unidata.UCAR.EDU!davis>

> Please explain this.

After sending him this reply, I thought that it might be of interest to
other sun-spots readers.

Using a non-caching disk controller and BSD's traditional one sector at a
time transfers, it is impossible to read successive physical blocks from a
disk without having to wait for the disk to rotate completely around for
each sector.  BSD automatically tries to compensate this by utilizing the
sectors in a hopscotch arrangement that separates the sectors used to
write a file by a distance that is hopefully wide enough so that when the
file is read back, each successive read will be started before the next
sector needed has passed the head.

With caching controllers, reading a particular sector will cause that
sector, along with the following sectors on the track to all be read into
the cache in one pass.  [Actually, it a little bit more complex than that,
but the concept is sufficient for this discussion.]  If the sectors used
to create the file were written using successive physical sectors, the
likelyhood of cache hits on subsequent reads are pretty high.  Tunefs(8)
allows your to adjust how the sectors on the disk are utilized.

Now for the voodoo.  Many people using the Interphase or Ciprico simply do
a "tunefs -d 0", which eliminates all compensation for rotational delay.
After thinking about that, I decided to do a little testing.  SunOS reads
and writes are based on an 8K logical block size, when comes to 16
sectors.  Of course, not all accesses are going to be for the full 8K, but
none are going to be larger.  I suspected that inserting a rotational
delay between each group of 16 sectors might be an improvement.

I spent most of a day with an Interphase 4200 and a CDC FSD-340 playing.
I found that I got better results for both reading and writing with "-a
16" than I did for "-d 0".  I don't have the exact result around anymore,
but I do remember that "-d 0" actually resulted in a slower write time
than the default delay.  Using "-a 16" resulted in a reduction from 25
seconds (for the default settings) to 9 seconds for my read test.

These tests were hardly definitive.  I suspect that along with "-a 16",
the rotational delay gap should be adjusted.  I didn't have enough time to
test it.


Larry Blair   ames!vsi1!lmb   lmb@vicom.com

lmb@vicom.com (Larry Blair) (06/08/89)

In Sun-spots v7n303, I wrote:

> Using a non-caching disk controller and BSD's traditional one sector at a
> time transfers, it is impossible to read successive physical blocks from a
> disk without having to wait for the disk to rotate completely around for
> each sector.  BSD automatically tries to compensate this by utilizing the
> sectors in a hopscotch arrangement that separates the sectors used to
> write a file by a distance that is hopefully wide enough so that when the
> file is read back, each successive read will be started before the next
> sector needed has passed the head.
> 
> ............
>
> Now for the voodoo.  Many people using the Interphase or Ciprico simply do
> a "tunefs -d 0", which eliminates all compensation for rotational delay.
> After thinking about that, I decided to do a little testing.  SunOS reads
> and writes are based on an 8K logical block size, when comes to 16
> sectors.  Of course, not all accesses are going to be for the full 8K, but
> none are going to be larger.  I suspected that inserting a rotational
> delay between each group of 16 sectors might be an improvement.

Art Hays <lsr-vax!art@uunet.UU.NET> wrote the following to me:

> I thought with a logical block size of 8192 (16 512 byte sectors)
> only one disk transaction would be required to read all 16 sectors.
> In other words, the controller does a 8192 byte DMA to read the logical
> block.  If this is correct, aren't you using 'sector' above sometimes when
> you really mean 'logical block'?
> 
> The man page for 'tunefs' says 'maxcontig' is the number of
> contiguous blocks laid out before forcing a rotational delay.  IF by
> 'blocks' they mean a LOGICAL block, then '-a 16' would put a rotational
> delay after a string of 16 LOGICAL blocks.  For 8k logical block size,
> this would be after 16*8k or 128KBytes.  Since this is the size of the
> FIFO in the 753/7053, this would seem appropriate.

The term "block" is used inconsistently throughout the Sun documentation.
If you look at icheck(8), you'll see (from running it) that block means
"sector".  Art is probably correct, however.  The Interphase 4200 also has a
128K cache, so the value I got by experimentation may be right for reasons
that I misattributed.  In fact, looking through the source for Interphase's
driver seems to indicate that the basic DMA transfer is 8K.

My experimental situation was reading and writing files of 2MB.  I chose
that size to minimize filesystem overhead, seeking to the inode, etc.,
while staying in the same cylinder group.  If I ever get a chance, I'll
repeat the experiment with smaller file sizes.  If the "blocks" in tunefs
are 8K in size, the advantages of caching should diminish when the files
are below 128K and disappear when they go below 8K.

I'm still not entirely sure why -a 16 worked faster than -d 0.  There
really shouldn't be any difference.  Since 128K is nearly 8 tracks, I
would expect the head to head switching time (which Interphase tells me
is significant) to overshadow any rotational delays inserted between 128K
reads.

Btw, the Ciprico has a 512K cache, so -a 256 would be the value to use
there.

Anyone have any comments?  Anyone have any source?

 
Larry Blair   ames!vsi1!lmb   lmb@vicom.com