[comp.sys.ibm.pc] What is the best way to determine cache size?

emb978@leah.Albany.Edu (Eric M. Boehm) (02/11/90)

I currently use a disk cache on my 386/25. I had previously set it to 1
MB. However, this is ridiculously large (so as to be counterproductive).
I downsized it to 384 KB. This is probably still much too large. I
received a suggestion that 64 KB is probably a better size (since that
is the size of hardware caches on most machines with hardware caches).
That doesn't seem like a real good reason to me.

So, I am looking for method(s) to determine what the cache size should
be. As I said, I have a 386/25 (with 4 megs of memory). The excess
memory is currently being used for a ramdisk. 

I believe this will be of interest to others and that many people will
have an opinion on this, so please send e-mail and I will summarize to
the net in a week (assuming I get any responses).

Thanks in advance.
-- 
Eric M. Boehm
EMB978@leah.Albany.EDU
EMB978@ALBNYVMS.BITNET

MICHELBI@oregon.uoregon.edu (Michel Biedermann) (02/12/90)

In article <2525@leah.Albany.Edu>, emb978@leah.Albany.Edu (Eric M. Boehm) writes:
> I currently use a disk cache on my 386/25. I had previously set it to 1
> MB. However, this is ridiculously large (so as to be counterproductive).
> I downsized it to 384 KB. This is probably still much too large. I
> received a suggestion that 64 KB is probably a better size (since that
> is the size of hardware caches on most machines with hardware caches).
> That doesn't seem like a real good reason to me.

If I read you message correctly, I think you are confusing two different types
of caches: memory versus disk cache.

Most 386 computers, in order to perform at 0-wait states without using very
fast RAM, require a small cache, usually around 64K, or super fast SRAM memory
(not DRAM).  The main difference lies in the memory access speed which is
approximately 80-90ns for DRAM versus 15-25ns for SRAM.

In the case of a disk cache, standard DRAM is used to decrease the bottleneck
between the memory and the drive, not the CPU and the memory as seen above.  In
this case, you can indeed have a disk cache as big as you can afford memory.
There is a small penalty mind you.  With very big disk caches (>1024K) if the 
information requested is not in the cache, the computer may actually lose some
time looking for it.  I have found a disk cache of 512K to be a happy medium, 
although I have occasionally used one much bigger (1024K) when I switch
between programs a lot (DeskView or OS/2).  Zenith's latest EISA controller
has a 1MB cache built on the drive controller.  This allows an apparent disk 
access time of *3-6ms* even though the drive itself is only rated at 16ms
which is standard for ESDI drives.

I hope this helps without being too technical...

Please excuse me if I mis-interpretaded you question.

Michel Biedermann	michelbi@oregon.uoregon.edu
U. of Oregon
ZENITH Student Rep.

kabra437@pallas.athenanet.com (Ken Abrams) (02/12/90)

In article <2525@leah.Albany.Edu> emb978@leah.Albany.Edu (Eric M. Boehm) writes:
>I currently use a disk cache on my 386/25. I had previously set it to 1
>MB. However, this is ridiculously large (so as to be counterproductive).
>I downsized it to 384 KB. This is probably still much too large. I

Just EXACTLY what evidence do you have (speculation aside) that a large
cache is "counterproductive"?  I am using a 2M cache with SMARTDRIVE under
MS-DOS 4.01 and see absolutely NO indication that it is in any way too
large (quite to the contrary, as a matter of fact).  I think the reason
that hardware caches are much smaller is based almost entirely on COST.

>I believe this will be of interest to others and that many people will
>have an opinion on this, so please send e-mail and I will summarize to
>the net in a week (assuming I get any responses).
>Eric M. Boehm
>EMB978@leah.Albany.EDU
>EMB978@ALBNYVMS.BITNET

I would send mail but my news node is still having trouble handling 
addresses in the @ format and that is all you supplied.  I get SO
tired of re-addressing bounced mail just to reformat the address.
Anyway, I think this subject is of a MUCH greater general interest than
a lot of the machine specific (or program specific) stuff that generates
repeated news posts.    

-- 
========================================================
Ken Abrams                     uunet!pallas!kabra437
Illinois Bell                  kabra437@athenanet.com
Springfield                    (voice) 217-753-7965

scotts@cpqhou.UUCP (Scott Shaffer) (02/12/90)

> In article <2525@leah.Albany.Edu>, emb978@leah.Albany.Edu (Eric M. Boehm) writes:
> > I currently use a disk cache on my 386/25. I had previously set it to 1
> > MB. However, this is ridiculously large (so as to be counterproductive).
> > I downsized it to 384 KB. This is probably still much too large. I
> > received a suggestion that 64 KB is probably a better size (since that
> > is the size of hardware caches on most machines with hardware caches).
> > That doesn't seem like a real good reason to me.
> 

Well, I think the answer lies in what disk-cache you are using and what
algorithm they use.  It is a fact that the larger the cache you have, the
larger the look-up table must be.  Since this table is what the driver
uses to determine what is in the cache and whether to get it from a cache
buffer or read from the disk.  If you have a large disk, and the algorithm
cache's, say, 1-8 sectors at a time, then it is possible that a large
cache will actually have a performance degredation over a smaller one, due
to the fact that searching the table will require a little time.  However,
if the program uses an algorithm that cache's entire tracks (like Compaq's),
then there is *NO* performance degredation, even with caches of 16MB.  Thus,
the answer should be to a) figure out what algorithm the cache uses, and
b) figure out how much memory you are willing to give up for the cache.
I personally run 4MB of cache because I have the memory and use it for
nothing else (I also run 2MB VDISK and 9MB EMM).

The algorithm that cache's whole tracks is under the assumption that most
disk accesses will be sequential (sector to sector), so cacheing the whole
track will result in significant better performance by a program that is
going to read multiple sectors from the same track (this is most common in
database applications, and CAD work).  If, however, you acesses different
tracks, then you will have cache misses.  One person noted that this will
actually take more time (it will), but this will be negligable, because even
if the cache table is 20KB (for large caches only), the time it takes to
CMP AX,[cache] that 20KB will be VERY little on 386/25 machines. 

Obviously, if you can add to the controller cards memory cache, that will
be the best solution (unfortunately, this probably isn't an option).

+==========================================================================+
| Scott Shaffer    |  Compaq Computer Corporation @ Houston TX             |
| Systems Engr     | (These opinions do not necessarily reflect those of my|
|		   |  employer, friends or any living person.)		   |
+==========================================================================+
"Well son, regret is a funny thing; it's better to regret something you
 have done, than to regret something you haven't done."

rick@NRC.COM (Rick Wagner) (02/14/90)

In article <537@cpqhou.UUCP> scotts@cpqhou.UUCP (Scott Shaffer) writes:
>> In article <2525@leah.Albany.Edu>, emb978@leah.Albany.Edu (Eric M. Boehm) writes:

(stuff deleted):

>
>The algorithm that cache's whole tracks is under the assumption that most
>disk accesses will be sequential (sector to sector), so cacheing the whole
>track will result in significant better performance by a program that is
>going to read multiple sectors from the same track (this is most common in
>database applications, and CAD work).  If, however, you acesses different
>tracks, then you will have cache misses.  One person noted that this will
>actually take more time (it will), but this will be negligable, because even
>if the cache table is 20KB (for large caches only), the time it takes to
>CMP AX,[cache] that 20KB will be VERY little on 386/25 machines. 
>

If your cache software uses the track buffering method, then you
should definately consider using a disk optimizer.  This will ensure
that you get the most sequential data, using up fewer of your buffers.
The cache program I wrote uses track buffers, and is happiest with an
optimized disk.

While it is true that in random access applications (i.e. databases)
you may get alot of misses on the data fields, the key file, which
is repeatedly accessed, as well as the DOS FAT table, will be in
cache.  Caching these blocks will "buy" you alot, even if you can't
hold the entire data base itself.

>Obviously, if you can add to the controller cards memory cache, that will
>be the best solution (unfortunately, this probably isn't an option).

This is a rather broad statement; I'm not sure I would completely
agree.  I have not looked the caching hard disk controllers, so I
don't know the quality of their design.  But, given equal amounts of
memory to use for cache, I would guess a well written cache utility
runnning on a fast (16+ Mhz '386) would hold its own against the same
machine using a less efficiant algorithm on the controller.  At least
in a DOS enviroment, where the CPU would not be doing anything usefull
while waiting for the controller to search its cache, and transfer
from the controller.

	--rick
-- 
===============================================================================
Rick Wagner						Network Research Corp.
rick@nrc.com						2380 North Rose Ave.
(805) 485-2700	FAX: (805) 485-8204			Oxnard, CA 93030
Don't hate yourself in the morning; sleep til noon.

cs4g6ag@maccs.dcss.mcmaster.ca (Stephen M. Dunn) (02/27/90)

In article <537@cpqhou.UUCP> scotts@cpqhou.UUCP (Scott Shaffer) writes:
$Well, I think the answer lies in what disk-cache you are using and what
$algorithm they use.  It is a fact that the larger the cache you have, the
$larger the look-up table must be.  Since this table is what the driver
$uses to determine what is in the cache and whether to get it from a cache
$buffer or read from the disk.  If you have a large disk, and the algorithm
$cache's, say, 1-8 sectors at a time, then it is possible that a large
$cache will actually have a performance degredation over a smaller one, due
$to the fact that searching the table will require a little time.  However,
$if the program uses an algorithm that cache's entire tracks (like Compaq's),
$then there is *NO* performance degredation, even with caches of 16MB.  Thus,
$the answer should be to a) figure out what algorithm the cache uses, and
$b) figure out how much memory you are willing to give up for the cache.
$I personally run 4MB of cache because I have the memory and use it for
$nothing else (I also run 2MB VDISK and 9MB EMM).

   Yes, it's a fact that for a large cache, a larger look-up table must
be used.  However, large look-up table does not necessarily mean long
look-up times, and neither does caching individual sectors.  The smart way
to implement a look-up table for a disk cache is through hashing, and a
well-designed hashing algorithm will show about the same look-up time on
average for a large table as for a small one.

$The algorithm that cache's whole tracks is under the assumption that most
$disk accesses will be sequential (sector to sector), so cacheing the whole
$track will result in significant better performance by a program that is
$going to read multiple sectors from the same track (this is most common in
$database applications, and CAD work).  If, however, you acesses different
$tracks, then you will have cache misses.  One person noted that this will
$actually take more time (it will), but this will be negligable, because even
$if the cache table is 20KB (for large caches only), the time it takes to
$CMP AX,[cache] that 20KB will be VERY little on 386/25 machines. 

   Even caches which don't read in the entire track may well do look-ahead
reads ... PC-Cache, for example, will do "batch copies" if you want, and this
means that when you access one sector, it will read in up to the remainder
of the current track (not including sectors earlier than this one, I believe)
based on the assumption of primarily sequential access, yet it is a sector-
oriented cache.  IBMCACHE.SYS will also read multiple sectors if you want it
to, and I would imagine that at least some versions of PC-Kwik will do this
too, since PC-Cache is licenced from them.

$Obviously, if you can add to the controller cards memory cache, that will
$be the best solution (unfortunately, this probably isn't an option).

   Although some software (don't ask me what packages specifically) won't
work correctly with caching controllers or with memory-resident disk
caches, and in the case of such software you're better off with a software
cache because you can remove it at will (at the worst, you'd have to
reboot, and some caches can be removed without rebooting).
-- 
Stephen M. Dunn                               cs4g6ag@maccs.dcss.mcmaster.ca
          <std_disclaimer.h> = "\nI'm only an undergraduate!!!\n";
****************************************************************************
               I Think I'm Going Bald - Caress of Steel, Rush