emb978@leah.Albany.Edu (Eric M. Boehm) (02/19/90)
Included below are the responses I received to my question "What is the best way to determine cache size". I was hoping for a procedure to arrive at some magical number but as several people pointed out, it all depends on what applications you are using. Basically, you have to use trial and error. A few misconceptions I had were cleared up. The hardware cache on motherboards is a memory cache used to get 0 wait states without going to expensive static ram or very fast (40ns) dynamic ram. It is used to decrease a bottleneck between the CPU and RAM, while a disk cache is meant to decrease the bottleneck between RAM and the hard disk. Comparing the two is probably comparing apples and oranges. I had thought a large cache would be counterproductive because you would need to reload the entire cache if the data wasn't currently in the cache. From the discussions below, that doesn't seem to be the case. I had tried some measurements after the first few responses. I am using PC-CACHE (from PCTOOLS 5.5) set to read 16 sectors ahead. With 384K, I was getting about 55% hits. At 512K, about 75% hits; at 1024K, about 77-85% hits. I tried to set up QUICKCACHE II (available from simtel20). It hung up during the installation process while trying to determine buffer size (I think the cache on the controller was screwing up their tests). I installed it manually but I didn't like the program because it used 77K of memory from the 640K region while pc-cache uses about 24K. Finally, I had very good results by keeping the ramdisk large and moving entire applications or source code to it. Compilation is very fast. For the work I am doing, I don't think I would get that much more from a larger cache. In fact, the best size may lie between 512K and 1024K. Thanks to all who responded. Their help and information was much appreciated. ------------------------------------------------------------------------------ From: Stephen Trier <trier@SCL.CWRU.Edu> Cache size is very dependent on what applications you are using. For example, I use Mush for my e-mail, and Mush does a lot of backward searches and re-reads of a mail folder. This makes it a great application for a cache. My folders are from 10K to 144K is length, so I'd probably use a 128K cache, or if I felt generous with the RAM, 256K. On the other hand, WordPerfect doesn't use the disk much, and when it does, it's mostly just for straight reads and writes of a file. Because of that, a cache won't help much. There are two viewpoints on caching: One holds that a cache is like a special, high-speed RAMdisk that uses virtual memory to look huge. From that viewpoint, you should go to a 3 or 4 Meg cache. You might get very fast performance, since entire applications will be buffered onto the cache. (I'd love to do some C compilers with a cache that size.) The other argument is that a cache is for repeated reads of one or two files, and should therefore be about as large as the largest file (or application) you will use. On my system, that would be about 256K. The only sure test is to try it both ways. Since you have memory to spare, try a 512K cache. However, I would also try moving that cache up to 3.5 Megs or so and dropping that RAMDisk down proportionately. You might be very pleased with the performance, especially the *second* time you run an application or do a compile. ;-) From: mcdonald@aries.scs.uiuc.edu (Doug McDonald) Having both a ramdisk and a cache complicates matters enormously. You might be better off using the ramdisk for all your most common programs etc. Then again you might be better off with a huge cache, though these things DO have a maximum useful size. What you have to do is simple TRY all your normal activities with various choices. Also note - too many BUFFERS in your config.sys files is BAD if you use a cache. You might get an improvement by going down to somewhere between 3 and 10. You probably need 2 buffers for each commonly used drive designation letter. I usually use c:, e:, and one at a time of d:, f:, g:, and h: , so mine works best at roughly 7. I personally use TeX a lot, and find that it works best with a disk cache of 640k, though 192k is much better than none at all. 192K is enough to be optimal for compiling things with Microsoft C. I do not use the rest of my 4 megs as a ramdisk, but rather to run genuine 30386 32 bit programs. From: chao@cory.berkeley.edu (Chia-Chi Chao) I use PC-CACHE from PC TOOLS, and it has a /measures option which displays the number of disk accesses and actual physical access, along with the percentage of accesses saved. I don't think there's a program which automatically determines the correct cache size for you. Also, if you use disk cache, I think you can decrease the BUFFERS value to minimum to save some memory space. From: nghiem@emx.utexas.edu (Alex Nghiem) I simply set the cache to the size of the application that I use frequently. Lets say I use Microsoft Works often. Lets say the complete Microsoft works takes up say 185k on the hard disk. Setting the cache to 256k provides ample space to cache all of Microsoft Works and allows space for whatever swap files it may use. From: MICHELBI@oregon.uoregon.edu (Michel Biedermann) If I read you message correctly, I think you are confusing two different types of caches: memory versus disk cache. Most 386 computers, in order to perform at 0-wait states without using very fast RAM, require a small cache, usually around 64K, or super fast SRAM memory (not DRAM). The main difference lies in the memory access speed which is approximately 80-90ns for DRAM versus 15-25ns for SRAM. In the case of a disk cache, standard DRAM is used to decrease the bottleneck between the memory and the drive, not the CPU and the memory as seen above. In this case, you can indeed have a disk cache as big as you can afford memory. There is a small penalty mind you. With very big disk caches (>1024K) if the information requested is not in the cache, the computer may actually lose some time looking for it. I have found a disk cache of 512K to be a happy medium, although I have occasionally used one much bigger (1024K) when I switch between programs a lot (DeskView or OS/2). Zenith's latest EISA controller has a 1MB cache built on the drive controller. This allows an apparent disk access time of *3-6ms* even though the drive itself is only rated at 16ms which is standard for ESDI drives. From: kabra437@pallas.athenanet.com (Ken Abrams) Just EXACTLY what evidence do you have (speculation aside) that a large cache is "counterproductive"? I am using a 2M cache with SMARTDRIVE under MS-DOS 4.01 and see absolutely NO indication that it is in any way too large (quite to the contrary, as a matter of fact). I think the reason that hardware caches are much smaller is based almost entirely on COST. From: jdudeck@polyslo.CalPoly.EDU (John R. Dudeck) The rule of thumb for disk cache size is: Make the cache as big as you can without taking away from memory that your program will use. If you want to try to do measurements, you probably will never get anything conclusive. The reason for this is the way in which a cache works. It holds the disk data that has been read, in the hopes that you will try to read the same data again, thus removing the need for a repeated disk access to that data. If you keep reading the same data over and over, the cache will always get a hit, and you will have fantastic speed improvements over no cache at all. If you read different data each time you read, there will be no improvement at all. It all depends on how much your work rereads the same data without intervening reads of other data. This can be determined easier on the back of an envelope than by doing benchmark measurements! From: scotts@cpqhou.UUCP (Scott Shaffer) Well, I think the answer lies in what disk-cache you are using and what algorithm they use. It is a fact that the larger the cache you have, the larger the look-up table must be. Since this table is what the driver uses to determine what is in the cache and whether to get it from a cache buffer or read from the disk. If you have a large disk, and the algorithm cache's, say, 1-8 sectors at a time, then it is possible that a large cache will actually have a performance degredation over a smaller one, due to the fact that searching the table will require a little time. However, if the program uses an algorithm that cache's entire tracks (like Compaq's), then there is *NO* performance degredation, even with caches of 16MB. Thus, the answer should be to a) figure out what algorithm the cache uses, and b) figure out how much memory you are willing to give up for the cache. I personally run 4MB of cache because I have the memory and use it for nothing else (I also run 2MB VDISK and 9MB EMM). The algorithm that cache's whole tracks is under the assumption that most disk accesses will be sequential (sector to sector), so cacheing the whole track will result in significant better performance by a program that is going to read multiple sectors from the same track (this is most common in database applications, and CAD work). If, however, you acesses different tracks, then you will have cache misses. One person noted that this will actually take more time (it will), but this will be negligable, because even if the cache table is 20KB (for large caches only), the time it takes to CMP AX,[cache] that 20KB will be VERY little on 386/25 machines. Obviously, if you can add to the controller cards memory cache, that will be the best solution (unfortunately, this probably isn't an option). From: jpn@genrad.com (John P. Nelson) This is not necessarily true. A 1 megabyte cache is about right for my development environment (which is a compile/edit/debug cycle). The rule of thumb is that you would like to fit all of your "working" data into cache, including the programs you are running. (Of course, if you run a single program all day, you don't need to have that program in cache.) >I received a suggestion that 64 KB is probably a better size (since that >is the size of hardware caches on most machines with hardware caches). You are thinking of a different type of cache. That is a MEMORY cache, not a disk cache. A memory cache is made up of very fast memory, and caches access to normal, slower memory. A disk cache generally needs to be larger, because the amount of data you access from disk is generally larger than the amount of memory being used by a program. This is particularly true since disk is cached with a 512K granularity, while memory caches generally cache longwords (4 bytes) or doublewords (8 bytes). It never hurts to use a larger cache, except that the memory is unavailable to programs for other uses. As I said, what you want to do is to keep your entire working set of data (and programs) in cache memory. If your cache program has "clear cache" and a "display cache usage" functions, you can clear the cache, perform your normal work operations, then check the cache usage. If the cache is 100% used, then you don't have an excess of cache memory allocated. Of course, you may not need to allocate quite as much memory as is implied by the above procedure, because the most frequently accessed data will be cached in preference to infrequently used data. In that case, you simply have to estimate how large your "frequently" used data is, and allocate slightly more than that. It depends on your application! If you allocate 1 byte less than the "working set" of data that you use, your cache will tend to thrash. As an example, take an application that frequently accesses a 64K file sequentially, with a system that has 63K of cache. The file is accessed from the beginning to the end: The first 63K of the file is stored in cache. Now, the last 1K of the file is accessed, but there is no more cache available, so the cache driver discards the oldest (least recently used) data (the first 1K of the file) from the cache, and loads the last 1K of the file into that block. Now watch what happens the next time the file is accessed. The first block of the file is not stored in cache, so the disk is accessed. Because there are no free cache blocks, the oldest block is discarded, which just happens to be the SECOND 1K of the file. Note that the ENTIRE FILE will be fetched from disk in the same manner. In other words the cache is TOTALLY USELESS! From: ddb@ns.network.com (David Dyer-Bennet) I find 384k barely sufficient, 1 meg about ok. I'd use 2 meg, except I run some programs that make good use of the expanded memory for other things. Your comparison to memory cache sizes is not terribly relevant; for running dos, memory cache is really relevant to about 640k of memory, whereas your disk is probably 40 megabytes at least. Most caches I've used provide some way to report their hit ratios (what percentage of reads find the data in cache). If you can get this information from yours, monitor the hit ration periodically while running with different cache sizes. The right size will depend a lot on exactly what your work mix is, so don't expect a "right" answer from the net to be just the number :-) From: rick@NRC.COM (Rick Wagner) If your cache software uses the track buffering method, then you should definately consider using a disk optimizer. This will ensure that you get the most sequential data, using up fewer of your buffers. The cache program I wrote uses track buffers, and is happiest with an optimized disk. While it is true that in random access applications (i.e. databases) you may get alot of misses on the data fields, the key file, which is repeatedly accessed, as well as the DOS FAT table, will be in cache. Caching these blocks will "buy" you alot, even if you can't hold the entire data base itself. From: Stephen M. Dunn cs4g6ag@maccs.dcss.mcmaster.ca How big your cache should be depends on what you're doing with your computer. For example, I have mine set up to use all 384K of expanded memory, and to read up to 16 sectors in advance. Working with my favourite editor (90K) and Turbo C's TCC 2.0 to work on C programs of 20-60K bytes of source code, I get around 55% cache hits. Working with the same editor and Clipper to work on an application with about 10 source files averaging 10-15K, I get about 70% cache hits. Using a relatively small database application, I get 85-90% cache hits (including loading in the program - 200K). So as you can see, it varies. So far today, I've loaded in some comm software, switched to WP, gone back to the comm software, and shelled out a couple of times to do disk directories, and my hit rate is 66%. What the sizes for your cache and ramdisk should be depend on what you're doing. IF you're doing stuff that will all fit in the ramdisk at once (or almost all), then you should have a really big ramdisk and a small cache. If you can't really use the ramdisk too much, then it makes sense to have it only as big as you need it and to have a much larger cache. My brother has a 1M cache in his machine and finds that for software development, it makes a huge difference to performance. The more frequently you'll perform disk writes, the more ramdisk and less cache you want. But if most of what you're doing is disk reads, and if you're likely to be reading stuff that won't all fit into a ramdisk, you'd be better off to increase the size of the cache. That is, in general - there are no hard and fast rules for this. Just in case you don't know this, though, don't use CORETEST or something similar to try to determine your data transfer rate, since it reads a small number of tracks over and over again and after the first time, they'll all be in the cache (just out of interest, I tried this once on my machine ... without the cache, it measured the data transfer rate at just over 200k/s; with the cache, it jumped to 5M/s). -- Eric M. Boehm EMB978@leah.Albany.EDU EMB978@ALBNYVMS.BITNET
cs4g6ag@maccs.dcss.mcmaster.ca (Stephen M. Dunn) (02/27/90)
In article <2559@leah.Albany.Edu> emb978@leah.Albany.Edu (Eric M. Boehm) writes:
$Included below are the responses I received to my question "What is the
$best way to determine cache size".
[...]
$There are two viewpoints on caching: One holds that a cache is like a
$special, high-speed RAMdisk that uses virtual memory to look huge. From
$that viewpoint, you should go to a 3 or 4 Meg cache. You might get very
$fast performance, since entire applications will be buffered onto the cache.
$(I'd love to do some C compilers with a cache that size.) The other argument
$is that a cache is for repeated reads of one or two files, and should
$therefore be about as large as the largest file (or application) you will use.
Well, the "ramdisk" view is pretty accurate _if_ you're only reading files
and not writing them, but once you start writing, a ramdisk will be so much
faster since it only involves a write to memory, while a disk cache requires
a write to the disk as well. Almost all caches are write-through designs, which
means that as soon as you write something, it goes onto disk immediately.
The only exception to this that I know of is Super PC-Kwik, which is a
write-back design that will perform background writes while you're doing
something else with your PC.
The repeated reads argument also has some validity, especially as it
applies to the directory and file allocation information if you're using
several files and opening and closing them a fair bit.
$From: jdudeck@polyslo.CalPoly.EDU (John R. Dudeck)
$If you want to try to do measurements, you probably will never get anything
$conclusive. The reason for this is the way in which a cache works. It holds
$the disk data that has been read, in the hopes that you will try to read the
$same data again, thus removing the need for a repeated disk access to that
$data. If you keep reading the same data over and over, the cache will
$always get a hit, and you will have fantastic speed improvements over no
$cache at all. If you read different data each time you read, there will
$be no improvement at all. It all depends on how much your work rereads
$the same data without intervening reads of other data. This can be
$determined easier on the back of an envelope than by doing benchmark
$measurements!
So the point is that benchmark-type activity is meaningless, and I
agree. So what you do is set up a disk cache and then do whatever you
normally do with the machine, and have a look at the cache statistics
at the end of the day. Or, for smaller-scale timing, set up a cache and
then compile and link a program of whatever the typical size you use is
and see how long it takes without the cache or with various size caches.
If you keep in mind how a cache works, you can effectively test to see
how much of a performance difference it makes ... it's only if you
try doing the same thing over and over again that your results get
screwy.
--
Stephen M. Dunn cs4g6ag@maccs.dcss.mcmaster.ca
<std_disclaimer.h> = "\nI'm only an undergraduate!!!\n";
****************************************************************************
I Think I'm Going Bald - Caress of Steel, Rush