joel@Solbourne.COM (Joel Boney) (10/26/90)
>> >> (richard oxbrow) writes: >> >> | [The blurb that i am currently looking at says that it has an >> | 8k cache on board (6k instruction and 2k data).] >> >> Okay, someone enlighten me, why that mix of cache. I missed the original mail, but I assume the question above pertained to the KAP chip that is in the new S4000 SPARC desktop from Solbourne. Below is my attemp to "enlighten" you why we picked these cache sizes. There are several basic reasons why this is a good choice of cache sizes for this chip: 1. RISC chips make many more instruction accesses than they make data accesses. This is especially true for the SPARC architrecture due to the large number of on-chip registers (register windows). We have long data traces of segments of the SPECmarks (including OS activity) that indicate the percentages for integer code looks something like this: I-fetch = 82% Data Read = 12% Data Write = 6% Published data for CISC machines indicate their ratio to be as high as 50% I-fetch and 50% data accesses. 2. RISC machines need 1 or more new instructions per clock to keep the pipelines flowing. 3. The time it takes to load/store a cache line from/to memory can vary depending on the type of access and the cache line sizes. For the S4000 the I-cache line size is 32 bytes and the D-cache line size is 8 bytes. Therefore an instruction cache miss takes more time to refill the cache line than a data cache miss. A simple example might be the best justification for our choices: Given the above percentages of access types and assuming a cache hit takes 1 clock, then the average number of clocks per access is: avg_clocks = .82*((ihit*1) + ((1.0-ihit) *instr_access_clocks)) + .12*((drhit*1) + ((1.0-drhit)*data_read_access_clocks)) + .06*((dwhit*1) + ((1.0-dwhit)*data_write_access_clocks)) where: ihit = instruction cache hit rate drhit = data cache hit rate for data reads dwhit = data cache hit rate for data writes ...access_clocks = number of clocks to complete the access for a cache miss Now all we need to calculate the average clocks per access is the hit rates and the time it takes to complete accesses that miss in the caches. The following miss rates were actually measured on some long traces. The miss access times are representative of a desktop system. (NOTE: These are not the actual access times of the S4000 as I feel this is proprietary information.) The 2K and 4K caches are 2 way set associative and the 6K cache is 3 way set associative. The instruction caches have 32 byte line sizes and the data caches have 8 byte line sizes: ihit: 4K = .958 instr_access_clocks = 22 6K = .968 drhit: 2K = .853 data_read_access_clocks = 10 4K = .888 dwhit: 2k = .747 data_write_access_clocks = 8 4K = .787 So now lets calculate the average number of clocks per access. 4K-ICache/4K-DCache: avg_clocks = .82*(.958 + .042*22) + .18*(.888 + .112*10) + .06*(.787 + .213*8) = 2.05 clocks 6K-ICache/2K-DCache: avg_clocks = .82*(.968 + .032*22) + .18*(.853 + .147*10) + .06*(.747 + .253*8) = 1.96 clocks So for the case of the S4000 the 6K/2K split is better than a 4K/4K split. I suspect that if you were to do a similar analysis of a CISC machine (e.g., 68040) you would find that the optimal cache split would be 4K I-Cache/4K D-Cache due to the larger ratio of data accesses to instruction accesses. The existence of second level caches might also change the mix. For example the IBM RS6000 has no first level data cache but has a fast second level combined cache. Selecting the correct cache sizes for a chip, given silicon size limitations and variable memory access times, is always a complex problem composed of about 80% collecting and interpreting data and about 20% (educated?) guessing. DISCLAIMER: For those of you who make your living doing these things, I apologize for simplifying the above example. I ignored several secondary affects like what happens if a data cache write back is going on when an I cache miss occurs. Or whether a data write miss is write allocated or not, etc. etc. -- joel boney (Solbourne Computer)
gideony@microsoft.UUCP (Gideon YUVAL) (11/01/90)
In article <1990Oct26.151427.14084@Solbourne.COM> joel@Solbourne.COM (Joel Boney) writes: > We have long data traces of segments of the SPECmarks (including > OS activity) that indicate the percentages for integer code looks > something like this: > > I-fetch = 82% > Data Read = 12% > Data Write = 6% Do you have these numbers for the individual SPEC benchmarks? and are these traces available to the public, by any chance? -- Gideon Yuval, gideony@microsof.UUCP, 206-882-8080 (fax:206-883-8101;TWX:160520)