[comp.arch] 64 bit sparc chip sets

joel@Solbourne.COM (Joel Boney) (10/26/90)

>>
>> (richard oxbrow) writes:
>> 
>> | 	[The blurb that i am currently looking at says that it has an
>> | 	8k cache on board (6k instruction and 2k data).]
>> 
>>   Okay, someone enlighten me, why that mix of cache.

I missed the original mail, but I assume the question above pertained to
the KAP chip that is in the new S4000 SPARC desktop from Solbourne.
Below is my attemp to "enlighten" you why we picked these cache sizes.

There are several basic reasons why this is a good choice of cache
sizes for this chip:

  1. RISC chips make many more instruction accesses than they make
     data accesses. This is especially true for the SPARC architrecture
     due to the large number of on-chip registers (register windows).
     We have long data traces of segments of the SPECmarks (including
     OS activity) that indicate the percentages for integer code looks
     something like this:

	I-fetch    = 82%
	Data Read  = 12%
	Data Write = 6%

     Published data for CISC machines indicate their ratio to be as high
     as 50% I-fetch and 50% data accesses.

  2. RISC machines need 1 or more new instructions per clock to keep
     the pipelines flowing.

  3. The time it takes to load/store a cache line from/to memory can
     vary depending on the type of access and the cache line sizes.
     For the S4000 the I-cache line size is 32 bytes and the D-cache
     line size is 8 bytes. Therefore an instruction cache miss takes
     more time to refill the cache line than a data cache miss. 
  
A simple example might be the best justification for our choices:

Given the above percentages of access types and assuming a cache hit
takes 1 clock, then the average number of clocks per access is:

  avg_clocks = .82*((ihit*1)  + ((1.0-ihit) *instr_access_clocks)) +
               .12*((drhit*1) + ((1.0-drhit)*data_read_access_clocks)) +
               .06*((dwhit*1) + ((1.0-dwhit)*data_write_access_clocks)) 

          where:
	    ihit  = instruction cache hit rate
	    drhit = data cache hit rate for data reads
	    dwhit = data cache hit rate for data writes
	    ...access_clocks = number of clocks to complete the access 
                               for a cache miss


  Now all we need to calculate the average clocks per access is the 
  hit rates and the time it takes to complete accesses that miss in 
  the caches. The following miss rates were actually measured on some 
  long traces. The miss access times are representative of a desktop system. 
  (NOTE: These are not the actual access times of the S4000 as I feel 
  this is proprietary information.) The 2K and 4K caches are 2 way set
  associative and the 6K cache is 3 way set associative. The 
  instruction caches have 32 byte line sizes and the data caches have
  8 byte line sizes:

    ihit:   4K = .958    instr_access_clocks = 22 
            6K = .968

    drhit:  2K = .853    data_read_access_clocks = 10
	    4K = .888

    dwhit:  2k = .747    data_write_access_clocks = 8
            4K = .787

  So now lets calculate the average number of clocks per access.

  4K-ICache/4K-DCache:

    avg_clocks = .82*(.958 + .042*22) +
                 .18*(.888 + .112*10) +
                 .06*(.787 + .213*8)  =  2.05 clocks

  6K-ICache/2K-DCache:

    avg_clocks = .82*(.968 + .032*22) +
                 .18*(.853 + .147*10) +
                 .06*(.747 + .253*8)  =  1.96 clocks

So for the case of the S4000 the 6K/2K split is better than a 4K/4K
split. I suspect that if you were to do a similar analysis of a CISC 
machine (e.g., 68040) you would find that the optimal cache split would 
be 4K I-Cache/4K D-Cache due to the larger ratio  of data accesses to 
instruction accesses. The existence of second level caches might also 
change the mix. For example the IBM RS6000 has no first level data cache
but has a fast second level combined cache. 

Selecting the correct cache sizes for a chip, given silicon size 
limitations and variable memory access times, is always a complex 
problem composed of about 80% collecting and interpreting data and 
about 20% (educated?) guessing.

DISCLAIMER: For those of you who make your living doing these things,
I apologize for simplifying the above example. I ignored several
secondary affects like what happens if a data cache write back
is going on when an I cache miss occurs. Or whether a data write
miss is write allocated or not, etc. etc.

	-- joel boney (Solbourne Computer)
		

gideony@microsoft.UUCP (Gideon YUVAL) (11/01/90)

In article <1990Oct26.151427.14084@Solbourne.COM> joel@Solbourne.COM (Joel Boney) writes:
>     We have long data traces of segments of the SPECmarks (including
>     OS activity) that indicate the percentages for integer code looks
>     something like this:
>
>	I-fetch    = 82%
>	Data Read  = 12%
>	Data Write = 6%

Do you have these numbers for the individual SPEC benchmarks?
and are these traces available to the public, by any chance?

-- 
Gideon Yuval, gideony@microsof.UUCP, 206-882-8080 (fax:206-883-8101;TWX:160520)