[comp.arch] data-dependent cache misses & gc's

wilson@uicbert.eecs.uic.edu (Paul Wilson) (01/26/91)
mash@mips.COM (John Mashey) writes:
>In article <MOSS.91Jan21172107@ibis.cs.umass.edu> moss@cs.umass.edu writes:
>>MIPS systems have a tool that reorganizes code for better *cache* behavior.
>>The same idea would presumably work for paging (and TLB) stress reduction ...

>Yes,  although as it stands, the tool only does it for code, not data;
>for many programs, code impact on paging and TLB is much less than
>data impacts .... or in some cases, the code impact is horrible, but
>there's nothing in the world that can be done about it except to have
>huge caches, and even better, fast cache refill.  (Consider the kind of
>program that's >200MB of code, much of it in a giant single loop,
>leading to a high I-cache miss rate.)

One example of data-dependent cache misses is the misses due to data 
allocation.In garbage collected systems with a high rate of data allocation 
(as is generally true in Lisp, Smalltalk, and ML), you get LOTS of cache 
misses unless your cache is big enough to hold the youngest generation of
your garbage collector.  (If you don't have a generational gc, it's pretty
hopeless).

So it turns out that cache performance for Lisp or Smalltalk is dominated
by the allocator's reference patterns.  The main "data dependence" is really
just the rate of data object creation.

Interestingly, I've found that this impacts direct-mapped caches much more
severely than set-associative caches.  With set-associative caches you
get a very nice drop in miss rates past the point where the youngest
generation fits in cache, but with direct-mapped caches you don't.

This is because allocating lots of data (and not reclaiming most of it
almost immediately) violates the basic locality assumption
that direct-mapped caches are based on.  It's assumed that there will be
very few locations that are very heavily referenced, and they'll usually
map to blocks that no other heavily-referenced blocks map to.   If you
have a lot of blocks that are referenced at a medium time-scale, then
you lose big.  (This is because misses are a function of the MINIMUM of
the reference frequencies of conflicting blocks.  If two blocks map to
the same cache block, then the less-often-referenced one determines
the number of misses, because the other block can be touched any number
of times between references to the less-often-referenced block.)

This is probably as clear as mud, but I have yet another paper that
describes these principles, and my experiments to back it up.  It has
a cute picture that illustrates this graphically.  I can send a hardcopy
to anybody who asks.

Recently I noticed that Mark Hill's cache results for the SPEC XLisp
benchmark are consistent with this -- set-associative caches do much
better for Lisp than direct mapped caches.  

(There may be hacks around this involving split caches or weird striping 
of memory usage;  that's what I'm looking at now.)

   -- PRW

-- 
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680