wilson@uicbert.eecs.uic.edu (Paul Wilson) (01/26/91)
mash@mips.COM (John Mashey) writes: >In article <MOSS.91Jan21172107@ibis.cs.umass.edu> moss@cs.umass.edu writes: >>MIPS systems have a tool that reorganizes code for better *cache* behavior. >>The same idea would presumably work for paging (and TLB) stress reduction ... >Yes, although as it stands, the tool only does it for code, not data; >for many programs, code impact on paging and TLB is much less than >data impacts .... or in some cases, the code impact is horrible, but >there's nothing in the world that can be done about it except to have >huge caches, and even better, fast cache refill. (Consider the kind of >program that's >200MB of code, much of it in a giant single loop, >leading to a high I-cache miss rate.) One example of data-dependent cache misses is the misses due to data allocation.In garbage collected systems with a high rate of data allocation (as is generally true in Lisp, Smalltalk, and ML), you get LOTS of cache misses unless your cache is big enough to hold the youngest generation of your garbage collector. (If you don't have a generational gc, it's pretty hopeless). So it turns out that cache performance for Lisp or Smalltalk is dominated by the allocator's reference patterns. The main "data dependence" is really just the rate of data object creation. Interestingly, I've found that this impacts direct-mapped caches much more severely than set-associative caches. With set-associative caches you get a very nice drop in miss rates past the point where the youngest generation fits in cache, but with direct-mapped caches you don't. This is because allocating lots of data (and not reclaiming most of it almost immediately) violates the basic locality assumption that direct-mapped caches are based on. It's assumed that there will be very few locations that are very heavily referenced, and they'll usually map to blocks that no other heavily-referenced blocks map to. If you have a lot of blocks that are referenced at a medium time-scale, then you lose big. (This is because misses are a function of the MINIMUM of the reference frequencies of conflicting blocks. If two blocks map to the same cache block, then the less-often-referenced one determines the number of misses, because the other block can be touched any number of times between references to the less-often-referenced block.) This is probably as clear as mud, but I have yet another paper that describes these principles, and my experiments to back it up. It has a cute picture that illustrates this graphically. I can send a hardcopy to anybody who asks. Recently I noticed that Mark Hill's cache results for the SPEC XLisp benchmark are consistent with this -- set-associative caches do much better for Lisp than direct mapped caches. (There may be hacks around this involving split caches or weird striping of memory usage; that's what I'm looking at now.) -- PRW -- Paul R. Wilson Software Systems Laboratory lab ph.: (312) 996-9216 U. of Illin. at C. EECS Dept. (M/C 154) wilson@bert.eecs.uic.edu Box 4348 Chicago,IL 60680