[comp.arch] Cache lookahead

franka@mmintl.UUCP (Frank Adams) (03/17/88)

This isn't a new idea, but I've never seen it discussed in a RISC context,
and it seems to have some unique advantages there.

The idea is that in a cached machine, the software often knows what it is
going to be doing next.  If access to memory is not already your critical
path, you can have instructions which reference a memory location, either
code or data, but don't do anything with it.  Then, when the program gets
around to looking at it, it is already in cache.

The reason this seems particularly attractive in RISC is that you often have
unused instruction slots anyhow, waiting for latency.  This seems like a
natural thing to fill such a slot with.

Of course, a sophisticated compiler is needed to take proper advantage of
this.  In the case of data, it may make more sense to just load it into a
register instead of merely "referencing" it, but this doesn't apply to code.
(And not even to data, sometimes.  If the access to the data is going to be
as an array reference in a loop, preloading the first few elements into
registers doesn't really help.)

Has anyone done anything like this?  Any idea what the costs are, and how
they compare to the benefits?
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

himel@mips.COM (Mark I. Himelstein) (03/20/88)

In article <2769@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>The idea is that in a cached machine, the software often knows what it is
>going to be doing next.  If access to memory is not already your critical
>path, you can have instructions which reference a memory location, either
>code or data, but don't do anything with it.  Then, when the program gets
>around to looking at it, it is already in cache.

The MIPS M500/M800/M1000 has physical caches fronting slow memory (otherwise
why have the caches :-). We have a max of 8k 32 bit words of data cache and the
same for instruction cache. Our caches are a single line wide.

Some factors I'd be concerned with regarding this suggestion:
	- increasing the number of collisions within the process
		by adding memory accesses before their time.
	- increasing the inter-process cache collision rate when every
		switch entails changing more cache entries.
	- dirtying a cache page with respect to a memory page when
		you don't need to. If you reuse the memory page before
		you actually use it, the operating system will have to clean
		the cache.

(by collision I mean two memory references that use the same cache entry)

Wider cache lines could make the above factors worse. A very very good
compiler might be able to minimize the first factor but not the last two.

-- 
-Mark I. Himelsteing DISCLAIMER: I speak only for myself.
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!himel,   DDD:408-720-1700, x251
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

himel@mips.COM (Mark I. Himelstein) (03/20/88)

In article <1902@gumby.mips.COM> himel@gumby.UUCP (Mark I. Himelstein) writes:
>Some factors I'd be concerned with regarding this suggestion:
>	- dirtying a cache page with respect to a memory page when
>		you don't need to. If you reuse the memory page before
>		you actually use it, the operating system will have to clean
>		the cache.

Sorry, too many 'you's; this should read:

	- dirtying a cache page with respect to a memory page when
		you don't need to. If the operating system reuses the 
		memory page before you actually use the referenced 
		location, the operating system will have to clean the 
		cache.
-- 
-Mark I. Himelstein DISCLAIMER: I speak only for myself.
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!himel,   DDD:408-720-1700, x251
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

aglew@ccvaxa.UUCP (03/21/88)

I have a paper from Purdue EE entitled "Compiler Driven Cache Policy"
which is basically about providing compiler control of cache.
Their policy is called SCP - Software Cache Policy.

Authors are Chi, C., and Dietz, H.G.
June 1987.

Chi talks about a machines that have advisory bits, like "read this and
keep in cache", but his algorithm is general to machines that use explicit
instructions. It is not general to all codes, although this is promised for
a later report.

---

Someone from the CEDAR group at the U of I gave a talk showing that, in
his research, software controlled caching was not a win.

The trouble is, this is not a disproof - all we need is one existence
proof.

hankd@pur-ee.UUCP (Hank Dietz) (03/23/88)

In article <28200127@ccvaxa>, aglew@ccvaxa.UUCP writes:
> I have a paper from Purdue EE entitled "Compiler Driven Cache Policy"
> which is basically about providing compiler control of cache.
> Their policy is called SCP - Software Cache Policy.
> Authors are Chi, C., and Dietz, H.G.
> June 1987.
...
> Someone from the CEDAR group at the U of I gave a talk showing that, in
> his research, software controlled caching was not a win.
> The trouble is, this is not a disproof - all we need is one existence
> proof.

Chi-Hung Chi is my PhD student at Purdue EE.  His thesis work is a complete
graph-based model for static (compile-time) memory management, especially
management of cache and registers.  His work differs from most other
software cache management schemes in that:

1. It is based on minimizing TOTAL REFERENCE TIME, NOT on MAXIMIZING
   cache HIT RATIO (this makes a very big difference!) and

2. It is complete cache management, rather than simple prefetch
   prediction.

An excellent register-allocation version of his technique appears in the
proceedings of HICSS 88:  "Register Allocation for GaAs-based Computer
Systems."  As for the cache management, we have many materials available on
this, but the following is the abstract of our latest paper "Improving Cache
Performance By Selective Cache Bypass" (sent to SUPERCOMPUTING 88):

	In traditional cache-based computers, all memory references are made
through cache.  However, a significant number of items which are referenced
in a program are referenced so infrequently that other cache traffic is
certain to "bump" these items from cache before they are referenced again.
In such cases, not only is there no benefit in placing the item in cache,
but there is the additional overhead of "bumping" some other item out of
cache to make room for this useless cache entry.  Where a cache line is
larger than a processor word, there is an additional penalty in loading the
entire line from memory into cache, whereas the reference could have been
satisfied with a single word fetch.  Simulations have shown that these
effects typicaly degrade cache-based system performance (average reference
time) by 10% to 30%.

	This performance loss is due to cache pollution; by simply forcing
"polluting" references to directly reference main memory -- bypassing the
cache -- much of this performance can be regained.  The technique proposed
in this paper involves the use of hardware which, under program control,
will determine whether each reference should be through the cache or
bypassing the cache and referencing main memory directly.  Several
inexpensive heuristics for the compiler to determine how to make each
reference are given.
     _        
    | \   __     _
    |  | |  |   / |  Compiler-oriented          Hank Dietz &
 _/ |  | |  |  /  |  Architecture               Chi-Hung Chi
/   |--| |__| |__/   Researcher from
\_  |  | | \  |      Purdue
  \ |    |  \  \
                \

aglew@ccvaxa.UUCP (03/24/88)

If I had an address I would have sent email, but I'd like to thank
Hank Dietz for his note. It's good to see that some researchers
(1) read comp.arch, and (2) aren't so paranoid about publication
rules that they will post here.

I hope to see more of it.