franka@mmintl.UUCP (Frank Adams) (03/17/88)
This isn't a new idea, but I've never seen it discussed in a RISC context, and it seems to have some unique advantages there. The idea is that in a cached machine, the software often knows what it is going to be doing next. If access to memory is not already your critical path, you can have instructions which reference a memory location, either code or data, but don't do anything with it. Then, when the program gets around to looking at it, it is already in cache. The reason this seems particularly attractive in RISC is that you often have unused instruction slots anyhow, waiting for latency. This seems like a natural thing to fill such a slot with. Of course, a sophisticated compiler is needed to take proper advantage of this. In the case of data, it may make more sense to just load it into a register instead of merely "referencing" it, but this doesn't apply to code. (And not even to data, sometimes. If the access to the data is going to be as an array reference in a loop, preloading the first few elements into registers doesn't really help.) Has anyone done anything like this? Any idea what the costs are, and how they compare to the benefits? -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
himel@mips.COM (Mark I. Himelstein) (03/20/88)
In article <2769@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >The idea is that in a cached machine, the software often knows what it is >going to be doing next. If access to memory is not already your critical >path, you can have instructions which reference a memory location, either >code or data, but don't do anything with it. Then, when the program gets >around to looking at it, it is already in cache. The MIPS M500/M800/M1000 has physical caches fronting slow memory (otherwise why have the caches :-). We have a max of 8k 32 bit words of data cache and the same for instruction cache. Our caches are a single line wide. Some factors I'd be concerned with regarding this suggestion: - increasing the number of collisions within the process by adding memory accesses before their time. - increasing the inter-process cache collision rate when every switch entails changing more cache entries. - dirtying a cache page with respect to a memory page when you don't need to. If you reuse the memory page before you actually use it, the operating system will have to clean the cache. (by collision I mean two memory references that use the same cache entry) Wider cache lines could make the above factors worse. A very very good compiler might be able to minimize the first factor but not the last two. -- -Mark I. Himelsteing DISCLAIMER: I speak only for myself. UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!himel, DDD:408-720-1700, x251 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
himel@mips.COM (Mark I. Himelstein) (03/20/88)
In article <1902@gumby.mips.COM> himel@gumby.UUCP (Mark I. Himelstein) writes: >Some factors I'd be concerned with regarding this suggestion: > - dirtying a cache page with respect to a memory page when > you don't need to. If you reuse the memory page before > you actually use it, the operating system will have to clean > the cache. Sorry, too many 'you's; this should read: - dirtying a cache page with respect to a memory page when you don't need to. If the operating system reuses the memory page before you actually use the referenced location, the operating system will have to clean the cache. -- -Mark I. Himelstein DISCLAIMER: I speak only for myself. UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!himel, DDD:408-720-1700, x251 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
aglew@ccvaxa.UUCP (03/21/88)
I have a paper from Purdue EE entitled "Compiler Driven Cache Policy" which is basically about providing compiler control of cache. Their policy is called SCP - Software Cache Policy. Authors are Chi, C., and Dietz, H.G. June 1987. Chi talks about a machines that have advisory bits, like "read this and keep in cache", but his algorithm is general to machines that use explicit instructions. It is not general to all codes, although this is promised for a later report. --- Someone from the CEDAR group at the U of I gave a talk showing that, in his research, software controlled caching was not a win. The trouble is, this is not a disproof - all we need is one existence proof.
hankd@pur-ee.UUCP (Hank Dietz) (03/23/88)
In article <28200127@ccvaxa>, aglew@ccvaxa.UUCP writes: > I have a paper from Purdue EE entitled "Compiler Driven Cache Policy" > which is basically about providing compiler control of cache. > Their policy is called SCP - Software Cache Policy. > Authors are Chi, C., and Dietz, H.G. > June 1987. ... > Someone from the CEDAR group at the U of I gave a talk showing that, in > his research, software controlled caching was not a win. > The trouble is, this is not a disproof - all we need is one existence > proof. Chi-Hung Chi is my PhD student at Purdue EE. His thesis work is a complete graph-based model for static (compile-time) memory management, especially management of cache and registers. His work differs from most other software cache management schemes in that: 1. It is based on minimizing TOTAL REFERENCE TIME, NOT on MAXIMIZING cache HIT RATIO (this makes a very big difference!) and 2. It is complete cache management, rather than simple prefetch prediction. An excellent register-allocation version of his technique appears in the proceedings of HICSS 88: "Register Allocation for GaAs-based Computer Systems." As for the cache management, we have many materials available on this, but the following is the abstract of our latest paper "Improving Cache Performance By Selective Cache Bypass" (sent to SUPERCOMPUTING 88): In traditional cache-based computers, all memory references are made through cache. However, a significant number of items which are referenced in a program are referenced so infrequently that other cache traffic is certain to "bump" these items from cache before they are referenced again. In such cases, not only is there no benefit in placing the item in cache, but there is the additional overhead of "bumping" some other item out of cache to make room for this useless cache entry. Where a cache line is larger than a processor word, there is an additional penalty in loading the entire line from memory into cache, whereas the reference could have been satisfied with a single word fetch. Simulations have shown that these effects typicaly degrade cache-based system performance (average reference time) by 10% to 30%. This performance loss is due to cache pollution; by simply forcing "polluting" references to directly reference main memory -- bypassing the cache -- much of this performance can be regained. The technique proposed in this paper involves the use of hardware which, under program control, will determine whether each reference should be through the cache or bypassing the cache and referencing main memory directly. Several inexpensive heuristics for the compiler to determine how to make each reference are given. _ | \ __ _ | | | | / | Compiler-oriented Hank Dietz & _/ | | | | / | Architecture Chi-Hung Chi / |--| |__| |__/ Researcher from \_ | | | \ | Purdue \ | | \ \ \
aglew@ccvaxa.UUCP (03/24/88)
If I had an address I would have sent email, but I'd like to thank Hank Dietz for his note. It's good to see that some researchers (1) read comp.arch, and (2) aren't so paranoid about publication rules that they will post here. I hope to see more of it.