chuck@dartvax.UUCP (Chuck Simmons) (11/15/84)
I don't know much about architecture, so try not to be too harsh with me as I parade my ignorance... All the cacheing schemes I have ever heard of seem to work on a demand paged basis. When a piece of information is asked for, it is sucked into the cache, replacing the least recently used piece of data in the cache. (Here I am not making any distinctions between a cache used to speed up main memory and a cache used to speed up accesses to a disk.) One problem with this approach is that when the processor realizes that a desired piece of data is not in cache, then the processor must wait until the data can be read into cache. Now it seems to me that human brains don't work this way. Rather, human brains are more associative. It seems to me that within the brain, each piece of information points to other pieces of information which are often used in association with the first piece of information. Thus, while part of the brain is cogitating one hunk of data, other parts of the brain can start swapping in pieces of data that will soon be needed. Now suppose we had a cache that was much more under a programmer's control. To be concrete, suppose we have a cache of say, 32 elements each containing 32 words. And suppose our processor has a load cache instruction with syntax: load cache <cache address> <memory address> When this instruction was executed, the instruction execution processor would quickly tell the cache processor to pick up some data from memory. Now, while the cache processor is picking up the data, the instruction processor continues executing instructions which are already in cache. Using this instruction, I could imagine a very careful programmer helping the system to obtain cache hit rates of at least 90%. For example, at the beginning of each 32 word chunk of code, the programmer might load the next 32 word chunk of code into cache, a chunk of code that might be branched to, and a few memory locations that would soon be accessed. So maybe someone out there can tell me about interesting cache designs, or tell me why a cache such I have described wouldn't work (the cache processor conflicts with the instruction processor? a compiler could never use the load cache instruction effectively? no programmer in her right mind would want to bother with the instruction?). You bring the flames, I'll bring the marshmallows. dartvax!chuck
kissell@flairvax.UUCP (Kevin Kissell) (11/17/84)
Well, the 68020 has cache-control instructions of a sort - you can freeze
the on-chip cache. That's probably worthwhile, because that cache is a little
small. I believe that it is preferable to have a cache big enough to get a
90+% hit ratio without intervention than to use dedicated cacheing instructions.
Kevin D. Kissell
Fairchild Advanced Processor Development
uucp: {ihnp4 decvax}!decwrl!\
>flairvax!kissell
{ucbvax sdcrdcf}!hplabs!/
abc@brl-tgr.ARPA (Brint Cooper ) (11/20/84)
> > All the cacheing schemes I have ever heard of seem to work on a > demand paged basis. When a piece of information is asked for, > it is sucked into the cache, replacing the least recently used > piece of data in the cache. Not exactly. There is usually an entire "block" of data in the cache. > > One problem with this approach is that when the processor realizes > that a desired piece of data is not in cache, then the processor must > wait until the data can be read into cache. But only a few microseconds. The time is so short, the OS isn't even aware of it. ... > > Now suppose we had a cache that was much more under a programmer's control. > To be concrete, suppose we have a cache of say, 32 elements each containing > 32 words. And suppose our processor has a load cache instruction with > syntax: > > load cache <cache address> <memory address> > > When this instruction was executed, the instruction execution processor > would quickly tell the cache processor to pick up some data from memory. > Now, while the cache processor is picking up the data, the instruction > processor continues executing instructions which are already in cache. > > Using this instruction, I could imagine a very careful programmer helping > the system to obtain cache hit rates of at least 90%. > > So maybe someone out there can tell me about interesting cache designs, > or tell me why a cache such I have described wouldn't work... It probably would work, but it isn't needed. Further, this is the sort of drudgery from which programmers must be liberated. Cache memory is based upon the "principal of locality," which sort of means that the next memory location to be referenced by the program is probably quite close to the last one referenced. Therefore, with a "reasonable" cache size and a fast cache, high "hit ratios" should be possible. The exact numbers, of course, depend upon the probram being run and the design parameters. *** REPLACE THIS LINE WITH YOUR MESSAGE *** Brint (301) 278-6883 AV: 283-6883 FTS: 939-6883 ArpaNet: abc@brl UUCP: ...!{decvax,cbosgd}!brl-bmd!abc Postal: Dr Brinton Cooper U.S. Army Ballistic Research Laboratory Attn: AMXBR-SECAD (Cooper) Aberdeen Proving Ground, Md 21005
bcase@uiucdcs.UUCP (11/22/84)
[bug lunch] Your software controlled cache is a very good idea. It is so good that it is being implemented by smart compilers everywhere. If you imagine that the register file of a machine is a cache that must be managed by software, and that at any time, the things in the register file also have home memory locations associated with them, then the code generator (compiler or human being, but the most interesting case is the compiler) or optimizer (as you wish) does the cache managment by emitting the proper load and store instructions. Thus, with a really smart and globally optimizing compiler, a large register file starts to perform like (better than?) a good (great?) data cache. This is one interpretation of your idea, another is honest-to-goodness cache hardware controlled by software. The IBM 801 had some such instructions, but I am not qualified to comment. My point is that your idea is a good one, but it is not really new (sigh, there are no new ideas, only old ones. Many times I thought I had thought of something new, only to read about it in some paper somewhere.). One good place to start reading about this idea is in the paper by Sites "How to use 1000 registers." Although his ideas sound a little like the RISC register file, the essence is there. Then, if you are in to compilers, read about register allocation by coloring and spilling (a paper by some guys at IBM, I can get the reference for you if you are interested) and about register allocation by priority based coloring (by some Stanford guys, Chow and Hennessey). I don't know if anyone has done any work with software controlled instruction caches, but it should be looked into (although it sounds a little scary!).
srm@nsc.UUCP (Richard Mateosian) (11/23/84)
> Well, the 68020 has cache-control instructions of a sort - you can freeze > the on-chip cache. There is no way that a user-mode program can use that feature, even through a system call. -- Richard Mateosian {amd,decwrl,fortune,hplabs,ihnp4}!nsc!srm nsc!srm@decwrl.ARPA
dgreen@ucla-cs.UUCP (11/27/84)
Regarding predictive hardware caches, and indeed almost everything else you wanted to know about performance of hardware caches, please refer to the following survey article: Alan J. Smith, "Cache Memories," in ACM Computing Surveys, vol. 14, number 3, September 1982. He outlines three cache prefetch algorithms: 1. Always prefetch. Whenever you access memory block x, prefetch memory block x+1. 2. Prefetch on misses. Whenever you attempt to access memory block x, and x is missing from the cache, prefetch memory block x+1. 3. Tagged prefetch. "We associate with each line a single bit called the tag, which is set to one whenever the line is accessed by a program. It is initially zero and is reset to zero when the line is removed from the cache. Any line brought to the cache by a prefetch operation retains its tag of zero. When a tag changes from 0 to 1 (i.e., when the line is referenced for the first time after prefetching or is demand-fetched), a prefetch is initiated for the next sequential line. The idea is very similar to prefetching on misses only, except that miss which did not occur because the line was prefetched (i.e., had there not been a prefetch, there would have been a miss to this line) also initiates a prefetch." "... always prefetching cut the miss ratio by 50 to 90 percent ... and tagged prefetch was almost equally effective. Prefetching only on misses was less than half as good ..." (all quotes from Smith) Dan Greening, Computer Science Department UCLA.
kissell@flairvax.UUCP (Kevin Kissell) (11/27/84)
Cache control by a programmer or compiler differs from register file management
in that cacheing affects both data and program memory accesses, while register
management has a direct effect only on data memory accesses. Unless, of course,
the registers are addressable as memory by the instruction unit, but not many
contemporary designs have this property. (As I recall DEC 10's and 20's did).
Kevin D. Kissell
Fairchild Advanced Processor Development
uucp: {ihnp4 decvax}!decwrl!\
>flairvax!kissell
{ucbvax sdcrdcf}!hplabs!/
paul@ISM780B.UUCP (11/29/84)
> Now suppose we had a cache that was much more under a programmer's control. > To be concrete, suppose we have a cache of say, 32 elements each containing > 32 words. And suppose our processor has a load cache instruction with > syntax: > load cache <cache address> <memory address> > When this instruction was executed, the instruction execution processor > would quickly tell the cache processor to pick up some data from memory. > Now, while the cache processor is picking up the data, the instruction > processor continues executing instructions which are already in cache. You may have just re-invented registers, and part of the rational for a RISC machine with tons of registers, and memory addressing restricted to explicit load and store instructions. Current supercomputers tend to be built this way already. > ...For example, at > the beginning of each 32 word chunk of code, the programmer might load > the next 32 word chunk of code into cache, a chunk of code that might > be branched to, and a few memory locations that would soon be accessed. Automatic prefetching of instructions is a common way of speeding up a cpu. The bigger the machine, the bigger the prefetch cache. It is sometimes suggested that branch instructions should allow the programmer (or compiler) to indicate which way the branch is most likely to go; this would make "automatic" instruction prefetching "virtually" equivalent to explicit instruction cache control. Paul Perkins -- INTERACTIVE Systems USENET: {ihnp4|allegra|harvard|amd}!ima!paul decvax!cca!ima!paul harpo!inmet!ima!paul ucbvax!cbosgd!ima!paul {uscvax|ucla-vax|vortex}!ism780!paul Disclaimer: This message is provided AS IS. The reader assumes all risk as to the accuracy of the content and its usefulness for any particular purpose. Opinions expressed are my own and not necessarily those of my employer or anyone else. Void where taxed or prohibited.