davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/01/90)
In the discussion of relative cache sizes, I remembered something which may have been discussed here several years ago. Perhaps someone will be able to mention any developments which are recent. The feature is intelligent I-cache, which only stores instructions which are the target of jumps. The basis of this is that pipelines make cache less effective for long inline runs of code. In fact, in most cases there is nothing to be gained by caching these instructions, unless the memory bandwidth is really overloaded, such as by many DMA devices or multiple CPUs. The implementation was to or the pipeline empty (or wait, or other similar signal) with the 'not in cache' and use that value to determine if the instructions should be cached. This is particularly a gain when the size of a loop or nested loops is larger than the cache, when procedures are being called, etc. It can eliminate the memory latency on calls to common procedures, and make large loops run as if they were completely in cache. If anyone has any info on recent work (if any) I'd like to hear it. If there are any good papers I should look up I'd like to see them, too. Obviously this must either be harder to do than I think, or provide less benefit, or everyone would be doing it. I can make the same argument about separate cache for data being written and read, in that a large cache for locations read and small write back cache for data written would seem (without analysis I haven't done) to offer serious reductions in the number of times the CPU waits for memory. Feel free to correct me, this just came to me while thinking about the smart I-cache. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) The Twin Peaks Halloween costume: stark naked in a body bag
jesup@cbmvax.commodore.com (Randell Jesup) (11/21/90)
In article <2823@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: > The feature is intelligent I-cache, which only stores instructions >which are the target of jumps. The basis of this is that pipelines make >cache less effective for long inline runs of code. In fact, in most >cases there is nothing to be gained by caching these instructions, >unless the memory bandwidth is really overloaded, such as by many DMA >devices or multiple CPUs. This is known as a "Branch Target Cache" or BTC. Both the RPM-40 and the AMD 29000 have BTCs (you can probably dig up some of the RPM-40 design team around CR&D - Try Dave Nagy, Janet Moseley, or Dave McGonagle (who has a company at the RPI incubator center nowadays)). If you can feed instructions into the CPU from memory (or second- level cache) at full speed, then all you need to cover is the branch latency to restart the pipeline from memory, and a BTC covers this nicely. -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup Thus spake the Master Ninjei: "If your application does not run correctly, do not blame the operating system." (From "The Zen of Programming") ;-)