carters@ajpo.sei.cmu.edu (Scott Carter) (12/16/90)
Regarding the need for I-cache flushes to support self-modifying code for various purposes (OO thunks, blit routines, etc.), a few thoughts: It seems that one would really prefer to just invalidate the cache lines which have been replaced by new code, rather than flush the entire I-cache and lose valuable context. (flushing the entire cache could be implemented by just clearing the bit line of the valid bit, which should add little HW). In the 88K with external caches only, the invalidate path already exists in the chip (just map a page into operand space with coherence enabled, and it could easily cause a snooping invalidate via the M-bus, no? The change would be that an 88200 in I_cache mode would no longer be able to ignore M-bus coherence ops, which will in turn cause more contention on the Icache tag RAM and hence performance hit, but in a uniprocessor you'd only see these cycles for actual writes into code space, which better be pretty rare). In a processor with e.g. an on-chip Icache, there still probably needs to be a path from the operand side to Ifetch because of indirect jumps. To keep the branch latency down I imagine there's usually a direct path from a register file read port to the instruction address mux, so we could add an instruction like Icache_Invalidate (reg) without adding any new bussing, but the controls are a bit odd. Might need to have a cycle or two latency on this instruction. For processors with direct-mapped I-caches, a hack which is probably feasible, if ugly, is to link in a code space section which consists of nothing but return instructions, one per line, which covers all the classes in the I-cache. e.g. on a Mips R3000 this wastes 64KB in the code image, and gives you an Icache flush at the cost of seven instructions and one spurious I-cache miss per line/block (icache refill block) per block you need to invalidate. Not that bad, really. Scott Carter - McDonnell Douglas Electronic Systems Company carter%csvax.decnet@mdcgwy.mdc.com (preferred and faster) - or - carters@ajpo.sei.cmu.edu (714)-896-3097 The opinions expressed herein are solely those of the author, and are not necessarily those of McDonnell Douglas.
Bruce.Hoult@bbs.actrix.gen.nz (12/18/90)
Scott Carter writes: >For processors with direct-mapped I-caches, a hack which is probably feasible, >if ugly, is to link in a code space section which consists of nothing but >return instructions, one per line, which covers all the classes in the I-cache. >e.g. on a Mips R3000 this wastes 64KB in the code image, and gives you an >Icache flush at the cost of seven instructions and one spurious I-cache miss >per line/block (icache refill block) per block you need to invalidate. Not >that bad, really. Why not use the same number of NOPs (or whatever harmless instruction reads most bytes of instruction stream in the fewest cycles -- for example on the 6502 something like LDA #0000 was 50% faster than NOPs). It'll be much quicker to execute. That's how some friends and I saved the cost of memory refresh hardware on a home-brew computer -- just get an interrupt routine to execute a page (256 bytes for the 64 Kbit chips we used at the time) of NOPs every few mS. This only used a few percent of the processor time. -- Bruce.Hoult@bbs.actrix.gen.nz Twisted pair: +64 4 772 116 BIX: brucehoult Last Resort: PO Box 4145 Wellington, NZ