[comp.arch] I-cache flush for SMC

carters@ajpo.sei.cmu.edu (Scott Carter) (12/16/90)

Regarding the need for I-cache flushes to support self-modifying code for
various purposes (OO thunks, blit routines, etc.), a few thoughts:

It seems that one would really prefer to just invalidate the cache lines which 
have been replaced by new code, rather than flush the entire I-cache and lose
valuable context.  (flushing the entire cache could be implemented by just
clearing the bit line of the valid bit, which should add little HW).  In 
the 88K with external caches only, the invalidate path already exists in the
chip (just map a page into operand space with coherence enabled, and it could
easily cause a snooping invalidate via the M-bus, no?  The change would be that
an 88200 in I_cache mode would no longer be able to ignore M-bus coherence ops,
which will in turn cause more contention on the Icache tag RAM and hence 
performance hit, but in a uniprocessor you'd only see these cycles for actual
writes into code space, which better be pretty rare).

In a processor with e.g. an on-chip Icache, there still probably needs to be a
path from the operand side to Ifetch because of indirect jumps.  To keep the
branch latency down I imagine there's usually a direct path from a register
file read port to the instruction address mux, so we could add an instruction
like Icache_Invalidate (reg) without adding any new bussing, but the controls 
are a bit odd.  Might need to have a cycle or two latency on this instruction.

For processors with direct-mapped I-caches, a hack which is probably feasible,
if ugly, is to link in a code space section which consists of nothing but
return instructions, one per line, which covers all the classes in the I-cache.
e.g. on a Mips R3000 this wastes 64KB in the code image, and gives you an 
Icache flush at the cost of seven instructions and one spurious I-cache miss
per line/block (icache refill block) per block you need to invalidate.  Not
that bad, really.

Scott Carter - McDonnell Douglas Electronic Systems Company
carter%csvax.decnet@mdcgwy.mdc.com (preferred and faster) - or -
carters@ajpo.sei.cmu.edu		 (714)-896-3097
The opinions expressed herein are solely those of the author, and are not
necessarily those of McDonnell Douglas.

Bruce.Hoult@bbs.actrix.gen.nz (12/18/90)

Scott Carter writes:

>For processors with direct-mapped I-caches, a hack which is probably feasible,
>if ugly, is to link in a code space section which consists of nothing but
>return instructions, one per line, which covers all the classes in the I-cache.
>e.g. on a Mips R3000 this wastes 64KB in the code image, and gives you an 
>Icache flush at the cost of seven instructions and one spurious I-cache miss
>per line/block (icache refill block) per block you need to invalidate.  Not
>that bad, really.

Why not use the same number of NOPs (or whatever harmless instruction reads
most bytes of instruction stream in the fewest cycles -- for example on the
6502 something like LDA #0000 was 50% faster than NOPs).  It'll be much
quicker to execute.  That's how some friends and I saved the cost of memory
refresh hardware on a home-brew computer -- just get an interrupt routine to
execute a page (256 bytes for the 64 Kbit chips we used at the time) of NOPs
every few mS.  This only used a few percent of the processor time.
-- 
Bruce.Hoult@bbs.actrix.gen.nz   Twisted pair: +64 4 772 116
BIX: brucehoult                 Last Resort:  PO Box 4145 Wellington, NZ