[comp.arch] Why only half a cache?

daveb@geac.UUCP (11/16/87)
  Why half a cache? Isn't that like half an Ass?

  In the last article, I outlined the ICL approach to stack-top,
data and instruction caches.  Now comes the obvious problem:

  If these stores are really made of **ordinary** fast memory, what
happens when you have to write them back to the main store?  What
happens when you execute a subroutine-call, for example?

  This was one of the problems which led to the use of "traditional"
cacheing in the first big ICLs.  It appeared that the gains from the
faster memory would be easily eaten up in the losses when one needed
to write the data back to slow main store.  One of the ideas
suggested by Mr. Bill Byrne (and enlarged upon by me, to little
avail), was to bifurcate the caches.
  Assume a process about to call a system service (like "date").
The routine pushes a parameter list onto the stack (and thereby
updates the cache, but not main store) and calls the system service.
At this point the instruction cache needs to be reloaded, so there
is little reason to try to delay the write-back any further. 
  On a machine with only one (each) stack, data and instruction
cache, the world is going to stop.  On the proposed machine with two
cache sets, the second process is free to proceed, after a mini-
process-switch which amounts to the swapping of a small number of
registers. The second process starts executing and the caches of the
first process are flushed back to and reloaded from the main store.
The flushing of the instruction cache is cheapest: it is simply
marked invalid.  The flushing of the stack cache is a bit slower, a
block-aligned DMA is started, it is marked invalid, and a pair of
wrap-around pointers are reset.  The flushing of the data cache may
be as simple or complex as the designer desires: this is the only
"real" (ie, content-addressable) cache in the system, and may in
fact be doing delayed write-through anyway...

  When the other process uses up its timeslice, runs out of stack
cache, runs out of its instruction cache or is delayed for a
data-cache fetch from an unavailable page, the first process is
checked for runnability by a low-cost (ie, small & stupid) process,
and if runnable is switched back on.  Otherwise the cpu stands idle
until the second process is switched/relocated, etc.

  Several instructions need to be excepted from being processed
using these caches, most notable the "test and set" instruction for
process/processor synchronization, but in the main, most
instructions benefit.

  Back-of-the-envelope calculations implied that the speedup would
be considerable, based on the known period of time the processors
were standing idle or processing process switches.  Regrettably, no
statistical or modeling work was done.  Bill was well respected,
but he was in this funny colony over by the US, you understand...

 --dave (this funny colony has a mainframer now: geac) c-b
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.