daveb@geac.UUCP (11/16/87)
Why half a cache? Isn't that like half an Ass? In the last article, I outlined the ICL approach to stack-top, data and instruction caches. Now comes the obvious problem: If these stores are really made of **ordinary** fast memory, what happens when you have to write them back to the main store? What happens when you execute a subroutine-call, for example? This was one of the problems which led to the use of "traditional" cacheing in the first big ICLs. It appeared that the gains from the faster memory would be easily eaten up in the losses when one needed to write the data back to slow main store. One of the ideas suggested by Mr. Bill Byrne (and enlarged upon by me, to little avail), was to bifurcate the caches. Assume a process about to call a system service (like "date"). The routine pushes a parameter list onto the stack (and thereby updates the cache, but not main store) and calls the system service. At this point the instruction cache needs to be reloaded, so there is little reason to try to delay the write-back any further. On a machine with only one (each) stack, data and instruction cache, the world is going to stop. On the proposed machine with two cache sets, the second process is free to proceed, after a mini- process-switch which amounts to the swapping of a small number of registers. The second process starts executing and the caches of the first process are flushed back to and reloaded from the main store. The flushing of the instruction cache is cheapest: it is simply marked invalid. The flushing of the stack cache is a bit slower, a block-aligned DMA is started, it is marked invalid, and a pair of wrap-around pointers are reset. The flushing of the data cache may be as simple or complex as the designer desires: this is the only "real" (ie, content-addressable) cache in the system, and may in fact be doing delayed write-through anyway... When the other process uses up its timeslice, runs out of stack cache, runs out of its instruction cache or is delayed for a data-cache fetch from an unavailable page, the first process is checked for runnability by a low-cost (ie, small & stupid) process, and if runnable is switched back on. Otherwise the cpu stands idle until the second process is switched/relocated, etc. Several instructions need to be excepted from being processed using these caches, most notable the "test and set" instruction for process/processor synchronization, but in the main, most instructions benefit. Back-of-the-envelope calculations implied that the speedup would be considerable, based on the known period of time the processors were standing idle or processing process switches. Regrettably, no statistical or modeling work was done. Bill was well respected, but he was in this funny colony over by the US, you understand... --dave (this funny colony has a mainframer now: geac) c-b -- David Collier-Brown. {mnetor|yetti|utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.