[comp.arch] Stack Cache with O/S driven copyback ?

mlord@bwdls58.UUCP (Mark Lord) (01/15/91)

How smart are the stack-cache designs that have been looked at?
I'm curious whether a proposal such as the following has already
been researched.
-------
The procedure call stack is probably the heaviest used chunk of memory
in most time sharing systems, such as *nix or whatever.  It therefore
follows that hardware optimization of this resource might provide 
significant performance improvements in just about any RISC/CISC system.

Specifically, how about an operating system controlled cache, devoted to
caching the call stack of the current task and/or interrupt handler?

This stack would have the following key characteristics:

	1) copyback cache - line size of 64bits. (determine optimum size ?)
	2) one-way direct mapping.
	3) O/S writes start & size registers for memory range to be cached,
		during context switches.
	4) cache size >= average task stack size  (4-8K ?)
	5) copyback has lowest bus priority - gets done only when nothing
		else wants the bus (reads, writes, dma, other caches).
	6) at context switch time, O/S uses hardware mechanism to
		clear the copyback flags for locations beyond top of stack.
		This could eliminate a lot of unnecessary copybacks.
		The same strategy could also be used by interrupt handlers,
		if the cache is to be shared.

So the idea is that, each context switch, the O/S purges the cache of any
pending copybacks for items no longer "on the stack", and then sets a new
range of addresses to be cached for the next task.  Stack locations do not
get copied back to memory until absolutely necessary (some other location
needs to be cached in the same slot of the cache.. gotta clear it first!),
unless there are free bus cycles that would otherwise be unused.

I've seen discussion of stack caching here in the past, but I don't recall
seeing suggestions 3/6) discussed.  Without it, stack caching appears to gain
nothing over simply using a larger general purpose cache.  But with it.. ?

Undoubtedly this has problems, but perhaps others here can help iron them out.
-- 
 ___Mark S. Lord__________________________________________
| ..uunet!bnrgate!mlord%bmerh724 | Climb Free Or Die (NH) |
| MLORD@BNR.CA   Ottawa, Ontario | Personal views only.   |
|________________________________|________________________|

ddr@cs.edinburgh.ac.uk (Doug Rogers) (01/15/91)

In article <5229@bwdls58.UUCP>, mlord@bwdls58.UUCP (Mark Lord) writes:
> How smart are the stack-cache designs that have been looked at?
> I'm curious whether a proposal such as the following has already
> been researched.
> 
> Specifically, how about an operating system controlled cache, devoted to
> caching the call stack of the current task and/or interrupt handler?
> 
> This stack would have the following key characteristics:
> 
> 	5) copyback has lowest bus priority - gets done only when nothing
> 		else wants the bus (reads, writes, dma, other caches).

The problem is in multi processor systems maintaining cache consistency. Within
the Futurebus spec. one user has the right to modify at any time. Clearly
if there was only one copy of the information then it should not matter how slowly
the information is passed back but what happens if while this is going on another
cache wishes to gain access to this information. (this would include the data cache
for the same processor).

-- 
Douglas Rogers                     JANET: ddr@uk.ac.ed.lfcs
Department of Computer Science     UUCP:  ..!mcvax!ukc!lfcs!ddr
University of Edinburgh            ARPA:  ddr%lfcs.ed.ac.uk@nsfnet-relay.ac.uk
Edinburgh EH9 3JZ, UK.             Tel:   031-650 5172 (direct line)

mlord@bwdls58.bnr.ca (Mark Lord) (01/16/91)

In article <4513@skye.cs.ed.ac.uk> ddr@cs.edinburgh.ac.uk (Doug Rogers) writes:
<In article <5229@bwdls58.UUCP>, mlord@bwdls58.UUCP (Mark Lord) writes:
<> 
<> Specifically, how about an operating system controlled cache, devoted to
<> caching the call stack of the current task and/or interrupt handler?
<> 
<> 	5) copyback has lowest bus priority - gets done only when nothing
<> 		else wants the bus (reads, writes, dma, other caches).
<
<The problem is in multi processor systems maintaining cache consistency. Within
<the Futurebus spec. one user has the right to modify at any time. Clearly
<if there was only one copy of the information then it should not matter how slowly
<the information is passed back but what happens if while this is going on another
<cache wishes to gain access to this information. (this would include the data cache
<for the same processor).

I'm not particularly worried about the multi-processor case for this stack
cache.  Shared memory variables tend not to be procedure locals on a call stack,
as it becomes tricky to communicate their current addresses to other processors.

Also, note that this proposal is in addition to regular data caches, which
can handle multi-processor consistency using one's favorite means for the REST
of memory, just not the process STACKs.  We could agree to document this very
unlikely deficiency and let the O/S programmers read our notes before they
try to set up globals in stack variables which are shared between CPUs.  :)
-- 
 ___Mark S. Lord__________________________________________
| ..uunet!bnrgate!mlord%bmerh724 | Climb Free Or Die (NH) |
| MLORD@BNR.CA   Ottawa, Ontario | Personal views only.   |
|________________________________|________________________|