[comp.arch] More Register vs. cache

qzhe1@cs.aukuni.ac.nz (Qun Zheng ) (01/13/91)

Hi, netters,

I enjoy reading all the debates about register vs. cache.  Here I've got some-
thing.

I've just read an article on ACM Computer Architecure News, Vol.17, No.6, Dec
1989:

    "Register Window Architecture for Multitasking Applications"

	by  D. Quammen, D.R. Miller, and D.Tabak
	from    George Mason University, et,al.

It seems to me they work on real-time concurrent programming with Ada or alikes,
where lots of tasks/processes running concurrently and intercommunicating with
each other, and lots of task switches occur from time to time.  They also men-
tion that lots of "context switches" exist in object-oriented programming.
So, they suggest that register/cache should not only support procedure call, but
also context switches and Inter-Process Communications (IPC).  Nowadays cache,
register windows/stacks has failed to support the later applications.

Hence, they have designed an on-chip register structure, called Threaded Regist-
er windows.  Basically, it's a pool of 64 on-chip non-overlapped register window
s
with 16 32-bit registers per window (experimental choice).  Each window has a
unique window-id, and can be allocated for various purposes among all concurrent
tasks.  Windows can be dynamically linked (threaded) up to form stacks/queues,
via instructions like CALL/RETURN, CALLI/RETURNI (I for interrupt), PUSH/POP,
ENQUEUE/DEQUEUE.  Low-core 2^16 = 64 KB has been reserved for register window
spilling.

Their architecture doesn't have kernel mode (at least, not mentioned in the
article and its diagrams).  This might be OK for specific real-time control
environment, but sounds no good for general purpose computing.  Also, their
design didn't mention how to incorperate with virtual memory.

For each task, there are 7 windows are directly addressible, using a 7-bit
"register number", first 3 bits to choose which window, and next 4 bits for
registr within the window. They didn't show actual instruction formats.
Each task has a 64-bit PSW, with 6 window-ids and related indeces, CCs and
interrupt masks.  Along with a global window, the 6 windows in PSW are:

    caller's window (OLD) and callee's window (CUR), so parameter passed
	within registers without overlapping :-). On a CALL, a new window
	is allocated.  A window is spilled into memory only when all 64
	windows are used up.  RETURN deallocte the window.  Windows for
	previous unreturned callers are managed in a stack form, described
	below.  Only top two windows (OLD and CUR) of the stack are directly
	addressible by "register number".

    a Map Window (MW), which is just an ordinary window assigned for various
	house-keeping jobs.  The first register of the window is used as
	2 16-bit double links (threads) to link up previous MWs.  The rest
	15 register are used to to keep track of window stack of unretured
	callers's information:  a 6-bit window-id and 26-bit return address
	in one 32-bit register. So only register access needed on CALL/RETURN
	if no spill occurs. A MW is spilled into memory only when all of its
	15 windows are spilled into memory.  Again, only top window (MW) of
	the house-keeping stack (previous MWs) is directly addressible via
	"register number".

    Normal procedure activity stack might be needed to overcome the problem
	that a single window (CUR) might not be enough to keep all local varia-
	bles and parameters.  The traditional stack can be implemented with
	top in on-chip registers.  PSW has SQW1 and SQW2 (stack/queue windows)
	can be used for this purpose.

	But more interesting way to use SQW1 and SQW2 is to implement on-chip
	queue/pipe for IPC :-). A queue of windows used as pipe between two
	tasks, so time-consuming data-copying among memory spaces are avoided.
	Again, only the head and tail windows (SQW1 & SQW2) are directly
	addressible.

    The last directly addressible window is called as Object Window (OW),
	holding procedural-global task-local/specific information, e.g. task-id,
	task control block (PCB), etc.  They have illustrated way to use these
	windows to speed up task creation (with cactus stack, something like
	Burroughs?)  task switching, etc.

Well, well.  Complicated, isn't it?  It seems to me it can be high performence
processor for real-time control.  But I wonder OS designer for general-purpose
computing will let user to directly access MW, OW, SQW1/2 for IPC queues.  Well,
they might have their own idea to fix things up.

For the fun of architectue adventure, I wnat to know if we can use cache to do
all the tricks, while make it cleaner?

I have tried to contact with the auther according to their email address in the
article, but failed to get response.  Here are their address:

    quammen@gmuvax2.gmu.edu
    dtabak@gmuvax.gmu.edu
    rmiller@gmuvax2.gmu.edu

Is there anyone on the net knowing their progross?

Chuck
-----