qzhe1@cs.aukuni.ac.nz (Qun Zheng ) (01/13/91)
Hi, netters,
I enjoy reading all the debates about register vs. cache. Here I've got some-
thing.
I've just read an article on ACM Computer Architecure News, Vol.17, No.6, Dec
1989:
"Register Window Architecture for Multitasking Applications"
by D. Quammen, D.R. Miller, and D.Tabak
from George Mason University, et,al.
It seems to me they work on real-time concurrent programming with Ada or alikes,
where lots of tasks/processes running concurrently and intercommunicating with
each other, and lots of task switches occur from time to time. They also men-
tion that lots of "context switches" exist in object-oriented programming.
So, they suggest that register/cache should not only support procedure call, but
also context switches and Inter-Process Communications (IPC). Nowadays cache,
register windows/stacks has failed to support the later applications.
Hence, they have designed an on-chip register structure, called Threaded Regist-
er windows. Basically, it's a pool of 64 on-chip non-overlapped register window
s
with 16 32-bit registers per window (experimental choice). Each window has a
unique window-id, and can be allocated for various purposes among all concurrent
tasks. Windows can be dynamically linked (threaded) up to form stacks/queues,
via instructions like CALL/RETURN, CALLI/RETURNI (I for interrupt), PUSH/POP,
ENQUEUE/DEQUEUE. Low-core 2^16 = 64 KB has been reserved for register window
spilling.
Their architecture doesn't have kernel mode (at least, not mentioned in the
article and its diagrams). This might be OK for specific real-time control
environment, but sounds no good for general purpose computing. Also, their
design didn't mention how to incorperate with virtual memory.
For each task, there are 7 windows are directly addressible, using a 7-bit
"register number", first 3 bits to choose which window, and next 4 bits for
registr within the window. They didn't show actual instruction formats.
Each task has a 64-bit PSW, with 6 window-ids and related indeces, CCs and
interrupt masks. Along with a global window, the 6 windows in PSW are:
caller's window (OLD) and callee's window (CUR), so parameter passed
within registers without overlapping :-). On a CALL, a new window
is allocated. A window is spilled into memory only when all 64
windows are used up. RETURN deallocte the window. Windows for
previous unreturned callers are managed in a stack form, described
below. Only top two windows (OLD and CUR) of the stack are directly
addressible by "register number".
a Map Window (MW), which is just an ordinary window assigned for various
house-keeping jobs. The first register of the window is used as
2 16-bit double links (threads) to link up previous MWs. The rest
15 register are used to to keep track of window stack of unretured
callers's information: a 6-bit window-id and 26-bit return address
in one 32-bit register. So only register access needed on CALL/RETURN
if no spill occurs. A MW is spilled into memory only when all of its
15 windows are spilled into memory. Again, only top window (MW) of
the house-keeping stack (previous MWs) is directly addressible via
"register number".
Normal procedure activity stack might be needed to overcome the problem
that a single window (CUR) might not be enough to keep all local varia-
bles and parameters. The traditional stack can be implemented with
top in on-chip registers. PSW has SQW1 and SQW2 (stack/queue windows)
can be used for this purpose.
But more interesting way to use SQW1 and SQW2 is to implement on-chip
queue/pipe for IPC :-). A queue of windows used as pipe between two
tasks, so time-consuming data-copying among memory spaces are avoided.
Again, only the head and tail windows (SQW1 & SQW2) are directly
addressible.
The last directly addressible window is called as Object Window (OW),
holding procedural-global task-local/specific information, e.g. task-id,
task control block (PCB), etc. They have illustrated way to use these
windows to speed up task creation (with cactus stack, something like
Burroughs?) task switching, etc.
Well, well. Complicated, isn't it? It seems to me it can be high performence
processor for real-time control. But I wonder OS designer for general-purpose
computing will let user to directly access MW, OW, SQW1/2 for IPC queues. Well,
they might have their own idea to fix things up.
For the fun of architectue adventure, I wnat to know if we can use cache to do
all the tricks, while make it cleaner?
I have tried to contact with the auther according to their email address in the
article, but failed to get response. Here are their address:
quammen@gmuvax2.gmu.edu
dtabak@gmuvax.gmu.edu
rmiller@gmuvax2.gmu.edu
Is there anyone on the net knowing their progross?
Chuck
-----