qzhe1@cs.aukuni.ac.nz (Qun Zheng ) (01/13/91)
Hi, netters, I enjoy reading all the debates about register vs. cache. Here I've got some- thing. I've just read an article on ACM Computer Architecure News, Vol.17, No.6, Dec 1989: "Register Window Architecture for Multitasking Applications" by D. Quammen, D.R. Miller, and D.Tabak from George Mason University, et,al. It seems to me they work on real-time concurrent programming with Ada or alikes, where lots of tasks/processes running concurrently and intercommunicating with each other, and lots of task switches occur from time to time. They also men- tion that lots of "context switches" exist in object-oriented programming. So, they suggest that register/cache should not only support procedure call, but also context switches and Inter-Process Communications (IPC). Nowadays cache, register windows/stacks has failed to support the later applications. Hence, they have designed an on-chip register structure, called Threaded Regist- er windows. Basically, it's a pool of 64 on-chip non-overlapped register window s with 16 32-bit registers per window (experimental choice). Each window has a unique window-id, and can be allocated for various purposes among all concurrent tasks. Windows can be dynamically linked (threaded) up to form stacks/queues, via instructions like CALL/RETURN, CALLI/RETURNI (I for interrupt), PUSH/POP, ENQUEUE/DEQUEUE. Low-core 2^16 = 64 KB has been reserved for register window spilling. Their architecture doesn't have kernel mode (at least, not mentioned in the article and its diagrams). This might be OK for specific real-time control environment, but sounds no good for general purpose computing. Also, their design didn't mention how to incorperate with virtual memory. For each task, there are 7 windows are directly addressible, using a 7-bit "register number", first 3 bits to choose which window, and next 4 bits for registr within the window. They didn't show actual instruction formats. Each task has a 64-bit PSW, with 6 window-ids and related indeces, CCs and interrupt masks. Along with a global window, the 6 windows in PSW are: caller's window (OLD) and callee's window (CUR), so parameter passed within registers without overlapping :-). On a CALL, a new window is allocated. A window is spilled into memory only when all 64 windows are used up. RETURN deallocte the window. Windows for previous unreturned callers are managed in a stack form, described below. Only top two windows (OLD and CUR) of the stack are directly addressible by "register number". a Map Window (MW), which is just an ordinary window assigned for various house-keeping jobs. The first register of the window is used as 2 16-bit double links (threads) to link up previous MWs. The rest 15 register are used to to keep track of window stack of unretured callers's information: a 6-bit window-id and 26-bit return address in one 32-bit register. So only register access needed on CALL/RETURN if no spill occurs. A MW is spilled into memory only when all of its 15 windows are spilled into memory. Again, only top window (MW) of the house-keeping stack (previous MWs) is directly addressible via "register number". Normal procedure activity stack might be needed to overcome the problem that a single window (CUR) might not be enough to keep all local varia- bles and parameters. The traditional stack can be implemented with top in on-chip registers. PSW has SQW1 and SQW2 (stack/queue windows) can be used for this purpose. But more interesting way to use SQW1 and SQW2 is to implement on-chip queue/pipe for IPC :-). A queue of windows used as pipe between two tasks, so time-consuming data-copying among memory spaces are avoided. Again, only the head and tail windows (SQW1 & SQW2) are directly addressible. The last directly addressible window is called as Object Window (OW), holding procedural-global task-local/specific information, e.g. task-id, task control block (PCB), etc. They have illustrated way to use these windows to speed up task creation (with cactus stack, something like Burroughs?) task switching, etc. Well, well. Complicated, isn't it? It seems to me it can be high performence processor for real-time control. But I wonder OS designer for general-purpose computing will let user to directly access MW, OW, SQW1/2 for IPC queues. Well, they might have their own idea to fix things up. For the fun of architectue adventure, I wnat to know if we can use cache to do all the tricks, while make it cleaner? I have tried to contact with the auther according to their email address in the article, but failed to get response. Here are their address: quammen@gmuvax2.gmu.edu dtabak@gmuvax.gmu.edu rmiller@gmuvax2.gmu.edu Is there anyone on the net knowing their progross? Chuck -----