nathan@orstcs.UUCP (03/17/84)
RISCs and Caches, again. This distinction being made between registers and cache entries is a bit specious. On most existing architectures, of course, the CPU "doesn't know about" the cache, so of course performance optimizations are not available. Assume now that we wish to design a machine with the cache an integeral part of the machine. The first observation is that there are real performance advantages to splitting up the cache into an "instruction cache" and a "data cache". This way instruction fetches don't displace data items, and vice versa. A further refinement is to recognize that data accesses tend to be of two types: stack-related, and "other". Stack related operations tend to be very localized and are ideal candidates for caching. Making the data cache respond only to stack (and stack-relative) operations might improve performance quite a bit. On such microprocessors as the M68000 this information (stack vs. other) is not available outside the package, making this impossible; it may be partially feasible on a 16000. The trick is that, instead of general-purpose registers, short stack/frame offsets are used. Perhaps the first 16 words of stack are accessible with instructions no larger than the traditional register commands. Words further down would use longer offsets as is done in current machines. Since the machine expects a short-offset stack word to be in cache, a simultaneous main-memory access is NOT begun unless the cache actually "missses". It seems to me that this scheme provides all the advantages of the Berkeley RISC's overlapping register trick (you can if you WANT to, etc.) alnng with the 9900's quick task switching. Of course, things run a little slower while the new section of cache fills, but you have the effect of lots of registers but you NEVER save any off unnecessarily. Tricks like "store only into cache" and the like could of course be used, at the risk of the designer's sanity..... The first and most obvious difficulty is the "write-through" problem, which on a traditional cache would slow things to a crawl. Various tricks are available: instruction variants are one; A FIFO with "write- through" is another. The latter would be a FIFO which is fed memory address/data pairs to be written to memory. Such writes would have lower priority than memory reads. (presumably any word in the FIFO would also be in the cache; is that right?) When a write is made to an address that is already in the FIFO, its contents are replaced and (perhaps) it loses its "place in line". The Write FIFO shouldn't need to be very large; certainly not over 32 words. Similarly, the Stack cache would not need to be nearly as large as traditional caches; 64 or 128 words would probably be plenty. A larger cache makes for very quick procedure calls, though. I'll bet all this was tried by Burroughs 15 years ago, right? ------------------------------------------------ from the vicious cycle of: >>----->>-------------( Nathan C. Myers )-----------------\ / | | ...!decvax!tektronix!ogcvax!hp-pcd!orstcs!nathan | | nathan.oregon-state@RAND-RELAY | \___________________________________________________/