[net.arch] Stacks & Caches

nathan@orstcs.UUCP (03/17/84)
RISCs and Caches, again.

This distinction being made between registers and cache entries
is a bit specious.  On most existing architectures, of course,
the CPU "doesn't know about" the cache, so of course performance
optimizations are not available.  Assume now that we wish to design
a machine with the cache an integeral part of the machine.

The first observation is that there are real performance advantages to
splitting up the cache into an "instruction cache" and a "data cache".
This way instruction fetches don't displace data items, and vice versa.

A further refinement is to recognize that data accesses tend to be of 
two types: stack-related, and "other".  Stack related operations tend
to be very localized and are ideal candidates for caching.  Making the
data cache respond only to stack (and stack-relative) operations
might improve performance quite a bit.  On such microprocessors 
as the M68000 this information (stack vs. other) is not available 
outside the package, making this impossible; it may be partially 
feasible on a 16000.

The trick is that, instead of general-purpose registers, short stack/frame
offsets are used.  Perhaps the first 16 words of stack are accessible
with instructions no larger than the traditional register commands.  Words 
further down would use longer offsets as is done in current machines.
Since the machine expects a short-offset stack word to be in cache,
a simultaneous main-memory access is NOT begun unless the cache actually
"missses".

It seems to me that this scheme provides all the advantages of the
Berkeley RISC's overlapping register trick (you can if you WANT to, etc.)
alnng with the 9900's quick task switching.  Of course, things run a little
slower while the new section of cache fills, but you have the effect of
lots of registers but you NEVER save any off unnecessarily.
Tricks like "store only into cache" and the like could of course be
used, at the risk of the designer's sanity.....

The first and most obvious difficulty is the "write-through" problem,
which on a traditional cache would slow things to a crawl.  Various 
tricks are available: instruction variants are one; A FIFO with "write-
through" is another.  The latter would be a FIFO which is fed memory
address/data pairs to be written to memory.  Such writes would have 
lower priority than memory reads.  (presumably any word in the FIFO would 
also be in the cache; is that right?)  When a write is made to an address 
that is already in the FIFO, its contents are replaced and (perhaps) 
it loses its "place in line".  The Write FIFO shouldn't need to be
very large; certainly not over 32 words.  Similarly, the Stack cache 
would not need to be nearly as large as traditional caches; 64 or 128 
words would probably be plenty.  A larger cache makes for very quick
procedure calls, though.

I'll bet all this was tried by Burroughs 15 years ago, right?

                  ------------------------------------------------
 
                            from the vicious cycle of:
 
         >>----->>-------------( Nathan C. Myers )-----------------\
               /                                                    |
              |  ...!decvax!tektronix!ogcvax!hp-pcd!orstcs!nathan   |
              |           nathan.oregon-state@RAND-RELAY            |
               \___________________________________________________/