[comp.arch] Decoded I-Cache

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (05/04/90)

In article <26407B03.9044@paris.ics.uci.edu> baxter@ics.uci.edu 
	(Ira Baxter) writes:
>... why couldn't one decode an instruction
>on fetch from main memory, and simply store the decoding information
>along with the PC value into the I-cache rather than storing the instruction
>there?  A cache with a 50nS hit, 300nS miss, 95% hit rate would deliver
>roughly 16 million *decoded* instructions per second to the CPU.

AT&T's CRISP machine worked this way. There are three important
points:

- loops get a very high hit rate.

- branching can be sometimes overlapped. The trick is to have
  branching fields in each I-cache word. Branch instructions are
  written into the I-cache entry of the preceding instruction.  A branch
  decision, made far enough ahead, can be represented in the cache by
  zapping the fields (converting the branch from Conditional to Absolute).

- this is essentially superscalar processing, with cache-style
  buffering between the decode and execute. That decoupling causes
  problems, because many decisions have to be deferred.  For example,
  the decoder can't fetch values from registers, because the values may
  not exist yet. So, the decoder can't eliminate the difference between
  Register and Immediate operands. Similarly, the decoder can't
  initiate memory operations (except to immediate addresses). This
  means that the "decoding" doesn't reduce the number of flavors,
  unless you require some values to be invariant. For example, one
  could make the stack pointer changable only during call/return, in
  which case the decoder can turn SP-relative addresses into absolute
  addresses.

-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science