lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (05/04/90)
In article <26407B03.9044@paris.ics.uci.edu> baxter@ics.uci.edu (Ira Baxter) writes: >... why couldn't one decode an instruction >on fetch from main memory, and simply store the decoding information >along with the PC value into the I-cache rather than storing the instruction >there? A cache with a 50nS hit, 300nS miss, 95% hit rate would deliver >roughly 16 million *decoded* instructions per second to the CPU. AT&T's CRISP machine worked this way. There are three important points: - loops get a very high hit rate. - branching can be sometimes overlapped. The trick is to have branching fields in each I-cache word. Branch instructions are written into the I-cache entry of the preceding instruction. A branch decision, made far enough ahead, can be represented in the cache by zapping the fields (converting the branch from Conditional to Absolute). - this is essentially superscalar processing, with cache-style buffering between the decode and execute. That decoupling causes problems, because many decisions have to be deferred. For example, the decoder can't fetch values from registers, because the values may not exist yet. So, the decoder can't eliminate the difference between Register and Immediate operands. Similarly, the decoder can't initiate memory operations (except to immediate addresses). This means that the "decoding" doesn't reduce the number of flavors, unless you require some values to be invariant. For example, one could make the stack pointer changable only during call/return, in which case the decoder can turn SP-relative addresses into absolute addresses. -- Don D.C.Lindsay Carnegie Mellon Computer Science