[comp.arch] Decoding in I-cache

cet1@cl.cam.ac.uk (C.E. Thompson) (05/05/90)

In article <26407B03.9044@paris.ics.uci.edu> baxter@ics.uci.edu (Ira Baxter) writes:
>
>If this were all there were to it, why couldn't one decode an instruction
>on fetch from main memory, and simply store the decoding information
>along with the PC value into the I-cache rather than storing the instruction
>there?  A cache with a 50nS hit, 300nS miss, 95% hit rate would deliver
>roughly 16 million *decoded* instructions per second to the CPU.
>
Of course, if you can't guarantee the the instruction alignment (as you can 
on most RISCs and can't on most CISCs) then there is always the possibility 
that you might later read the same bytes back from the I-cache but grouped
into different instructions. It would be possible to do an OOPS---RESET THE
WORLD in this situation, admitedly.

There is a throw-away remark in the Grohoski-Kahle-Thatcher-Moore paper on  
the `Branch and Fixed-Point Instruction Execution Units' in the IBM collection
(SA23-2619) on the RS/6000 that `as instructions return from the memory 
system {to the I-cache} they are predecoded into eight general operating    
code classes'. Does anyone know any more about this? It doesn't seem like   
enough extra information to speed up subsequent decoding unless the division
into eight categories is an awfully complicated Boolean function of the     
instruction bits.

Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

andy@Gang-of-Four.Stanford.EDU (Andy Freeman) (05/08/90)

In article <1872@gannet.cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E. Thompson) writes:
>There is a throw-away remark in the Grohoski-Kahle-Thatcher-Moore paper on  
>the `Branch and Fixed-Point Instruction Execution Units' in the IBM collection
>(SA23-2619) on the RS/6000 that `as instructions return from the memory 
>system {to the I-cache} they are predecoded into eight general operating    
>code classes'. Does anyone know any more about this? It doesn't seem like   
>enough extra information to speed up subsequent decoding unless the division
>into eight categories is an awfully complicated Boolean function of the     
>instruction bits.

There are two reasons to "decode".  The first is to put things into a
more useful form.  The second is to put things into a more useful
place, or maybe even several more useful places.  (VLIW/LIW machines
can have a separate I cache for each functional unit cluster, so that
the only thing they have to ship around is the PC.  One can save a bit
of hardware by using a single tag/match unit for all of the caches.)

-andy
--
UUCP:    {arpa gateways, sun, decwrl, uunet, rutgers}!neon.stanford.edu!andy
ARPA:    andy@neon.stanford.edu
BELLNET: (415) 723-3088