[comp.arch] 16 & 32 bit vs 32 bit only instruct

aglew@ccvaxa.UUCP (02/27/88)

The big thing about three address instructions is that
it lets you take maximal advantage of a multiport register
file. Ie. if you *HAVE* to say A=B+C, then A=C;A+=C makes you
pay a penalty. And there is always hope that compilers will
start using the 3 address instructions to avoid dependencies.

But, if you don't use them, why pay for them? Why not have a
decoded instruction cache that takes a compact representation
and generates the canonical form? It doesn't have to be as fancy
as CRISP - Patterson's group had a paper on this.

oconnor@sunset.steinmetz (Dennis M. O'Connor) (03/01/88)

An article by aglew@ccvaxa.UUCP says:
] 
] The big thing about three address instructions is that
] it lets you take maximal advantage of a multiport register
] file. Ie. if you *HAVE* to say A=B+C, then A=C;A+=C makes you
] pay a penalty. And there is always hope that compilers will
] start using the 3 address instructions to avoid dependencies.

Many operations in load-store machines are of the load-it,
modify-it, maybe modify-it-again, then maybe store-it. These
types of operations will never want three-address formats.
The original (destroyed in two-address) value is never reused.
Our research indicated that this was the most common case.
For these types of data, dependencies can't be avoided.

] But, if you don't use them, why pay for them? Why not have a
] decoded instruction cache that takes a compact representation
] and generates the canonical form? It doesn't have to be as fancy
] as CRISP - Patterson's group had a paper on this.

It adds latency. Especially it adds to the latency of a branch. Either
you will have more post-branch slots to fill, or you will have a more
expensive cache-miss penalty. Branches seem to be about one-tenth
of all instructions.

Plus, of course, a classic RISC argument : couldn't you have found a
BETTER use for all that silicon ? A bigger cache ? More registers ? An
on-chip FPU ? UNIX-on-ROM :-) ?

--
    Dennis O'Connor			      UUNET!steinmetz!sunset!oconnor
		   ARPA: OCONNORDM@ge-crd.arpa
   (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)

bcase@Apple.COM (Brian Case) (03/02/88)

In article <9728@steinmetz.steinmetz.UUCP> sunset!oconnor@steinmetz.UUCP writes:
>Many operations in load-store machines are of the load-it,
>modify-it, maybe modify-it-again, then maybe store-it. These
>types of operations will never want three-address formats.
>The original (destroyed in two-address) value is never reused.
>Our research indicated that this was the most common case.
>For these types of data, dependencies can't be avoided.

In my experience, just the opposite is true; er, that is the opposite of
"The original value is never reused" is true.  Yes, it is true that many
operations are like "load-it, modify, store-it-back" but reuse is, to me,
one the *MAIN* benefits of RISC architectures.  Marty Hopkins said it
pretty well in some short papers.  Lots of registers and three-address
operations facilitate reuse.  If having a three address format reduces
the instruction (cycle) count in your inner loops from 10 to 9, you
potentially have 10% better performance.  If the inner loops go from 5
to 4 instructions, it's even better.  Three address instructions don't
have to be terribly frequently used to be very important.

>] But, if you don't use them, why pay for them? Why not have a
>] decoded instruction cache that takes a compact representation
>] and generates the canonical form? It doesn't have to be as fancy
>] as CRISP - Patterson's group had a paper on this.
>
>It adds latency. Especially it adds to the latency of a branch. Either
>you will have more post-branch slots to fill, or you will have a more
>expensive cache-miss penalty. Branches seem to be about one-tenth
>of all instructions.

This is the one big lose with decoded instruction caches.  A smaller lose
is the size; in the same area, you could have had probably a 2x size
encoded instruction cache.  Depending upon the actual sizes involved, the
2x size difference may not have much effect (rule of thumb:  double the
cache size will halve the miss rate.  Sort of.).  An 8K instrution cache
is probably not much worse than a 16K instruction cache (depending on
lots of things, or course...).