aglew@ccvaxa.UUCP (02/27/88)
The big thing about three address instructions is that it lets you take maximal advantage of a multiport register file. Ie. if you *HAVE* to say A=B+C, then A=C;A+=C makes you pay a penalty. And there is always hope that compilers will start using the 3 address instructions to avoid dependencies. But, if you don't use them, why pay for them? Why not have a decoded instruction cache that takes a compact representation and generates the canonical form? It doesn't have to be as fancy as CRISP - Patterson's group had a paper on this.
oconnor@sunset.steinmetz (Dennis M. O'Connor) (03/01/88)
An article by aglew@ccvaxa.UUCP says: ] ] The big thing about three address instructions is that ] it lets you take maximal advantage of a multiport register ] file. Ie. if you *HAVE* to say A=B+C, then A=C;A+=C makes you ] pay a penalty. And there is always hope that compilers will ] start using the 3 address instructions to avoid dependencies. Many operations in load-store machines are of the load-it, modify-it, maybe modify-it-again, then maybe store-it. These types of operations will never want three-address formats. The original (destroyed in two-address) value is never reused. Our research indicated that this was the most common case. For these types of data, dependencies can't be avoided. ] But, if you don't use them, why pay for them? Why not have a ] decoded instruction cache that takes a compact representation ] and generates the canonical form? It doesn't have to be as fancy ] as CRISP - Patterson's group had a paper on this. It adds latency. Especially it adds to the latency of a branch. Either you will have more post-branch slots to fill, or you will have a more expensive cache-miss penalty. Branches seem to be about one-tenth of all instructions. Plus, of course, a classic RISC argument : couldn't you have found a BETTER use for all that silicon ? A bigger cache ? More registers ? An on-chip FPU ? UNIX-on-ROM :-) ? -- Dennis O'Connor UUNET!steinmetz!sunset!oconnor ARPA: OCONNORDM@ge-crd.arpa (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)
bcase@Apple.COM (Brian Case) (03/02/88)
In article <9728@steinmetz.steinmetz.UUCP> sunset!oconnor@steinmetz.UUCP writes: >Many operations in load-store machines are of the load-it, >modify-it, maybe modify-it-again, then maybe store-it. These >types of operations will never want three-address formats. >The original (destroyed in two-address) value is never reused. >Our research indicated that this was the most common case. >For these types of data, dependencies can't be avoided. In my experience, just the opposite is true; er, that is the opposite of "The original value is never reused" is true. Yes, it is true that many operations are like "load-it, modify, store-it-back" but reuse is, to me, one the *MAIN* benefits of RISC architectures. Marty Hopkins said it pretty well in some short papers. Lots of registers and three-address operations facilitate reuse. If having a three address format reduces the instruction (cycle) count in your inner loops from 10 to 9, you potentially have 10% better performance. If the inner loops go from 5 to 4 instructions, it's even better. Three address instructions don't have to be terribly frequently used to be very important. >] But, if you don't use them, why pay for them? Why not have a >] decoded instruction cache that takes a compact representation >] and generates the canonical form? It doesn't have to be as fancy >] as CRISP - Patterson's group had a paper on this. > >It adds latency. Especially it adds to the latency of a branch. Either >you will have more post-branch slots to fill, or you will have a more >expensive cache-miss penalty. Branches seem to be about one-tenth >of all instructions. This is the one big lose with decoded instruction caches. A smaller lose is the size; in the same area, you could have had probably a 2x size encoded instruction cache. Depending upon the actual sizes involved, the 2x size difference may not have much effect (rule of thumb: double the cache size will halve the miss rate. Sort of.). An 8K instrution cache is probably not much worse than a 16K instruction cache (depending on lots of things, or course...).