ejensen@gorby.Sun.COM (Eric Jensen) (11/30/88)
Summary of what follows: 1) For floating point programs, setting condition codes does not impact performance. 2) For very branch intensive integer programs the difference between SPARC and MIPS branch architectures might be as much as 3%. In article <8768@wright.mips.COM> earl@wright.mips.com (Earl Killian) writes: >In article <78977@sun.uucp>, ejensen@gorby (Eric Jensen) writes: >>In article <8523@wright.mips.COM> earl@wright.mips.com (Earl Killian) writes: >>>Condition codes are not only harmful for translation, but for >>>performance. 15% of instructions are conditional branches. If you >>>take two instructions instead of one to do this simple operation (set >>>the condition codes and then branch on them), you've just forced your >>>computer to execute 15% more instructions. >> >>This is nonsense. From "MIPS R2000 RISC Architecture" by Gerry >>Kane, pp. C-4 and C-5, I quote: > >Nonsense? More like a carefully chosen design. The credit goes to >the original Stanford MIPS crowd. When they designed their original I am told by informed sources that Hennessy freely gives credit to Manolis Katevenis (Berkeley) for this idea (thesis sections 6.3.3 - 6.3.5). >MIPS chip they noticed that some branch conditions are much more >common than others, and that this could be exploited. A < B, A <= B, >A > B, A >= B (for B != 0) are not as common as you might first guess What I termed "nonsense" was the reasoning: SINCE 15% of instructions are conditional branches THEN cc machines execute 15% more instructions to set the ccs I think I interpreted this correctly as you appear to have altered the subject line in your reply :-) > [description of how optimizers can transform loop control relationals > with > or < in them into ones using == and !=] Thank you for the info. But are you implying that set-less-than and load-immediate (for branches) are mostly used in loop headers? I don't believe that. Please show data. >As for condition codes being set as a by-product of other operations, >can you cite any SPARC statistics? My guess is that this is >insignificant for SPARC (but I don't have one to measure it -- sorry). So some statistics... First the 15% bicc number is highly variable, and for the following set of programs, unrepresentative. [ Some definitions.. bicc -> branch on integer condition codes fbfcc -> branch on floating point condition codes setcc -> any instruction that sets the integer condition codes ] instructions %bicc %fbfcc program ----- ------ ------- 16.6% GNU CC compiled with Sun cc -O4, compiling gcc.c already processed by cpp, target machine is 68k 11.6% GNU Chess compiled with Sun cc -O4, playing itself on level 2 5.4% .8% spice2g6 compiled f77 -O2 digsr input 3.2% 2.2% doduc compiled f77 -O3 2.3% 1.7% simple compiled f77 -O3 ^ MIPS & SPARC use condition codes for their floating-point units Each bicc is preceded by a setcc. A percentage of these setccs, in addition to setting the condition codes, compute results that are used later by the program. The following table is closely approximate as this info is not directly computed by our current tools. program % of setccs (for biccs) that compute results ------ ---- GNU CC 10% GNU Chess 3% spice2g6 50% doduc 60% simple 40% So for spice, doduc and simple, half of the conditional branches are preceded by setccs that only set the condition codes. How well does the MIPS compiler avoid preceding branches with either load-immediate or set-less-than for these programs? Even if it were perfect (which is doubtful), we are only talking about 1 - 3% more SPARC instructions for this machine characteristic - NOT 15%. Also 1 - 3% more setccs would NOT translate into 1 - 3% more cycles as both architectures are more dominated by floating-point interlocks and floating-point compare&branch sequences. For branch intensive programs like GNU CC, getting rid of setcc instructions does not always improve performance. GNU CC has many instruction sequences that look like [bicc,setcc,bicc] where the setcc is in the executed delay slot of the preceding branch (bicc). After accounting for this phenomenon and setcc's that compute a value, GNU CC spends 12% (not 16.6%) of it's instructions to just set the condition codes. Most of GNU Chess setccs just set the condition code. Again these numbers are NOT 15%. What percentage of MIPS branches are not preceded by a load-immediate or set-less-than for similar GNU CC, GNU Chess runs? Most of the cycles (0 - ?%) SPARC may lose on additional setcc instructions, it appears to gain with annulled branches. I believe it is not uncommon for the MIPS compiler to fail to fill 60% of the branch delay slots. In SPARC these slots can be filled using annulled branches (the SPARC compiler tries to use delayed branches before resorting to annulled branches). For GNU CC annulled branches account for 60% of all conditional branches. Of this 60%, 60% are taken. So.. 16.6% * 60% * 60% = 6% So the "ceiling" is 6% (12% from setccs - 6% from annulled branches = 6%) on any advantage the MIPS branch architecture has over the SPARC branch architecture on a "worst case" integer program (the "ceiling" for GNU Chess is about 3.5%). But the MIPS compiler is going to generate some load-immediate (li) or set-less-than (slt) instructions before branches. If the MIPS compiler generated half as many li&slt instructions as the SPARC compiler generates setcc instructions, the performance is equal. If the MIPS compiler generated a quarter as many li&slt instructions as the SPARC compiler generates setcc instructions, the MIPS architecture would have a 3% instruction advantage. I'd expect this to be the most likely case. Eric H. Jensen ejensen@sun.com