[comp.arch] Some statistics for Re: CC machines do not execute 15% more instructions

ejensen@gorby.Sun.COM (Eric Jensen) (11/30/88)

Summary of what follows:

	1) For floating point programs, setting condition codes does
	not impact performance.

	2) For very branch intensive integer programs the difference
	between SPARC and MIPS branch architectures might be as much
	as 3%.

In article <8768@wright.mips.COM> earl@wright.mips.com (Earl Killian) writes:
>In article <78977@sun.uucp>, ejensen@gorby (Eric Jensen) writes:
>>In article <8523@wright.mips.COM> earl@wright.mips.com (Earl Killian) writes:
>>>Condition codes are not only harmful for translation, but for
>>>performance.  15% of instructions are conditional branches.  If you
>>>take two instructions instead of one to do this simple operation (set
>>>the condition codes and then branch on them), you've just forced your
>>>computer to execute 15% more instructions.
>>
>>This is nonsense.  From  "MIPS R2000 RISC Architecture" by Gerry
>>Kane, pp. C-4 and C-5, I quote:
>
>Nonsense?  More like a carefully chosen design.  The credit goes to
>the original Stanford MIPS crowd.  When they designed their original

I am told by informed sources that Hennessy freely gives credit to
Manolis Katevenis (Berkeley) for this idea (thesis sections 6.3.3 -
6.3.5).

>MIPS chip they noticed that some branch conditions are much more
>common than others, and that this could be exploited.  A < B, A <= B,
>A > B, A >= B (for B != 0) are not as common as you might first guess

What I termed "nonsense" was the reasoning:

	SINCE 15% of instructions are conditional branches
	THEN cc machines execute 15% more instructions to set the ccs

I think I interpreted this correctly as you appear to have altered the
subject line in your reply :-)

> [description of how optimizers can transform loop control relationals
>  with > or < in them into ones using == and !=]

Thank you for the info.  But are you implying that set-less-than and
load-immediate (for branches) are mostly used in loop headers?  I
don't believe that. Please show data.

>As for condition codes being set as a by-product of other operations,
>can you cite any SPARC statistics?  My guess is that this is
>insignificant for SPARC (but I don't have one to measure it -- sorry).

So some statistics...

First the 15% bicc number is highly variable, and for the following
set of programs, unrepresentative.

[ Some definitions..
  bicc  -> branch on integer condition codes
  fbfcc -> branch on floating point condition codes
  setcc -> any instruction that sets the integer condition codes ]

 instructions
%bicc	%fbfcc	program
-----	------	-------
16.6%		GNU CC compiled with Sun cc -O4, compiling gcc.c already
		processed by cpp, target machine is 68k
11.6%		GNU Chess compiled with Sun cc -O4, playing itself on level 2
 5.4%	 .8%	spice2g6 compiled f77 -O2 digsr input
 3.2%	2.2%	doduc compiled f77 -O3
 2.3%	1.7%	simple compiled f77 -O3
	^
	MIPS & SPARC use condition codes for their floating-point units

Each bicc is preceded by a setcc.  A percentage of these setccs, in
addition to setting the condition codes, compute results that are used
later by the program.  The following table is closely approximate as
this info is not directly computed by our current tools.

program		% of setccs (for biccs) that compute results
------		----
GNU CC		10%
GNU Chess	 3%
spice2g6	50%
doduc		60%
simple		40%

So for spice, doduc and simple, half of the conditional branches are
preceded by setccs that only set the condition codes.  How well does
the MIPS compiler avoid preceding branches with either load-immediate
or set-less-than for these programs?  Even if it were perfect (which
is doubtful), we are only talking about 1 - 3% more SPARC instructions
for this machine characteristic - NOT 15%.  Also 1 - 3% more setccs
would NOT translate into 1 - 3% more cycles as both architectures are
more dominated by floating-point interlocks and floating-point
compare&branch sequences.

For branch intensive programs like GNU CC, getting rid of setcc
instructions does not always improve performance.  GNU CC has many
instruction sequences that look like [bicc,setcc,bicc] where the setcc
is in the executed delay slot of the preceding branch (bicc).  After
accounting for this phenomenon and setcc's that compute a value, GNU
CC spends 12% (not 16.6%) of it's instructions to just set the
condition codes.  Most of GNU Chess setccs just set the condition
code.  Again these numbers are NOT 15%.

What percentage of MIPS branches are not preceded by a load-immediate
or set-less-than for similar GNU CC, GNU Chess runs?

Most of the cycles (0 - ?%) SPARC may lose on additional setcc
instructions, it appears to gain with annulled branches.  I believe it
is not uncommon for the MIPS compiler to fail to fill 60% of the
branch delay slots.  In SPARC these slots can be filled using annulled
branches (the SPARC compiler tries to use delayed branches before
resorting to annulled branches). For GNU CC annulled branches account
for 60% of all conditional branches.  Of this 60%, 60% are taken. So..
	
	16.6% * 60% * 60% = 6%

So the "ceiling" is 6% (12% from setccs - 6% from annulled branches =
6%) on any advantage the MIPS branch architecture has over the SPARC
branch architecture on a "worst case" integer program (the "ceiling"
for GNU Chess is about 3.5%).  But the MIPS compiler is going to
generate some load-immediate (li) or set-less-than (slt) instructions
before branches.

If the MIPS compiler generated half as many li&slt instructions as the
SPARC compiler generates setcc instructions, the performance is equal.

If the MIPS compiler generated a quarter as many li&slt instructions as the
SPARC compiler generates setcc instructions, the MIPS architecture
would have a 3% instruction advantage.  I'd expect this to be the most
likely case.


Eric H. Jensen
ejensen@sun.com