[comp.arch] condition codes vs. compare/branch

earl@wright.mips.com (Earl Killian) (12/03/88)

In article <79321@sun.uucp>, ejensen@gorby (Eric Jensen) writes:
>What I termed "nonsense" was the reasoning:
>	SINCE 15% of instructions are conditional branches
>	THEN cc machines execute 15% more instructions to set the ccs

I am indeed claiming that condition codes are bad because they require
the execution of extra instructions.  I believe the effect is as bad
as the fraction of instructions that are conditional branches.  In
other words, if you take a machine with condition codes and one with a
compare/branch structure, the former will execute many more
instructions.  Yes, sometimes condition codes are set as a by-product
of other operations which reduces this somewhat, but I think that is a
small effect (I will show below why I think your data is misleading).

What you did was to jump on my statement about condition codes costing
15% and said the difference between MIPS and SPARC branches was less
than 15%.  Fine, but that wasn't what I said, was it?  For example,
what do annulled branches have to do with condition codes vs.
compare/branch??  That's stretching things a bit far I think.

Later I'll give some data on how much you give up when you go from
pure compare/branch to MIPS branches so you can calculate how much the
15% is reduced for comparing SPARC and MIPS, which is what you want to
do.  Purer compare/branch architectures do exist as you know (you
worked on one: the LLNL S-2).  I would still claim the difference
between S-2 branches and SPARC branches is fairly close to 15%, for
example.

>>As for condition codes being set as a by-product of other operations,
>>can you cite any SPARC statistics?  My guess is that this is
>>insignificant for SPARC (but I don't have one to measure it -- sorry).
>
>So some statistics...
>
>First the 15% bicc number is highly variable, and for the following
>set of programs, unrepresentative.

Yes, like most machine code statistics, it is highly variable.  This
makes it a dangerous thing, because you can prove anything by
selecting programs that back your point.  For example, it is well
known that floating point applications tend to have much longer basic
blocks, and hence much lower branch frequencies.  So of course 15% is
unrepresentative when you pick 2 programs and 3 floating point
programs.  Because fp programs are so different wrt branches, I
usually separate them in my own mind when branches are the issue.
15% is my rough average for integer conditional branch frequency.

>16.6%		GNU CC compiled with Sun cc -O4, compiling gcc.c already
>		processed by cpp, target machine is 68k
>						    ^^^
Using 68k statistics to compare SPARC and MIPS branches is an
interesting twist.

>Each bicc is preceded by a setcc.  A percentage of these setccs, in
>addition to setting the condition codes, compute results that are used
>later by the program.  The following table is closely approximate as
>this info is not directly computed by our current tools.
>
>program	% of setccs (for biccs) that compute results
>------		----
>GNU CC		10%
>GNU Chess	 3%
>spice2g6	50%
>doduc		60%
>simple		40%
Probably the GNU CC 10% result is due to
	for (p = list; p != NULL; p = p->next)
like constructs having the p = p->next set the condition codes for the
p != NULL test.  That's a useful data point.  Of course this is true of
68000 that you measured, but not the SPARC that you want to compare to
MIPS.

Please look at what construct is generating condition code setting and
computing results.  The high runners are all fortran codes.  I suspect
your compiler is compiling
	do 10 i = l,h
	  <<body>>
  10	continue
as
	if (l <= h) {
	  count = h - l;
	  do {
	    <<body>>
	    i += 1;
	    count -= 1;
	  } while (count >= 0);
	}
so the decrement sets the condition codes for the loopback branch.
Does that mean that condition codes and compare/branch are equal here?
No, because the compare/branch compiler would have generated
	if (l <= h) {
	  term = h + 1;
	  do {
	    <<body>>
	    i += 1;
	  } while (i != term);
	}
which is still one instruction faster per iteration.  In other words
the count -= 1 isn't real computation; it's an artifact of the condition
code architecture.  Low-level machine code statistics can be
misleading sometimes.

>So for spice, doduc and simple, half of the conditional branches are
>preceded by setccs that only set the condition codes.  How well does
>the MIPS compiler avoid preceding branches with either load-immediate
>or set-less-than for these programs?  Even if it were perfect (which
>is doubtful), we are only talking about 1 - 3% more SPARC instructions
>for this machine characteristic - NOT 15%.

I think the previous point suggests your figures are way off.

>What percentage of MIPS branches are not preceded by a load-immediate
>or set-less-than for similar GNU CC, GNU Chess runs?

In my second posting on this subject I explained why the MIPS stuff is
a reasonable compromise compared to the pure compare/branch
architecture that would be ideal, and why it gets much of the savings
possible.  Here is some data to back it up:

instr = total instructions
cbr = condition branches
slt-br = branches on SLT result
li-br = branches on LI result
			cbr	slt-br	 li-br
			/instr	/cbr	 /cbr
as1			0.112	0.126	0.097
ccom			0.124	0.215	0.211
compress		0.151	0.063	0.077
doducd			0.062	0.080	0.031
espresso		0.168	0.089	0.010
gnuchess		0.107	0.429	0.045
grep			0.185	0.298	0.002
nroff			0.161	0.162	0.187
simple			0.061	0.062	0.049
spice2g6		0.057	0.109	0.085
terse			0.135	0.193	0.060
ugen			0.147	0.144	0.079
wolf			0.118	0.373	0.061

int+fp mean		0.122	0.180	0.076
fp mean			0.060	0.084	0.055
int mean		0.141	0.209	0.083

Using the integer program statistics, conditional branches are 14.1%
of instructions.  20.9% of those branches are for A < B, A > B, A <=
B, A >= B for B != 0.  8.3% of those branches were A == Imm or A != Imm.
So MIPS' selection of a compare/branch subset got 71% of the branches
in one cycle, and 29% took two.
--
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086

earl@wright.mips.com (Earl Killian) (12/03/88)

In article <366@stca77.stc.oz>, peter@stca77 (Peter Jeremy) writes:
>In article <8768@wright.mips.COM> earl@wright.mips.com (Earl Killian) writes:
>>  A < B, A <= B,
>>A > B, A >= B (for B != 0) are not as common as you might first guess
>>(look up Stanford's published results).  For example these are used at
>>the tops of loops, but they're not useful at the bottom (if you
>>implement loopback with <=, then your loops will fail when the
>>termination value is 2**31-1).
>
>Only if you are using signed 32-bit arithmetic.  You need a whole new set of
>branch instructions for unsigned arithmetic as well.  Getting a loop to
>execute 2**n times, where n is wordsize, is a messy process, likely to
>result in infinite loops if one is not careful.

Unsigned branches won't fix the problem because they do the wrong
thing for negative-positive comparison and because you get the same
problem at the end of the unsigned range.

On the other hand, the solution is simple, not messy as you suggest.
(Also it's iterating up to 2**n-1, not iterating 2**n times that's
the real problem.)
To compile a Pascal "for i := l to h do body" you generate the
equivalent of
	i = l;
	temp = h;
	if (i < temp) {
	  temp += 1;
	  do {
	    body;
	    i += 1;
	  } while (i != temp);
	}
which is very efficient, and the only conditional required in the loop
is !=.

>Stating that an instruction is not useful, because it does not have the
>effect that the programmer wanted when used at the extremities of the
>number range, is a bit specious.

If you write compilers with an attitude like that you won't pass many
validation suites.  The pascal validation suite explicitly tests for
iteration up to maxint, for example, because some compilers are
sloppy.

>>  At loop bottom != is the appropriate
>>test.  (Also a global optimizer can convert A < B tests to A != B
>>tests in some cases.)
>
>I prefer using relational operators rather than equality operators as
>part of my "defensive coding" style.  If A accidently gets munged due
>to a bug, I prefer the loop to exit, rather than wait another 2**32
>(or whatever) iterations until A==B again.  If I found that my compiler
>was "optimising" relational tests to equality tests, I would report it
>as a bug and scream until it was fixed - unless the compiler is going to
>guarantee that my code has no bugs that might corrupt A or B.

That's what "optimize" means; if you can't prove it executes the same
as the original, you don't do it.  For example, it is perfectly
acceptable for the compiler to take
	for (i = 0; i < 10; i += 1) {
	  <<body that doesn't modify i>>
	}
and compile it as if you wrote
	i = 0;
	if (i < 10) {
	  do {
	    <<body>>
	    i += 1;
	  } while (i != 11);
	}
and you won't scream because you'll never notice, because they execute
identically.
--
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086