earl@wright.mips.com (Earl Killian) (12/03/88)
In article <79321@sun.uucp>, ejensen@gorby (Eric Jensen) writes: >What I termed "nonsense" was the reasoning: > SINCE 15% of instructions are conditional branches > THEN cc machines execute 15% more instructions to set the ccs I am indeed claiming that condition codes are bad because they require the execution of extra instructions. I believe the effect is as bad as the fraction of instructions that are conditional branches. In other words, if you take a machine with condition codes and one with a compare/branch structure, the former will execute many more instructions. Yes, sometimes condition codes are set as a by-product of other operations which reduces this somewhat, but I think that is a small effect (I will show below why I think your data is misleading). What you did was to jump on my statement about condition codes costing 15% and said the difference between MIPS and SPARC branches was less than 15%. Fine, but that wasn't what I said, was it? For example, what do annulled branches have to do with condition codes vs. compare/branch?? That's stretching things a bit far I think. Later I'll give some data on how much you give up when you go from pure compare/branch to MIPS branches so you can calculate how much the 15% is reduced for comparing SPARC and MIPS, which is what you want to do. Purer compare/branch architectures do exist as you know (you worked on one: the LLNL S-2). I would still claim the difference between S-2 branches and SPARC branches is fairly close to 15%, for example. >>As for condition codes being set as a by-product of other operations, >>can you cite any SPARC statistics? My guess is that this is >>insignificant for SPARC (but I don't have one to measure it -- sorry). > >So some statistics... > >First the 15% bicc number is highly variable, and for the following >set of programs, unrepresentative. Yes, like most machine code statistics, it is highly variable. This makes it a dangerous thing, because you can prove anything by selecting programs that back your point. For example, it is well known that floating point applications tend to have much longer basic blocks, and hence much lower branch frequencies. So of course 15% is unrepresentative when you pick 2 programs and 3 floating point programs. Because fp programs are so different wrt branches, I usually separate them in my own mind when branches are the issue. 15% is my rough average for integer conditional branch frequency. >16.6% GNU CC compiled with Sun cc -O4, compiling gcc.c already > processed by cpp, target machine is 68k > ^^^ Using 68k statistics to compare SPARC and MIPS branches is an interesting twist. >Each bicc is preceded by a setcc. A percentage of these setccs, in >addition to setting the condition codes, compute results that are used >later by the program. The following table is closely approximate as >this info is not directly computed by our current tools. > >program % of setccs (for biccs) that compute results >------ ---- >GNU CC 10% >GNU Chess 3% >spice2g6 50% >doduc 60% >simple 40% Probably the GNU CC 10% result is due to for (p = list; p != NULL; p = p->next) like constructs having the p = p->next set the condition codes for the p != NULL test. That's a useful data point. Of course this is true of 68000 that you measured, but not the SPARC that you want to compare to MIPS. Please look at what construct is generating condition code setting and computing results. The high runners are all fortran codes. I suspect your compiler is compiling do 10 i = l,h <<body>> 10 continue as if (l <= h) { count = h - l; do { <<body>> i += 1; count -= 1; } while (count >= 0); } so the decrement sets the condition codes for the loopback branch. Does that mean that condition codes and compare/branch are equal here? No, because the compare/branch compiler would have generated if (l <= h) { term = h + 1; do { <<body>> i += 1; } while (i != term); } which is still one instruction faster per iteration. In other words the count -= 1 isn't real computation; it's an artifact of the condition code architecture. Low-level machine code statistics can be misleading sometimes. >So for spice, doduc and simple, half of the conditional branches are >preceded by setccs that only set the condition codes. How well does >the MIPS compiler avoid preceding branches with either load-immediate >or set-less-than for these programs? Even if it were perfect (which >is doubtful), we are only talking about 1 - 3% more SPARC instructions >for this machine characteristic - NOT 15%. I think the previous point suggests your figures are way off. >What percentage of MIPS branches are not preceded by a load-immediate >or set-less-than for similar GNU CC, GNU Chess runs? In my second posting on this subject I explained why the MIPS stuff is a reasonable compromise compared to the pure compare/branch architecture that would be ideal, and why it gets much of the savings possible. Here is some data to back it up: instr = total instructions cbr = condition branches slt-br = branches on SLT result li-br = branches on LI result cbr slt-br li-br /instr /cbr /cbr as1 0.112 0.126 0.097 ccom 0.124 0.215 0.211 compress 0.151 0.063 0.077 doducd 0.062 0.080 0.031 espresso 0.168 0.089 0.010 gnuchess 0.107 0.429 0.045 grep 0.185 0.298 0.002 nroff 0.161 0.162 0.187 simple 0.061 0.062 0.049 spice2g6 0.057 0.109 0.085 terse 0.135 0.193 0.060 ugen 0.147 0.144 0.079 wolf 0.118 0.373 0.061 int+fp mean 0.122 0.180 0.076 fp mean 0.060 0.084 0.055 int mean 0.141 0.209 0.083 Using the integer program statistics, conditional branches are 14.1% of instructions. 20.9% of those branches are for A < B, A > B, A <= B, A >= B for B != 0. 8.3% of those branches were A == Imm or A != Imm. So MIPS' selection of a compare/branch subset got 71% of the branches in one cycle, and 29% took two. -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086
earl@wright.mips.com (Earl Killian) (12/03/88)
In article <366@stca77.stc.oz>, peter@stca77 (Peter Jeremy) writes: >In article <8768@wright.mips.COM> earl@wright.mips.com (Earl Killian) writes: >> A < B, A <= B, >>A > B, A >= B (for B != 0) are not as common as you might first guess >>(look up Stanford's published results). For example these are used at >>the tops of loops, but they're not useful at the bottom (if you >>implement loopback with <=, then your loops will fail when the >>termination value is 2**31-1). > >Only if you are using signed 32-bit arithmetic. You need a whole new set of >branch instructions for unsigned arithmetic as well. Getting a loop to >execute 2**n times, where n is wordsize, is a messy process, likely to >result in infinite loops if one is not careful. Unsigned branches won't fix the problem because they do the wrong thing for negative-positive comparison and because you get the same problem at the end of the unsigned range. On the other hand, the solution is simple, not messy as you suggest. (Also it's iterating up to 2**n-1, not iterating 2**n times that's the real problem.) To compile a Pascal "for i := l to h do body" you generate the equivalent of i = l; temp = h; if (i < temp) { temp += 1; do { body; i += 1; } while (i != temp); } which is very efficient, and the only conditional required in the loop is !=. >Stating that an instruction is not useful, because it does not have the >effect that the programmer wanted when used at the extremities of the >number range, is a bit specious. If you write compilers with an attitude like that you won't pass many validation suites. The pascal validation suite explicitly tests for iteration up to maxint, for example, because some compilers are sloppy. >> At loop bottom != is the appropriate >>test. (Also a global optimizer can convert A < B tests to A != B >>tests in some cases.) > >I prefer using relational operators rather than equality operators as >part of my "defensive coding" style. If A accidently gets munged due >to a bug, I prefer the loop to exit, rather than wait another 2**32 >(or whatever) iterations until A==B again. If I found that my compiler >was "optimising" relational tests to equality tests, I would report it >as a bug and scream until it was fixed - unless the compiler is going to >guarantee that my code has no bugs that might corrupt A or B. That's what "optimize" means; if you can't prove it executes the same as the original, you don't do it. For example, it is perfectly acceptable for the compiler to take for (i = 0; i < 10; i += 1) { <<body that doesn't modify i>> } and compile it as if you wrote i = 0; if (i < 10) { do { <<body>> i += 1; } while (i != 11); } and you won't scream because you'll never notice, because they execute identically. -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086