ok@quintus.UUCP (Richard A. O'Keefe) (12/25/87)
I thought that taken branches were always more costly than untaken branches, so that one should try to arrange branches so that the not-taken case was the commonest. There were some messages in this group a while back which seems to suggest that in some newer RISCs this was no longer the case. Would someone who understands such matters care to explain? It would also be interesting to see a comparison of - branch taken cache (processor keeps a cache of instructions and predicts that they will/won't be taken if they were/weren't the last time) - compiler-assigned fixed branch prediction bits - guessing that backward branches will be taken and forward ones won't Background: we have some code where a lot of time is going into a few fairly short loops L: <body> b<CC> L ; usually taken and it would be fairly easy to unroll them a couple of times: L: <body> b<notCC> X ; usually NOT taken <body> b<notCC> X ; usually NOT taken <body> b<CC> L ; usually taken X: but if non-taken branches are more costly than taken ones this would be a bad idea, so it would be good to have some idea of on which machines it would be a good idea and on which it wouldn't.
cik@l.cc.purdue.edu (Herman Rubin) (01/01/88)
In article <496@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes: > I thought that taken branches were always more costly than untaken > branches, so that one should try to arrange branches so that the > not-taken case was the commonest. There were some messages in this > group a while back which seems to suggest that in some newer RISCs > this was no longer the case. On some machines, taken branches are faster; on others, not taken are. It may depend on the amount of displacement involved. The programmer (or the algorithm designer) usually knows which is more likely. As it frequently matters, I suggest that there be a way to tell the compiler which, and also that the architecture should use a bit in the branch instruction to tell the machine which. > - compiler-assigned fixed branch prediction bits > - guessing that backward branches will be taken and forward ones won't How can the compiler know? In many of the if - then -else situations I have encountered, the if - then is rare; in others the else is rare. If it is just if -then with a drop through if the condition does not hold, usually the drop through is the common condition _in my experience_. Most compilers will make the drop through as a forward branch, as that gives the shortest code. If there is a cost for taken branches, the conditional code should be shunted off, possibly to a remote location. This may involve transfers to locations outside the current block, which is frowned on by most compilers. This should not be treated as a conditional subroutine call, as the setup for a subroutine is likely to exceed the branch penalty. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet
brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (01/03/88)
In article <496@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes: > I thought that taken branches were always more costly than untaken > branches, so that one should try to arrange branches so that the > not-taken case was the commonest. There were some messages in this > group a while back which seems to suggest that in some newer RISCs > this was no longer the case. To cite an example, the Ridge 32 manufactured by Ridge Computers of Santa Clara has a branch prediction bit for conditional branches. The bit predicts whether or not the branch is taken or not. If correct the branch takes one clock, if not correct the branch takes 4 clocks while a pipeline in the processor refills. This branch prediction bit is static. One could consider dynamic branch prediction bits which get changed by the processor to suit what happened the last time through. One need only change the bit in the "code cache" not the bit out in main memory.
chuck@amdahl.amdahl.com (Charles Simmons) (01/05/88)
In article <638@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > The programmer >(or the algorithm designer) usually knows [whether a branch is more >likely to be taken or untaken;] > I suggest that there be a way to tell the compiler >which, and also that the architecture should use a bit in the branch >instruction to tell the machine which. >-- >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 >Phone: (317)494-6054 >hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet Here's an idea for how a compiler might be able to automagically make a reasonable guess for determining whether or not a branch is more likely to be taken or not taken. Feedback is welcome. Some analyses of program behavior suggest that most conditional expressions are comparisons against the constant zero. After this, there are a large number of comparisons against a constant. One method of allowing a compiler to automagically guess whether a branch will usually be taken or not, would be for the compiler to estimate the number of legal values that one side of the expression can take on, and the number of legal values that the other side of the expression can take on. These two estimates can then be used to estimate how often the expression will be TRUE. For example, consider the fragment of code: if (p == NULL) { ... } Here, the compiler might think "'p' can take on 2**32 values, NULL can take on one value, so the estimated number of times these two expressions are equal is 1/2**32. Since this number is less than 1/2, the condition will usually be false." Obviously, this algorithm is going to be most effective when the programmer is comparing for equality (or inequality) against a constant. This leads to a simplification which might be effective in a large percentage of comparisons. If a compiler generates code to test a variable for equality against a constant, it might assume that the test would usually be false; similarily, when generating code to test for inequality against a constant, the compiler would assume the test would usually be true. Comments? -- Chuck
lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (01/06/88)
In article <20339@amdahl.amdahl.com> chuck@amdahl.amdahl.com (Charles Simmons) writes: >One method of allowing a compiler to automagically guess whether >a branch will usually be taken or not, would be for the compiler >to estimate the number of legal values that one side of the expression >can take on, and the number of legal values that the other side of >the expression can take on. These two estimates can then be used >to estimate how often the expression will be TRUE. The best heuristics use high-level context. For example, a loop construct is liable go around again. An IF-part is more likely to be executed than an ELSE part. CASE TRUE is more likely than CASE FALSE. Compilers should allow the source program to give hints. With that, it's easy to build tools that measure running programs, and then feed the results back into the source. The CRISP machine is well suited to this, since its conditional branch instruction contains a hint bit. -- Don lindsay@k.gp.cs.cmu.edu CMU Computer Science
hen@bu-cs.BU.EDU (Bill Henneman) (01/06/88)
As I recall, the original FORTRAN (for the IBM 704) had a FREQUENCY statement which allowed the programmer to give hints about the probability of branches being taken. It was used by the compiler for index register optimization. I was a great fan of this kind of declaration, and I used to put them in code as another form of comment. I never could understood why that statement vanished in later versions. Maybe it could be reincorporated in F99.
radford@calgary.UUCP (Radford Neal) (01/09/88)
In article <18251@bu-cs.BU.EDU>, hen@bu-cs.BU.EDU (Bill Henneman) writes: > As I recall, the original FORTRAN (for the IBM 704) had a FREQUENCY > statement which allowed the programmer to give hints about the > probability of branches being taken ... I never could understood why that > statement vanished in later versions... I read somewhere, once, that the statement was removed after they discovered that it was implemented *backwards*, and that no users had noticed. I haven't been able to confirm this. It's an interesting comment on optimization and/or marketing policy if true. I think adding hints such as this is a waste of the programmer's time. The improvements are likely to be small. Anyway, if you're that concerned about optimization, you'll get far more and better data by profiling an actual run of the program and feeding that data into the compiler. I've heard of someone doing this... Radford Neal The University of Calgary
orr@taux01.UUCP (Orr Michael ) (01/09/88)
In article <496@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >I thought that taken branches were always more costly than untaken >branches, so that one should try to arrange branches so that the >not-taken case was the commonest. There were some messages in this >group a while back which seems to suggest that in some newer RISCs >this was no longer the case. > >Would someone who understands such matters care to explain? (Hope I qualify. I am part of the compiler writing team of NS. I saw some responses, but not a general answer to the question.) > The relative cost of branches is usualy as stated - taken ones require overhead. In many RISC architectures, the branch is delayed, usually by 1 cycle. in this space, some useful instruction can be executed, in parallel with the branching overhead. in this way the cost of the branch is reduced. this depends on having something useful to do which can be done at that time. (actually this is not limited to RISC. you can do the same in CISC as well). >It would also be interesting to see a comparison of > - branch taken cache (processor keeps a cache of instructions > and predicts that they will/won't be taken if they were/weren't > the last time) I know of several approaches to this : * branch destinations cache - the HW stores a small # of instrucctions from destinations of branches that were taken. in this way, the next time you get to the same branch you have lower overhead. * HW branch predictors, either static by branch direction, or a sort of a learning state-machine, which predicts branches acc. to their direction, history, location or whatever. * build a double fetch mechanism. on every branch, fetch some small number of instructions from both possible paths, so when you know which way the branch went you have a lower overhead. this is most useful in pipelined processors, were you pre-fetch instructions. > - compiler-assigned fixed branch prediction bits Possible, but difficult to implement. for every heuristic telling what is more likely I can bring an opposite example. > - guessing that backward branches will be taken and forward ones won't This is pretty much what the 32532 HW branch predictor does. this works very well. Again this just lowers the overhead, but does not guarantee a lower cost. a taken branch, even if correctly predicted, may still be more costly than a non-taken one. The cost depends on the architecture & Its' implementation. > >Background: we have some code where a lot of time is going into a few >fairly short loops > L: <body> > b<CC> L ; usually taken >and it would be fairly easy to unroll them a couple of times: > L: <body> > b<notCC> X ; usually NOT taken > <body> > b<notCC> X ; usually NOT taken > <body> > b<CC> L ; usually taken > X: >but if non-taken branches are more costly than taken ones this would >be a bad idea, so it would be good to have some idea of on which >machines it would be a good idea and on which it wouldn't. As stated before, this is what is assumed by the HW in the NS32532. Our compiler (CTP optimizing compiler suite) knows this, and tries to arrange things accordingly. (e.g. when changing a DIV x,2**n into a sequence containing a conditional branch, I generate different code, with a different branch direction for the NS32532 than I do for other NS32XXX chips). both of the above cases directly match the type of code the '532 HW branch predictor assumes it will see most of the time. Michael Orr. -- Michael Orr orr@nsc.NSC.COM IBM moto: "machines should work, pepole should think" Orr's comment : Nither do. Disclaimer : Opinions, come home. All is forgiven. Papa.
davidg@eleazar.Dartmouth.EDU (David Gelhar) (01/09/88)
In article <20339@amdahl.amdahl.com> chuck@amdahl.amdahl.com (Charles Simmons) writes: >[...]to estimate the number of legal values that one side of the expression >can take on, and the number of legal values that the other side of >the expression can take on. These two estimates can then be used >to estimate how often the expression will be TRUE. >For example, consider the fragment of code: > if (p == NULL) { > ... > } >Here, the compiler might think "'p' can take on 2**32 values, >NULL can take on one value, so the estimated number of times >these two expressions are equal is 1/2**32. Since this number >is less than 1/2, the condition will usually be false." This code fragment strikes me as an excellent example of where this heuristic would NOT produce useful results; I think it's pretty clear that in practice the values of pointer variables are not evenly distributed over the 2**32 possibilities. In fact, my estimate of the probability of an arbitrary pointer being NULL is more like 1/2 than 1/2**32. (Hmmmm... I guess that doesn't help much in predicting which way the branch will go, eh? :-)). (Hi chuck.)
fnf@mcdsun.UUCP (Fred Fish) (01/12/88)
In article <588@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes: >In article <20339@amdahl.amdahl.com> chuck@amdahl.amdahl.com (Charles Simmons) writes: >>One method of allowing a compiler to automagically guess whether >>a branch will usually be taken or not, would be ... > >The best heuristics use high-level context. For example, a loop construct >is liable go around again. An IF-part is more likely to be executed than an >ELSE part. I'm not so sure about the above generalization, it probably has a lot to do with individual style. For example, I almost always tend to write code with the following if(){}else{} style: if (<what I expected to happen didn't happen>) { <take some sort of short evasive action> } else { <begin long sequence of instructions> . . . <end long sequence of instructions> } Thus the above generalization would be almost always 100% wrong for my style, unless lots of unexpected things happened to me... -Fred -- # Fred Fish hao!noao!mcdsun!fnf (602) 438-3614 # Motorola Computer Division, 2900 S. Diablo Way, Tempe, Az 85282 USA