[comp.arch] taken -vs- untaken branches

ok@quintus.UUCP (Richard A. O'Keefe) (12/25/87)

I thought that taken branches were always more costly than untaken
branches, so that one should try to arrange branches so that the
not-taken case was the commonest.  There were some messages in this
group a while back which seems to suggest that in some newer RISCs
this was no longer the case.

Would someone who understands such matters care to explain?

It would also be interesting to see a comparison of
    - branch taken cache (processor keeps a cache of instructions
	and predicts that they will/won't be taken if they were/weren't
	the last time)
    - compiler-assigned fixed branch prediction bits
    - guessing that backward branches will be taken and forward ones won't

Background: we have some code where a lot of time is going into a few
fairly short loops
	L:	<body>
		b<CC>	L	; usually taken
and it would be fairly easy to unroll them a couple of times:
	L:	<body>
		b<notCC> X	; usually NOT taken
		<body>
		b<notCC> X	; usually NOT taken
		<body>
		b<CC>	 L	; usually taken
	X:
but if non-taken branches are more costly than taken ones this would
be a bad idea, so it would be good to have some idea of on which
machines it would be a good idea and on which it wouldn't.

cik@l.cc.purdue.edu (Herman Rubin) (01/01/88)

In article <496@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
> I thought that taken branches were always more costly than untaken
> branches, so that one should try to arrange branches so that the
> not-taken case was the commonest.  There were some messages in this
> group a while back which seems to suggest that in some newer RISCs
> this was no longer the case.

On some machines, taken branches are faster; on others, not taken are.
It may depend on the amount of displacement involved.  The programmer
(or the algorithm designer) usually knows which is more likely.  As
it frequently matters, I suggest that there be a way to tell the compiler
which, and also that the architecture should use a bit in the branch
instruction to tell the machine which.

>     - compiler-assigned fixed branch prediction bits
>     - guessing that backward branches will be taken and forward ones won't

How can the compiler know?  In many of the if - then -else situations I have
encountered, the if - then is rare; in others the else is rare.  If it is 
just if -then with a drop through if the condition does not hold, usually
the drop through is the common condition _in my experience_.  Most compilers
will make the drop through as a forward branch, as that gives the shortest
code.  If there is a cost for taken branches, the conditional code should be
shunted off, possibly to a remote location.  This may involve transfers to
locations outside the current block, which is frowned on by most compilers.
This should not be treated as a conditional subroutine call, as the setup
for a subroutine is likely to exceed the branch penalty.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet

brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (01/03/88)

In article <496@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
> I thought that taken branches were always more costly than untaken
> branches, so that one should try to arrange branches so that the
> not-taken case was the commonest.  There were some messages in this
> group a while back which seems to suggest that in some newer RISCs
> this was no longer the case.

To cite an example, the Ridge 32 manufactured by Ridge Computers
of Santa Clara has a branch prediction bit for conditional branches.
The bit predicts whether or not the branch is taken or not.  If correct
the branch takes one clock, if not correct the branch takes 4 clocks
while a pipeline in the processor refills.  This branch prediction bit
is static.  One could consider dynamic branch prediction bits which
get changed by the processor to suit what happened the last time through.
One need only change the bit in the "code cache" not the bit out in main
memory.

chuck@amdahl.amdahl.com (Charles Simmons) (01/05/88)

In article <638@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>                                                       The programmer
>(or the algorithm designer) usually knows [whether a branch is more
>likely to be taken or untaken;]
>                       I suggest that there be a way to tell the compiler
>which, and also that the architecture should use a bit in the branch
>instruction to tell the machine which.
>-- 
>Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
>Phone: (317)494-6054
>hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet

Here's an idea for how a compiler might be able to automagically
make a reasonable guess for determining whether or not a branch
is more likely to be taken or not taken.  Feedback is welcome.

Some analyses of program behavior suggest that most conditional
expressions are comparisons against the constant zero.  After
this, there are a large number of comparisons against a constant.
One method of allowing a compiler to automagically guess whether
a branch will usually be taken or not, would be for the compiler
to estimate the number of legal values that one side of the expression
can take on, and the number of legal values that the other side of
the expression can take on.  These two estimates can then be used
to estimate how often the expression will be TRUE.

For example, consider the fragment of code:

    if (p == NULL) {
	...
    }

Here, the compiler might think "'p' can take on 2**32 values,
NULL can take on one value, so the estimated number of times
these two expressions are equal is 1/2**32.  Since this number
is less than 1/2, the condition will usually be false."

Obviously, this algorithm is going to be most effective when
the programmer is comparing for equality (or inequality) against a
constant.  This leads to a simplification which might be effective
in a large percentage of comparisons.  If a compiler generates
code to test a variable for equality against a constant, it might
assume that the test would usually be false; similarily, when
generating code to test for inequality against a constant, the
compiler would assume the test would usually be true.

Comments?

-- Chuck

lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) (01/06/88)

In article <20339@amdahl.amdahl.com> chuck@amdahl.amdahl.com (Charles Simmons) writes:
>One method of allowing a compiler to automagically guess whether
>a branch will usually be taken or not, would be for the compiler
>to estimate the number of legal values that one side of the expression
>can take on, and the number of legal values that the other side of
>the expression can take on.  These two estimates can then be used
>to estimate how often the expression will be TRUE.

The best heuristics use high-level context.  For example, a loop construct
is liable go around again. An IF-part is more likely to be executed than an
ELSE part. CASE TRUE is more likely than CASE FALSE.

Compilers should allow the source program to give hints. With that, it's
easy to build tools that measure running programs, and then feed the
results back into the source. The CRISP machine is well suited to this,
since its conditional branch instruction contains a hint bit.

-- 
	Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science

hen@bu-cs.BU.EDU (Bill Henneman) (01/06/88)

As I recall, the original FORTRAN (for the IBM 704) had a FREQUENCY
statement which allowed the programmer to give hints about the
probability of branches being taken.  It was used by the compiler for
index register optimization.  

I was a great fan of this kind of declaration, and I used to put them
in code as another form of comment.  I never could understood why that
statement vanished in later versions.  Maybe it could be
reincorporated in F99.

radford@calgary.UUCP (Radford Neal) (01/09/88)

In article <18251@bu-cs.BU.EDU>, hen@bu-cs.BU.EDU (Bill Henneman) writes:
> As I recall, the original FORTRAN (for the IBM 704) had a FREQUENCY
> statement which allowed the programmer to give hints about the
> probability of branches being taken ... I never could understood why that
> statement vanished in later versions...

I read somewhere, once, that the statement was removed after they 
discovered that it was implemented *backwards*, and that no users had
noticed. I haven't been able to confirm this. It's an interesting
comment on optimization and/or marketing policy if true.

I think adding hints such as this is a waste of the programmer's time. The
improvements are likely to be small. Anyway, if you're that concerned about
optimization, you'll get far more and better data by profiling an actual
run of the program and feeding that data into the compiler. I've heard
of someone doing this...

   Radford Neal
   The University of Calgary

orr@taux01.UUCP (Orr Michael ) (01/09/88)

In article <496@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>I thought that taken branches were always more costly than untaken
>branches, so that one should try to arrange branches so that the
>not-taken case was the commonest.  There were some messages in this
>group a while back which seems to suggest that in some newer RISCs
>this was no longer the case.
>
>Would someone who understands such matters care to explain?

  (Hope I qualify. I am part of the compiler writing team of NS.
  I saw some responses, but not a general answer to the question.)
>

   The relative cost of branches is usualy as stated - taken ones
require overhead. In many RISC architectures, the branch is delayed,
usually by 1 cycle. in this space, some useful instruction can be
executed, in parallel with the branching overhead. in this way the cost
of the branch is reduced. this depends on having something useful to do
which can be done at that time. (actually this is not limited to RISC.
you can do the same in CISC as well).

>It would also be interesting to see a comparison of
>    - branch taken cache (processor keeps a cache of instructions
>	and predicts that they will/won't be taken if they were/weren't
>	the last time)

	I know of several approaches to this :
  * branch destinations cache - the HW stores a small # of instrucctions
    from destinations of branches that were taken. in this way, the next
    time you get to the same branch you have lower overhead.
  * HW branch predictors, either static by branch direction, or a sort
    of a learning state-machine, which predicts branches acc. to their
    direction, history, location or whatever.
  * build a double fetch mechanism. on every branch, fetch some small
    number of instructions from both possible paths, so when you know
    which way the branch went you have a lower overhead. this is most
    useful in pipelined processors, were you pre-fetch instructions.

>    - compiler-assigned fixed branch prediction bits

   Possible, but difficult to implement. for every heuristic telling
what is more likely I can bring an opposite example. 

>    - guessing that backward branches will be taken and forward ones won't
    This is pretty much what the 32532 HW branch predictor does. this
works very well. Again this just lowers the overhead, but does not
guarantee a lower cost. a taken branch, even if correctly predicted,
may still be more costly than a non-taken one. The cost depends on
the architecture & Its' implementation.

>
>Background: we have some code where a lot of time is going into a few
>fairly short loops
>	L:	<body>
>		b<CC>	L	; usually taken
>and it would be fairly easy to unroll them a couple of times:
>	L:	<body>
>		b<notCC> X	; usually NOT taken
>		<body>
>		b<notCC> X	; usually NOT taken
>		<body>
>		b<CC>	 L	; usually taken
>	X:
>but if non-taken branches are more costly than taken ones this would
>be a bad idea, so it would be good to have some idea of on which
>machines it would be a good idea and on which it wouldn't.

   As stated before, this is what is assumed by the HW in the NS32532.
Our compiler (CTP optimizing compiler suite) knows this, and tries
to arrange things accordingly. (e.g. when changing a DIV x,2**n into
a sequence containing a conditional branch, I generate different code,
with a different branch direction for the NS32532 than I do for other
NS32XXX chips). both of the above cases directly match the type of
code the '532 HW branch predictor assumes it will see most of the time.

							Michael Orr.
--
Michael Orr orr@nsc.NSC.COM    
		IBM moto: "machines should work, pepole should think"
		Orr's comment : Nither do.
Disclaimer : Opinions, come home. All is forgiven. Papa.

davidg@eleazar.Dartmouth.EDU (David Gelhar) (01/09/88)

In article <20339@amdahl.amdahl.com> chuck@amdahl.amdahl.com (Charles Simmons) writes:
>[...]to estimate the number of legal values that one side of the expression
>can take on, and the number of legal values that the other side of
>the expression can take on.  These two estimates can then be used
>to estimate how often the expression will be TRUE.
>For example, consider the fragment of code:
>    if (p == NULL) {
>	...
>    }
>Here, the compiler might think "'p' can take on 2**32 values,
>NULL can take on one value, so the estimated number of times
>these two expressions are equal is 1/2**32.  Since this number
>is less than 1/2, the condition will usually be false."

This code fragment strikes me as an excellent example of where this
heuristic would NOT produce useful results; I think it's pretty clear
that in practice the values of pointer variables are not evenly
distributed over the 2**32 possibilities.  In fact, my estimate of
the probability of an arbitrary pointer being NULL is more like 1/2
than 1/2**32.  (Hmmmm... I guess that doesn't help much in predicting
which way the branch will go, eh? :-)).

(Hi chuck.)

fnf@mcdsun.UUCP (Fred Fish) (01/12/88)

In article <588@PT.CS.CMU.EDU> lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) writes:
>In article <20339@amdahl.amdahl.com> chuck@amdahl.amdahl.com (Charles Simmons) writes:
>>One method of allowing a compiler to automagically guess whether
>>a branch will usually be taken or not, would be ...
>
>The best heuristics use high-level context.  For example, a loop construct
>is liable go around again. An IF-part is more likely to be executed than an
>ELSE part.

I'm not so sure about the above generalization, it probably has a lot to
do with individual style.  For example, I almost always tend to write
code with the following if(){}else{} style:

	if (<what I expected to happen didn't happen>) {
		<take some sort of short evasive action>
	} else {
		<begin long sequence of instructions>
		.
		.
		.
		<end long sequence of instructions>
	}

Thus the above generalization would be almost always 100% wrong for my
style, unless lots of unexpected things happened to me...

-Fred

-- 
# Fred Fish    hao!noao!mcdsun!fnf    (602) 438-3614
# Motorola Computer Division, 2900 S. Diablo Way, Tempe, Az 85282  USA