hansen@mips.UUCP (Craig Hansen) (04/21/87)
I've been reading two publications from National, one the infamous "Series 32000: NS32532 Competitive Comparison" that lists the eight separate patent applications we've already heard about, the other titled "Series 32000: NS32532 Technical Introduction." The feature that most caught my eye was that of branch prediction, for which I quote from the "Technical Introduction": The selection is based on the branch condition and direction; forward branches are predicted as taken, backward branches as non-taken. Well, I don't see how the branch prediction can depend on the branch condition, since waiting for the condition corresponds to no branch prediction at all. (Certainly novel and non-obvious...) But when I was a young sprout, they always told me that backward branches often represented loops and were most frequently taken. Forward branches often represent conditional expressions, and are more ambivalent. As far as I can see from the description, there is no prediction based on past behaviour from previous invocations of the branch instruction. If the description in National's literature is entirely accurate, the branch prediction scheme used is seriously broken and should predict the wrong answer more often than the right answer. Then again, if National hasn't seen silicon yet, and uses the opportunity to fix it now, I may have just saved them a few tens of thousands of dollars in wafer costs and hundreds in otherwise lost revenue.... -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...decwrl!mips!hansen
amos@instable.UUCP (04/22/87)
Looks like the person who wrote the brochure about the 32532 didn't look at the spec; anyway it's certain that: 1. The whole point of branch prediction is not waiting for the condition codes; therefore the prediction cannot, and does not, rely on them. 2. Forward branches are predicted as not taken, backward branches are predicted as taken, no matter who says otherwise. (We have tried several methods of state machines, caches, etc. and found out that this simple-minded method had the best ratio of relative performance increase vs. cost in chip area and overhead.) -- Amos Shapir National Semiconductor (Israel) 6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel Tel. (972)52-522261 amos%nsta@nsc.com {hplabs,pyramid,sun,decwrl} 34.48'E 32.10'N
robison@uiucdcsb.cs.uiuc.edu (04/22/87)
Are there any good papers on branch prediction? I've seen two schemes: 1) Put an extra bit in the instruction indicating which way the branch will probably go. This is a compile-time prediction. 2) Keep a record of past branches, i.e. run-time prediction. Though I've heard of examples of machines with each scheme, I've never seen any studies of their efficiency. Arch D. Robison University of Illinois at Urbana-Champaign CSNET: robison@UIUC.CSNET UUCP: {ihnp4,pur-ee,convex}!uiucdcs!robison ARPA: robison@B.CS.UIUC.EDU (robison@UIUC.ARPA)
roger@nsc.nsc.com (Roger Thompson) (04/23/87)
In article <324@dumbo.UUCP>, hansen@mips.UUCP (Craig Hansen) writes: > > The selection is based on the branch condition and > direction; forward branches are predicted as taken, > backward branches as non-taken. > YES the brochure has misplaced a "non" in the above paragraph. Reverse the usage. > Well, I don't see how the branch prediction can depend on the branch > condition, since waiting for the condition corresponds to no branch > prediction at all. (Certainly novel and non-obvious...) The condition factor that is used is that which is specified by the opcode, not the actual result from the operation being tested. The concept is that the predictor has an algorithm that has multiple inputs. Under certain conditions, the predictor determines one answer or the other. The prediction should not be confussed with the result. Our prediction algorithm is accurate to 70 - 80%. > > But when I was a young sprout, they always told me that backward branches > often represented loops and were most frequently taken. Forward branches > often represent conditional expressions, and are more ambivalent. > Agreed.
shebanow@ji.Berkeley.EDU (Mike Shebanow) (04/23/87)
In article <165100008@uiucdcsb> Arch Robison writes: > >Are there any good papers on branch prediction? I've seen two schemes: > > 1) Put an extra bit in the instruction indicating which way the branch > will probably go. This is a compile-time prediction. > > 2) Keep a record of past branches, i.e. run-time prediction. > >Though I've heard of examples of machines with each scheme, I've never >seen any studies of their efficiency. There have been many good papers on the subject. Here is a partial bibliography: (Note: ISCA = International Symposium on Computer Architecture, IBM TDB = IBM Technical Disclosure Bulletin) [1] S. McFarling and J. Hennesy, "Reducing the Cost of Branches," Proc. 13th ISCA, June 1986, pp. 396-403. [2] J. K. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Designs," IEEE Computer, vol. 17, #1, Jan 1984. [3] J. E. Smith, "A Study of Branch Prediction Strategies," 8th ISCA, May 1981, pp. 135-148. [4] J. Losq, "Generalized History Table for Branch Prediction," IBM TDB, vol 25, #1, June 1982, pp. 99-101. [5] A. Liles Jr. and B. E. Wilner, "Branch Prediction Mechanism." IBM TDB, vol 22, #7, Dec 1979, pp. 3013-3016. [6] G. Rao, "Technique for Minimizing Branch Delay Due to Incorrect Branch History Table Predictions," IBM TDB, vol 25, #1, June 1982, pp. 97-98. Based on the results published, the best prediction algorithms use run-time information. As developed by IBM, a branch target buffer is kept in the machine (BTB). The BTB is an associative table (direct mapped, n-way set associative, fully associative, whatever) which uses the address of the BRANCH as an index. Associated with the branch address are two pieces of information (usually): - target address of the branch (where to go if branch taken -- OR, perhaps the first few instructions at the target address (eg., AM29000)) - some prediction information (see below). The type of prediction information used depends on the algorithm. A simple scheme which works well is to use a 2 bit counter. The counter is incremented (until it hits the value 3) when the branch is taken, and decremented (until it hits the value 0) when the branch is not taken. When a branch is encountered, it is predicted taken if the counter value is 2 or 3, and predicted not taken if the counter value is 0 or 1. Lee and Smith describe a more complex scheme in their paper which provides even better accuracy. Using these runtime prediction algorithms, accuracies better than 90% have been achieved. With the opcode method National is using, accuracies in the 70-80% are usual (as they report). I want to point out that accuracy is a poor indicator of performance. A much better measure (for any particular architecture) is instruction block size (IBS). IBS is defined to be the average number of instructions executed between branch prediction mistakes. If we have a pipeline of depth 'd', then an !APPROXIMATE! measure of the efficiency of this pipe (in the presence of branch stalls, ignoring other types of stalls) is (IBS/(IBS + d)). For small pipe depths, IBS tends not to matter. For deep pipes though, a small value for IBS will kill you. I would be curious if anyone on comp.arch has these numbers. Mike Shebanow
marc@ucla-cs.UUCP (04/23/87)
In article <165100008@uiucdcsb> robison@uiucdcsb.cs.uiuc.edu writes: > >Are there any good papers on branch prediction? I've seen two schemes: > > 1) Put an extra bit in the instruction indicating which way the branch > will probably go. This is a compile-time prediction. > > 2) Keep a record of past branches, i.e. run-time prediction. > >Though I've heard of examples of machines with each scheme, I've never >seen any studies of their efficiency. > >Arch D. Robison >University of Illinois at Urbana-Champaign > The best paper I have read about branch prediction is: "Branch Prediction Strategies and Branch Target Buffer Design" Johnny K. F. Lee, Hewlett-Packard Alan Jay Smith, UCB. Computer (IEEE), January 1984, pp 6-22. Also you might want to look at the architecture issues concerning the Branch Target Cache of the Am29000, they claim a 60% hit ratio. Enjoy, Marc Tremblay UCLA Computer Science Dept.
kck@dana.UUCP (Ken Klingman) (04/24/87)
> > 2. Forward branches are predicted as not taken, backward branches are > predicted as taken, no matter who says otherwise. (We have tried > several methods of state machines, caches, etc. and found out that > this simple-minded method had the best ratio of relative performance > increase vs. cost in chip area and overhead.) > -- > Amos Shapir I kind of like the way branch prediction works on the Ridge 32 (and 3200) architecture. The conditional branch instructions have a branch predic- tion bit. If set, the instruction pre-fetch logic starts fetching from the target of the branch; if not set, the pre-fetch logic continues fetching along the same instruction stream. This allows code generators and assembly language programmers to predict whether or not branches will be/won't be taken. In general the Ridge compilers adopt the same philosophy of predict backward branches and don't predict forward branches, but all the assembly coded library routines like strcpy and friends make extensive use of it. One iteresting possibility when the code generator has control over branch prediction is how to predict if-then-else constructs. Do you predict the then and not predict the else? Or do you predict the else and not the then? Or maybe even more conterintuitively, maybe you predict/don't predict both? I never had time to run an analysis to see which possibility was best. Ken Klingman Dana Computer hplabs!dana!kck
firth@sei.cmu.edu (Robert Firth) (04/24/87)
In article <18537@ucbvax.BERKELEY.EDU> shebanow@ji.Berkeley.EDU.UUCP (Mike Shebanow) writes: ... >Based on the results published, the best prediction algorithms use run-time >information. As developed by IBM, a branch target buffer is kept in the >machine (BTB). The BTB is an associative table (direct mapped, n-way set >associative, fully associative, whatever) which uses the address of the >BRANCH as an index. Associated with the branch address are two pieces >of information (usually): > > - target address of the branch (where to go if branch taken -- OR, > perhaps the first few instructions at the target address (eg., > AM29000)) > - some prediction information (see below). Mike, could you please elaborate on this, because after 2 days I find myself no nearer to understanding it. The purpose of the "prediction" information seems to be to allow the processor to prefetch whichever of the branch continuations (successor or destination) is more likely to be needed. But if you have a branch cache, surely BOTH continuations are available already - one in the normal pipeline and the other in the cache. So who cares which is needed? - and why bother to predict it?
shebanow@ji.Berkeley.EDU (Mike Shebanow) (04/25/87)
In article <1123@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP writes: >Mike, could you please elaborate on this, because after 2 days I find >myself no nearer to understanding it. The purpose of the "prediction" >information seems to be to allow the processor to prefetch whichever >of the branch continuations (successor or destination) is more likely >to be needed. > >But if you have a branch cache, surely BOTH continuations are available >already - one in the normal pipeline and the other in the cache. So >who cares which is needed? - and why bother to predict it? Sure. For example, consider an instruction sequence like this: (#1) op1 a, b {some don't care operation} (#2) op2 b, c {some don't care operation} (#3) op3 a, c {this one sets the condition codes} (#4) bequ loc {use the 'zero' bit in the condition codes} (#5) ..... Now assume that this is to run on a machine implemented using a 4 stage pipe, with instruction #1 at the back of the pipe: +---------------+---------------+---------------+---------------+ --->| prefetch | decode | execute | retire |---> +---------------+---------------+---------------+---------------+ cycle # 1 (#1) 2 (#2) (#1) 3 (#3) (#2) (#1) 4 (#4) (#3) (#2) (#1) The question is what to do now, load in the target of the branch (instruction at loc) or load in the instruction following the branch (#5)?: 5 (loc) (#4) (#3) (#2) OR!!! 5 (#5) (#4) (#3) (#2) You don't know which, as the branch depends on instruction #3, which is in decode. Without prediction, the pipeline would have to stall until #3 executes (at least). With prediction, the pipeline is kept full. The downside of branch prediction is if you predict wrong. In that case, all instructions fetched after the predicted branch must be flushed. sorry for the long winded explanation, Mike Shebanow
jfh@killer.UUCP (04/28/87)
In article <324@dumbo.UUCP>, hansen@mips.UUCP (Craig Hansen) writes: [ Stuff ] > > The feature that most caught my eye was that of branch prediction, > for which I quote from the "Technical Introduction": > > The selection is based on the branch condition and > direction; forward branches are predicted as taken, > backward branches as non-taken. > [ Alot more stuff ] I too had wondered about this branch prediction mess, and I agree with this poster about this being the WRONG way to do things. I was prepared to make the argument that trying to find the branch direction would require fetching the offset (assuming relative branches), which would obligate the processor to a possibly unneeded fetch, along with running this offset through some logic (one or two more micro instructions) to figure out the direction, which sounded like a real loser. However, I suspect that someone at AMD would tell me how wonderfully pipelined their machine was and so on ... [ Now for the revised posting ] Craig stated that, like the rest of us, he was raised to believe that backwards branches make nice branches after while loops and such, and forward branches make nice branches before if-thens. Has anyone bothered to think about adding branch instructions that tell the CPU explicitly whether or not they expect to be taken? For example, MBLEQ - most of the time branch less or equal SBLEQ - seldom branch less or equal Compilers would be able to pick the instructions they wanted. (Speaking of patents, if this idea is patentable and has not yet been so, anyone but Nat-Semi can have it ... :-) :-) so long as I get credit ... :-) :-) - John (jfh@killer.UUCP) Disclaimer: No disclaimer. Whatcha gonna do, sue me?
pop@mtu.UUCP (05/01/87)
In article <5662@shemp.UCLA.EDU> marc@CS.UCLA.EDU (Marc Tremblay) writes: >In article <165100008@uiucdcsb> robison@uiucdcsb.cs.uiuc.edu writes: >> >>Are there any good papers on branch prediction? I've seen two schemes: >> >> 1) Put an extra bit in the instruction indicating which way the branch >> will probably go. This is a compile-time prediction. >> >> 2) Keep a record of past branches, i.e. run-time prediction. >> >>Though I've heard of examples of machines with each scheme, I've never >>seen any studies of their efficiency. >> >>Arch D. Robison >>University of Illinois at Urbana-Champaign >> You might also check out my article about branch prediction which combines the two: %A D. A. Poplawski %T Low cost branch prediction %J Proceedings of the Twenty-Third Allerton Conference on Communication, Control and Computing %P 979-983 %D October, 1985 Basically the idea is to monitor the execution of a program for a (very) few data sets, recording the number of times each unique branch instruction either branched and didn't branch, then predict all future executions of the same branch instruction according to the dominate outcome. It turns out that while there are always a few branch instructions which are unpredictable (i.e, branch half the time and don't branch half the time), the VAST majority are very predictible, and from my observations of programs like the C compiler, which you probably wouldn't think has very predictable branching behavior, the prediction accuracy is very high, even when just one sample of the execution is made and then used on other VERY different data sets. Basically the prediction accuracies are comparable with dynamic (i.e., expensive) schemes while the implementation cost is comparable with static (i.e., cheap) schemes.
lm@cottage.WISC.EDU (Larry McVoy) (05/01/87)
In article <269@mtu.UUCP> pop@mtu.UUCP (Dave Poplawski) writes: >In article <5662@shemp.UCLA.EDU> marc@CS.UCLA.EDU (Marc Tremblay) writes: >>In article <165100008@uiucdcsb> robison@uiucdcsb.cs.uiuc.edu writes: >>> >>>Are there any good papers on branch prediction? I've seen two schemes: I don't know whether anyone mentioned this one or not, but a very complete paper is A Study of Branch Prediction Strategies by J.E. Smith, appeared in IEEE ??? in 1981. 7 different strategies, 6 "real", static [i.e., all branches taken], and dynamic [i.e., cache recently taken branches] are explained, simulated, and the results compared. The bottom line from this paper is that just about ANY branch prediction is a win (lowest is 35 % hit ratio, with average in the 80's) and that some very simple branch predictions are extremely effective. The one I liked the best is a dynamic one that maintains an extra bit in the icache that is set iff the cache entry is a branch instruction that was taken on the last execution. Quite easy to implement, stingy on chip real estate, and on the average was right about 90.3% of the time (highest 98.9, lowest was 76.1). Mail me if you need more. Larry McVoy mcvoy@rsch.wisc.edu or uwvax!mcvoy "What a wonderful world it is that has girls in it!"
adb@homxa.UUCP (A.BERENBAUM) (05/06/87)
In article <809@killer.UUCP>, jfh@killer.UUCP (John Haugh) writes: > [ Stuff ] > Has anyone bothered to think about adding branch instructions that tell > the CPU explicitly whether or not they expect to be taken? For example, > > MBLEQ - most of the time branch less or equal > SBLEQ - seldom branch less or equal > > Compilers would be able to pick the instructions they wanted. (Speaking of > patents, if this idea is patentable and has not yet been so, anyone but > Nat-Semi can have it ... :-) :-) so long as I get credit ... :-) :-) > > - John (jfh@killer.UUCP) The CRISP Microprocessor has a single flag bit (call it True/False), and four conditional branch instructions: if True jump, predicted taken if True jump, predicted not taken if False jump, predicted taken if False jump, predicted not taken I think this covers what you ask for, but I'm sorry, you don't get credit. There's a paper in the 1987 COMPCON about compiling for CRISP that discusses how a compiler can exploit the prediction bits (it can do quite well -- we get correct prediction of nearly 90% as I recall). Alan Berenbaum, AT&T-something-or-other research!adb
pdg@ihdev.ATT.COM (Joe Isuzu) (05/06/87)
In article <809@killer.UUCP> jfh@killer.UUCP (John Haugh) writes: >Craig stated that, like the rest of us, he was raised to believe that >backwards branches make nice branches after while loops and such, and >forward branches make nice branches before if-thens. Has anyone bothered >to think about adding branch instructions that tell the CPU explicitly >whether or not they expect to be taken? For example, > > MBLEQ - most of the time branch less or equal > SBLEQ - seldom branch less or equal The AT&T CRISP processor does this, with instructions like ifFjmpn, ifFjmpy (if false, expect to jump no or yes) ifTjmpn, ifTjmpy (if true, expect to jump no or yes) These take 0 cycles for a correct prediction, and a max of 3 cycles for an incorrect prediction (-1 cycle per instruction between the compare and the branch). For more info, see any of the CRISP articles from IEEE Compcon '87. -- Paul Guthrie ihnp4!ihdev!pdg This Brain left intentionally blank.
csg@pyramid.UUCP (Carl S. Gutekunst) (05/10/87)
>> Has anyone bothered to think about adding branch instructions that tell >> the CPU explicitly whether or not they expect to be taken? For example, >> Compilers would be able to pick the instructions they wanted. > >The CRISP Microprocessor has a single flag bit (call it True/False), and four >conditional branch instructions: The Pyramid CPU uses a similar arrangement, with two instructions: "Branch on False Condition" and "Branch on True Condition." Each has the usual assortment of condition code tests. I never remember which assumes branch taken and which assumes it's not; fortunately the compilers don't get similarly confused. :-) <csg>