chris@mimsy.UUCP (Chris Torek) (06/07/89)
In article <13592@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes: >The VAX (PCC based) compiler's optimizer would convert many loops >that used multiple instructions to use a single complicated >instruction. It still does. >Unfortunately, the VAX 780 (ubiquitous at the time) generally ran >these slower. I am not sure about `generally', but some do run slower (alas!). >One case was the acbl (add, compare and branch longword). The >optimizer would replace the three instructions (something like:) > add $1,r2 > cmp $END,r2 > bne L1 > >with the "acbl" and all its arguments. Both codings take the >same number of bytes (25!). The original code is <code> L1: cmpl END,reg jgtr L2 <loop body> addl2 $CONST,reg # CONST must be positive jbr L1 L2: <more code> and the replacement is <code> decl reg jbr L2 L1: <loop body> L2: acbl END,$CONST,reg,L1 <more code> or, if the loop body is short enough (<= 130 bytes, although c2 uses the improper metric of `eight instructions') and CONST is 1, the acbl is replaced with an aobleq. In neither case is the instruction byte count 25; if we count only instructions inside the loop, the acbl is always shorter, and in any case it is never longer. If we assume all branches are in bounds (which gives the best situation for the cmp/jgtr code), we get: L1: cmp END,reg # 2+k bytes, k=1 for END constant or reg bgtr L2 # 2 bytes <loop body> addl2 $CONST,reg # 2+l bytes, l=1 for CONST in 0..63 brb L1 # 2 bytes L2: <more code> # total = 8+k+l (typically 10) versus: decl reg # 2 bytes, but outside loop brb L2 # 2 bytes, but outside loop L1: <loop body> L2: acbl END,$CONST,reg,L1 # 1+k+l+1+2 = 4+k+l <more code> # total = 4+k+l in loop, 8+k+l overall The most common constant, however, is 1; if the branches are in bounds, we get an aobleq and the count drops to L2: aobleq reg,END,L1 # 1+1+k+1 = 3+k for 3+k bytes in the loop and 7+k bytes total; in this case l=1 and the cmp/jgtr loop takes 9+k bytes total, all in the loop. If the branches are not in range, the picture becomes much murkier, as the `jxxx' opcodes usually `explode', but sometimes `tunnel': jgtr L3 <code> <more code> L3: can become jleq Laround # invert the branch brw L3 # and use a longer form Laround:<code> <more code> L3: but if there happens to be a branch to L3 (either unconditional or bgtr) that is in range of the first branch, we might get instead bgtr Ltunnel # get to someone who gets us there <code> Ltunnel:brw L3 <more code> L3: >The multiple instructions are faster (on the 780). This I do not intend to test (780s being rather uninteresting). Anyway, the real point of all this is that instruction selection, especially for baroque CISCs like a VAX, can be difficult.... -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris