[comp.arch] Branchless conditionals

hamilton@siberia.rtp.dg.com (Eric Hamilton) (06/15/91)

In article <1991Jun14.134338.4673@linus.mitre.org>, bs@frieda.mitre.org (Robert D. Silverman) writes:

|> :	    ibig  = absdx .le. dmax
|> 	    ^^^^^^^^^^^^^^^^^^^^^^^^
|> 
|> I would be very interested in seeing the assembler code that gets
|> emitted for this line of Fortran. How can this statement get executed
|> WITHOUT a branch??
|>  
|> I also have never seen a machine that has conditional assignment
|> implemented as an instruction! (As indicated above). What machine is
|> this? Could you show us the assembler syntax??
|> 
|> I ** believe ** that the above conditional assignment will get compiled
|> into a test and branch. I don't know of any machine that provides
|> instructions to do otherwise.
|> 
Motorola 88000, for one.  The trick is that the compare instruction puts
the condition code bit vector into a general register instead of a dedicated
condition code register, so that the bitfield instructions can be used
on the condition codes.  For example:

	cmp	scratch,absdx,dmax	; Compare; condition codes to scratch
	extu	ibig,scratch,1<le>	; Extract the le bit into ibig,
					; right-adjusted.

The condition code bit vector is also redundant, in the sense that
if any bit is set, there is another bit, in this case greater-than, which
is known to be clear, and vice versa.  In many cases it turns out the
redundancy is useful.  It's certainly near-as-no-never-mind free.

The ibig = absdx .le. dmax example just barely scratches the surface.  For
example:
	if( a condition b )
		i = i + 1;

can be rendered as:

	cmp	rscratch,x,y
	extu	rscratch,rscratch,1<condition>
	addu	i,i,rscratch

and lots more.  I'll leave it to the compiler folks to post the most aggressively
clever example.

-- 
----------------------------------------------------------------------
Eric Hamilton				+1 919 248 6172
Data General Corporation		hamilton@dg-rtp.rtp.dg.com
62 Alexander Drive			...!mcnc!rti!xyzzy!hamilton
Research Triangle Park, NC  27709, USA

glew@pdx007.intel.com (Andy Glew) (06/17/91)

|> :	    ibig  = absdx .le. dmax
|> 	    ^^^^^^^^^^^^^^^^^^^^^^^^
|> 
|> I would be very interested in seeing the assembler code that gets
|> emitted for this line of Fortran. How can this statement get executed
|> WITHOUT a branch??

The Intel386(tm) architecture has SETcc (set on condition)
instructions that put 0/1 into a register depending on the condition
codes. So you would have:
    
    	cmp 	ebx,ecx 	! assuming absdx in ebx, dmax in ecx
    	setle	al  	    	! put 0/1 into al 

Set-conditional is rather common - I believe it is in the MIPS R2000
architecture (and I am sure I will be corrected if wrong), and the AMD
29K architecture.  Minor differences according to whether 0/1 or 0/-1
is stored.
    	


|> I also have never seen a machine that has conditional assignment
|> implemented as an instruction! (As indicated above). What machine is
|> this? Could you show us the assembler syntax??

The Cydra-5 had a conditional assignment instruction (like the phi
function in single assignment code). Semantics: dest := IF icr[N] THEN
val1 ELSE val2.  Cydra also conditionalized nearly all instructions,
as does the Acorn Risk machine, although I do not know if the ARM has
a conditional select.

Note that with a bit of masking conditional select can be acheived on
the 88K, as on most other machines.



|> I ** believe ** that the above conditional assignment will get compiled
|> into a test and branch. I don't know of any machine that provides
|> instructions to do otherwise.

I have seen some of the better x86 compilers compile this without a
branch.
--

Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

This is a private posting; it does not indicate opinions or positions
of Intel Corp.

mac@gold.kpc.com (Mike McNamara) (06/21/91)

|> :	    ibig  = absdx .le. dmax
|> 	    ^^^^^^^^^^^^^^^^^^^^^^^^
|> 
|> I would be very interested in seeing the assembler code that gets
|> emitted for this line of Fortran. How can this statement get executed
|> WITHOUT a branch??

	Firstly, the MIPS instruction set has SLT, SLTI, SLTU and
SLTIU, which set to 1 the destination register if the first argument
is less than the second. Otherwise the destination is set to 0.  U
requests an unsigned comparison; I indicates the second argument is a
16 bit immediate.  
	Your code snippet does not contain declarations, but one is
probably correct in assuming that you are implicitly allowing absdx
and dmax to be typed as REALS, inwhich case the MIPS instructions
would not work (integer only)

	Most vector machines, (Ardent Titan, Convex, etc) have a
vector compare instruction that sets or clears bits in any specified
register based upon the compare; further, other operations can be
specified to operate "undermask" of that register.  This facilitates
vectorization of loops containing conditionals:

	DO I = 1,N
		IF ( X(I) .GT. Y(I) )
			X(I) = Y(I) * Z(I) + K
		ENDIF
	ENDDO

	becomes
		sw	N,<vlength>
		DVLDA	X,0(addr_X)
		DVLDB	Y,0(addr_Y)
		DVCGT   [M],X,Y
		DVLDA	Z,0(addr_Z)     ! could be under mask as well...
		DVMA,mt X,X,Y,[K]
		DVST,mt	X,0(addr_X)

	on an ardent titan.

	Ah, yes, Andy brings up the beloved cydrome machine.  I
introduced the intrinsic SELECT(EXP,A,B) to it's fortran compiler.
*Amazing* speed ups were achieved especially on loops with internal
conditional codes.  e.g., vector intrinsics.  I was able to get a 17x
improvement in ATAN2.

	I find myself today still "thinking" in the predicated vliw
paradigm.  To be fair, the hardware model the cydra provided
facilitates the software ideas in the control flow graph papers from
IBM.  Those of you at the most recent ASPLOS saw how the cydrome
architecture implements software pipelining in hardware; the
predicated operations greatly facilitate the control graph
optimizations as well.  The one incompleteness of the cydra was
predicated exceptions.  
	I.E., this instruction is PROMPTED: i.e., been moved out
of it's containing block, so if it generates an exception, just set a
bit in the destination register.  When a non promoted instruction uses
a value from a register that has the exception bits set, THEN generate
the fault...

	Ah... musings...

	-mac 
	

--
+-----------+-----------------------------------------------------------------+
|mac@kpc.com| Increasing Software complexity lets us sell Mainframes as       |
|           | personal computers. Carry on, X windows/Postscript/emacs/CASE!! |
+-----------+-----------------------------------------------------------------+

dwells@fits.cx.nrao.edu (Don Wells) (06/22/91)

In article <MAC.91Jun21094148@gold.kpc.com> mac@gold.kpc.com (Mike
McNamara) writes:

... Most vector machines, (Ardent Titan, Convex, etc) have a
   vector compare instruction that sets or clears bits in any specified
   register based upon the compare; further, other operations can be
   specified to operate "undermask" of that register.  This facilitates
   vectorization of loops containing conditionals...

In most such machines one can use the bits to mask the "compress" of a
vector of values 1...n by one means or another. The resulting vector
is a list of indicies of the loop for which the condition was
satisfied (or not satisfied). In such machines the list can then be
used as an index vector for load and store (gather, scatter)
operations. This technique is especially advantageous when the truth
ratio (probability of test being satisfied) is low, or when the amount
of work (especially load/store) being done in the loop is large. This
approach is a part of standard bag of tricks for high performance
programming of conditionals on vector boxes. Some compilers, notably
the Fujitsu, produce such code automatically for loops preceded by a
suitable truth ratio compiler directive, based on a execution cost
analysis. It is even practical to use the approach on concurrent
vector machines like the Alliants (yes, a concurrent vector
compress!). Perhaps in this era of supersonic RISC engines nobody
cares anymore about these old vector programming tactics...

--

Donald C. Wells             Associate Scientist        dwells@nrao.edu
National Radio Astronomy Observatory                   +1-804-296-0277
Edgemont Road                                     Fax= +1-804-296-0278
Charlottesville, Virginia 22903-2475 USA            78:31.1W, 38:02.2N 

edwardm@hpcuhe.cup.hp.com (Edward McClanahan) (06/25/91)

|> :	    ibig  = absdx .le. dmax
|> 	    ^^^^^^^^^^^^^^^^^^^^^^^^
|> 
|> I would be very interested in seeing the assembler code that gets
|> emitted for this line of Fortran. How can this statement get executed
|> WITHOUT a branch??
|>  
|> I also have never seen a machine that has conditional assignment
|> implemented as an instruction! (As indicated above). What machine is
|> this? Could you show us the assembler syntax??
|> 
|> I ** believe ** that the above conditional assignment will get compiled
|> into a test and branch. I don't know of any machine that provides
|> instructions to do otherwise.

Just to add another example...

HP's RISC CPU (and many others) use conditionals to NULLIFY the next
instruction.  Here is how the HP compiler 'could' translate the above
statement:

Let's assume the compiler has made the following register assignments:

  R19 - ibig
  R20 - absdx
  R21 - dmax

The following three instruction sequence will store a ZERO in R19 (ibig)
if absdx > dmax (assuming that 1 means TRUE; I'm not a FORTRAN person)
and a ONE if absdx <= dmax:

  LDI    0,R19       ; Set ibig to 'default' of FALSE

  SUB,>  R20,R21,R0  ; 'Compare' absdx to dmax and NULLIFY next instruction
                     ;   if absdx > dmax
                     ; Result of subtraction is thrown away when sent to R0

  LDI    1,R19       ; If not NULLIFIED by 'SUB,>', set ibig to TRUE

Of course some may argue that this is really a branch...

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

  Edward McClanahan
  Hewlett Packard Company     -or-     edwardm@cup.hp.com
  Mail Stop 42UN
  11000 Wolfe Road                     Phone: (408)447-5651
  Cupertino, CA  95014                 Fax:   (408)447-5039

hrubin@pop.stat.purdue.edu (Herman Rubin) (06/25/91)

This is the old conditional skip, unused in most "modern" architectures.

This works well on sequential machines, but has it been implemented efficiently
on the faster non-sequential ones?
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)   {purdue,pur-ee}!l.cc!hrubin(UUCP)

mjs@hpfcso.FC.HP.COM (Marc Sabatella) (06/25/91)

>HP's RISC CPU (and many others) use conditionals to NULLIFY the next
>instruction.  Here is how the HP compiler 'could' translate the above
>statement:
>...
>Of course some may argue that this is really a branch...

No, I think part of the point is that a nullify is not a branch - in
particular, it never interferes with prefetch, nor does it alter locality.

--------------
Marc Sabatella (marc@hpmonk.fc.hp.com)
Disclaimers:
	2 + 2 = 3, for suitably small values of 2
	Bill and Dave may not always agree with me

jdg@cs.man.ac.uk (Jim Garside) (06/26/91)

Since no one has mentioned it, the ARM - a modern RISC processor - has
conditional execution of *all* its instructions.

Jim Garside