[comp.arch] Multiply as Shift-Add

crowl@rochester.ARPA (Lawrence Crowl) (05/12/87)

Standard architectures provide a multiply instruction so that "i * 57"
compiles to a single, general purpose multiply instruction.

The AMD machine provides a decomposed multiply instruction that must
be executed n times in sequence for an n bit multiply.  In article
<16640@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) notes that
"bits(57) == 6".  Using "#" to represent the decomposed multiply
instruction, "i * 57" is translated to:

    ((((((i # 57) # 57) # 57) # 57) # 57) # 57) # 57

In article <16640@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) also
notes that "57 == 32 + 16 + 8 + 1" so that the multiply can be more 
efficiently translated to:

    (i << 5) + (i << 4) + (i << 3) + i 

In article <381@dumbo.UUCP> hansen@mips.UUCP (Craig Hansen) further
notes that "57 == (8 - 1) * 8 + 1".  Using this the muliply is yet
more efficiently translated to: 

    (((i << 3) - i) << 3) + i

Now a problem arises.  In the first three approaches an overflow at
any step indicates an overflow for the entire multiply.  However, in
the last approach, an overflow may occur at a step in the computation 
while the entire multiply itself does not overflow.  For example, 
assume a sixteen bit two's complement integer.  The multiplication
"i * 63" where i == 520 is not an overflow (520 * 63 == 32760).  The
expression translated to the shift-subtract approach is "(i << 6) - i".
The shift step overflows because "520 << 6" is greater than 32767.

Finally, the question: how do implementations using subtracts in the
shift-add approach to multiplication detect overflow for an entire
multiplication?
-- 
  Lawrence Crowl		716-275-5766	University of Rochester
			crowl@rochester.arpa	Computer Science Department
 ...!{allegra,decvax,seismo}!rochester!crowl	Rochester, New York,  14627

chris@mimsy.UUCP (Chris Torek) (05/13/87)

In article <27685@rochester.ARPA> crowl@rochester.ARPA (Lawrence Crowl) writes:
>[When using additive power-of-two multipy] approaches an overflow at
>any step indicates an overflow for the entire multiply.  However, in
>the last approach [using subtraction], an overflow may occur at a step
>in the computation while the entire multiply itself does not overflow.
>... how do implementations using subtracts in the shift-add approach
>to multiplication detect overflow for an entire multiplication?

Since the multiply is by a constant, you can decide ahead of time
what range will *not* overflow, then generate two branches:

	// if x * [-58 .. 57] is ok:
	cmp	rX,#57
	bgt	overflow
	cmp	rX,#-58
	blt	overflow
	// go ahead, ignoring overflow

I suspect, however, that the usual answer is `they do not bother'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

baum@apple.UUCP (Allen J. Baum) (05/13/87)

--------
[]

The HP Spectrum has instructions that shift an index depending on
whether the load/store is byte/halfword/word/doubleword. This
hardware is also used for a set of general 'shift0/1/2/3 & add'
instructions. Each has a variation that traps on overflow, where
overflow is either an addition overflow or a shift overflow. (Sees if
bits shifted out=sign)

These instructions are sufficient to multiply by constants<3800 in
six instructions, and average performance of two 32-bit variables is
under 20 cycles.

Most of this work was done by Dan Magenheimer at HP, and there will be a
publication describing the work.

 --
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

phil@osiris.UUCP (Philip Kos) (05/13/87)

In article <6651@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <27685@rochester.ARPA> crowl@rochester.ARPA (Lawrence Crowl) writes:
> >... how do implementations using subtracts in the shift-add approach
> >to multiplication detect overflow for an entire multiplication?
> 
> I suspect, however, that the usual answer is `they do not bother'.

Would you believe internal guard bits?  Maybe not everyone does this,
but it does give you the nice property that: given enough internal
guard bits (you really only need a couple), any iterative operation
like a multiple or divide (integer or floating point, it doesn't really
matter) whose result is exactly representible in size of the target
will be represented.  Oh yeah, for the preceding property, it also helps
a lot for the algorithm to be correct.  :-)

...!decvax!decuac!\                                               Phil Kos
  ...!seismo!mimsy!aplcen!osiris!phil           The Johns Hopkins Hospital
...!allegra!/                                                Baltimore, MD