crowl@rochester.ARPA (Lawrence Crowl) (05/12/87)
Standard architectures provide a multiply instruction so that "i * 57" compiles to a single, general purpose multiply instruction. The AMD machine provides a decomposed multiply instruction that must be executed n times in sequence for an n bit multiply. In article <16640@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) notes that "bits(57) == 6". Using "#" to represent the decomposed multiply instruction, "i * 57" is translated to: ((((((i # 57) # 57) # 57) # 57) # 57) # 57) # 57 In article <16640@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) also notes that "57 == 32 + 16 + 8 + 1" so that the multiply can be more efficiently translated to: (i << 5) + (i << 4) + (i << 3) + i In article <381@dumbo.UUCP> hansen@mips.UUCP (Craig Hansen) further notes that "57 == (8 - 1) * 8 + 1". Using this the muliply is yet more efficiently translated to: (((i << 3) - i) << 3) + i Now a problem arises. In the first three approaches an overflow at any step indicates an overflow for the entire multiply. However, in the last approach, an overflow may occur at a step in the computation while the entire multiply itself does not overflow. For example, assume a sixteen bit two's complement integer. The multiplication "i * 63" where i == 520 is not an overflow (520 * 63 == 32760). The expression translated to the shift-subtract approach is "(i << 6) - i". The shift step overflows because "520 << 6" is greater than 32767. Finally, the question: how do implementations using subtracts in the shift-add approach to multiplication detect overflow for an entire multiplication? -- Lawrence Crowl 716-275-5766 University of Rochester crowl@rochester.arpa Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627
chris@mimsy.UUCP (Chris Torek) (05/13/87)
In article <27685@rochester.ARPA> crowl@rochester.ARPA (Lawrence Crowl) writes: >[When using additive power-of-two multipy] approaches an overflow at >any step indicates an overflow for the entire multiply. However, in >the last approach [using subtraction], an overflow may occur at a step >in the computation while the entire multiply itself does not overflow. >... how do implementations using subtracts in the shift-add approach >to multiplication detect overflow for an entire multiplication? Since the multiply is by a constant, you can decide ahead of time what range will *not* overflow, then generate two branches: // if x * [-58 .. 57] is ok: cmp rX,#57 bgt overflow cmp rX,#-58 blt overflow // go ahead, ignoring overflow I suspect, however, that the usual answer is `they do not bother'. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris
baum@apple.UUCP (Allen J. Baum) (05/13/87)
-------- [] The HP Spectrum has instructions that shift an index depending on whether the load/store is byte/halfword/word/doubleword. This hardware is also used for a set of general 'shift0/1/2/3 & add' instructions. Each has a variation that traps on overflow, where overflow is either an addition overflow or a shift overflow. (Sees if bits shifted out=sign) These instructions are sufficient to multiply by constants<3800 in six instructions, and average performance of two 32-bit variables is under 20 cycles. Most of this work was done by Dan Magenheimer at HP, and there will be a publication describing the work. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
phil@osiris.UUCP (Philip Kos) (05/13/87)
In article <6651@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > In article <27685@rochester.ARPA> crowl@rochester.ARPA (Lawrence Crowl) writes: > >... how do implementations using subtracts in the shift-add approach > >to multiplication detect overflow for an entire multiplication? > > I suspect, however, that the usual answer is `they do not bother'. Would you believe internal guard bits? Maybe not everyone does this, but it does give you the nice property that: given enough internal guard bits (you really only need a couple), any iterative operation like a multiple or divide (integer or floating point, it doesn't really matter) whose result is exactly representible in size of the target will be represented. Oh yeah, for the preceding property, it also helps a lot for the algorithm to be correct. :-) ...!decvax!decuac!\ Phil Kos ...!seismo!mimsy!aplcen!osiris!phil The Johns Hopkins Hospital ...!allegra!/ Baltimore, MD