[net.micro] Integer Multiplication

pmontgom@sdcrdcf.UUCP (Peter Montgomery) (07/01/85)
Admittedly, some programs do very few integer multiplications except while
calculating subscripts and converting outputs to decimal notation.  I have
"number crunchers" which do very little floating point arithmetic but for
which integer multiplication is a very important instruction.  These
programs implement integer factorization algorithms, and do multiprecision
arithmetic modulo the number being factored (N).  Typically, N has 100
decimal digits (several machine words).  The most important operations are
A+B mod N, A-B mod N, and A*B mod N (the remainders of A+B, A-B, and A*B
when divided by N).  Multiplication is the most expensive of these
operations, despite attempts (see my article "Modular Multiplication Without
Trial Division" in the April, 1985, "Mathematics of Computation") to speed
it up.  On the VAX 11/780, the inner loop

1:
	emul	r2,(r6)+,r1,r0		# (r0,r1) = mul2*ar2(i) + carry
	emul	r3,(r7)+,(r8)+,r4	# (r4,r5) = mul3*ar3(i) + tmparr(i)
	addl2	r4,r0
	adwc	r5,r1			# quad = (r0,r1) + (r4,r5)
	bicl3   r11,r0,-8(r8)           # tmparr(i-1) = lower 30 bits of quad
	ashq    $2,r0,r0                # r1 = carry = quad/2^30
	sobgtr  r9,1b                   # End of loop

uses EMUL (multiply two signed longwords, add a third longword) more than
any other instruction.

        For those of you implementing instruction sets, please don't
require us to use floating point operations for integer multiplication.
Our algorithms require EXACT answers.  Double length results (e.g., VAX
EMUL instruction) are desirable, but if they are unavailable then we need
the LOWER bits of the product (e.g., a pseudorandom number generator using
the multiplicative congruential algorithm).  Floating point operations
return the UPPER bits.  As another example, we may know that the
determinant of the single precision integer matrix

		    A   B
		    C   D

will be a single precision integer, even though the intermediate results
(A*D and B*C) overflow.  If we use single precision floating point
arithmetic (e.g, as required on the CRAY II), then we lose essentially all
precision when subtracting intermediate results, and the computation is a
waste.  If instead the computations are modulo the machine word size (2^64
on the CRAY), then the final result will be exactly what we want even
though the intermediate results overflowed.


-- 
			Peter Montgomery

	{aero,allegra,bmcg,burdvax,hplabs,
	 ihnp4,psivax,randvax,sdcsvax,trwrb}!sdcrdcf!pmontgom

Don't blame me for the crowded freeways - I don't drive.