pmontgom@sdcrdcf.UUCP (Peter Montgomery) (07/01/85)
Admittedly, some programs do very few integer multiplications except while calculating subscripts and converting outputs to decimal notation. I have "number crunchers" which do very little floating point arithmetic but for which integer multiplication is a very important instruction. These programs implement integer factorization algorithms, and do multiprecision arithmetic modulo the number being factored (N). Typically, N has 100 decimal digits (several machine words). The most important operations are A+B mod N, A-B mod N, and A*B mod N (the remainders of A+B, A-B, and A*B when divided by N). Multiplication is the most expensive of these operations, despite attempts (see my article "Modular Multiplication Without Trial Division" in the April, 1985, "Mathematics of Computation") to speed it up. On the VAX 11/780, the inner loop 1: emul r2,(r6)+,r1,r0 # (r0,r1) = mul2*ar2(i) + carry emul r3,(r7)+,(r8)+,r4 # (r4,r5) = mul3*ar3(i) + tmparr(i) addl2 r4,r0 adwc r5,r1 # quad = (r0,r1) + (r4,r5) bicl3 r11,r0,-8(r8) # tmparr(i-1) = lower 30 bits of quad ashq $2,r0,r0 # r1 = carry = quad/2^30 sobgtr r9,1b # End of loop uses EMUL (multiply two signed longwords, add a third longword) more than any other instruction. For those of you implementing instruction sets, please don't require us to use floating point operations for integer multiplication. Our algorithms require EXACT answers. Double length results (e.g., VAX EMUL instruction) are desirable, but if they are unavailable then we need the LOWER bits of the product (e.g., a pseudorandom number generator using the multiplicative congruential algorithm). Floating point operations return the UPPER bits. As another example, we may know that the determinant of the single precision integer matrix A B C D will be a single precision integer, even though the intermediate results (A*D and B*C) overflow. If we use single precision floating point arithmetic (e.g, as required on the CRAY II), then we lose essentially all precision when subtracting intermediate results, and the computation is a waste. If instead the computations are modulo the machine word size (2^64 on the CRAY), then the final result will be exactly what we want even though the intermediate results overflowed. -- Peter Montgomery {aero,allegra,bmcg,burdvax,hplabs, ihnp4,psivax,randvax,sdcsvax,trwrb}!sdcrdcf!pmontgom Don't blame me for the crowded freeways - I don't drive.