pmontgom@sdcrdcf.UUCP (Peter Montgomery) (07/01/85)
References: Admittedly, some programs do very few integer multiplications except while calculating subscripts and converting outputs to decimal notation. My "number crunchers" do very little floating point arithmetic but integer multiplication is an essential instruction for them. These programs implement integer factorization algorithms, and do multiprecision arithmetic modulo the number being factored (N). Typically, N has 100 decimal digits (several machine words). The most important operations are A+B mod N, A-B mod N, and A*B mod N (the remainders of A+B, A-B, and A*B when divided by N). Multiplication remains the most expensive of these operations, despite attempts (see my article "Modular Multiplication Without Trial Division" in the April, 1985, "Mathematics of Computation") to speed it up. On the VAX 11/780, the inner loop 1: emul r2,(r6)+,r1,r0 # (r0,r1) = mul2*ar2(i) + carry emul r3,(r7)+,(r8)+,r4 # (r4,r5) = mul3*ar3(i) + tmparr(i) addl2 r4,r0 adwc r5,r1 # quad = (r0,r1) + (r4,r5) bicl3 r11,r0,-8(r8) # tmparr(i-1) = lower 30 bits of quad ashq $2,r0,r0 # r1 = carry = quad/2^30 sobgtr r9,1b # End of loop uses EMUL (multiply two signed longwords, add a third longword) more than any other instruction. For those of you implementing instruction sets, please don't require us to use floating point operations for integer multiplication. Our algorithms require EXACT answers. Double length results (e.g., VAX EMUL instruction) are desirable, but if they are unavailable then we want the LOWER bits of the product. Floating point operations return the UPPER bits. As an example, we may know that the determinant of the single precision integer matrix A B C D will be a single precision integer, even though the intermediate results (A*D and B*C) overflow. If we use single precision floating point arithmetic (e.g, as required on the CRAY II), then we lose essentially all precision when subtracting intermediate results, and the computation is a waste. If instead the computations are modulo the machine word size (2^64 on the CRAY), then the final result will be what we want despite the overflow. -- Peter Montgomery {aero,allegra,bmcg,burdvax,hplabs, ihnp4,psivax,randvax,sdcsvax,trwrb}!sdcrdcf!pmontgom Don't blame me for the crowded freeways - I don't drive.