gcwillia@daisy.waterloo.edu (Graeme Williams) (06/29/91)
Following several posts regarding division routines, in particular one with a lovely 3 instr. central loop I modified my old division routine into the following (supersonic, lightning fast, ZR rated, :-) ) 32bit routine: ;Essentially what's happening here is we are figuring out how far left ;we can shift the numerator before any actual dividing begins and thus save ;a significant number of useless trips through the (unrolled) 3 instr. loop. ; - Note we only go to the nearest 4 bits, trying to save another 2 bits ; results in 3 extra instrs. for a saving of 6 instrs. half the time ; and 0 instrs. the other half - i.e. no nett benefit. MOV cnt,#28 : MOV mod,num,LSR#4 CMP den,mod,LSR#12 : SUBLE cnt,cnt,#16 : MOVLE mod,mod,LSR#16 CMP den,mod,LSR#4 : SUBLE cnt,cnt,#8 : MOVLE mod,mod,LSR#8 CMP den,mod : SUBLE cnt,cnt,#4 : MOVLE mod,mod,LSR#4 MOV num,num,LSL cnt : RSB den,den,#0 ADDS num,num,num ;Now skip over cnt copies of the 3 instr. loop. ADD cnt,cnt,cnt,LSL#1 : ADD pc,pc,cnt,LSL#2 : MOV R0,R0 {ADCS mod,den,mod,LSL#1 : SUBLO mod,mod,den : ADCS num,num,num} x 32 copies The routine behaves as follows: On Entry: num - is a signed numerator A (but +ve) den - is a signed denominator B (but non-zero and +ve) On Exit: mod - contains A MOD B num - contains A DIV B The routine requires 4 registers and 452 bytes of memory. Execution time: Given numbers A and B (both +ve) chosen at random from a distribution in which the most significant bit of the numbers is uniformly distributed between bits 0 and 31 (i.e. the density of numbers goes as roughly log n) then the average execution time is: For A < B : 32 cycles (no division actually required) For A >= B: 58.25 cycles The best and worst cases are 32 cycles and 116 cycles respectively. Have fun. (Stuff like this ought to be worth good money!) Graeme Williams gcwillia@daisy.waterloo.edu P.S. With any luck there's only a couple of typos :-).