gcwillia@daisy.waterloo.edu (Graeme Williams) (06/29/91)
Following several posts regarding division routines, in particular one
with a lovely 3 instr. central loop I modified my old division routine
into the following (supersonic, lightning fast, ZR rated, :-) )
32bit routine:
;Essentially what's happening here is we are figuring out how far left
;we can shift the numerator before any actual dividing begins and thus save
;a significant number of useless trips through the (unrolled) 3 instr. loop.
; - Note we only go to the nearest 4 bits, trying to save another 2 bits
; results in 3 extra instrs. for a saving of 6 instrs. half the time
; and 0 instrs. the other half - i.e. no nett benefit.
MOV cnt,#28 : MOV mod,num,LSR#4
CMP den,mod,LSR#12 : SUBLE cnt,cnt,#16 : MOVLE mod,mod,LSR#16
CMP den,mod,LSR#4 : SUBLE cnt,cnt,#8 : MOVLE mod,mod,LSR#8
CMP den,mod : SUBLE cnt,cnt,#4 : MOVLE mod,mod,LSR#4
MOV num,num,LSL cnt : RSB den,den,#0
ADDS num,num,num
;Now skip over cnt copies of the 3 instr. loop.
ADD cnt,cnt,cnt,LSL#1 : ADD pc,pc,cnt,LSL#2 : MOV R0,R0
{ADCS mod,den,mod,LSL#1 : SUBLO mod,mod,den : ADCS num,num,num} x 32 copies
The routine behaves as follows:
On Entry: num - is a signed numerator A (but +ve)
den - is a signed denominator B (but non-zero and +ve)
On Exit: mod - contains A MOD B
num - contains A DIV B
The routine requires 4 registers and 452 bytes of memory.
Execution time:
Given numbers A and B (both +ve) chosen at random from a distribution in
which the most significant bit of the numbers is uniformly distributed
between bits 0 and 31 (i.e. the density of numbers goes as roughly log n)
then the average execution time is:
For A < B : 32 cycles (no division actually required)
For A >= B: 58.25 cycles
The best and worst cases are 32 cycles and 116 cycles respectively.
Have fun. (Stuff like this ought to be worth good money!)
Graeme Williams
gcwillia@daisy.waterloo.edu
P.S. With any luck there's only a couple of typos :-).