zrzm0370@rusmv1.rus.uni-stuttgart.de (Joerg Scheurich) (06/10/91)
Hi
So i know, there is no Assembler-command for integer-division in the
ARM. Therefor a fast division-routine is a thing of common interest.
In a earlier Posting i wrote about the bad quick hacked division-routine
for the demo of my formula-tool (math. formula ==> assemblermacros).
Now i wrote a better routine, but i don't know, if there is a better
algorithm.
The shorter and slower 32-Bit/32-Bit-division-routine requires
382 Cycles in 100 Bytes, with usage of 4 32-Bit-registers and 4 Byte memory.
With unrolling the loop,
the faster and longer 32-Bit/32-Bit-division-routine requires
182 Cycles in 692 Bytes, with usage of 4 32-Bit-registers.
This is very faster then the old routine (which requires 2**33 Cycles in
worst case and requires all time of the universe, if there is a division
by zero ... )
Cause the INTERNAL-assembler-command of the INTEL 8086
for a 32-Bit/16-Bit-division requires
165-184 Cycles in 2-4 Bytes, with usage of 3 16-Bit-registers,
i think, the result is ok.
The result is nothing against the INTERNAL-assembler-command of the ARM
for a 32-Bit*32-Bit-multiplication which requires
17 Cycles in 4 Bytes, with usage of 1-3 32-Bit-registers,
but my Professor in digital electronics used to say:
"If a program contain much divisions, the programmer is a idiot" ...
Know someone a better (faster or shorter) algorithm ?
so long
MUFTI
( internet: zrzm0111@helpdesk.rus.uni-stuttgart.de
( from janet: zrzm0111%de.uni-stuttgart.rus.helpdesk@NFS-RELAY )
bitnet: ZRZM AT DS0RUS1I )
This is not a posting of a binary. Cause the readable routines contains very
much linefeeds and spaces, I send them in submit-extract-format (41 lines).
Posting-number: Volume 01, Issue 07
Submitted-by: MUFTI
Archive-name: Division Macros/part01
table
!"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
begin 644 <SubExtWrk$Dir>.Work.File-Z
M'YV07YBDF4,'@,&#"!,J7,BPH<.'$",B!$'1!@P8!BEJQ A (T6.'CW&@#&Cz
M8PT8,FB<K!%#AHR.(&K,N'&C(T>).'/JW,FSIT^)(4/" /&SJ-&C2),JA3BEy
M#!TZ>>"4 <'CB9@R>4@022/'APLR:>RT,!.&( @31M(J:/HTZM2J5[-N[?HUx
M;(LY:-[(H7,VK1$%2P,+'DRXL.'#B!,K7BP8K%BR!!E+;JC1(DB/ES=F#!ECw
M9 T;)E&JA$'CADR8-EK2L#FYM6NC03$3?4V[MNL=(!P/3//�@Y;^K02>.Fv
MC +<,5R &!.&S9@Z;,+0F6HF^AD0:<S\+C,'.M^!((J?D3XUC!LRRWO;*;,7u
MQ!NI<LR3F0."SAL0<-[,23/<SG$0,B@W1QEC](:>"+J1488((-S!'QH@U.'&t
M?F<4AUYW8^C!W7[K@1 $'' AT<:;4C'FV\@_#>#<F:D@4=]:$Q%H6]O:"<'s
M=]ZEJ$!L(+C01!!#2/$$"$DX046"QO$( A-$2 &"%#*P4((,._+(I)-2S"#Er
M##HJV<035CP) PLCP%!E;$,T <63499Y9FQ23"$$$U2PR0*49)JI) A35#%Gq
MG5*,&2B9,;P94IIK9IFGH2'%^>>36BKJYIY-6.$$G6+>:2:C>9YI,Q+,HIp
M14$00<04=N()Y:@>$C$$JE*$&NN=A2J)*$6S9LEJGT(<402NLLHZ ZNE$N$Ko
MKE'B.4(,2BK !1M4*EDLK,E62ZRKL 9+*ZNW@GKGL$KR>JRWN8++8['CJMHFn
MLSPZRX:YL4V;JK728DMNL-RJ">RWN_J9KK:ZUFOLK_,2VNRS-%Q[:L&KUOOJm
MO=O:JB^Y\ 8E+L&Y"JMPNM4:W.ZS-2A,[9WTGFMOQA'SV&VY_?:*,< 5AX0Nl
MQATO>S ;-HC,<+0F/XQRK/FN27'+_]+*K\ <D^QQ;.[>H+.Z)%_K,\"UJCPQk
MR^'Z^[+1 9L\,+)*V_PQ&S@\W7'#/6?+==5H7JUQUB[OB[7722O++M//YF"Vj
MTFC'>S+50<L=LT<7R_TVW32'?7=0[HZTM[I2JYUQX$/#7?3<?G]=L-AXL]'9i
MXR7[/?7:E&-NL=:&'XTXV';?W!+H43LL.;X2"VUZ2(5#W'7F=:_K^N :R0MUh
MWT&5.OKDM0M.]-:W>S0SZ[Z/'4/" H\,N>RZLQW4RH?'EOO/P)-J:N]+,_YLg
M#"%7OW/DV9?>_>EQZQY^JYI#S;GYGN>L_O \BSY[RFVSW?MPASKY;2QQK9.>f
MT_9WMOX5[V^D2U[E>/0]F!T0>N4+2>/*QD"^.5!F$$2>U02H.N\5$'P7W-SBe
M-'@^O77P>FEKGP2;IY$*<FU^SU/AS62@)Z]9+W0//![M1J@\RS%O@,X;'P*Cd
MUSD9: ^$"^,?^W[V1(]PKX3PNQP2@Z=$#-Z/A=#Z8!*CV, I FZ&6Z2(#6DHc
MOOK5;(4><9<,<&BJ'\8NAE1T'Q8)&#\4(FV)&8SCLU(".^*!4(@ W)[;]DBXb
M$UKPCUZ$HT;DF#X?K@][>40C(VOHR!NFT'Z2I(@<]6=)*6+RC$2<H G[^,C5a
MZ7!L,EA@*<MXR@BFDHU\ZB0N<PC*'7)PEAXTHRT#6$0*ZC*-]"/?%P4)+1<"z
M$X;^D^$MD;E&9/+RC3>;00\S9T=#)A&10-/D[K)XQ$VV49FA3-&S9E!%+I(Qy
MF+44(3%52<[4C1.*Z,RF&-W9S7V*#YSMI,@5[]E(5GH2DJ]D6AR3-ZA)\<A1x
M3OB5_8:R*RI@*4HE"*@+BN $(C0A1;8)J4A'2M*2FO2DKW',7?*R%Y06IC(7w
MV8QF8#+3V$QO)*%)R4E*$].*R, &-0'!35Q*U,/L:2A%3:I2$8(;W>RG-[\)v
MSG"*\Y_D+*<YSXG.=$!0G3!<)SO;Z0X;OD,?\9 '!/))CQO6TY[WL$<^]+$/u
M?O3#G[#\)T @&%"!S@,"!(4E#0IBD(/H "$)S:@,%ZI#AC84EJE\*$1O&%&)t
MA@/5%.%F15QU$8QDE(8*N<=&.!IKEV+C(R )B4A&0A*KKF2G*:VV29#:TFACs
MXZE,.928RMHFG.2$J8FR*I>/&E1# PJ"6TE*MT%Q5&\CI:7;TM92O144<@EGr
MT4RU@3AUF$-7SZ 5M76MC,(D'?!M"\W/>L-[QEC/X4I3T62D*"<-.@NNYA0q
MQL$-5](E5.G&Y-R0""&B%#EO>A4Z226Q=E#7=4-VMQNXADY7(Q"5:,TH&J[Jp
CXBFCH]IH1S\*F*5Z^,,@#K&(1TSB$IOXQ"A.L8I7S.(6>QA7o
n
endchughes@maths.tcd.ie (Conrad Hughes) (06/11/91)
In <1991Jun10.112536.15897@rusmv1.rus.uni-stuttgart.de> zrzm0370@rusmv1.rus.uni-stuttgart.de (Joerg Scheurich) writes about division.. >382 Cycles in 100 Bytes, with usage of 4 32-Bit-registers >and 4 Byte memory. >With unrolling the loop, >the faster and longer 32-Bit/32-Bit-division-routine requires >182 Cycles in 692 Bytes, with usage of 4 32-Bit-registers. This divides a 64bit number by a 32bit one, with 32bit result, in 97 instructions/388 bytes/97 clock cycles.. Entry: A=MSW of 64bit no., B=LSW of 64bit no., C=32bit no. C _must_ be negative (if it's positive, RSB C,C,#0), A _must_ be positive (if not, RSBS B,B,#0:RSC A,A,#0) ADDS B,B,B ADCS A,C,A,LSL#1:SUBCC A,A,C:ADCS B,B,B [repeat these three 32 times] Exit: A=Remainder of division, B=result, C unchanged. ..if you don't need the remainder, get rid of the last SUBCC A,A,C. The result will be unsigned, and flags will (obviously) reflect the last addition. Adjust as you will to get 32/32 division (Initialise A to 0, or AB = numerator shifted some way) or any other variety you like.. Rolling up the loop it could get slightly complicated since every instruction requires flag information from the previous one ;-) Something like SUB count,count,#1:TEQ count,#0:BNE loop with count initialised to #32 might do the trick, but it'll slow it down to at least 288 cycles (so what if it only uses 32 bytes?? ;-}), which is pretty unacceptable unless you're on an ARM3.. This is off the top of my head and I'm nowhere near the Arc, so if it doesn't work (it should - I've typed it in enough times now..) flame me.. Conrad -- Will you sell me one of those if I shave my head Get me out of town is what Fireball said Never trust a man in a blue trench coat Never drive a car when you're dead Tom Waits