zrzm0370@rusmv1.rus.uni-stuttgart.de (Joerg Scheurich) (06/10/91)
Hi So i know, there is no Assembler-command for integer-division in the ARM. Therefor a fast division-routine is a thing of common interest. In a earlier Posting i wrote about the bad quick hacked division-routine for the demo of my formula-tool (math. formula ==> assemblermacros). Now i wrote a better routine, but i don't know, if there is a better algorithm. The shorter and slower 32-Bit/32-Bit-division-routine requires 382 Cycles in 100 Bytes, with usage of 4 32-Bit-registers and 4 Byte memory. With unrolling the loop, the faster and longer 32-Bit/32-Bit-division-routine requires 182 Cycles in 692 Bytes, with usage of 4 32-Bit-registers. This is very faster then the old routine (which requires 2**33 Cycles in worst case and requires all time of the universe, if there is a division by zero ... ) Cause the INTERNAL-assembler-command of the INTEL 8086 for a 32-Bit/16-Bit-division requires 165-184 Cycles in 2-4 Bytes, with usage of 3 16-Bit-registers, i think, the result is ok. The result is nothing against the INTERNAL-assembler-command of the ARM for a 32-Bit*32-Bit-multiplication which requires 17 Cycles in 4 Bytes, with usage of 1-3 32-Bit-registers, but my Professor in digital electronics used to say: "If a program contain much divisions, the programmer is a idiot" ... Know someone a better (faster or shorter) algorithm ? so long MUFTI ( internet: zrzm0111@helpdesk.rus.uni-stuttgart.de ( from janet: zrzm0111%de.uni-stuttgart.rus.helpdesk@NFS-RELAY ) bitnet: ZRZM AT DS0RUS1I ) This is not a posting of a binary. Cause the readable routines contains very much linefeeds and spaces, I send them in submit-extract-format (41 lines). Posting-number: Volume 01, Issue 07 Submitted-by: MUFTI Archive-name: Division Macros/part01 table !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ begin 644 <SubExtWrk$Dir>.Work.File-Z M'YV07YBDF4,'@,&#"!,J7,BPH<.'$",B!$'1!@P8!BEJQ A (T6.'CW&@#&Cz M8PT8,FB<K!%#AHR.(&K,N'&C(T>).'/JW,FSIT^)(4/" /&SJ-&C2),JA3BEy M#!TZ>>"4 <'CB9@R>4@022/'APLR:>RT,!.&( @31M(J:/HTZM2J5[-N[?HUx M;(LY:-[(H7,VK1$%2P,+'DRXL.'#B!,K7BP8K%BR!!E+;JC1(DB/ES=F#!ECw M9 T;)E&JA$'CADR8-EK2L#FYM6NC03$3?4V[MNL=(!P/3//�@Y;^K02>.Fv MC +<,5R &!.&S9@Z;,+0F6HF^AD0:<S\+C,'.M^!((J?D3XUC!LRRWO;*;,7u MQ!NI<LR3F0."SAL0<-[,23/<SG$0,B@W1QEC](:>"+J1488((-S!'QH@U.'&t M?F<4AUYW8^C!W7[K@1 $'' AT<:;4C'FV\@_#>#<F:D@4=]:$Q%H6]O:"<'s M=]ZEJ$!L(+C01!!#2/$$"$DX046"QO$( A-$2 &"%#*P4((,._+(I)-2S"#Er M##HJV<035CP) PLCP%!E;$,T <63499Y9FQ23"$$$U2PR0*49)JI) A35#%Gq MG5*,&2B9,;P94IIK9IFGH2'%^>>36BKJYIY-6.$$G6+>:2:C>9YI,Q+,HIp M14$00<04=N()Y:@>$C$$JE*$&NN=A2J)*$6S9LEJGT(<402NLLHZ ZNE$N$Ko MKE'B.4(,2BK !1M4*EDLK,E62ZRKL 9+*ZNW@GKGL$KR>JRWN8++8['CJMHFn MLSPZRX:YL4V;JK728DMNL-RJ">RWN_J9KK:ZUFOLK_,2VNRS-%Q[:L&KUOOJm MO=O:JB^Y\ 8E+L&Y"JMPNM4:W.ZS-2A,[9WTGFMOQA'SV&VY_?:*,< 5AX0Nl MQATO>S ;-HC,<+0F/XQRK/FN27'+_]+*K\ <D^QQ;.[>H+.Z)%_K,\"UJCPQk MR^'Z^[+1 9L\,+)*V_PQ&S@\W7'#/6?+==5H7JUQUB[OB[7722O++M//YF"Vj MTFC'>S+50<L=LT<7R_TVW32'?7=0[HZTM[I2JYUQX$/#7?3<?G]=L-AXL]'9i MXR7[/?7:E&-NL=:&'XTXV';?W!+H43LL.;X2"VUZ2(5#W'7F=:_K^N :R0MUh MWT&5.OKDM0M.]-:W>S0SZ[Z/'4/" H\,N>RZLQW4RH?'EOO/P)-J:N]+,_YLg M#"%7OW/DV9?>_>EQZQY^JYI#S;GYGN>L_O \BSY[RFVSW?MPASKY;2QQK9.>f MT_9WMOX5[V^D2U[E>/0]F!T0>N4+2>/*QD"^.5!F$$2>U02H.N\5$'P7W-SBe M-'@^O77P>FEKGP2;IY$*<FU^SU/AS62@)Z]9+W0//![M1J@\RS%O@,X;'P*Cd MUSD9: ^$"^,?^W[V1(]PKX3PNQP2@Z=$#-Z/A=#Z8!*CV, I FZ&6Z2(#6DHc MOOK5;(4><9<,<&BJ'\8NAE1T'Q8)&#\4(FV)&8SCLU(".^*!4(@ W)[;]DBXb M$UKPCUZ$HT;DF#X?K@][>40C(VOHR!NFT'Z2I(@<]6=)*6+RC$2<H G[^,C5a MZ7!L,EA@*<MXR@BFDHU\ZB0N<PC*'7)PEAXTHRT#6$0*ZC*-]"/?%P4)+1<"z M$X;^D^$MD;E&9/+RC3>;00\S9T=#)A&10-/D[K)XQ$VV49FA3-&S9E!%+I(Qy MF+44(3%52<[4C1.*Z,RF&-W9S7V*#YSMI,@5[]E(5GH2DJ]D6AR3-ZA)\<A1x M3OB5_8:R*RI@*4HE"*@+BN $(C0A1;8)J4A'2M*2FO2DKW',7?*R%Y06IC(7w MV8QF8#+3V$QO)*%)R4E*$].*R, &-0'!35Q*U,/L:2A%3:I2$8(;W>RG-[\)v MSG"*\Y_D+*<YSXG.=$!0G3!<)SO;Z0X;OD,?\9 '!/))CQO6TY[WL$<^]+$/u M?O3#G[#\)T @&%"!S@,"!(4E#0IBD(/H "$)S:@,%ZI#AC84EJE\*$1O&%&)t MA@/5%.%F15QU$8QDE(8*N<=&.!IKEV+C(R )B4A&0A*KKF2G*:VV29#:TFACs MXZE,.928RMHFG.2$J8FR*I>/&E1# PJ"6TE*MT%Q5&\CI:7;TM92O144<@EGr MT4RU@3AUF$-7SZ 5M76MC,(D'?!M"\W/>L-[QEC/X4I3T62D*"<-.@NNYA0q MQL$-5](E5.G&Y-R0""&B%#EO>A4Z226Q=E#7=4-VMQNXADY7(Q"5:,TH&J[Jp CXBFCH]IH1S\*F*5Z^,,@#K&(1TSB$IOXQ"A.L8I7S.(6>QA7o n end
chughes@maths.tcd.ie (Conrad Hughes) (06/11/91)
In <1991Jun10.112536.15897@rusmv1.rus.uni-stuttgart.de> zrzm0370@rusmv1.rus.uni-stuttgart.de (Joerg Scheurich) writes about division.. >382 Cycles in 100 Bytes, with usage of 4 32-Bit-registers >and 4 Byte memory. >With unrolling the loop, >the faster and longer 32-Bit/32-Bit-division-routine requires >182 Cycles in 692 Bytes, with usage of 4 32-Bit-registers. This divides a 64bit number by a 32bit one, with 32bit result, in 97 instructions/388 bytes/97 clock cycles.. Entry: A=MSW of 64bit no., B=LSW of 64bit no., C=32bit no. C _must_ be negative (if it's positive, RSB C,C,#0), A _must_ be positive (if not, RSBS B,B,#0:RSC A,A,#0) ADDS B,B,B ADCS A,C,A,LSL#1:SUBCC A,A,C:ADCS B,B,B [repeat these three 32 times] Exit: A=Remainder of division, B=result, C unchanged. ..if you don't need the remainder, get rid of the last SUBCC A,A,C. The result will be unsigned, and flags will (obviously) reflect the last addition. Adjust as you will to get 32/32 division (Initialise A to 0, or AB = numerator shifted some way) or any other variety you like.. Rolling up the loop it could get slightly complicated since every instruction requires flag information from the previous one ;-) Something like SUB count,count,#1:TEQ count,#0:BNE loop with count initialised to #32 might do the trick, but it'll slow it down to at least 288 cycles (so what if it only uses 32 bytes?? ;-}), which is pretty unacceptable unless you're on an ARM3.. This is off the top of my head and I'm nowhere near the Arc, so if it doesn't work (it should - I've typed it in enough times now..) flame me.. Conrad -- Will you sell me one of those if I shave my head Get me out of town is what Fireball said Never trust a man in a blue trench coat Never drive a car when you're dead Tom Waits