shs@uts.amdahl.com (Steve Schoettler) (06/02/89)
Here's a puzzle: What's the fastest way to divide an 11 bit number by three, on a processor that doesn't have any multiply or divide instructions? There is not enough room in memory for a table lookup. Suppose (A): answer must be exact (to the nearest bit). (B): answer can be approximate. Here are a couple of ideas for (B): 1. Divide by 2 (shift), divide by 4 (shift again), and take the average (add and shift). This produces 3x/8, which has a 4% error, but only takes 4 instructions. 2. Reduced Table Lookup. If there's room for 1/2 a table or 1/4 a table or ..., shift right n bits. Table entries must contain (2^n)x/3 instead of x/3. As n gets larger, table gets smaller, but the error increases (as 2^n). Steve -- Steve Schoettler shs@uts.amdahl.com {sun,decwrl,pyramid,ames,uunet}!amdahl!shs Amdahl Corp., M/S 213, 1250 E. Arques Ave, Sunnyvale, CA 94088
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (06/02/89)
In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes: > >Here's a puzzle: > What's the fastest way to divide an 11 bit number by three, > on a processor that doesn't have any multiply or divide instructions? > I guess I'm cheating but how about using tri-state logic and shifting right one tit? ;-). (Isn't tit the _accepted_ abbreviation for ternary digit?) -- Roelof Vuurboom SSP/V3 Philips TDS Apeldoorn, The Netherlands +31 55 432226 domain: roelof@idca.tds.philips.nl uucp: ...!mcvax!philapd!roelof
filbo@ssyx.ucsc.edu (Bela Lubkin) (06/02/89)
1/3 in binary is .01010101... To divide an 11-bit number by 3, multiply it by .01010101... binary. To do this without using a multiply instruction: Store 0 in Result. Repeat 5 times: Shift Dividend right by 2 and add it to Result. -- this amounts to 16 simple instructions, if no looping constructs are used. However, the results will be more accurate if you: Store Divident in Result. Repeat 5 times: Shift Dividend right by 2 and add it to Result. Increment Result. Shift Result right by 2. Neither routine is exact. To be exact to 11 bits' accuracy with this method, you must first shift the dividend left 11 times to make it a 22-bit value, then shift-right-by-2-and-add 11 times. And yes, the code can be optimized a bit by cascading the shift-and-adds, eliminating some of the intermediate adds. Alternatively, use any approximate method, then correct the result by re- multiplying it by 3 (shift left and add), then repeatedly adding 3 to that value until it exceeds the dividend, while also incrementing the approximate result. i.e. if algorithm X says that 100/3 is 30: 30*3=90 (30+1)*3=93 (30+2)*3=96 (30+3)*3=99 (30+4)*3=102 retreat to 30+3, 33. You could even recursively apply the approximate algorithm to the difference between the dividend and 3*(dividend approximately/3). >Bela<
firth@sei.cmu.edu (Robert Firth) (06/03/89)
In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes: > What's the fastest way to divide an 11 bit number by three, > on a processor that doesn't have any multiply or divide instructions? > > There is not enough room in memory for a table lookup. Sorry if this is a dense answer, but what's wrong with multiplying by 1/3? In binary, that's 0.010101..., so the algorithm looks like temp := number>>2; quot := temp while temp>=4 do temp := temp>>2; quot := quot+temp end For an 11-bit number, worst case is 5 shifts and 4 adds. If I expected a logarithmic distribution of input values, I'd probably unroll the loop and branch out after iterations 2 and 3.
jimh1@.UUCP (jimh1 is ACK alter ego) (06/03/89)
in the paper "A Fast Technique for Constant Divisors" CACM V19 nr 2 all such division operations are considered for this class of problems. let N be the given. M = N * 5; /* 2^2+1 = 5*/ M = M * 17; /* 2^4 +1 = 17 */ M = M * 257; /* 2^8 +1 = 257 */ /* at this point our intermediate quantity has overflowed our original 11 bit register: for larger register sizes we may have to continue with 2^16+1 and 2^32+1 */ ANS = -M /* VOILA*/ Note that if the N is not evenly divisible by 3 the result ANS appears larger than N. HOWEVER, ANS * 3 results in N (within 11 bits)
jonah@db.toronto.edu (Jeffrey Lee) (06/03/89)
> Here's a puzzle: > What's the fastest way to divide an 11 bit number by three, > on a processor that doesn't have any multiply or divide instructions? > > There is not enough room in memory for a table lookup. > > Suppose (A): answer must be exact (to the nearest bit). > (B): answer can be approximate. > [A] You didn't specify whether the numbers were unsigned or signed magnitude. The following produces exact answers for 11-bit positive integers and does not produce any partial results greater than 2047: int d3(n) register int n; { register q; q = n>>2; q = (n-q)>>1; q = (n-q)>>1; q = (n-q)>>1; q = (n-q)>>1; q = (n-q)>>1; q = (n-q)>>1; q = (n-q)>>1; q = (n-q)>>1; q = (n-q)>>1; q = (n-q)>>1; } There may be faster ways that use larger partial results. This can be altered to work for negative numbers: negate, compute, and negate the result. --- Jeff Lee jonah@db.toronto.edu
jonah@db.toronto.edu (Jeffrey Lee) (06/03/89)
> > Here's a puzzle: > > What's the fastest way to divide an 11 bit number by three, > > on a processor that doesn't have any multiply or divide instructions? > > Oops. Egg on face. I forgot to put "return q" in the routine in my last posting. [It still works with Sun's cc, but not gcc BTW.] Also, since then I came up with a slightly better approach that still keeps partial results less than 11 bits: int d3(n) register int n; { register q; q = n>>1; q = ((((q>>2)+q>>2)+q>>2)+q>>2)+q>>1; /* possibly too small */ q = (n-q)>>1; /* possibly too large */ q = (n-q)>>1; /* just right */ return q; } I'll shut up now. --- Jeff Lee jonah@db.toronto.edu
kenm@sci.UUCP (Ken McElvain) (06/04/89)
In article <d33P02nM30sj01@amdahl.uts.amdahl.com>, shs@uts.amdahl.com (Steve Schoettler) writes: > > Here's a puzzle: > What's the fastest way to divide an 11 bit number by three, > on a processor that doesn't have any multiply or divide instructions? > > There is not enough room in memory for a table lookup. > > Suppose (A): answer must be exact (to the nearest bit). > (B): answer can be approximate. > > Here are a couple of ideas for (B): > Case A) Exact division by 3 with small memory. let x = 4 5 4 3 2 num = a5*x + a4*x + a3*x + a2*x + a1*x + a0 0 <= ai < 4 0 <= a5 < 2 (because num is 11 bits) Dividing by 3 is equivalent to dividing by (x-1), which can be done by normal polynomial division. 4 3 2 1 result = b4*x + b3*x + b2*x + b1*x + b0 + lookup[bn1]; b4 = a5 b3 = a4 - b4 = a4 - a5 b2 = a3 - b3 = a3 - a4 + a5 b1 = a2 - b2 = a2 - a3 + a4 - a5 b0 = a1 - b1 = a1 - a2 + a3 - a4 + a5 bn1 = a0 - b0 = a0 - a1 + a2 - a3 + a4 - a5 Now -7 <= bn1 <= 9, so the lookup table will be small. Remainder can be calcuated in a much simplier way since 4 mod 3 = 1 which means that you can drop all the x**n terms. num mod 3 = (a5 + a4 + a3 + a2 + a1 + a0) mod 3 Repeat the additions until the result fits in two bits, then map 3 to 0. These operations are of some interest in vector machines since one would like to use all banks of memory regardless of the vector stride. The remainder mod 3 gives the bank number and the result of the division gives the offset in the bank. Then any stride which is not divisable by 3 will evenly hit the banks of memory. A prime number works well because it is unlikely to have a factor in common with the stride. Numbers which differ by 1 from a power of 2 are easier to perform the calcuations for, which leaves you with choices of 3 5 7 17 and 31 for number of banks. I believe this was used in the BSP (Burroughs Scientific Processor) with 17 way interleaving. Approximate methods can be used at the expense of not using some fraction of available memory. Whether or not it is worth it depends on how often strides divisable by a power of 2 actually happen. I don't know. Ken McElvain Silicon Compiler Systems decwrl!sci!kenm
hascall@atanasoff.cs.iastate.edu (John Hascall) (06/06/89)
In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes: >Here's a puzzle: > What's the fastest way to divide an 11 bit number by three, > on a processor that doesn't have any multiply or divide instructions? > There is not enough room in memory for a table lookup. > Suppose (A): answer must be exact (to the nearest bit). > (B): answer can be approximate. >Here are a couple of ideas for (B): > 1. Divide by 2 (shift), divide by 4 (shift again), and take the average > (add and shift). This produces 3x/8, which has a 4% error, > but only takes 4 instructions. 1a. If you can do a 2-bit shift in 1 instruction, then try: Divide by 4 [y = (x >> 2)], divide by 16 [z = (y >> 4)] and add [y += z]. This gives: x * (1/4 + 1/16) or x * 0.3125, which is only 2% error in 3 instructions (and adding another shift-by-2 and add gives x * 0.328125 (0.5%) in 5). John Hascall / ISU Comp Center / Ames IA
gideony@microsoft.UUCP (Gideon Yuvall) (06/08/89)
ASPLOS-II has a paper (by the h-p Spectrum people) on this & similar issues. -- Gideon Yuval, gideony@microsof.UUCP, 206-882-8080 (fax:206-883-8101;TWX:160520) (TEMPORARY home 'phone: -883-8039)