[rec.puzzles] Divide by three?

shs@uts.amdahl.com (Steve Schoettler) (06/02/89)

Here's a puzzle:
	What's the fastest way to divide an 11 bit number by three,
	on a processor that doesn't have any multiply or divide instructions?

	There is not enough room in memory for a table lookup.

	Suppose (A): answer must be exact (to the nearest bit).
	        (B): answer can be approximate.

Here are a couple of ideas for (B):

	1. Divide by 2 (shift), divide by 4 (shift again), and take the average
	    (add and shift).  This produces 3x/8, which has a 4% error,
	   but only takes 4 instructions.

	2. Reduced Table Lookup.
	   If there's room for 1/2 a table or 1/4 a table or ...,
	   shift right n bits.  Table entries must contain (2^n)x/3
	   instead of x/3.  As n gets larger, table gets smaller, but
	   the error increases (as 2^n).
	
Steve
-- 

        Steve Schoettler
        shs@uts.amdahl.com
        {sun,decwrl,pyramid,ames,uunet}!amdahl!shs
        Amdahl Corp., M/S 213, 1250 E. Arques Ave, Sunnyvale, CA 94088

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (06/02/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes:
>
>Here's a puzzle:
>	What's the fastest way to divide an 11 bit number by three,
>	on a processor that doesn't have any multiply or divide instructions?
>
I guess I'm cheating but how about using tri-state logic and shifting
right one tit? ;-). (Isn't tit the _accepted_ abbreviation for ternary digit?)
-- 
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

filbo@ssyx.ucsc.edu (Bela Lubkin) (06/02/89)

1/3 in binary is .01010101...

To divide an 11-bit number by 3, multiply it by .01010101... binary.

To do this without using a multiply instruction:
  Store 0 in Result.
  Repeat 5 times:
    Shift Dividend right by 2 and add it to Result. 
-- this amounts to 16 simple instructions, if no looping constructs are used.

However, the results will be more accurate if you:
  Store Divident in Result.
  Repeat 5 times:
    Shift Dividend right by 2 and add it to Result.
  Increment Result.
  Shift Result right by 2.

Neither routine is exact.  To be exact to 11 bits' accuracy with this method,
you must first shift the dividend left 11 times to make it a 22-bit value,
then shift-right-by-2-and-add 11 times.  And yes, the code can be optimized a
bit by cascading the shift-and-adds, eliminating some of the intermediate adds.

Alternatively, use any approximate method, then correct the result by re-
multiplying it by 3 (shift left and add), then repeatedly adding 3 to that
value until it exceeds the dividend, while also incrementing the approximate
result.  i.e. if algorithm X says that 100/3 is 30:
  30*3=90
  (30+1)*3=93
  (30+2)*3=96
  (30+3)*3=99
  (30+4)*3=102
  retreat to 30+3, 33.

You could even recursively apply the approximate algorithm to the difference
between the dividend and 3*(dividend approximately/3).  >Bela<

firth@sei.cmu.edu (Robert Firth) (06/03/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes:

>	What's the fastest way to divide an 11 bit number by three,
>	on a processor that doesn't have any multiply or divide instructions?
>
>	There is not enough room in memory for a table lookup.

Sorry if this is a dense answer, but what's wrong with multiplying
by 1/3?  In binary, that's 0.010101..., so the algorithm looks like

	temp := number>>2;
	quot := temp

	while temp>=4 do temp := temp>>2; quot := quot+temp end

For an 11-bit number, worst case is 5 shifts and 4 adds.  If I expected
a logarithmic distribution of input values, I'd probably unroll the loop
and branch out after iterations 2 and 3.

jimh1@.UUCP (jimh1 is ACK alter ego) (06/03/89)

in the paper "A Fast Technique for Constant Divisors" CACM V19 nr 2
all such division operations are considered for this class of problems.

let N be the given. 
	M = N * 5; /* 2^2+1 = 5*/
	M = M * 17; /* 2^4 +1 = 17 */
	M = M * 257; /* 2^8 +1 = 257 */
/* at this point our intermediate quantity has overflowed our
   original 11 bit register: for larger register sizes we may have
   to continue with 2^16+1 and 2^32+1 */

	ANS = -M    /* VOILA*/

Note that if the N is not evenly divisible by 3 the result ANS 
appears larger than N. HOWEVER, ANS * 3 results in N (within 11 bits)

jonah@db.toronto.edu (Jeffrey Lee) (06/03/89)

> Here's a puzzle:
> 	What's the fastest way to divide an 11 bit number by three,
> 	on a processor that doesn't have any multiply or divide instructions?
> 
> 	There is not enough room in memory for a table lookup.
> 
> 	Suppose (A): answer must be exact (to the nearest bit).
> 	        (B): answer can be approximate.
> 

[A] You didn't specify whether the numbers were unsigned or signed
magnitude.  The following produces exact answers for 11-bit positive
integers and does not produce any partial results greater than 2047:

    int
    d3(n)
        register int n;
    {
        register q;
        q = n>>2;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
    }

There may be faster ways that use larger partial results.
This can be altered to work for negative numbers:  negate,
compute, and negate the result.

---
Jeff Lee              jonah@db.toronto.edu

jonah@db.toronto.edu (Jeffrey Lee) (06/03/89)

> > Here's a puzzle:
> > 	What's the fastest way to divide an 11 bit number by three,
> > 	on a processor that doesn't have any multiply or divide instructions?
> > 

Oops.  Egg on face.  I forgot to put "return q" in the routine in my
last posting.  [It still works with Sun's cc, but not gcc BTW.]
Also, since then I came up with a slightly better approach that still
keeps partial results less than 11 bits:

    int
    d3(n)
        register int n;
    {
        register q;
        q = n>>1;
        q = ((((q>>2)+q>>2)+q>>2)+q>>2)+q>>1;  /* possibly too small */
        q = (n-q)>>1;                          /* possibly too large */
        q = (n-q)>>1;                          /* just right */
        return q;
    }

I'll shut up now.

---
Jeff Lee              jonah@db.toronto.edu

kenm@sci.UUCP (Ken McElvain) (06/04/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com>, shs@uts.amdahl.com (Steve Schoettler) writes:
> 
> Here's a puzzle:
> 	What's the fastest way to divide an 11 bit number by three,
> 	on a processor that doesn't have any multiply or divide instructions?
> 
> 	There is not enough room in memory for a table lookup.
> 
> 	Suppose (A): answer must be exact (to the nearest bit).
> 	        (B): answer can be approximate.
> 
> Here are a couple of ideas for (B):
> 

Case A) Exact division by 3 with small memory.

let x = 4
           5      4      3       2 
num =  a5*x + a4*x + a3*x  + a2*x + a1*x + a0

0 <= ai < 4
0 <= a5 < 2   (because num is 11 bits)

Dividing by 3 is equivalent to dividing by (x-1), which can be done
by normal polynomial division.

              4      3       2      1	
result =  b4*x + b3*x  + b2*x + b1*x + b0 + lookup[bn1];

	b4  = a5
	b3  = a4 - b4	= a4 - a5
	b2  = a3 - b3	= a3 - a4 + a5
	b1  = a2 - b2	= a2 - a3 + a4 - a5
	b0  = a1 - b1	= a1 - a2 + a3 - a4 + a5
	bn1 = a0 - b0	= a0 - a1 + a2 - a3 + a4 - a5

Now   -7 <= bn1 <= 9, so the lookup table will be small.

Remainder can be calcuated in a much simplier way since 4 mod 3 = 1
which means that you can drop all the x**n terms.

	num mod 3 = (a5 + a4 + a3 + a2 + a1 + a0) mod 3
	Repeat the additions until the result fits in two bits,
	then map 3 to 0.

These operations are of some interest in vector machines since one would
like to use all banks of memory regardless of the vector stride.  The
remainder mod 3 gives the bank number and the result of the division
gives the offset in the bank.  Then any stride which is not divisable
by 3 will evenly hit the banks of memory.  A prime number works well 
because it is unlikely to have a factor in common with the stride.
Numbers which differ by 1 from a power of 2 are easier to perform
the calcuations for, which leaves you with choices of 3 5 7 17 and 31
for number of banks.  I believe this was used in the BSP (Burroughs
Scientific Processor) with 17 way interleaving.  

Approximate methods can be used at the expense of not using some
fraction of available memory.

Whether or not it is worth it depends on how often strides divisable
by a power of 2 actually happen.  I don't know.

Ken McElvain
Silicon Compiler Systems
decwrl!sci!kenm

hascall@atanasoff.cs.iastate.edu (John Hascall) (06/06/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes:
 
>Here's a puzzle:
>	What's the fastest way to divide an 11 bit number by three,
>	on a processor that doesn't have any multiply or divide instructions?
>	There is not enough room in memory for a table lookup.
>	Suppose (A): answer must be exact (to the nearest bit).
>	        (B): answer can be approximate.
 
>Here are a couple of ideas for (B):
 
>	1. Divide by 2 (shift), divide by 4 (shift again), and take the average
>	    (add and shift).  This produces 3x/8, which has a 4% error,
>	   but only takes 4 instructions.
 
        1a. If you can do a 2-bit shift in 1 instruction, then try:
	    Divide by 4 [y = (x >> 2)], divide by 16 [z = (y >> 4)] and
	    add [y += z].  This gives:  x * (1/4 + 1/16) or x * 0.3125,
	    which is only 2% error in 3 instructions (and adding another
	    shift-by-2 and add gives x * 0.328125 (0.5%) in 5).

John Hascall / ISU Comp Center / Ames IA

gideony@microsoft.UUCP (Gideon Yuvall) (06/08/89)

ASPLOS-II has a paper (by the h-p Spectrum people) on this & similar
issues.
-- 
Gideon Yuval, gideony@microsof.UUCP, 206-882-8080 (fax:206-883-8101;TWX:160520)
                                             (TEMPORARY home 'phone: -883-8039)