[comp.arch] Divide by three?

shs@uts.amdahl.com (Steve Schoettler) (06/02/89)

Here's a puzzle:
	What's the fastest way to divide an 11 bit number by three,
	on a processor that doesn't have any multiply or divide instructions?

	There is not enough room in memory for a table lookup.

	Suppose (A): answer must be exact (to the nearest bit).
	        (B): answer can be approximate.

Here are a couple of ideas for (B):

	1. Divide by 2 (shift), divide by 4 (shift again), and take the average
	    (add and shift).  This produces 3x/8, which has a 4% error,
	   but only takes 4 instructions.

	2. Reduced Table Lookup.
	   If there's room for 1/2 a table or 1/4 a table or ...,
	   shift right n bits.  Table entries must contain (2^n)x/3
	   instead of x/3.  As n gets larger, table gets smaller, but
	   the error increases (as 2^n).
	
Steve
-- 

        Steve Schoettler
        shs@uts.amdahl.com
        {sun,decwrl,pyramid,ames,uunet}!amdahl!shs
        Amdahl Corp., M/S 213, 1250 E. Arques Ave, Sunnyvale, CA 94088

iphwk@terra.oscs.montana.edu (Bill Kinnersley) (06/02/89)

[In "Divide by three?" shs@uts.amdahl.com (Steve Schoettler) said:]
:
: Here's a puzzle:
: 	What's the fastest way to divide an 11 bit number by three,
: 	on a processor that doesn't have any multiply or divide instructions?
:

How about using the identity

   1/3 = 1/4 + (1/4)^2 + (1/4)^3 + ...

In other words,

   term = number
   ans = 0
   for i=1 to 6
      term = term/4
      ans = ans + term



-- 
--Bill Kinnersley
  Physics Department   Montana State University    Bozeman, MT 59717
  INTERNET: iphwk@terra.oscs.montana.edu      BITNET: IPHWK@MTSUNIX1
226 Transfer complete.

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (06/02/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes:
>
>Here's a puzzle:
>	What's the fastest way to divide an 11 bit number by three,
>	on a processor that doesn't have any multiply or divide instructions?
>
I guess I'm cheating but how about using tri-state logic and shifting
right one tit? ;-). (Isn't tit the _accepted_ abbreviation for ternary digit?)
-- 
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

rcdsdgx@dutrun.UUCP (Dik Groot) (06/02/89)

In article <8906020201.AA10938@terra.oscs.montana.edu> iphwk@terra.oscs.montana.edu (Bill Kinnersley) writes:
>[In "Divide by three?" shs@uts.amdahl.com (Steve Schoettler) said:]
>: Here's a puzzle:
>: 	What's the fastest way to divide an 11 bit number by three,
>: 	on a processor that doesn't have any multiply or divide instructions?
>How about using the identity
>
>   1/3 = 1/4 + (1/4)^2 + (1/4)^3 + ...
>
which can also be written as 

1/3 = 1/4 * ( 1 + 1/4 ) * ( 1 + (1/4)^2 ) * ( 1 + (1/4)^4 ) * ... ^8 ...

So divide by 4, and add to the original. Divide the result by 16, and
add. Divide by 256, and add. Do this in 16 bits arithmetic or you might
loose some bits. Divide the last result by 4.
                                                       DikGroot!
-- 
{ - - - - - - - - - - - - - - - - - - - disclaimer : I usually lie. }
Dik Groot,   Delft University of Technology,   Rekencentrum,   DUneT.
Postbox 354, 2600 AJ Delft.  ptt: (31)-15-781938  fax: (31)-15-786522
BITNET/EARN: RCDSDGX AT HDETUD1          usenet/eunet: rcdsdgx@dutrun

filbo@ssyx.ucsc.edu (Bela Lubkin) (06/02/89)

1/3 in binary is .01010101...

To divide an 11-bit number by 3, multiply it by .01010101... binary.

To do this without using a multiply instruction:
  Store 0 in Result.
  Repeat 5 times:
    Shift Dividend right by 2 and add it to Result. 
-- this amounts to 16 simple instructions, if no looping constructs are used.

However, the results will be more accurate if you:
  Store Divident in Result.
  Repeat 5 times:
    Shift Dividend right by 2 and add it to Result.
  Increment Result.
  Shift Result right by 2.

Neither routine is exact.  To be exact to 11 bits' accuracy with this method,
you must first shift the dividend left 11 times to make it a 22-bit value,
then shift-right-by-2-and-add 11 times.  And yes, the code can be optimized a
bit by cascading the shift-and-adds, eliminating some of the intermediate adds.

Alternatively, use any approximate method, then correct the result by re-
multiplying it by 3 (shift left and add), then repeatedly adding 3 to that
value until it exceeds the dividend, while also incrementing the approximate
result.  i.e. if algorithm X says that 100/3 is 30:
  30*3=90
  (30+1)*3=93
  (30+2)*3=96
  (30+3)*3=99
  (30+4)*3=102
  retreat to 30+3, 33.

You could even recursively apply the approximate algorithm to the difference
between the dividend and 3*(dividend approximately/3).  >Bela<

news@ism780c.isc.com (News system) (06/03/89)

In article <8906020201.AA10938@terra.oscs.montana.edu> iphwk@terra.oscs.montana.edu (Bill Kinnersley) writes:
>[In "Divide by three?" shs@uts.amdahl.com (Steve Schoettler) said:]
>:
>: Here's a puzzle:
>: 	What's the fastest way to divide an 11 bit number by three,
>: 	on a processor that doesn't have any multiply or divide instructions?
>:
>
>How about using the identity
>
>   1/3 = 1/4 + (1/4)^2 + (1/4)^3 + ...

How about a 2K table of quotients indexed by the 11 bit number?

   e.g in C:

       int i_over_three = [0,0,0,1,1,1,2,2,2,...];
       int i;
       i_over_three[i]  /* is i/3 */


     Marv Rubinstein

firth@sei.cmu.edu (Robert Firth) (06/03/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes:

>	What's the fastest way to divide an 11 bit number by three,
>	on a processor that doesn't have any multiply or divide instructions?
>
>	There is not enough room in memory for a table lookup.

Sorry if this is a dense answer, but what's wrong with multiplying
by 1/3?  In binary, that's 0.010101..., so the algorithm looks like

	temp := number>>2;
	quot := temp

	while temp>=4 do temp := temp>>2; quot := quot+temp end

For an 11-bit number, worst case is 5 shifts and 4 adds.  If I expected
a logarithmic distribution of input values, I'd probably unroll the loop
and branch out after iterations 2 and 3.

jimh1@.UUCP (jimh1 is ACK alter ego) (06/03/89)

in the paper "A Fast Technique for Constant Divisors" CACM V19 nr 2
all such division operations are considered for this class of problems.

let N be the given. 
	M = N * 5; /* 2^2+1 = 5*/
	M = M * 17; /* 2^4 +1 = 17 */
	M = M * 257; /* 2^8 +1 = 257 */
/* at this point our intermediate quantity has overflowed our
   original 11 bit register: for larger register sizes we may have
   to continue with 2^16+1 and 2^32+1 */

	ANS = -M    /* VOILA*/

Note that if the N is not evenly divisible by 3 the result ANS 
appears larger than N. HOWEVER, ANS * 3 results in N (within 11 bits)

jonah@db.toronto.edu (Jeffrey Lee) (06/03/89)

> Here's a puzzle:
> 	What's the fastest way to divide an 11 bit number by three,
> 	on a processor that doesn't have any multiply or divide instructions?
> 
> 	There is not enough room in memory for a table lookup.
> 
> 	Suppose (A): answer must be exact (to the nearest bit).
> 	        (B): answer can be approximate.
> 

[A] You didn't specify whether the numbers were unsigned or signed
magnitude.  The following produces exact answers for 11-bit positive
integers and does not produce any partial results greater than 2047:

    int
    d3(n)
        register int n;
    {
        register q;
        q = n>>2;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
        q = (n-q)>>1;
    }

There may be faster ways that use larger partial results.
This can be altered to work for negative numbers:  negate,
compute, and negate the result.

---
Jeff Lee              jonah@db.toronto.edu

jonah@db.toronto.edu (Jeffrey Lee) (06/03/89)

> > Here's a puzzle:
> > 	What's the fastest way to divide an 11 bit number by three,
> > 	on a processor that doesn't have any multiply or divide instructions?
> > 

Oops.  Egg on face.  I forgot to put "return q" in the routine in my
last posting.  [It still works with Sun's cc, but not gcc BTW.]
Also, since then I came up with a slightly better approach that still
keeps partial results less than 11 bits:

    int
    d3(n)
        register int n;
    {
        register q;
        q = n>>1;
        q = ((((q>>2)+q>>2)+q>>2)+q>>2)+q>>1;  /* possibly too small */
        q = (n-q)>>1;                          /* possibly too large */
        q = (n-q)>>1;                          /* just right */
        return q;
    }

I'll shut up now.

---
Jeff Lee              jonah@db.toronto.edu

kenm@sci.UUCP (Ken McElvain) (06/04/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com>, shs@uts.amdahl.com (Steve Schoettler) writes:
> 
> Here's a puzzle:
> 	What's the fastest way to divide an 11 bit number by three,
> 	on a processor that doesn't have any multiply or divide instructions?
> 
> 	There is not enough room in memory for a table lookup.
> 
> 	Suppose (A): answer must be exact (to the nearest bit).
> 	        (B): answer can be approximate.
> 
> Here are a couple of ideas for (B):
> 

Case A) Exact division by 3 with small memory.

let x = 4
           5      4      3       2 
num =  a5*x + a4*x + a3*x  + a2*x + a1*x + a0

0 <= ai < 4
0 <= a5 < 2   (because num is 11 bits)

Dividing by 3 is equivalent to dividing by (x-1), which can be done
by normal polynomial division.

              4      3       2      1	
result =  b4*x + b3*x  + b2*x + b1*x + b0 + lookup[bn1];

	b4  = a5
	b3  = a4 - b4	= a4 - a5
	b2  = a3 - b3	= a3 - a4 + a5
	b1  = a2 - b2	= a2 - a3 + a4 - a5
	b0  = a1 - b1	= a1 - a2 + a3 - a4 + a5
	bn1 = a0 - b0	= a0 - a1 + a2 - a3 + a4 - a5

Now   -7 <= bn1 <= 9, so the lookup table will be small.

Remainder can be calcuated in a much simplier way since 4 mod 3 = 1
which means that you can drop all the x**n terms.

	num mod 3 = (a5 + a4 + a3 + a2 + a1 + a0) mod 3
	Repeat the additions until the result fits in two bits,
	then map 3 to 0.

These operations are of some interest in vector machines since one would
like to use all banks of memory regardless of the vector stride.  The
remainder mod 3 gives the bank number and the result of the division
gives the offset in the bank.  Then any stride which is not divisable
by 3 will evenly hit the banks of memory.  A prime number works well 
because it is unlikely to have a factor in common with the stride.
Numbers which differ by 1 from a power of 2 are easier to perform
the calcuations for, which leaves you with choices of 3 5 7 17 and 31
for number of banks.  I believe this was used in the BSP (Burroughs
Scientific Processor) with 17 way interleaving.  

Approximate methods can be used at the expense of not using some
fraction of available memory.

Whether or not it is worth it depends on how often strides divisable
by a power of 2 actually happen.  I don't know.

Ken McElvain
Silicon Compiler Systems
decwrl!sci!kenm

hascall@atanasoff.cs.iastate.edu (John Hascall) (06/06/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes:
 
>Here's a puzzle:
>	What's the fastest way to divide an 11 bit number by three,
>	on a processor that doesn't have any multiply or divide instructions?
>	There is not enough room in memory for a table lookup.
>	Suppose (A): answer must be exact (to the nearest bit).
>	        (B): answer can be approximate.
 
>Here are a couple of ideas for (B):
 
>	1. Divide by 2 (shift), divide by 4 (shift again), and take the average
>	    (add and shift).  This produces 3x/8, which has a 4% error,
>	   but only takes 4 instructions.
 
        1a. If you can do a 2-bit shift in 1 instruction, then try:
	    Divide by 4 [y = (x >> 2)], divide by 16 [z = (y >> 4)] and
	    add [y += z].  This gives:  x * (1/4 + 1/16) or x * 0.3125,
	    which is only 2% error in 3 instructions (and adding another
	    shift-by-2 and add gives x * 0.328125 (0.5%) in 5).

John Hascall / ISU Comp Center / Ames IA

gideony@microsoft.UUCP (Gideon Yuvall) (06/08/89)

ASPLOS-II has a paper (by the h-p Spectrum people) on this & similar
issues.
-- 
Gideon Yuval, gideony@microsof.UUCP, 206-882-8080 (fax:206-883-8101;TWX:160520)
                                             (TEMPORARY home 'phone: -883-8039)

bd@zyx.SE (Bjorn Danielsson) (06/13/89)

In article <d33P02nM30sj01@amdahl.uts.amdahl.com> shs@uts.amdahl.com (Steve Schoettler) writes:
>
>Here's a puzzle:
>       What's the fastest way to divide an 11 bit number by three,
>       on a processor that doesn't have any multiply or divide instructions?
>
>       There is not enough room in memory for a table lookup.
>
>       Suppose (A): answer must be exact (to the nearest bit).
>               (B): answer can be approximate.
>
>Here are a couple of ideas for (B):
>
>       1. Divide by 2 (shift), divide by 4 (shift again), and take the average
>           (add and shift).  This produces 3x/8, which has a 4% error,
>          but only takes 4 instructions.
>
>       2. Reduced Table Lookup.
>          If there's room for 1/2 a table or 1/4 a table or ...,
>          shift right n bits.  Table entries must contain (2^n)x/3
>          instead of x/3.  As n gets larger, table gets smaller, but
>          the error increases (as 2^n).
>       
>Steve

The following algorithm uses 128 bytes of table space, and calculates an
exact answer in six instructions (or less):

        1. Split the number into a 5-bit (x) and a 6-bit (y) part,
           so that n = 64x + y.

        2. Calculate (64x)/3, y/3, remainder(64x,3), and remainder(y,3)
           by table lookups. There is one table for 64x (32 entries) and
           another for y (64 entries). In both tables, the quotient is
           shifted two bits to the left, and the righmost two bits encode
           the remainder as:

                remainder = 0   ->   bits = 00
                remainder = 1   ->   bits = 01
                remainder = 2   ->   bits = 11

           The first table must be at least 12 bits wide, and the second one
           at least 7 bits.

        3. Add the results of the table lookups. Remainders that add up to 3
           or more will result in a carry into the "4" bit, because of the
           special encoding.

        4. Shift the sum two bits to the right. The result is the answer.

Here's a C program fragment for it:

    short table1[32] = {
           0,   85,  171,  256,  341,  427,  512,  597,  683,  768,  853,
         939, 1024, 1109, 1195, 1280, 1365, 1451, 1536, 1621, 1707, 1792,
        1877, 1963, 2048, 2133, 2219, 2304, 2389, 2475, 2560, 2645
    };

    char table2[64] = {
         0,  1,  3,  4,  5,  7,  8,  9, 11, 12, 13, 15, 16, 17, 19, 20,
        21, 23, 24, 25, 27, 28, 29, 31, 32, 33, 35, 36, 37, 39, 40, 41,
        43, 44, 45, 47, 48, 49, 51, 52, 53, 55, 56, 57, 59, 60, 61, 63,
        64, 65, 67, 68, 69, 71, 72, 73, 75, 76, 77, 79, 80, 81, 83, 84
    };

    unsigned int
    divide_by_three(n)
        unsigned int n;
    {
        unsigned int x, y;
        x = n >> 6;
        x = table1[x];
        y = n & 0x3f;
        y = table2[y];
        x = x + y;
        x = x >> 2;
        return x;
    }

-- 
Bjorn Danielsson, ZYX Sweden AB, +46 (18) 69 67 63, bd@zyx.SE