[comp.lang.postscript] GCD algorithm

bvs@light.uucp (Bakul Shah) (11/01/90)

In article <90302.140837CXT105@psuvm.psu.edu> CXT105@psuvm.psu.edu (Christopher Tate) writes:
> ...
>I will mention again, though, that this code contains a very well-behaved
>implementation of Euclid's classic algorithm for calculating the Greatest
>Common Divisor of two integers.  Anyone who's looking for such a beast,
>feel free to borrow it!

A couple of comments on the GCD algorithm.

- To justify posting this to a PostScript group, here is a faster
  version of basically the same algorithm:

    /gcd {              % given m, n on the stack
      dup 3 1 roll mod  % leaves n, m mod n on the stack
      dup 0 eq {
	pop             % leaves n on the stack when done
      } {
	gcd             % find gcd of the new pair
      } ifelse
    } bind def

  Not too surprisingly, this recursive version is faster than
  the one below which uses an explicit loop:

    /gcd {
      { dup 3 1 roll mod dup 0 eq { pop exit } if } loop
    } bind def

- I discovered a curious fact when I looked through a bunch of
  math/CS books for an exact description of Euclid's algorithm.
  The three math books I looked at describe the _faster_ algorithm
  (the one used in the spirograph program) as Euclid's alg., while
  the three CS books describe the following _slower_ one as
  Euclid's:

    /* using guarded commands... */
    loop {
      a > b => a -= b;
      b > a => b -= a;
    }

    The loop terminates when neither guard is true, i.e. when
    a == b.  This extends in a nice (but slow) way for the GCD
    of N numbers.  As an example, gcd(a, b, c, ... n) is

    loop {
      a > b => a -= b;
      b > c => b -= c;
      c > d => c -= d;
      ...
      n > a => n -= a;
    }

  So....  Who is right?  How did two different alg. came to be
  known as Euclid's?  Does the faster alg. also have such a nice
  generalization for gcd of N numbers?  The slower alg. can be
  modified slightly to simulteneously compute LCM.  Can the same
  thing be done with the faster one?  Time to visit a library!

-- Bakul Shah
   bvs@BitBlocks.COM
   ...!{ames,decwrl,sun,uunet}!amdcad!light!bvs

CXT105@psuvm.psu.edu (Christopher Tate) (11/01/90)

In article <1990Oct31.182705.3184@light.uucp>, bvs@light.uucp (Bakul Shah) says:

>- To justify posting this to a PostScript group, here is a faster
>  version of basically the same algorithm:
>
>    /gcd {              % given m, n on the stack
>      dup 3 1 roll mod  % leaves n, m mod n on the stack
>      dup 0 eq {
>        pop             % leaves n on the stack when done
>      } {
>        gcd             % find gcd of the new pair
>      } ifelse
>    } bind def
>
>  Not too surprisingly, this recursive version is faster than
>  the one below which uses an explicit loop:
>
>    /gcd {
>      { dup 3 1 roll mod dup 0 eq { pop exit } if } loop
>    } bind def
>
>                     [stuff deleted]
>
>  So....  Who is right?  How did two different alg. came to be
>  known as Euclid's?  Does the faster alg. also have such a nice
>  generalization for gcd of N numbers?  The slower alg. can be
>  modified slightly to simulteneously compute LCM.  Can the same
>  thing be done with the faster one?  Time to visit a library!

As it happens, I'm takinga computational number theory course even as
we speak :-), so here's what my text has to say on the subject of Euclid's
GCD algorithm....

Essentially, the whole thing works because of this odd property of the GCD;
namely, that  gcd(a, b) = m*a + n*b  for some integers m and n (not
necessarily unique).  From this, we can do some fiddling with the properties
of division and so on to show that  gcd(a, b) = gcd(b, a - m*b)  for ANY
integer m.  In Euclid's algorithm, we essentially take m to be as large as
possible, i.e. a - m*b is as close to zero as possible.  Since we can write
r = a mod b = a - m*b (for m as large as possible), we simply use a mod b
for the next iteration, and try to calculate gcd(b, a mod b).

I once ran across a version of this algorithm (in C) which looked like
this:

int gcd(int a, int b)
{
   if (a>b) return gcd(a, a-b);
   else if (a<b) return gcd(b, b-a);
   else return a;
}

Essentially, taking the smallest steps down (using a-b instead of a mod b)
possible at a time, but avoiding any division.  Still used an awful lot of
stack space sometimes, though....

I'm very suprised that the recursive version is faster than the iterative
one; I suspect it has to do with the way recursion is implemented in
PostScript.  Is it still faster if you don't bind the gcd procedures?  How
about if you give it two large, almost-equal primes?  In most programming
languages (i.e. native code compilation, etc.), given an iterative algorithm
and an equivalent recursive algorithm, the iterative algorithm will be faster
if only because it won't have to go through all the overhead of the recursive
function calls....

There's a version of this algorithm known as the binary GCD algorithm,
which works on the binary representation of the numbers (indirectly) instead
of on the numbers themselves.  This allows it to take advantage of the fact
that just finding a modulus is computationally icky (as it requires division),
while dividing an integer by two is what computers do best.  It's not usually
faster than the above, though, unless implemented directly in machine code.

And, for everyone who's been following this, the version of the GCD algorithm
shown above IS much faster than mine -- use it instead!  Mine was a quick-
and-dirty rewrite of a Rexx version of the algorithm, and so was not at all
suited to the real stack-handling capabilities of PostScript.

-------
Christopher Tate                   | "Pardon me, but is this your bar
                                   |    of soap?"
cxt105@psuvm.psu.edu               | "Why, yes, I suppose it is...."
{...}!psuvax1!psuvm.bitnet!cxt105  |
cxt105@psuvm.bitnet                | "So do we."

glenn@heaven.woodside.ca.us (Glenn Reid) (11/02/90)

In article <90305.105824CXT105@psuvm.psu.edu> CXT105@psuvm.psu.edu (Christopher Tate) writes:
>In article <1990Oct31.182705.3184@light.uucp>, bvs@light.uucp (Bakul Shah) says:
>
>>- To justify posting this to a PostScript group, here is a faster
>>  version of basically the same algorithm:
>>
>>    /gcd {              % given m, n on the stack
>>      dup 3 1 roll mod  % leaves n, m mod n on the stack
>>      dup 0 eq {
>>        pop             % leaves n on the stack when done
>>      } {
>>        gcd             % find gcd of the new pair
>>      } ifelse
>>    } bind def
>>

[ stuff deleted ]

>And, for everyone who's been following this, the version of the GCD algorithm
>shown above IS much faster than mine -- use it instead!  Mine was a quick-
>and-dirty rewrite of a Rexx version of the algorithm, and so was not at all
>suited to the real stack-handling capabilities of PostScript.

I put in the newer version of "gcd" and it went a bit faster.  Even more
dramatic was to bind the for loop at the end, like this:

    0 3 limit
    {
       dup frac mul /phi exch def       % calculate phi
       /theta exch def                  % recalculate theta
       theta limit div 1.0 1.0 sethsbcolor    % go through all colors
       theta cos diff mul phi cos a mul add   % this is x
       theta sin diff mul phi sin a mul add   % this is y
       2 copy lineto stroke moveto      % same as x y lineto stroke x y moveto
    } bind for
%     ^^^^  (add "bind" here)

(Glenn) cvn pop

-- 
 Glenn Reid				RightBrain Software
 glenn@heaven.woodside.ca.us		PostScript/NeXT developers
 ..{adobe,next}!heaven!glenn		415-851-1785

bvs@light.uucp (Bakul Shah) (11/04/90)

In an earlier article I gave two versions of a GCD algorithm and
claimed the recursive vesion was faster.  In response Christopher
Tate writes:

> I'm very suprised that the recursive version is faster than the iterative
> one; I suspect it has to do with the way recursion is implemented in
> PostScript.  Is it still faster if you don't bind the gcd procedures?  How
> about if you give it two large, almost-equal primes?  In most programming
> languages (i.e. native code compilation, etc.), given an iterative algorithm
> and an equivalent recursive algorithm, the iterative algorithm will be faster
> if only because it won't have to go through all the overhead of the recursive
> function calls....

In this case it seems the overhead to set up a loop and take it
down when done is higher than just calling a procedure.  Anyway,
after some more timing tests I find that which version is faster
depends on input numbers, on binding and on the interpreter.  See
test results below.  One surprise was that ghostscript 2.0 on a
sun 3/50 was about twice as fast as the QMS PS410 even though the
3/50 cpu is slower and spends some fraction of time in refreshing
the screen.

-- Bakul Shah
   bvs@BitBlocks.COM
   ..!{ames,att,decwrl,pyramid,sun,uunet}!amdcad!light!bvs

The test procedures:

  % {recursive, iterative}{bind def, def} versions of Euclid's gcd alg.
  /gcd1 { dup 3 1 roll mod dup 0 eq { pop } { gcd1 } ifelse } def
  /gcd2 { dup 3 1 roll mod dup 0 eq { pop } { gcd2 } ifelse } bind def
  /gcd3 { { dup 3 1 roll mod dup 0 eq { pop exit } if } loop } def
  /gcd4 { { dup 3 1 roll mod dup 0 eq { pop exit } if } loop } bind def

  % time to exec a procedure
  /time { /t0 usertime def exec usertime t0 sub } bind def

  /test {
    /ncycles exch def
    /num1 exch def
    /num0 exch def

    % compute overhead time for running the test
    /overhead { ncycles { num0 num1 pop pop } repeat } time def

    % collect test times in an array
    [
      [ /gcd1 load /gcd2 load /gcd3 load /gcd4 load ]
      % run each procedure ncyles times and compute run time.
      { /gcd exch def { ncycles { num0 num1 gcd pop } repeat } time } forall
    ]

    % subtract overhead from test times and print
    { overhead sub == ( ) print } forall
  } bind def

Some sample tests:
   31415  31415 1000 test
  111111  11111 1000 test
   11111 111111 1000 test
  123456  23456 1000 test

Results:  (time in millseconds, gs, ralpage run on a sun 3/50)

	gcd1    gcd2    gcd3    gcd4

ghostscript:
	 340     440     480     600     31415  31415 1000 test
	 660     860     820    1020    111111  11111 1000 test
	1020    1280    1200    1480     11111 111111 1000 test
	2300    2940    2600    3240    123456  23456 1000 test

QMS PS410:
	 736     860     900    1064     31415  31415 1000 test
	1472    1708    1564    1840    111111  11111 1000 test
	2204    2548    2224    2604     11111 111111 1000 test
	5132    5908    4872    5684    123456  23456 1000 test

RALpage:
	2450    3484    3000    4351     31415  31415 1000 test
	4934    6867    5267    7518    111111  11111 1000 test
	7433   10267    7516   10666     11111 111111 1000 test
       17634   23701   16534   23351    123456  23456 1000 test

BTW, the C version is about 50 times faster than the fastest
PostScript version under ghostscript.