[net.math] multiple-precision arithmetic

bs@faron.UUCP (Robert D. Silverman) (11/16/84)

	I have implemented both in C and in VAX assembler a set of
multiple precision arithmetic routines which are quite fast. Using
them I have also implemented numerous state-of the art factorization
and prime testing routines including the Continued Fraction Algorithm,
Pollard P-1 and P+1, Pollard Rho, and primality testing based upon
Lucasian criteria. They are all quite fast and I believe that they
run about as fast as possible on the VAX. The continued fraction
program, for example, factors typical 40 digit numbers in about an
hour, and 50 digits numbers in about 35 hours. 
	The key problems involved in implementing multi-precision
routines on the VAX is the high cost of subroutine calls and the
relatively slow speed of the EMUL and EDIV instructions. A typical
CALLS instruction on the VAX with 3 arguments takes about 15 usec
to execute. Thus, calling a routine to (say) add 2 multi-precise numbers
is quite expensive. The EMUL instruction takes slightly less than 7 usec
to multiply 2 32 bit numbers giving a 64 bit product and EDIV takes
about 12 usec to divide a 64 bit number by a 32 bit number. One
cannot generate code to utilise EDIV and EMUL from C so the low level
arithmetic primitives must either be done in assembler, or you restrict
yourself to working in radix 2^15 so you can multiply 2 integers without
overflow.
	I would be happy to privately respond to any queries.

P.S.	I am also currently engaged in implementing the new Cohen-Lenstra
	prime testing algorithm and would like to hear from anyone who is
	interested in it.

keith@reed.UUCP (Packard) (11/29/84)

I have a very good use for multi precision arithmetic.  I have already
written my own (not the best but functional) in C for the VAX as a
part of a package that includes a pseudo machine that executes code
generated by a compiler doing multi precision rational arithmetic.

The compiler accepts a large subset of C as input.

The pseudo machine is implemented as a stack machine with several
built in functions: scanf, printf, gcd, ...

As a bench mark, my pseudo machine computes the primes to 100 in
3.4 seconds (sorry it's so slow): 

/*
 *	compute prime numbers using a silly algorithm
 */
main ()
{
	auto	i,j,k, limit, c;

	printf ("limit: ");
	scanf ("%d", limit);
	c = 2;
	printf ("2\t3\t");
	for (i = 5; i < limit; i = i + 2) {
		j = sqrt (i, 4);
		for (k = 3; k <= j; k += 2) {
			if (~ i % k) {
				goto bad;
			}
		}
		printf ("%d", i);
		if ((c += 1) == 8) {
			c = 0;
			printf ("\n");
		} else
			printf ("\t");
bad:	;
	}
	if (c)
		printf ("\n");
}

sqrt (n, m)
{
	auto	i, t, j;

	j = int (n / 2);		/* note built in function int */
	for (i = 0; i < m; i += 1) {
		if ((t = int (n / j)) == j)
			return j;
		j = int ((t + j) / 2);
	}
	return j;
}

but it computes the factorial of 100:

9332621544394415268169923885626670049071596826438162146859296389521759999322991
5608941463976156518286253697920827223758251185210916864000000000000000000000000

in only 0.3 seconds

main ()
{
	/*
	 * note storage class used instead of type, no types in
	 * this language
	 */
	auto	i;

	while (scanf ("%d", i) == 1)
		printf ("%d\n", i!);
	exit (0);		/* actually returns 0 all the way back to the shell! */
}

I have basically finished with this project but if I found better
arithmetic routines (better than those found in Knuth) I would
certainly like to include them.  I am a compiler writer by
nature, this complicated math stuff takes too much work to
verify, out of ~5000 lines of code for this system, the
1500 lines of math routines took at least 50% of the time to
get working.

I am willing to distribute this package to anyone interested.
Unfortunately, the math routines are rather VAX specific  I
think they will work on a 16032 or any other 32 bit machine
with correct byte order but, I have not tested this (even though
I have access to a 16032 machine running 4.2 unix, if you
want to know if it works I could ethernet the sources down there
and try it before sending the whole mess.)

	keith packard
	...tektronix!reed!keith
	or
	...tektronix!tekmdp!keithp		- no sources here but a better connection

steven@mcvax.UUCP (Steven Pemberton) (12/03/84)

We are producing here a new programming language, B, that includes
unbounded-length rational arithmetic. The implementation we have is an
interpreter rather than a compiler, but despite that it can generate the
primes to 100 in about the same time as that mentioned (actually slightly
faster), though factorial 100 takes longer.

In comparing it with bc on Unix, it compares well, often being very much
faster. The whole system, including the arithmetic, is largely machine
independent - we've ported it to dozens of machines - though it is Unix
dependent. We're currently producing a non-Unix version for the IBM PC. For
non-commercial use it only costs the price of the media - if you're
interested mail me.

One lovely example of the use of unbounded arithmetic is this following B
program by Lambert Meertens for producing the decimal digits of pi
(after printing 80 digits of pi, a, b, c, and d are all larger than 10**200).

HOW'TO PI:
    WRITE '3.'
    PUT 3, 0, 40, 4, 24, 0, 1 IN k, a, b, c, d, e, f
    WHILE 1=1:
        PUT k**2, 2*k+1, k+1 IN p, q, k
        PUT b, p*a+q*b, d, p*c+q*d IN a, b, c, d
        PUT f, floor(b/d) IN e, f
        WHILE e=f:
            WRITE e<<1
            PUT 10*(a-e*c), 10*(b-f*d) IN a, b
            PUT floor(a/c), floor(b/d) IN e, f:>

Most of it should be obvious: PUT ... IN ... is the assignment command;
WRITE e<<1 writes e in width 1.

Rather than let the program run to completion :-), here is a little of its
output:

3.14159265358979323846264338327950288419716939937510582097494459230781640628620899

Steven Pemberton, CWI, Amsterdam.

west@sphinx.UChicago.UUCP (Steve Westfall) (12/05/84)

I think that you should rename your new language something other than
"B".  I have never seen it, but a number of books mention an already-
existing language called "B" that was a predecessor of C (see, for
example, S.R. Bourne, The Unix System, page 74).
-- 
Steve Westfall, Staff Analyst
University of Chicago Computation Center
uucp:     ihnp4!gargoyle!sphinx!west
Mailnet:  staff.westfall@UChicago.Mailnet

keithd@cadovax.UUCP (Keith Doyle) (12/06/84)

Speaking of PI and multiple precision arithmetic, after once upon a time
looking at the infinite series for computing PI, it seemed possible to 
write a routine that could run forever, computing digits for PI.  No
need for umpteen digit precision arithmetic, as it can be determined
when a digit will no longer be affected by further computations, thus
allowing the digit to be produced, and the 'remainder' of the PI
computation to be scaled up in such a way as no precision is lost,
and only a reasonable integer precision is necessary for any of the
intermediate results.

I've always wanted to do this, and produce a subroutine that would
return the 'next' digit of PI, but never got around to it.  Has anyone
out there done this?  Thought it would be fun to see how it performs
as a random number generator, etc..


Keith Doyle
{ucbvax,ihnp4,decvax}!trwrb!cadovax!keithd
"You'll PAY to know what you REALLY think!"

P.S.  Who knows, maybe I'll finally get around to working this one out
      myself.  Hardly high priority.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/08/84)

> ... it seemed possible to write a routine that could run forever,
> computing digits for PI.  ...

I once saw a MACRO-11 routine (quite small) for computing the digits of
"e" indefinitely, using only a small amount of storage.  I tried it and
it worked just fine.  Does anyone recall how this is done?

tstorm@vu44.UUCP (Theo van der Storm) (12/09/84)

Here is another (faster) algorithm to calculate pi. The algorithm that
was posted by mcvax!steven seems to be working with unnecessary
big numbers, although it is in some sense "incremental".
This algorithm is not "incremental".

:::::::::::::::::::::::::::
First a B implementation of the algorithm:
:::::::::::::::::::::::::::

HOW'TO PI n:
	\ Print the first n decimals of pi. mcvax!vu44!tstorm
	WRITE 'Approximation of pi in `n` digits' /
	WRITE (arcam(48,18,n))+(arcam(32,57,n))+(arcam(-20,239,n)) /

YIELD arcam(a, m, n):
	\ Calculate (10**n)*a*arctan(1/m), mcvax!vu44!tstorm
	PUT floor(((10**n)*a)/m), -m*m, 0, 1, 0 IN t, mc, s, kk, i
	WHILE t<>0:
		PUT s+floor(t/kk) IN s
		PUT floor(t/mc), kk+2, i+1 IN t, kk, i
	IF m=18:
		WRITE 'Error < `i+i` units of the last decimal' /
	RETURN s+ceiling(i/2)

:::::::::::::::::::::::::::
And now a BC implementation
Old BC versions use the =OP notation the version I'm working with
uses OP= notation. Change line 5 if necessary.
:::::::::::::::::::::::::::

define t(a, m, n){
    /* calculate (10^n)*a*arctan(1/m), mcvax!vu44!tstorm */
    t = ((10^n)*a)/m; c = -m*m; s = 0; k = 1
    while (t != 0){
	s += t/k; t /= c; k += 2
    }
    return(s)
}

define p(n){
    "Approximation of pi. Number of digits: "; n
    scale = 0; return(t(48, 18, n)+t(32, 57, n)+t(-20, 239, n))
}

:::::::::::::::::::::::::::
-- 

Theo van der Storm, 52 20'N / 4 52'E, {seismo|decvax}!mcvax!vu44!tstorm

jfh@browngr.UUCP (John "Spike" Hughes) (12/10/84)

Computing 'e', using finite precision: I had this as an assignment in
a programming course at Princeton in 1975 (although I never did do the
asst.) The idea (at least in part) was to use variable radix expansions,
i.e. to express a number not in 10th's, hundreth's, etc., but instead
as some number of 1/(2!)'s, some number of 1/(3!)'s, etc. In this
form, e = 1.1111111111111... etc. I can't remember what one did from there...

cooper@pbsvax.DEC (Topher Cooper HLO2-3/M08 DTN225-5819) (12/12/84)

lines: 45

> ... it seemed possible to write a routine that could run forever computing
> digits for PI.  No need for umpteen digit precision arithmetic.

Sounds neat, but I'm afraid its not possible; at least not unless PI is
rational.  I seem to remember that the non-rationality of PI has never
been proven, but the smart money is on that side.

Let's start by assuming an iterative (non-recursive) algorithm which
cranks along with a fixed amount of space, emitting digits of PI every
once in a while.

The state at the end of the code being iterated over, and what is emitted
during its execution, is completely determined by the state at the
start of the code's execution.  Since there is a fixed amount of storage
(say N bits) there is a fixed number of states which it could be in.
After at most 2^N iterations a state would be repeated and the a repeating
pattern would start to be emitted.

Of course, very good approximations can be done in principal with very
little storage.  Conceivably, a single 16 bit state variable could result
in 64Kbits (roughly 20,000 digits).

The same argument applies to recursive algorithms, but in this case the
extra state may be hidden in the stack.

An exact algorithm which didn't start with infinite storage would have to
grow (perhaps slowly).  This is usually done with dynamic alocation of
multiple precision integers, floating-points, or rationals; but could
be done in other ways.

> I once saw a MACRO-11 routine (quite small) for computing the digits
> of "e" indefinitely, using only a small amount of storage.

The same argument applies.  Either the routine was an approximation to
some large number of digits, or it used a trick like a slowly increasing
recursive depth to approximate infinite storage.

I think any elegant algorithms along these lines would be of interest.

	Topher

USENET: ...decvax!decwrl!dec-rhea!dec-pbsvax!cooper
ARPA: cooper%pbsvax.DEC@decwrl.ARPA