[comp.lang.lisp] isqrt

d34676@tansei.cc.u-tokyo.ac.jp (Akira Kurihara) (06/05/91)

I am interested in the speed of a Common Lisp function isqrt
for extremely big bignums.

I compared three functions:

fast-isqrt-mid1.....given below
fast-isqrt-rec......given below
isqrt...............a Common Lisp function

These functions should return the same value.
I tried it on the following three environments:

MACL.........Macintosh Allegro Common Lisp 1.3.2 on Mac SE
             accelerated with 25 MHz 68020
lucid........Lucid Sun Common Lisp 4 on SPARCstation IPC
akcl.........Austin Kyoto Common Lisp 1.530 on SPARCstation IPC

I tried to calculate isqrt of:

(* 2 (expt 10 2000)).....to get  1000 digits of square root of 2
(* 2 (expt 10 6000)).....to get  3000 digits of square root of 2
(* 2 (expt 10 20000))....to get 10000 digits of square root of 2

Here is the result.

*** For 1000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL          1.250 sec           0.317 sec        174     sec
lucid         2.72  sec           0.63  sec         11.91  sec
akcl          2.400 sec           0.567 sec          2.483 sec

*** For 3000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL         12.233 sec           2.450 sec       4503     sec
lucid        27.68  sec           5.45  sec        102.12  sec
akcl         24.567 sec           5.100 sec         24.567 sec

*** For 10000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL        142     sec          35     sec         very long
lucid       327.75  sec          80.31  sec       1126.10  sec
akcl        273.917 sec          64.367 sec        275.783 sec

Looking at this table,
(1) 25 MHz 68020 Macintosh can be as twice as faster than SPARCstation IPC
as far as it is concerned with division of bignums by bignums.
(2) akcl is a little bit faster than lucid at least as far as it is
concerned with division of bignums by bignums.
(3) isqrt = fast-isqrt-mid1 on akcl ?

I appreciate very much if you would evaluate the following forms

(isqrt-time-comparison (* 2 (expt 10 2000)) 1 1)

(isqrt-time-comparison (* 2 (expt 10 6000)) 1 1)

(isqrt-time-comparison (* 2 (expt 10 20000)) 1 1)

on a different environment, and e-mail it to me. The function
isqrt-time-comparison is given below.

Also, if you have a faster function, would you inform me?

Thank you.

---

(defun fast-isqrt-simple (n &aux init-value iterated-value)
  "argument n must be a non-negative integer."
  (cond
   ((zerop n)
    0)
   (t
    (setq init-value 1)
    ; do iteration once
    (setq iterated-value (floor (+ init-value (floor n init-value)) 2))
    (setq init-value iterated-value)
    (loop
      ; do iteration, same as above
      (setq iterated-value (floor (+ init-value (floor n init-value)) 2))
      (if (>= iterated-value init-value) (return init-value))
      (setq init-value iterated-value)))))


(defun fast-isqrt-mid1 (n &aux n-len init-value iterated-value)
  "argument n must be a non-negative integer"
  (cond
   ((> n 24)		; theoretically (> n 0) ,i.e., n-len > 0
    (setq n-len (integer-length n))
    (if (evenp n-len)
      (setq init-value (ash 1 (ash n-len -1)))
      (setq init-value (ash 2 (ash n-len -1))))
    (loop
      (setq iterated-value (ash (+ init-value (floor n init-value)) -1))
      (if (not (< iterated-value init-value))
        (return init-value)
        (setq init-value iterated-value))))
   ((> n 15) 4)
   ((> n  8) 3)
   ((> n  3) 2)
   ((> n  0) 1)
   ((> n -1) 0)
   (t nil)))


(defun fast-isqrt-rec (n &aux n-len-quarter n-half n-half-isqrt
                       init-value iterated-value)
  "argument n must be a non-negative integer"
  (cond
   ((> n 24)		; theoretically (> n 7) ,i.e., n-len-quarter > 0
    (setq n-len-quarter (ash (integer-length n) -2))
    (setq n-half (ash n (- (ash n-len-quarter 1))))
    (setq n-half-isqrt (fast-isqrt-rec n-half))
    (setq init-value (ash (1+ n-half-isqrt) n-len-quarter))
    (loop
      (setq iterated-value (ash (+ init-value (floor n init-value)) -1))
      (if (not (< iterated-value init-value))
        (return init-value)
        (setq init-value iterated-value))))
   ((> n 15) 4)
   ((> n  8) 3)
   ((> n  3) 2)
   ((> n  0) 1)
   ((> n -1) 0)
   (t nil)))


(defun isqrt-time-comparison (n1 how-many iteration-times &aux n2)
  (setq n2 (+ n1 how-many))
  (print "do nothing")
  (time
   (let ((n n1))
     (loop
       (if (not (< n n2)) (return))
       (dotimes (i-t iteration-times)
         )
       (incf n))))
  (print "fast-isqrt-mid1")
  (time
   (let ((n n1))
     (loop
       (if (not (< n n2)) (return))
       (dotimes (i-t iteration-times)
         (fast-isqrt-mid1 n))
       (incf n))))
  (print "fast-isqrt-rec")
  (time
   (let ((n n1))
     (loop
       (if (not (< n n2)) (return))
       (dotimes (i-t iteration-times)
         (fast-isqrt-rec n))
       (incf n))))
  (print "isqrt")
  (time
   (let ((n n1))
     (loop
       (if (not (< n n2)) (return))
       (dotimes (i-t iteration-times)
         (isqrt n))
       (incf n))))
  nil)

----

Akira Kurihara

d34676@tansei.cc.u-tokyo.ac.jp

School of Mathematics
Japan Women's University
Mejirodai 2-8-1, Bunkyo-ku
Tokyo 112
Japan
03-3943-3131 ext-7243

AI.gadbois@MCC.COM (David Gadbois) (06/06/91)

    Date: 5 Jun 91 13:32:11 GMT
    From: d34676@tansei.cc.u-tokyo.ac.jp (Akira Kurihara)


    I am interested in the speed of a Common Lisp function isqrt
    for extremely big bignums.

I added the results of running the test on a Symbolics machine.  The
results for the builtin ISQRT are surprising.  While I normally run
screaming from numeric code, I had a look at the source, and it uses one
of those clever bit-twiddling tricks that appears to calculate the
result in O( (INTEGER-LENGTH n) ) time.  One of the "numerical recipes"
books probably has an explanation of it.

    I compared three functions:

    fast-isqrt-mid1.....given below
    fast-isqrt-rec......given below
    isqrt...............a Common Lisp function

    These functions should return the same value.
    I tried it on the following three environments:

    MACL.........Macintosh Allegro Common Lisp 1.3.2 on Mac SE
		 accelerated with 25 MHz 68020
    lucid........Lucid Sun Common Lisp 4 on SPARCstation IPC
    akcl.........Austin Kyoto Common Lisp 1.530 on SPARCstation IPC
    XL1200.......Symbolics XL1200 running Genera 8.0.2

    I tried to calculate isqrt of:

    (* 2 (expt 10 2000)).....to get  1000 digits of square root of 2
    (* 2 (expt 10 6000)).....to get  3000 digits of square root of 2
    (* 2 (expt 10 20000))....to get 10000 digits of square root of 2

    Here is the result.

    *** For 1000 digits ***

	       fast-isqrt-mid1     fast-isqrt-rec         isqrt
    MACL          1.250 sec           0.317 sec        174       sec
    lucid         2.72  sec           0.63  sec         11.91    sec
    akcl          2.400 sec           0.567 sec          2.483   sec
    XL1200        0.757 sec           0.178 sec          0.00122 sec

    *** For 3000 digits ***

	       fast-isqrt-mid1     fast-isqrt-rec         isqrt
    MACL         12.233 sec           2.450 sec       4503       sec
    lucid        27.68  sec           5.45  sec        102.12    sec
    akcl         24.567 sec           5.100 sec         24.567   sec
    XL1200        7.257 sec           1.484 sec          0.00450 sec

    *** For 10000 digits ***

	       fast-isqrt-mid1     fast-isqrt-rec         isqrt
    MACL        142     sec          35     sec         very long
    lucid       327.75  sec          80.31  sec       1126.10   sec
    akcl        273.917 sec          64.367 sec        275.783  sec
    XL1200       84.780 sec          20.910 sec          0.0120 sec

    Looking at this table,

    (1) 25 MHz 68020 Macintosh can be as twice as faster than
    SPARCstation IPC as far as it is concerned with division of bignums
    by bignums.

I have found that MCL (nee MACL) is a really nice second-generation
Common Lisp.  In most of the cases I have tested, it does better than
the UNIX-based Lisps.  Plus, you can get a decent development platform
for under $10K.  The ISQRT implementation is evidently a disappointment,
though.

--David Gadbois

adler@slot.enet.dec.com (Mark Adler) (06/06/91)

Here are the results I got on a DECsystem 3100 using Lucid Common Lisp.
By the way, when I compiled the code in production mode,
the isqrt time went to zero.  I assume that the compiler "optimized" the
calculation out of existence, presumably since the value was not used.
(This probably also explains the Syumbolics results reported in close to
zero time.)


;;; Lucid Common Lisp/DECsystem
;;; Development Environment Version 4.0, 12 November 1990
;;; Copyright (C) 1985, 1986, 1987, 1988, 1989, 1990 by Lucid, Inc.
;;; All Rights Reserved
;;;
;;; This software product contains confidential and trade secret information
;;; belonging to Lucid, Inc.  It may not be copied for any reason other than
;;; for archival and backup purposes.
;;;
;;; Lucid Common Lisp is a trademark of Lucid, Inc.
;;; DECsystem is a trademark of Digital Equipment Corp.

Lisp> (isqrt-time-comparison (* 2 (expt 10 2000)) 1 1)

"do nothing" 
Elapsed Real Time = 0.03 seconds
Total Run Time    = 0.00 seconds
User Run Time     = 0.00 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =        840

"fast-isqrt-mid1" 
Elapsed Real Time = 2.46 seconds
Total Run Time    = 2.36 seconds
User Run Time     = 2.34 seconds
System Run Time   = 0.02 seconds
Process Page Faults    =          2
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =     21,616

"fast-isqrt-rec" 
Elapsed Real Time = 0.59 seconds
Total Run Time    = 0.55 seconds
User Run Time     = 0.55 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =     10,256

"isqrt" 
Elapsed Real Time = 12.18 seconds
Total Run Time    = 11.68 seconds
User Run Time     = 11.54 seconds
System Run Time   = 0.14 seconds
Process Page Faults    =         57
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed = 10,502,096
There were 20 ephemeral GCs
NIL
Lisp> (isqrt-time-comparison (* 2 (expt 10 6000)) 1 1)

"do nothing" 
Elapsed Real Time = 0.00 seconds
Total Run Time    = 0.00 seconds
User Run Time     = 0.00 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =      2,496

"fast-isqrt-mid1" 
Elapsed Real Time = 24.14 seconds
Total Run Time    = 23.91 seconds
User Run Time     = 23.87 seconds
System Run Time   = 0.04 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =     74,088

"fast-isqrt-rec" 
Elapsed Real Time = 4.77 seconds
Total Run Time    = 4.72 seconds
User Run Time     = 4.71 seconds
System Run Time   = 0.01 seconds
Process Page Faults    =          0
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =     30,872

"isqrt" 
Elapsed Real Time = 97.92 seconds (1 minute, 37.92 seconds)
Total Run Time    = 96.18 seconds (1 minute, 36.18 seconds)
User Run Time     = 95.85 seconds (1 minute, 35.85 seconds)
System Run Time   = 0.33 seconds
Process Page Faults    =        105
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed = 93,533,248
There were 179 ephemeral GCs
NIL
Lisp> (isqrt-time-comparison (* 2 (expt 10 20000)) 1 1)

"do nothing" 
Elapsed Real Time = 0.00 seconds
Total Run Time    = 0.00 seconds
User Run Time     = 0.00 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          0
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =      8,312

"fast-isqrt-mid1" 
Elapsed Real Time = 291.34 seconds (4 minutes, 51.34 seconds)
Total Run Time    = 285.04 seconds (4 minutes, 45.04 seconds)
User Run Time     = 283.89 seconds (4 minutes, 43.89 seconds)
System Run Time   = 1.16 seconds
Process Page Faults    =         22
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =    262,072
There was 1 ephemeral GC

"fast-isqrt-rec" 
Elapsed Real Time = 70.17 seconds (1 minute, 10.17 seconds)
Total Run Time    = 69.63 seconds (1 minute, 9.63 seconds)
User Run Time     = 69.54 seconds (1 minute, 9.54 seconds)
System Run Time   = 0.09 seconds
Process Page Faults    =          3
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =    113,392

"isqrt" 
Elapsed Real Time = 1068.20 seconds (17 minutes, 48.20 seconds)
Total Run Time    = 1055.11 seconds (17 minutes, 35.11 seconds)
User Run Time     = 1052.86 seconds (17 minutes, 32.86 seconds)
System Run Time   = 2.25 seconds
Process Page Faults    =        381
Dynamic Bytes Consed   =    223,568
Ephemeral Bytes Consed = 1,035,828,048
There were 1989 ephemeral GCs
NIL
Lisp>

adler@slot.enet.dec.com (Mark Adler) (06/06/91)

I've added DS-3100 and DS-5000 to the list.


    MACL.........Macintosh Allegro Common Lisp 1.3.2 on Mac SE
		 accelerated with 25 MHz 68020
    lucid........Lucid Sun Common Lisp 4 on SPARCstation IPC
    akcl.........Austin Kyoto Common Lisp 1.530 on SPARCstation IPC
    XL1200.......Symbolics XL1200 running Genera 8.0.2
    DS-3100......Lucid Common Lisp 4.0 on DEC DS-3100
    DS-5000......Lucid Common Lisp 4.0 on DEC DS-5000


*** For 1000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL          1.250 sec           0.317 sec        174     sec
lucid         2.72  sec           0.63  sec         11.91  sec
akcl          2.400 sec           0.567 sec          2.483 sec
XL1200        0.757 sec           0.178 sec          0.00122 sec
DS-3100	      2.34  sec            .55  sec         11.54  sec
DS-5000       1.34  sec            .32  sec          6.57  sec

*** For 3000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL         12.233 sec           2.450 sec       4503     sec
lucid        27.68  sec           5.45  sec        102.12  sec
akcl         24.567 sec           5.100 sec         24.567 sec
XL1200        7.257 sec           1.484 sec          0.00450 sec
DS-3100	     23.87  sec           4.71  sec         95.85  sec
DS-5000      13.69  sec           2.70  sec         54.10  sec


*** For 10000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL        142     sec          35     sec         very long
lucid       327.75  sec          80.31  sec       1126.10  sec
akcl        273.917 sec          64.367 sec        275.783 sec
XL1200       84.780 sec          20.910 sec          0.0120 sec
DS-3100	    283.89  sec          69.54  sec       1052.86  sec    
DS-5000     161.16  sec          39.61  sec        605.57  sec

miller@FS1.cam.nist.gov (Bruce R. Miller) (06/06/91)

In article <19910606055232.5.GADBOIS@KISIN.ACA.MCC.COM>, David Gadbois writes: 
>     Date: 5 Jun 91 13:32:11 GMT
>     From: d34676@tansei.cc.u-tokyo.ac.jp (Akira Kurihara)
> 
>     I am interested in the speed of a Common Lisp function isqrt
>     for extremely big bignums.
> 
> I added the results of running the test on a Symbolics machine.  The
> results for the builtin ISQRT are surprising.  While I normally run
> screaming from numeric code, I had a look at the source, and it uses one
> of those clever bit-twiddling tricks that appears to calculate the
> result in O( (INTEGER-LENGTH n) ) time.  One of the "numerical recipes"
> books probably has an explanation of it.

Actually, one of the compiler books probably has an explanation of it!

I did the same computations (but using Genera 8.0.1, in case it matters)
and got essentially the same results as you did.
I mailed the results directly to Akira -- with one change:

Apparently the symbolics knows that isqrt has no `real' side-effects and
optimizes it out -- the disassembled code reveals no call to isqrt --
and I doubt that isqrt is stashed in microcode! 
To trick the compiler, I changed each timing body to 
  (setq result n)
  (setq result (fast-isqrt-mid1 n))
  ...
  (setq result (isqrt n))
and redid the tests.  Everything was the same except the isqrt timings
which now are essentially the same as fast-isqrt-mid1 (which _is_ "one of
those clever bit-twiddling tricks" !). 

bruce
miller@cam.nist.gov

miles@cogsci.ed.ac.uk (Miles Bader) (06/07/91)

Hardware: sun4/330 (sparc) running sunos 4.0.3

All compiled with speed=3, safety=0, space=0

CMUCL = CMU Common Lisp (10-May-1991)
AKCL  = Austin KCL Version 1.505

* (isqrt-time-comparison (* 2 (expt 10 2000)) 1 1)
	   fast-isqrt-mid1   fast-isqrt-rec    isqrt
CMUCL	     0.610 sec         0.080 sec       234.340 sec
AKCL	     2.517 sec         0.550 sec         2.483 sec

* (isqrt-time-comparison (* 2 (expt 10 6000)) 1 1)
	   fast-isqrt-mid1   fast-isqrt-rec    isqrt
CMUCL	     5.710 sec         1.170 sec         very long (> 1 hour)
AKCL	    24.050 sec         4.817 sec        24.950 sec

* (isqrt-time-comparison (* 2 (expt 10 20000)) 1 1)
	   fast-isqrt-mid1   fast-isqrt-rec    isqrt
CMUCL	     37.720 sec        5.470 sec         very long
AKCL	    276.617 sec       68.950 sec       272.317 sec

-Miles
--
--
Miles Bader  --  HCRC, University of Edinburgh  --  Miles.Bader@ed.ac.uk

boyland@aspen.Berkeley.EDU (John B. Boyland) (06/07/91)

In article <2885210567@ARTEMIS.cam.nist.gov>, miller@FS1.cam.nist.gov
(Bruce R. Miller) writes:
|> 
|> In article <19910606055232.5.GADBOIS@KISIN.ACA.MCC.COM>, David
Gadbois writes: 
|> >     Date: 5 Jun 91 13:32:11 GMT
|> >     From: d34676@tansei.cc.u-tokyo.ac.jp (Akira Kurihara)
|> > 
|> >     I am interested in the speed of a Common Lisp function isqrt
|> >     for extremely big bignums.
|> > 
|> > I added the results of running the test on a Symbolics machine.  The
|> > results for the builtin ISQRT are surprising. [...]
|> 
|> [...]
|> Apparently the symbolics knows that isqrt has no `real' side-effects and
|> optimizes it out [...]

I'm relieved to hear this: I spent a few hours finding a method that runs
about twice as fast as fast-isqrt-rec and was disappointed that it was
rendered worse than obsolete.

I have a [lovely?, no!] proof, which unfortunately cannot fit in
this margin :-), for this algorithm.  Essentially it works by noting that
this every recursive call of fast-isqrt-rec does two or three iterations
and that the last iteration is never useful (it only verifies the answer).
The three iteration case needs to be checked for but this only requires
a multiply of numbers only half as long.

For allegro on a DS3100, I get the comparison:
(I don't have the figures for the other versions, I sent them to Akira already)

		fast-isqrt-rec	faster-isqrt-rec
1000		    .884              .517
3000		   7.30              4.02
10000            107.               43.8

Here's the function.  throw stones at it and make it break.
(Who knows, maybe it has subtle bugs?)
(It uses a lot of code from the old fast-isqrt-rec)

---------------------------------------------------------------------

(defun faster-isqrt-rec (n &aux n-len-quarter n-half n-half-isqrt
			     init-value q r iterated-value)
  "argument n must be a non-negative integer"
  (cond
   ((> n 24)		; theoretically (> n 7) ,i.e., n-len-quarter > 0
    (setq n-len-quarter (ash (integer-length n) -2))
    (setq n-half (ash n (- (ash n-len-quarter 1))))
    (setq n-half-isqrt (faster-isqrt-rec n-half))
    (setq init-value (ash (1+ n-half-isqrt) n-len-quarter))
    (multiple-value-setq (q r) (floor n init-value))
    (setq iterated-value (ash (+ init-value q) -1))
    (if (eq (logbitp 0 q) (logbitp 0 init-value)) ; same sign
	;; average is exact and we need to test the result
	(let ((m (- iterated-value init-value)))
	  (if (> (* m m) r)
	      (- iterated-value 1)
	      iterated-value))
	;; average was not exact, we take value
	iterated-value))
   ((> n 15) 4)
   ((> n  8) 3)
   ((> n  3) 2)
   ((> n  0) 1)
   ((> n -1) 0)
   (t nil)))
-------------------------------------------------------------------

John Boyland
boyland@sequoia.berkeley.edu

AI.gadbois@MCC.COM (David Gadbois) (06/09/91)

    Date: 6 Jun 91 15:22:47 GMT
    From: miller@FS1.cam.nist.gov (Bruce R. Miller)

    In article <19910606055232.5.GADBOIS@KISIN.ACA.MCC.COM>, I write: 
    >     Date: 5 Jun 91 13:32:11 GMT
    >     From: d34676@tansei.cc.u-tokyo.ac.jp (Akira Kurihara)
    > 
    >     I am interested in the speed of a Common Lisp function isqrt
    >     for extremely big bignums.
    > 
    > I added the results of running the test on a Symbolics machine.
    > The results for the builtin ISQRT are surprising.  While I
    > normally run screaming from numeric code, I had a look at the
    > source, and it uses one of those clever bit-twiddling tricks that
    > appears to calculate the result in O( (INTEGER-LENGTH n) ) time.
    > One of the "numerical recipes" books probably has an explanation
    > of it.

    Actually, one of the compiler books probably has an explanation of
    it!

    I did the same computations (but using Genera 8.0.1, in case it
    matters) and got essentially the same results as you did.  I mailed
    the results directly to Akira -- with one change:

    Apparently the symbolics knows that isqrt has no `real' side-effects
    and optimizes it out -- the disassembled code reveals no call to
    isqrt -- and I doubt that isqrt is stashed in microcode!  To trick
    the compiler, I changed each timing body to

      (setq result n)
      (setq result (fast-isqrt-mid1 n))
      ...
      (setq result (isqrt n))

    and redid the tests.  Everything was the same except the isqrt
    timings which now are essentially the same as fast-isqrt-mid1 (which
    _is_ "one of those clever bit-twiddling tricks" !).

Gack, how embarrassing.  I should have know better.  By way of penance,
here are the updated collected figures:

    MACL.........Macintosh Allegro Common Lisp 1.3.2 on Mac SE
		 accelerated with 25 MHz 68020
    lucid........Lucid Sun Common Lisp 4 on SPARCstation IPC
    akcl.........Austin Kyoto Common Lisp 1.530 on SPARCstation IPC
    XL1200.......Symbolics XL1200 running Genera 8.0.2
    CMUCL........CMU Common Lisp (10-May-1991) sun4/330 running sunos 4.0.3  
                 ^^^^^^^^^^^^^^^
                 Is this Python?  Running under SUNOS!?  Where can I get
                 a copy? 
    AKCL.........Austin KCL Version 1.505 sun4/330 running sunos 4.0.3 
    DS3100A......Allegro (version?) on DECStation 3100
    DS3100L......Lucid Common Lisp 4.0 on DEC DS-3100
    DS5000.......Lucid Common Lisp 4.0 on DEC DS-5000


*** For 1000 digits ***

        fast-isqrt-mid1   fast-isqrt-rec      isqrt     faster-isqrt-rec
MACL          1.250            0.317         174     
lucid         2.72             0.63           11.91  
akcl          2.400            0.567           2.483 
XL1200        0.757            0.178           0.729            0.111
CMUCL         0.610            0.080         234.340 
AKCL          2.517            0.550           2.483 
DS3100A                        0.884                            0.517
DS3100L	      2.34              .55           11.54  
DS5000       1.34              .32            6.57  

*** For 3000 digits ***

        fast-isqrt-mid1   fast-isqrt-rec      isqrt     faster-isqrt-rec
MACL         12.233            2.450        4503     
lucid        27.68             5.45          102.12  
akcl         24.567            5.100          24.567 
XL1200        7.257            1.484           6.612            0.853
CMUCL         5.710            1.170        (> 1 hour) 
AKCL         24.050            4.817          24.950 
DS3100A                        7.30                             4.02
DS3100L	     23.87             4.71           95.85  
DS5000       13.69             2.70           54.10  

*** For 10000 digits ***

        fast-isqrt-mid1   fast-isqrt-rec      isqrt     faster-isqrt-rec
MACL        142               35            very long
lucid       327.75            80.31         1126.10  
akcl        273.917           64.367         275.783 
XL1200       84.780           20.910          83.085             9.205
CMUCL        37.720            5.470        very long
AKCL        276.617           68.950         272.317 
DS3100A                      107.                               43.8
DS3100L	    283.89            69.54         1052.86      
DS5000      161.16            39.61          605.57  

--David Gadbois

d34676@tansei.cc.u-tokyo.ac.jp (Akira Kurihara) (06/09/91)

I received some results on (time ...) of isqrt and similar functions
for very big bignums.
I requested to calculate essentially (isqrt (* 2 (expt 10 (* 2 d))))
for d = 1000, 3000, 10000.
The following is what I received and compiled. I hope I made no mistake
while making this table.
Be sure that this table tells you ONLY about:
(A) whether the system is fast for bignum operation, especially division.
(B) whether the algorithm of isqrt for bignum is fast.

*** Lisps and Machines ***

(1) CMU Common Lisp (10-May-1991) on sun4/330
(2) Genera 8.0.1 on Symbolics XL1200
(3) Genera 8.0.2 on Symbolics XL1200
(4) Allegro Common Lisp on SPARCstation2
(5) Macintosh Allegro Common Lisp 1.3.2 on Mac SE accelerated with 25 MHz 68020
(6) Lucid Common Lisp 4.0 on DEC 5000
(7) Lucid Sun Common Lisp 4.0 on SPARCstation2
(8) Macintosh Allegro Common Lisp 1.3.1 on Mac IIcx
(9) Macintosh Common Lisp 2.0b1p2 on Macintosh IIx
(a) Austin Kyoto Common Lisp 1.505 on sun4/330
(b) Austin Kyoto Common Lisp 1.530 on SPARCstation IPC
(c) Lucid Common Lisp 4.0 on DEC 3100
(d) Lucid Sun Common Lisp 4 on SPARCstation IPC
(e) Allegro Common Lisp on DEC 3100

*** For 1,000 digits *** (time is in second)

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
(1)           0.610               0.080            234.340    
(2)           0.728583            0.179719           0.731269    
(3)           0.757               0.178              0.729
(4)           0.984               0.217              1.917
(5)           1.250               0.317            174        
(6)           1.34                0.32               6.57     
(7)           1.47                0.35               7.34
(8)           1.983               0.467            272.867
(9)           2.150               0.567              0.633
(a)           2.517               0.550              2.483    
(b)           2.400               0.567              2.483    
(c)           2.34                0.55              11.54     
(d)           2.72                0.63              11.91     
(e)           3.534               0.800              5.216

*** For 3,000 digits *** (time is in second)

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
(1)           5.710               1.170           very long (> 1 hour)
(2)           7.114492            1.422794           6.636563
(3)           7.257               1.484              6.612
(4)          10.067               1.966             19.634
(5)          12.233               2.450           4503        
(6)          13.69                2.70              54.10     
(7)          15.06                2.97              64.05
(8)          19.583               3.917           7164.333
(9)          20.117               4.083              4.150
(a)          24.050               4.817             24.950    
(b)          24.567               5.100             24.567    
(c)          23.87                4.71              95.85     
(d)          27.68                5.45             102.12     
(e)          36.717               7.200             54.283

*** For 10,000 digits *** (time is in second)

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
(1)          37.720               5.470           very long
(2)          83.216370           20.502718          83.681786
(3)          84.780              20.910             83.085
(4)         117.650              28.783            233.984
(5)         142                  35               very long
(6)         161.16               39.61             605.57     
(7)         179.34               44.87             719.45
(8)         229.183              56.467           very long
(9)         229.383              56.500             56.533
(a)         276.617              68.950            272.317
(b)         273.917              64.367            275.783    
(c)         283.89               69.54            1052.86         
(d)         327.75               80.31            1126.10     
(e)         435.350             106.617            645.550

Potential Problem:

(i) Under some environment with a wise compiler, the function
isqrt-time-comparison, which I posted earlier, is over-optimized
for our purpose. If so, (time ...) will return almost 0 second.

(ii) alms@cambridge.apple.com suggested me the following

>Also, it is best to pass a simple function call to the TIME macro.
>If you pass a complex form to TIME, then the timing will include
>the time used to process the complex form.  This processing may
>involve invoking the compiler.

So, probably it is safe to do something like:


(let* ((how-many-dig 1000)   ; this is for d = 1000
      (n (* 2 (expt 10 (* 2 how-many-dig)))))
  (terpri)
  (princ "sqrt of 2 -> ")
  (princ how-many-dig)
  (princ " digits")
  (terpri)
  (time (fast-isqrt-mid1 n))
  (time (fast-isqrt-rec n))
  (time (isqrt n))
  nil)


Also, I received a function isqrt-newton (given below) from
boyland@aspen.Berkeley.EDU, which is twice as fast as fast-isqrt-rec.
I followed his proof that this function returns correct answer.
I believe that, as far as one employs Newton's method, no one can
make functions which is twice as fast as isqrt-newton.


(defun isqrt-newton (n &aux n-len-quarter n-half n-half-isqrt
                       init-value q r m iterated-value)
  "argument must be a non-negative integer"
  (cond
   ((> n 24)            ; theoretically (> n 15) ,i.e., n-len-quarter > 0
    (setq n-len-quarter (ash (- (integer-length n) 1) -2))
    (setq n-half (ash n (- (ash n-len-quarter 1))))
    (setq n-half-isqrt (isqrt-newton n-half))
    (setq init-value (ash n-half-isqrt n-len-quarter))
    (multiple-value-setq (q r) (floor n init-value))
    (setq iterated-value (ash (+ init-value q) -1))
    (cond ((oddp q)
           iterated-value)
          (t
           (setq m (- iterated-value init-value))
           (if (> (* m m) r)
             (1- iterated-value)
             iterated-value))))
   ((> n 15) 4)
   ((> n  8) 3)
   ((> n  3) 2)
   ((> n  0) 1)
   ((> n -1) 0)
   (t nil)))

--

Akira Kurihara

d34676@tansei.cc.u-tokyo.ac.jp

School of Mathematics
Japan Women's University
Mejirodai 2-8-1, Bunkyo-ku
Tokyo 112
Japan
03-3943-3131 ext-7243

ram+@cs.cmu.edu (Rob MacLachlan) (06/13/91)

I have done an evaluation this "isqrt" benchmark on CMU CL systems here at CMU.
I discovered that the results being computed were wrong due to bugs in
division.  After these problems had been solved, the times were significantly
longer than those reported Miles, but still much faster than those of
commercial implementations.  (generally 2-5x)

I'd like the remark about the apparent inconsistency of high CMU bignum
primitive performance v.s. poor performance on the built-in ISQRT.
Really there is no inconsistency.  The bignum implementation has consistently
*not* been tuned.  When we first got it running, it was already much faster
than the commercial implementations.

As to why CMU is faster, it's hard to say for sure.  It isn't algorithmic
cleverness; we implemented Knuth fairly directly.  My guess is that the biggest
factor is compiler technology: the CMU bignum code is written in Lisp using a
few extra primitives like 32 x 32 -> 64.  Unlike the memory-to-memory
operations described in the Lucid bignum paper, these are register-to-register
operations.  Python efficiently open-codes 32 bit integer operations (even
using the standard CL arithmetic/logic functions.)

In the case of division (which these benchmarks are intensive with), it is also
possible that we have implemented the 64 / 32 divide more efficiently.  This
must be done as an assembly routine, since neither MIPS nor SPARC have an
extended divide instruction.

So what is the significance of this benchmark (and of bignum performance in
general)?  Well, that obviously depends on how much you use bignums.  These
sorts of large speedups for CMU are not seen for more conventional
"symbol-crunching" Lisp programs (20% is more common...)  But if you are
working on mathematical or scientific software, you will like CMU CL's dramatic
superiority in bignum and floating-point arithmetic.

The fact that CMU bignums are written in Lisp and are public domain also makes
it easy for people doing serious work with extended-precision arithmetic to
efficiently add any new primitives they need.

As you may have inferred from Miles's message, CMU CL is up on SUNOS Sparcs.
However, we are not ready to release it yet.  We will post to comp.lang.lisp.


Raw data:

The Lucid numbers were copied from the net.  The SparcStation 1+ numbers were
done at CMU using allegro 4.0.1.  Both CMU and Allegro numbers are elapsed
(real) time, not any sort of "CPU" time.

1000 digits

system\test	mid1	rec
ds5000 CMU	0.38	0.09
   "   Lucid	1.34	0.32
ds3100 CMU	0.56	0.14
   "   Lucid	2.34	0.55	
ss1+  CMU	0.77	0.27
   "  Allegro	1.65	0.38

3000 digits

system\test	mid1	rec
ds5000 CMU	3.39	0.69
   "   Lucid   13.7	2.70
ds3100 CMU	4.88	1.02
   "   Lucid   23.9	4.71
ss1+ CMU	5.37	1.15
   " Allegro   16.6	3.29

10000 digits

system\test	mid1	rec
ds5000 CMU	38.7	10.5
   "   Lucid   161	39.6
ds3100 CMU	56.9	16.6
   "   Lucid   284	69.5
ss1+  CMU	64.5	17.6
   "  Allegro  198	48.2

  Robert A. MacLachlan (ram@cs.cmu.edu)