[comp.lang.lisp] Floating point in CL?

tim@cstr.ed.ac.uk (Tim Bradshaw) (09/27/90)

For the first time in my life, I need to write some floating-point
code.  Since I hate C, I want to do it in Lisp.  However my initial
experiments indicate that our Lisp is *immeasurably* slower than C.

CLtL states at several places that a good Common Lisp compiler
can generate extremely good floating-point code.  Since we have at
least a reasonable compiler (Franz Allegro, Sun4), I had some hope
that it would generate at least reasonable floating-point code.

I wrote a trivial function (loop from 1 to n, incrementing a float);
and sure enough, Allegro comes quite close to C.  Good.

Alas, when I write more complex code, it becomes appallingly slow.

Here is what I used to test this.  The Lisp is written deliberately to
look as close as I could make it like the C that follows, so it's a
bit nasty:

(defmacro to-unity (var maxvar)
  ;; Convert var to be from -1 to 1
  ;; ?? Typing.  Is coerce the right thing?  Don't want a rational here.
  `(- (/ (coerce ,var 'float)
	 (coerce ,maxvar 'float))
      0.5))

(defun make-mandel-10 ()
  ;; Brute-force Mandelbrot thing.  No complex numbers!
  ;; This version has hard-wired assumptions, and looks like the C.
  (declare (optimize (speed 3)
		     (safety 0)))
 (let ((image (make-array '(10 10)
                     :element-type
                     'bit :initial-element 0))
	      ;; image components set to 0 if in, 1 if not: different
	      ;; than make-mandel!
       (x 0.0)
       (y 0.0))
      (declare (type float x y))
      (dotimes (xindex 10)
	(dotimes (yindex 10)
	  (setf x (the float (to-unity xindex 10))
		y (the float (to-unity yindex 10)))
	  (dotimes (i 100)
	    ;; z = x + iy => z * z = x*x +2ixy - y*y =>
	    ;; Re(z^2) =X^2 -y^2, Im(z^2) = 2xy
	    (setq x (the float (+ x (* x x) (- (* y y))))
		  y (the float (+ y (* 2.0 x y))))
	    (if (or (>= (abs x) 2.0)
		    (>= (abs y) 2.0))
		(progn 
		  (setf (aref image xindex yindex) 1)
		  (return nil))))))
      t))


And the C:

main ()
{
    int image[10][10];
    float x, y;
    int xindex, yindex, i;

    for (xindex = 0; xindex < 10; xindex++)
	for (yindex = 0; yindex < 10; yindex++)
	    image[xindex][yindex] = 0;

    for (xindex = 0; xindex < 10; xindex++) {
	for (yindex = 0; yindex < 10; yindex++) {
	    x = ((float) xindex)/10.0 - 0.5;
	    y = ((float) yindex)/10.0 - 0.5;
	    
	    for (i = 0; i < 100; i++) {
		x = x + x * x - y * y;
		y = y + 2.0 * x * y;

		if (x >= 2.0 || x <= -2.0 || y >= 2.0 || y <= -2.0) {
		    image[xindex][yindex] = 1;
		    break;
		}
	    }
	}
    }
}
		    
On A Sun4/65, with Sun cc, no optimization, and Allegro CL "3.1.13.1
[Sun4] (0/0/0)"  I get:

    kahlo* time ./mandel 
            0.1 real	0.0 user	0.0 sys

and

    <cl> (time (make-mandel-10))
    cpu time (non-gc) 1683 msec user, 0 msec system
    cpu time (gc)     250 msec user, 0 msec system
    cpu time (total)  1933 msec user, 0 msec system
    real time  2610 msec

Well I should use a bigger test so the C timings show up, but this
doesn't look good.

So the question: are there general guidelines on writing
floating-point code which will perform well in Lisp, given a good
compiler?  I'm pretty naive about numerical programming, but I tried
the following, as can be seen in the above code.

	Declaring types
	Compiling with SPEED=3, SAFETY=0
	Making sure that results of intermediate parts of the
	  calculation are going to be FLOATs -- i.e. ensuring that /
	  doesn't get a chance to produce RATIONALs.
	Using THE to allow the compiler to make assumptions about
	  types that it might not otherwise.

I assume it's crucially important that floats should be represented
directly rather than as pointers?  How do you drop unsubtle enough
hints to the compiler that it listens to this?  Am I missing something
else?  Is Allegro just no good at floating point?  If so are there
good floating-pouint Lisp systems for Sun4s?

Tim Bradshaw.  Internet: tim%ed.cstr@nsfnet-relay.ac.uk
UUCP: ...!uunet!mcvax!ukc!cstr!tim  JANET: tim@uk.ac.ed.cstr
"Quis custodiet ipsos custodes?"

jwz@lucid.com (Jamie Zawinski) (09/29/90)

In article <TIM.90Sep27140541@watt.cstr.ed.ac.uk> Tim Bradshaw writes:
>
> I wrote a trivial function (loop from 1 to n, incrementing a float);
> and sure enough, Allegro comes quite close to C.  Good.
> Alas, when I write more complex code, it becomes appallingly slow.
[...]
> (defmacro to-unity (var maxvar)
>   ;; Convert var to be from -1 to 1
>   ;; ?? Typing.  Is coerce the right thing?  Don't want a rational here.
>   `(- (/ (coerce ,var 'float)
> 	   (coerce ,maxvar 'float))
>       0.5))

COERCE is almost never what you want, certainly not in tight code; what you
should say is something like

(defmacro to-unity (var maxvar)
  "Convert var to be from -1 to 1"
  ;; ?? Typing.  Is coerce the right thing?  Don't want a rational here.
  `(- (/ (float ,var)
	 (float ,maxvar))
      0.5))

The compiler will probably be smart enough to realize that the result of FLOAT
is always a float, and will emit code for "/" that does a fp-divide, instead
of code that does a type-dispatch on the arguments to decide what kind of
division should take place.  (I suppose it's possible for the compiler to make
similar assumptions about the case where the second argument to COERCE is a
constant, but that would be effectively the same as the compiler rewriting 
(COERCE FOO 'FLOAT) to be (FLOAT FOO).)

> I assume it's crucially important that floats should be represented
> directly rather than as pointers?  How do you drop unsubtle enough
> hints to the compiler that it listens to this?  

Things of type short-float will be immediate objects; the other float types
will be represented as multi-word objects to which you have a pointer.  Math
on short-floats will probably be faster, but with a loss of precision.  

(defmacro to-unity (var maxvar)
  `(- (/ (float ,var 1.0s0)
	 (float ,maxvar 1.0s0))
      0.5s0))

The compiler should realize that the result of FLOAT will always be a
short-float, and will emit faster code for "/" and "-".  But you may need
to add calls to THE if your compiler isn't clever enough; your mileage may
vary.

> So the question: are there general guidelines on writing
> floating-point code which will perform well in Lisp, [...]
>	Declaring types
>	Compiling with SPEED=3, SAFETY=0
>	Making sure that results of intermediate parts of the
>	  calculation are going to be FLOATs -- i.e. ensuring that /
>	  doesn't get a chance to produce RATIONALs.
>	Using THE to allow the compiler to make assumptions about
>	  types that it might not otherwise.

These are the big ones.  You probably shouldn't have to use THE very much,
unless your compiler is failing to notice something.  Another thing to
consider is avoiding the overhead of a function call.  When writing code that
will turn into relatively few machine instructions (like numerical code often
does) the overhead of building a new stack frame, saving registers, etc can be
excessive.  You can do this with macros, but I think it's cleaner to use 
functions declared or proclaimed INLINE when possible.  This can make
debugging easier, too - just turn off the inline declarations and you've got
more info on the stack.

> (defun make-mandel-10 ()
[...]

Here is the inner loop of a Mandelbrot generator I implemented a while back 
for TI Explorer Lisp Machines.  On a Lispm, type-checking isn't as important,
but it does help.  This function takes a coordinate in mandelbrot space, and
some parameters, and returns the number of iterations before stability.
It does the fp math in the precision of its arguments - if you pass in short
floats, only short float arithmetic will be used; likewise for long-floats,
etc.  I structured it this way because the floating-point precision to be used
is a runtime option of the program.

But since this function it is proclaimed inline, if the compiler realizes that
the arguments that its caller is passing in are all short-floats, it will emit
short-float-only code for the divisions, subtractions, etc.

(proclaim '(inline mandelbrot-one-point))
(defun mandelbrot-one-point (x y w h range depth real-origin imaginary-origin
			     drop-out-distance &optional seed-x seed-y)
  "Computes the Mandelbrot mapping of a pixel XY.
  Returns a fixnum, the number of iterations to stability, or NIL if 
  DROP-OUT-DISTANCE is reached.
     X,Y:	The pixel we are computing (fixnums).
     W, H:	The pixel size of the target area (fixnums).
     RANGE:	The size of the source area in m-space (float).
     DEPTH:	Maximum number of iterations (fixnum).
     REAL, IMAG: The origin of the source area in m-space (floats).
     DROP-OUT:	float.
     SEED-X,Y:	The image seed; if these are NIL, then the seed is the initial
 		point (correct for Mandelbrot; for Julia, these are constant).

  All floating-point computations take place in the precision of the numbers
  passed in.  So, if you want to use only short-float math, then all floating
  point values supplied should be short-floats.

  X,Y are really the complex number (REAL+X) + (IMAG+Y)i."

;  (declare (values . iterations-or-nil))
  (declare (optimize (speed 3) (safety 0))
	   (fixnum x y w h depth drop-out-distance)
	   (float range real-origin imaginary-origin))
  ;;
  ;; z0 = x + yi
  ;; z1 = z0^2 + mu
  ;; For Julia, mu is a constant; for Mandelbrot, mu is z0.
  ;;
  (let* ((s (max w h))
	 ;; real-z:      real-origin + xr/s with same precision as r.
	 ;; imaginary-z: imag-origin + yr/s with same precision as r.
	 (real-z       (+ real-origin      (float (/ (* x range) s) range)))
	 (imaginary-z  (- imaginary-origin (float (/ (* y range) s) range)))
	 (real-mu      (or seed-x real-z))
	 (imaginary-mu (or seed-y imaginary-z))
	 (value nil))
    (declare (float real-mu imaginary-mu real-z imaginary-z)
	     (fixnum s)
	     (type (or null fixnum) value))

    (dotimes (iteration depth)
      (declare (fixnum iteration))

      (if (>= (+ (* real-z real-z) (* imaginary-z imaginary-z))
	      drop-out-distance)
	  (return)
	  (setq value iteration))

      (let ((new-real-z (- (- (* real-z real-z) (* imaginary-z imaginary-z))
			   real-mu)))     ; (real-z^2 - imag-z^2) - real-mu
	(declare (float new-real-z))
	;; imaginary-z = (2 * real-z * imag-z) - imag-mu
	(setq imaginary-z (- (* (* real-z 2) imaginary-z) imaginary-mu))
	(setq real-z new-real-z)
	))
    value))

If you want to look at the rest of the code, you can anonymous FTP it from
spice.cs.cmu.edu (128.32.150.27) in /usr/jwz/public/mandelbrot.lisp.  (There's
lots of other lisp code in that directory too - see _readme.text.)

> "Quis custodiet ipsos custodes?"

Hurm.
	-- Jamie <jwz@lucid.com>

ram@wb1.cs.cmu.edu (Rob MacLachlan) (09/30/90)

Well, one point is that Common Lisp FLOAT is not the same as C float.  If an 
implementation supports more than one floating point format (as I believe 
Allegro does), then a FLOAT declaration is almost useless.  You need to
use SINGLE-FLOAT or DOUBLE-FLOAT, depending on what you have in mind.
Also, Jamie's point about SHORT-FLOAT's being immediate is not true in
all (or even most) Lisp's on conventional hardware.

You should also make sure your non-float arithmetic is getting open-coded,
i.e. all your DOTIMES variables should be declared FIXNUM.  And any arrays
should be declared to be SIMPLE-ARRAY or whatever.

  Rob

jwz@lucid.com (Jamie Zawinski) (09/30/90)

Rob writes:
> You should also make sure your non-float arithmetic is getting open-coded,
> i.e. all your DOTIMES variables should be declared FIXNUM.  

Since the result of changing the value of a dotimes counter is undefined,
isn't it reasonable for the expansion of dotimes to declare it fixnum if
it can tell that it will never reach the bignum range?

		-- Jamie

tgd@tesla.CS.ORST.EDU (Tom Dietterich) (09/30/90)

To write good floating point code, I have found it essential to use
the facilities of the lucid and franz compilers to trace the
compilation process.  In lucid, you can do the following in your
source file:

(eval-when (compile) (compiler-options :show-optimizations t))

(defun ....)

(eval-when (compile) (compiler-options :show-optimizations nil))

When you invoke the production compiler, you will get a very nice
listing showing the type inference of the compiler and any expressions
that it could not fail to optimize.

In franz, the thing to do is include

(defun ....
  (declare (:explain :calls :types))
)

which gives a similar (but not quite as pretty) trace.

--Tom

P.S. In lucid, all floats should be declared single-float, and I find
it essential to call (the ..) quite often.  Short floats are handled
incorrectly (at least in lucid3.0 on sparc architectures).

P.P.S.  Many folks simply define their own versions of +, -, etc that
include calls to the at each point:

For example: (courtesy of Scott Fahlman)

(defmacro *sf (&rest args)
  `(the single-float
	(* ,@(mapcar #'(lambda (x) (list 'the 'single-float x)) args))))


-- 
Thomas G. Dietterich
Department of Computer Science
Dearborn Hall, 303
Oregon State University

ram@wb1.cs.cmu.edu (Rob MacLachlan) (10/01/90)

>From: jwz@lucid.com (Jamie Zawinski)
>Rob writes:
>> You should also make sure your non-float arithmetic is getting open-coded,
>> i.e. all your DOTIMES variables should be declared FIXNUM.  
>
>Since the result of changing the value of a dotimes counter is undefined,
>isn't it reasonable for the expansion of dotimes to declare it fixnum if
>it can tell that it will never reach the bignum range?
>

Yes, if it can tell.  It should really be adequate for the repeat count to
be known to be a fixnum, but in practice not all compilers can hack that.
And unless the repeat count is obviously a fixnum (such as a constant), it
may in any case be necessary to add some sort of declaration.

Although in practice I am sure it is rare (unheard of) for a dotimes count
to not be a fixnum, with today's machine speeds, it is not a totally
nonsensical idea.

  Rob (ram@cs.cmu.edu)

jim@Franz.COM (Jim Veitch) (10/02/90)

I thought I'd reply to your posting to comp.lang.lisp:

> For the first time in my life, I need to write some floating-point
> code.  Since I hate C, I want to do it in Lisp.  However my initial
> experiments indicate that our Lisp is *immeasurably* slower than C.

> CLtL states at several places that a good Common Lisp compiler
> can generate extremely good floating-point code.  Since we have at
> least a reasonable compiler (Franz Allegro, Sun4), I had some hope
> that it would generate at least reasonable floating-point code.

> I wrote a trivial function (loop from 1 to n, incrementing a float);
> and sure enough, Allegro comes quite close to C.  Good.

> Alas, when I write more complex code, it becomes appallingly slow.

1. Allegro CL supports 2 types of floats (like C) -- double-floats and
single-floats.  Unlike C, where 'float' has a precise meaning, Common
Lisp 'float' can be a composite type.  In your code, simply declared
'float', the compiler doesn't know which one you mean, and generates
generic arithmetic instead.  If you declare all one type and if the
compiler trusts the declarations (in Allegro this is only true if
speed > safety), everything is in-lined.

It is true that there is no standard for which float types are
supported and which compiler optimizations are done.  I guess each
compiler uses differing sets and the only real ways to see which are to
read the documentation, use the metering tools (in the add-on Composer
product with Franz), and ask the vendor directly (which I would
encourage any Franz user to do!).

2. Where did all the time go in your example?

(a) mostly in generic floating point arithmetic (+, -, *)

(b) much of the rest in calling 'abs' and 'coerce' (which I removed in the
    example below).  'abs' is an unnecessary function call which entails
    boxing up the float and  handing a pointer to 'abs'.  You might argue
    the compiler should be smart enough to recognize a special 'abs' for
    declared floating point arguments, but Allegro does not do that at
    the current time.

(c) call to 'coerce' (which is a an extremely complex Common Lisp function)
    is dangerous unless the compiler is very clever.  Ours isn't clever
    enough to do the "right" thing.  In general, it's wisest to use the
    simpler thing, which the compiler is more likely to know about.  So
    the call to 'coerce' is replaced by a call to 'float', which the
    Allegro CL compiler does know about.

3. I rewrote your example to look exactly like the C code and ran it with
the following time (it is one clock-tick or less on a Sun4/110): 16 msecs.

(defun make-mandel-10 ()
  ;; Brute-force Mandelbrot thing.  No complex numbers!
  ;; This version has hard-wired assumptions, and looks like the C.
  (let ((image (make-array '(10 10)
			   :element-type
			   'bit :initial-element 0))
	;; image components set to 0 if in, 1 if not: different
	;; than make-mandel!
	(x 0.0)
	(y 0.0))
    (declare (type single-float x y)  		; originally declared 'float'
	     (optimize (speed 3) (safety 1)))
    (dotimes (xindex 10)
      (dotimes (yindex 10)
	(setf x (- (/ (float xindex) 10.0) 0.5)  ; 'float' replaces 'coerce'
	      y (- (/ (float yindex) 10.0) 0.5))
	(dotimes (i 100)
	  ;; z = x + iy => z * z = x*x +2ixy - y*y =>
	  ;; Re(z^2) =X^2 -y^2, Im(z^2) = 2xy
	  (setq x (+ x (* x x) (- (* y y)))
		y (+ y (* 2.0 x y)))
	  (if (or (>= x 2.0)	; 'abs' test expanded into 4 cases
		  (<= x -2.0)   ; just like the C code
		  (>= y 2.0)
		  (<= y 2.0))
	      (progn
		(setf (aref image xindex yindex) 1)
		(return nil))))))
    t))