tim@cstr.ed.ac.uk (Tim Bradshaw) (09/27/90)
For the first time in my life, I need to write some floating-point
code.  Since I hate C, I want to do it in Lisp.  However my initial
experiments indicate that our Lisp is *immeasurably* slower than C.
CLtL states at several places that a good Common Lisp compiler
can generate extremely good floating-point code.  Since we have at
least a reasonable compiler (Franz Allegro, Sun4), I had some hope
that it would generate at least reasonable floating-point code.
I wrote a trivial function (loop from 1 to n, incrementing a float);
and sure enough, Allegro comes quite close to C.  Good.
Alas, when I write more complex code, it becomes appallingly slow.
Here is what I used to test this.  The Lisp is written deliberately to
look as close as I could make it like the C that follows, so it's a
bit nasty:
(defmacro to-unity (var maxvar)
  ;; Convert var to be from -1 to 1
  ;; ?? Typing.  Is coerce the right thing?  Don't want a rational here.
  `(- (/ (coerce ,var 'float)
	 (coerce ,maxvar 'float))
      0.5))
(defun make-mandel-10 ()
  ;; Brute-force Mandelbrot thing.  No complex numbers!
  ;; This version has hard-wired assumptions, and looks like the C.
  (declare (optimize (speed 3)
		     (safety 0)))
 (let ((image (make-array '(10 10)
                     :element-type
                     'bit :initial-element 0))
	      ;; image components set to 0 if in, 1 if not: different
	      ;; than make-mandel!
       (x 0.0)
       (y 0.0))
      (declare (type float x y))
      (dotimes (xindex 10)
	(dotimes (yindex 10)
	  (setf x (the float (to-unity xindex 10))
		y (the float (to-unity yindex 10)))
	  (dotimes (i 100)
	    ;; z = x + iy => z * z = x*x +2ixy - y*y =>
	    ;; Re(z^2) =X^2 -y^2, Im(z^2) = 2xy
	    (setq x (the float (+ x (* x x) (- (* y y))))
		  y (the float (+ y (* 2.0 x y))))
	    (if (or (>= (abs x) 2.0)
		    (>= (abs y) 2.0))
		(progn 
		  (setf (aref image xindex yindex) 1)
		  (return nil))))))
      t))
And the C:
main ()
{
    int image[10][10];
    float x, y;
    int xindex, yindex, i;
    for (xindex = 0; xindex < 10; xindex++)
	for (yindex = 0; yindex < 10; yindex++)
	    image[xindex][yindex] = 0;
    for (xindex = 0; xindex < 10; xindex++) {
	for (yindex = 0; yindex < 10; yindex++) {
	    x = ((float) xindex)/10.0 - 0.5;
	    y = ((float) yindex)/10.0 - 0.5;
	    
	    for (i = 0; i < 100; i++) {
		x = x + x * x - y * y;
		y = y + 2.0 * x * y;
		if (x >= 2.0 || x <= -2.0 || y >= 2.0 || y <= -2.0) {
		    image[xindex][yindex] = 1;
		    break;
		}
	    }
	}
    }
}
		    
On A Sun4/65, with Sun cc, no optimization, and Allegro CL "3.1.13.1
[Sun4] (0/0/0)"  I get:
    kahlo* time ./mandel 
            0.1 real	0.0 user	0.0 sys
and
    <cl> (time (make-mandel-10))
    cpu time (non-gc) 1683 msec user, 0 msec system
    cpu time (gc)     250 msec user, 0 msec system
    cpu time (total)  1933 msec user, 0 msec system
    real time  2610 msec
Well I should use a bigger test so the C timings show up, but this
doesn't look good.
So the question: are there general guidelines on writing
floating-point code which will perform well in Lisp, given a good
compiler?  I'm pretty naive about numerical programming, but I tried
the following, as can be seen in the above code.
	Declaring types
	Compiling with SPEED=3, SAFETY=0
	Making sure that results of intermediate parts of the
	  calculation are going to be FLOATs -- i.e. ensuring that /
	  doesn't get a chance to produce RATIONALs.
	Using THE to allow the compiler to make assumptions about
	  types that it might not otherwise.
I assume it's crucially important that floats should be represented
directly rather than as pointers?  How do you drop unsubtle enough
hints to the compiler that it listens to this?  Am I missing something
else?  Is Allegro just no good at floating point?  If so are there
good floating-pouint Lisp systems for Sun4s?
Tim Bradshaw.  Internet: tim%ed.cstr@nsfnet-relay.ac.uk
UUCP: ...!uunet!mcvax!ukc!cstr!tim  JANET: tim@uk.ac.ed.cstr
"Quis custodiet ipsos custodes?"jwz@lucid.com (Jamie Zawinski) (09/29/90)
In article <TIM.90Sep27140541@watt.cstr.ed.ac.uk> Tim Bradshaw writes: > > I wrote a trivial function (loop from 1 to n, incrementing a float); > and sure enough, Allegro comes quite close to C. Good. > Alas, when I write more complex code, it becomes appallingly slow. [...] > (defmacro to-unity (var maxvar) > ;; Convert var to be from -1 to 1 > ;; ?? Typing. Is coerce the right thing? Don't want a rational here. > `(- (/ (coerce ,var 'float) > (coerce ,maxvar 'float)) > 0.5)) COERCE is almost never what you want, certainly not in tight code; what you should say is something like (defmacro to-unity (var maxvar) "Convert var to be from -1 to 1" ;; ?? Typing. Is coerce the right thing? Don't want a rational here. `(- (/ (float ,var) (float ,maxvar)) 0.5)) The compiler will probably be smart enough to realize that the result of FLOAT is always a float, and will emit code for "/" that does a fp-divide, instead of code that does a type-dispatch on the arguments to decide what kind of division should take place. (I suppose it's possible for the compiler to make similar assumptions about the case where the second argument to COERCE is a constant, but that would be effectively the same as the compiler rewriting (COERCE FOO 'FLOAT) to be (FLOAT FOO).) > I assume it's crucially important that floats should be represented > directly rather than as pointers? How do you drop unsubtle enough > hints to the compiler that it listens to this? Things of type short-float will be immediate objects; the other float types will be represented as multi-word objects to which you have a pointer. Math on short-floats will probably be faster, but with a loss of precision. (defmacro to-unity (var maxvar) `(- (/ (float ,var 1.0s0) (float ,maxvar 1.0s0)) 0.5s0)) The compiler should realize that the result of FLOAT will always be a short-float, and will emit faster code for "/" and "-". But you may need to add calls to THE if your compiler isn't clever enough; your mileage may vary. > So the question: are there general guidelines on writing > floating-point code which will perform well in Lisp, [...] > Declaring types > Compiling with SPEED=3, SAFETY=0 > Making sure that results of intermediate parts of the > calculation are going to be FLOATs -- i.e. ensuring that / > doesn't get a chance to produce RATIONALs. > Using THE to allow the compiler to make assumptions about > types that it might not otherwise. These are the big ones. You probably shouldn't have to use THE very much, unless your compiler is failing to notice something. Another thing to consider is avoiding the overhead of a function call. When writing code that will turn into relatively few machine instructions (like numerical code often does) the overhead of building a new stack frame, saving registers, etc can be excessive. You can do this with macros, but I think it's cleaner to use functions declared or proclaimed INLINE when possible. This can make debugging easier, too - just turn off the inline declarations and you've got more info on the stack. > (defun make-mandel-10 () [...] Here is the inner loop of a Mandelbrot generator I implemented a while back for TI Explorer Lisp Machines. On a Lispm, type-checking isn't as important, but it does help. This function takes a coordinate in mandelbrot space, and some parameters, and returns the number of iterations before stability. It does the fp math in the precision of its arguments - if you pass in short floats, only short float arithmetic will be used; likewise for long-floats, etc. I structured it this way because the floating-point precision to be used is a runtime option of the program. But since this function it is proclaimed inline, if the compiler realizes that the arguments that its caller is passing in are all short-floats, it will emit short-float-only code for the divisions, subtractions, etc. (proclaim '(inline mandelbrot-one-point)) (defun mandelbrot-one-point (x y w h range depth real-origin imaginary-origin drop-out-distance &optional seed-x seed-y) "Computes the Mandelbrot mapping of a pixel XY. Returns a fixnum, the number of iterations to stability, or NIL if DROP-OUT-DISTANCE is reached. X,Y: The pixel we are computing (fixnums). W, H: The pixel size of the target area (fixnums). RANGE: The size of the source area in m-space (float). DEPTH: Maximum number of iterations (fixnum). REAL, IMAG: The origin of the source area in m-space (floats). DROP-OUT: float. SEED-X,Y: The image seed; if these are NIL, then the seed is the initial point (correct for Mandelbrot; for Julia, these are constant). All floating-point computations take place in the precision of the numbers passed in. So, if you want to use only short-float math, then all floating point values supplied should be short-floats. X,Y are really the complex number (REAL+X) + (IMAG+Y)i." ; (declare (values . iterations-or-nil)) (declare (optimize (speed 3) (safety 0)) (fixnum x y w h depth drop-out-distance) (float range real-origin imaginary-origin)) ;; ;; z0 = x + yi ;; z1 = z0^2 + mu ;; For Julia, mu is a constant; for Mandelbrot, mu is z0. ;; (let* ((s (max w h)) ;; real-z: real-origin + xr/s with same precision as r. ;; imaginary-z: imag-origin + yr/s with same precision as r. (real-z (+ real-origin (float (/ (* x range) s) range))) (imaginary-z (- imaginary-origin (float (/ (* y range) s) range))) (real-mu (or seed-x real-z)) (imaginary-mu (or seed-y imaginary-z)) (value nil)) (declare (float real-mu imaginary-mu real-z imaginary-z) (fixnum s) (type (or null fixnum) value)) (dotimes (iteration depth) (declare (fixnum iteration)) (if (>= (+ (* real-z real-z) (* imaginary-z imaginary-z)) drop-out-distance) (return) (setq value iteration)) (let ((new-real-z (- (- (* real-z real-z) (* imaginary-z imaginary-z)) real-mu))) ; (real-z^2 - imag-z^2) - real-mu (declare (float new-real-z)) ;; imaginary-z = (2 * real-z * imag-z) - imag-mu (setq imaginary-z (- (* (* real-z 2) imaginary-z) imaginary-mu)) (setq real-z new-real-z) )) value)) If you want to look at the rest of the code, you can anonymous FTP it from spice.cs.cmu.edu (128.32.150.27) in /usr/jwz/public/mandelbrot.lisp. (There's lots of other lisp code in that directory too - see _readme.text.) > "Quis custodiet ipsos custodes?" Hurm. -- Jamie <jwz@lucid.com>
ram@wb1.cs.cmu.edu (Rob MacLachlan) (09/30/90)
Well, one point is that Common Lisp FLOAT is not the same as C float. If an implementation supports more than one floating point format (as I believe Allegro does), then a FLOAT declaration is almost useless. You need to use SINGLE-FLOAT or DOUBLE-FLOAT, depending on what you have in mind. Also, Jamie's point about SHORT-FLOAT's being immediate is not true in all (or even most) Lisp's on conventional hardware. You should also make sure your non-float arithmetic is getting open-coded, i.e. all your DOTIMES variables should be declared FIXNUM. And any arrays should be declared to be SIMPLE-ARRAY or whatever. Rob
jwz@lucid.com (Jamie Zawinski) (09/30/90)
Rob writes: > You should also make sure your non-float arithmetic is getting open-coded, > i.e. all your DOTIMES variables should be declared FIXNUM. Since the result of changing the value of a dotimes counter is undefined, isn't it reasonable for the expansion of dotimes to declare it fixnum if it can tell that it will never reach the bignum range? -- Jamie
tgd@tesla.CS.ORST.EDU (Tom Dietterich) (09/30/90)
To write good floating point code, I have found it essential to use the facilities of the lucid and franz compilers to trace the compilation process. In lucid, you can do the following in your source file: (eval-when (compile) (compiler-options :show-optimizations t)) (defun ....) (eval-when (compile) (compiler-options :show-optimizations nil)) When you invoke the production compiler, you will get a very nice listing showing the type inference of the compiler and any expressions that it could not fail to optimize. In franz, the thing to do is include (defun .... (declare (:explain :calls :types)) ) which gives a similar (but not quite as pretty) trace. --Tom P.S. In lucid, all floats should be declared single-float, and I find it essential to call (the ..) quite often. Short floats are handled incorrectly (at least in lucid3.0 on sparc architectures). P.P.S. Many folks simply define their own versions of +, -, etc that include calls to the at each point: For example: (courtesy of Scott Fahlman) (defmacro *sf (&rest args) `(the single-float (* ,@(mapcar #'(lambda (x) (list 'the 'single-float x)) args)))) -- Thomas G. Dietterich Department of Computer Science Dearborn Hall, 303 Oregon State University
ram@wb1.cs.cmu.edu (Rob MacLachlan) (10/01/90)
>From: jwz@lucid.com (Jamie Zawinski) >Rob writes: >> You should also make sure your non-float arithmetic is getting open-coded, >> i.e. all your DOTIMES variables should be declared FIXNUM. > >Since the result of changing the value of a dotimes counter is undefined, >isn't it reasonable for the expansion of dotimes to declare it fixnum if >it can tell that it will never reach the bignum range? > Yes, if it can tell. It should really be adequate for the repeat count to be known to be a fixnum, but in practice not all compilers can hack that. And unless the repeat count is obviously a fixnum (such as a constant), it may in any case be necessary to add some sort of declaration. Although in practice I am sure it is rare (unheard of) for a dotimes count to not be a fixnum, with today's machine speeds, it is not a totally nonsensical idea. Rob (ram@cs.cmu.edu)
jim@Franz.COM (Jim Veitch) (10/02/90)
I thought I'd reply to your posting to comp.lang.lisp: > For the first time in my life, I need to write some floating-point > code. Since I hate C, I want to do it in Lisp. However my initial > experiments indicate that our Lisp is *immeasurably* slower than C. > CLtL states at several places that a good Common Lisp compiler > can generate extremely good floating-point code. Since we have at > least a reasonable compiler (Franz Allegro, Sun4), I had some hope > that it would generate at least reasonable floating-point code. > I wrote a trivial function (loop from 1 to n, incrementing a float); > and sure enough, Allegro comes quite close to C. Good. > Alas, when I write more complex code, it becomes appallingly slow. 1. Allegro CL supports 2 types of floats (like C) -- double-floats and single-floats. Unlike C, where 'float' has a precise meaning, Common Lisp 'float' can be a composite type. In your code, simply declared 'float', the compiler doesn't know which one you mean, and generates generic arithmetic instead. If you declare all one type and if the compiler trusts the declarations (in Allegro this is only true if speed > safety), everything is in-lined. It is true that there is no standard for which float types are supported and which compiler optimizations are done. I guess each compiler uses differing sets and the only real ways to see which are to read the documentation, use the metering tools (in the add-on Composer product with Franz), and ask the vendor directly (which I would encourage any Franz user to do!). 2. Where did all the time go in your example? (a) mostly in generic floating point arithmetic (+, -, *) (b) much of the rest in calling 'abs' and 'coerce' (which I removed in the example below). 'abs' is an unnecessary function call which entails boxing up the float and handing a pointer to 'abs'. You might argue the compiler should be smart enough to recognize a special 'abs' for declared floating point arguments, but Allegro does not do that at the current time. (c) call to 'coerce' (which is a an extremely complex Common Lisp function) is dangerous unless the compiler is very clever. Ours isn't clever enough to do the "right" thing. In general, it's wisest to use the simpler thing, which the compiler is more likely to know about. So the call to 'coerce' is replaced by a call to 'float', which the Allegro CL compiler does know about. 3. I rewrote your example to look exactly like the C code and ran it with the following time (it is one clock-tick or less on a Sun4/110): 16 msecs. (defun make-mandel-10 () ;; Brute-force Mandelbrot thing. No complex numbers! ;; This version has hard-wired assumptions, and looks like the C. (let ((image (make-array '(10 10) :element-type 'bit :initial-element 0)) ;; image components set to 0 if in, 1 if not: different ;; than make-mandel! (x 0.0) (y 0.0)) (declare (type single-float x y) ; originally declared 'float' (optimize (speed 3) (safety 1))) (dotimes (xindex 10) (dotimes (yindex 10) (setf x (- (/ (float xindex) 10.0) 0.5) ; 'float' replaces 'coerce' y (- (/ (float yindex) 10.0) 0.5)) (dotimes (i 100) ;; z = x + iy => z * z = x*x +2ixy - y*y => ;; Re(z^2) =X^2 -y^2, Im(z^2) = 2xy (setq x (+ x (* x x) (- (* y y))) y (+ y (* 2.0 x y))) (if (or (>= x 2.0) ; 'abs' test expanded into 4 cases (<= x -2.0) ; just like the C code (>= y 2.0) (<= y 2.0)) (progn (setf (aref image xindex yindex) 1) (return nil)))))) t))