[comp.sys.amiga.tech] Speeding up fractals,

jcs@crash.cts.com (John Schultz) (12/21/90)
In <86849@tut.cis.ohio-state.edu> berenfie@tortoise.cis.ohio-state.edu (gregory neil berenfield) writes:

[stuff deleted]
>plane and use z->z^2+c calculations accordingly. The mandelbrot is drawn
>fairly quickly but not nearly as fast as MandFXP and I want to squeeze 
>maximum speed out of this sucker. Does anyone know of a way of storing
>floating points as ints and doing some sort of bit shifting stuff to do 
>fp arithmatic operations? Any help in optimizing this thing would be
>of great help. Thanks.


  First, MandFXP may be doing other tricks other than using a different
numeric format. These tricks are probably _mathematical_ in nature,
from knowledge known about the mandelbrot set. I doubt they are computing
the thing using brute force: they are probably using clever heuristics...
  Anyway, it's easy to do fixed point arithmetic. Just pick a bit location
for your fixed point, say bit 14. Now after you multiply two 14 bit numbers,
you now have a 28 bit number. To get back to a 14 bit number, you shift
right by 14. Before you divide one 14 bit number by another, you must make
it a 28 bit number. Examples:

#define FBITS 14 /* Fixed point bits, 16384 == 1.0 */
long a,b,c; /* 14 bit numbers */

  c = (a*b) >> FBITS;
  c = (a << FBITS) / b;
  c = (a*b)/c;

See the pattern? The rules are: Multiplication of two fixed point numbers
yields a fixed point number equal to the sum of the two number's fixed points.
Division yields the difference of the fixed points. The fixed point location
is called the RADIX. Using this terminology:

  a*b yields a fixed point number with radix(a) + radix(b);
  a/b "                              " radix(a) - radix(b);

You can mix radix' too:

 radix(a) == 7, radix(b) = 14:

  radix(a*b) = 21, radix((a*b)>>12) = 9, etc. Left shifts increase the radix,
right shifts decrease the radix.

  IMPORTANT: You *must* make sure that the radix is the same for both numbers
when adding or subtracting. You _can_ do things like:

  x = (a*a + b*b + c*c) >> FBITS; /* a,b,c and x have the same radix */

  Also, sqrt *halves* the radix, just as squaring a number doubles the radix.
Now, if you use the 68030, you should use 64 bit numbers (at least for
intermediate values). Use the addx.l, muls.l, divs.l, bfextu, and bfins
to manipulate your numbers. Here's a short example:


; rotate64, created by John Schultz
; Rotates a point in 3D space.
; Created 8-Aug-90

	section math64,code

	OPT	p=68020/68881 ; for Adapt assembler

	xdef	_rotate64

FB	equ	14	; fixed point precision (bits 0-14)
RB	equ	(32-FB) ; remaining bits (32-14 = 18)

; 32 bit rotation matrix offsets

ma	equ	0
mb	equ	4
mc	equ	8
md	equ	12
me	equ	16
mf	equ	20
mg	equ	24
mh	equ	28
mi	equ	32

; 32 bit 3D point offsets

px	equ	0
py	equ	4
pz	equ	8

;void longrotate(matrix * m,longpoint3d * p,ushort fbits) {
;long tcx,tcy,tcz;
;  tcx = p->x;
;  tcy = p->y;
;  tcz = p->z;
;  p->x = (m->a1*tcx + m->d1*tcy + m->g1*tcz) >> fbits;
;  p->y = (m->b1*tcx + m->e1*tcy + m->h1*tcz) >> fbits;
;  p->z = (m->c1*tcx + m->f1*tcy + m->i1*tcz) >> fbits;
;} /* longrotate */

; extern __asm rotate64(register __a0 matrix * m,
;                       register __a1 longpoint3d * p);

; a0 = matrix *, a1 = longpoint3d *

_rotate64:

	movem.l	d2-d6,-(sp)

	movem.l	 (a1),d0-d2	; x,y,z

; Rotate X

	move.l	(a0),d4		; ma
	muls.l	d0,d3:d4	; x*a, 64 bit result in d3:d4

	move.l	md(a0),d6	; md
	muls.l	d1,d5:d6	; y*d, 64 bit result in d5:d6

	add.l	d4,d6		; low 32 bits,  x*a + y*d
	addx.l	d3,d5		; high 32 bits, x*a + y*d (d6:d5 = res)

	move.l	mg(a0),d4	; mg
	muls.l	d2,d3:d4	; z*g, 64 bit result in d3:d4
	
	add.l	d4,d6		; low 32 bits,  x*a + y*d + z*g
	addx.l	d3,d5		; high 32 bits, x*a + y*d + z*g (d5:d6 = res)

; Result is in (high:low) d5:d6

	bfextu	d6{0:RB},d6	; shift right by 32-RB
	bfins	d5,d6{0:FB}	; insert bits from high word

	move.l	d6,(a1)+	; save rotated x

; Rotate Y

	move.l	mb(a0),d4	; ma
	muls.l	d0,d3:d4	; x*b, 64 bit result in d3:d4

	move.l	me(a0),d6	; md
	muls.l	d1,d5:d6	; y*e, 64 bit result in d5:d6

	add.l	d4,d6		; low 32 bits,  x*b + y*e
	addx.l	d3,d5		; high 32 bits, x*b + y*e (d5:d6 = res)

	move.l	mh(a0),d4	; mg
	muls.l	d2,d3:d4	; z*g, 64 bit result in d3:d4

	add.l	d4,d6		; low 32 bits,  x*b + y*e + z*h
	addx.l	d3,d5		; high 32 bits, x*b + y*e + z*h (d5:d6 = res)

; Result is in (high:low) d5:d6

	bfextu	d6{0:RB},d6	; shift right by 32-RB
	bfins	d5,d6{0:FB}	; insert bits from high word

	move.l	d6,(a1)+	; save rotated y

; Rotate Z

	move.l	mc(a0),d4	; mc
	muls.l	d0,d3:d4	; x*c, 64 bit result in d3:d4

	move.l	mf(a0),d6	; md
	muls.l	d1,d5:d6	; y*f, 64 bit result in d5:d6

	add.l	d4,d6		; low 32 bits,  x*c + y*f
	addx.l	d3,d5		; high 32 bits, x*c + y*f (d6:d5 = res)

	move.l	mi(a0),d4	; mg
	muls.l	d2,d3:d4	; z*g, 64 bit result in d3:d4
	
	add.l	d4,d6		; low 32 bits,  x*c + y*f + z*i
	addx.l	d3,d5		; high 32 bits, x*c + y*f + z*i (d5:d6 = res)

; Result is in (high:low) d5:d6

	bfextu	d6{0:RB},d6	; shift right by 32-FB
	bfins	d5,d6{0:FB}	; insert bits from high word

	move.l	d6,(a1)		; save rotated z

	movem.l	(sp)+,d2-d6
	rts

        END

  One last thing, if you are constantly using the fsqrt of the 882,
the fmove.l instruction is 100 cycles, so you may not see a big win
using integer in some cases. Multiplies and divides aren't much slower
on the 882 (than muls.l/divs.l), especially if you use fsglmul/div, but
adds and subtracts are slower than multiplies and divides! This makes
sense as the processor must align the bits (floating) before performing
adds or subs. Hope this helps,


  John