jcs@crash.cts.com (John Schultz) (12/21/90)
In <86849@tut.cis.ohio-state.edu> berenfie@tortoise.cis.ohio-state.edu (gregory neil berenfield) writes: [stuff deleted] >plane and use z->z^2+c calculations accordingly. The mandelbrot is drawn >fairly quickly but not nearly as fast as MandFXP and I want to squeeze >maximum speed out of this sucker. Does anyone know of a way of storing >floating points as ints and doing some sort of bit shifting stuff to do >fp arithmatic operations? Any help in optimizing this thing would be >of great help. Thanks. First, MandFXP may be doing other tricks other than using a different numeric format. These tricks are probably _mathematical_ in nature, from knowledge known about the mandelbrot set. I doubt they are computing the thing using brute force: they are probably using clever heuristics... Anyway, it's easy to do fixed point arithmetic. Just pick a bit location for your fixed point, say bit 14. Now after you multiply two 14 bit numbers, you now have a 28 bit number. To get back to a 14 bit number, you shift right by 14. Before you divide one 14 bit number by another, you must make it a 28 bit number. Examples: #define FBITS 14 /* Fixed point bits, 16384 == 1.0 */ long a,b,c; /* 14 bit numbers */ c = (a*b) >> FBITS; c = (a << FBITS) / b; c = (a*b)/c; See the pattern? The rules are: Multiplication of two fixed point numbers yields a fixed point number equal to the sum of the two number's fixed points. Division yields the difference of the fixed points. The fixed point location is called the RADIX. Using this terminology: a*b yields a fixed point number with radix(a) + radix(b); a/b " " radix(a) - radix(b); You can mix radix' too: radix(a) == 7, radix(b) = 14: radix(a*b) = 21, radix((a*b)>>12) = 9, etc. Left shifts increase the radix, right shifts decrease the radix. IMPORTANT: You *must* make sure that the radix is the same for both numbers when adding or subtracting. You _can_ do things like: x = (a*a + b*b + c*c) >> FBITS; /* a,b,c and x have the same radix */ Also, sqrt *halves* the radix, just as squaring a number doubles the radix. Now, if you use the 68030, you should use 64 bit numbers (at least for intermediate values). Use the addx.l, muls.l, divs.l, bfextu, and bfins to manipulate your numbers. Here's a short example: ; rotate64, created by John Schultz ; Rotates a point in 3D space. ; Created 8-Aug-90 section math64,code OPT p=68020/68881 ; for Adapt assembler xdef _rotate64 FB equ 14 ; fixed point precision (bits 0-14) RB equ (32-FB) ; remaining bits (32-14 = 18) ; 32 bit rotation matrix offsets ma equ 0 mb equ 4 mc equ 8 md equ 12 me equ 16 mf equ 20 mg equ 24 mh equ 28 mi equ 32 ; 32 bit 3D point offsets px equ 0 py equ 4 pz equ 8 ;void longrotate(matrix * m,longpoint3d * p,ushort fbits) { ;long tcx,tcy,tcz; ; tcx = p->x; ; tcy = p->y; ; tcz = p->z; ; p->x = (m->a1*tcx + m->d1*tcy + m->g1*tcz) >> fbits; ; p->y = (m->b1*tcx + m->e1*tcy + m->h1*tcz) >> fbits; ; p->z = (m->c1*tcx + m->f1*tcy + m->i1*tcz) >> fbits; ;} /* longrotate */ ; extern __asm rotate64(register __a0 matrix * m, ; register __a1 longpoint3d * p); ; a0 = matrix *, a1 = longpoint3d * _rotate64: movem.l d2-d6,-(sp) movem.l (a1),d0-d2 ; x,y,z ; Rotate X move.l (a0),d4 ; ma muls.l d0,d3:d4 ; x*a, 64 bit result in d3:d4 move.l md(a0),d6 ; md muls.l d1,d5:d6 ; y*d, 64 bit result in d5:d6 add.l d4,d6 ; low 32 bits, x*a + y*d addx.l d3,d5 ; high 32 bits, x*a + y*d (d6:d5 = res) move.l mg(a0),d4 ; mg muls.l d2,d3:d4 ; z*g, 64 bit result in d3:d4 add.l d4,d6 ; low 32 bits, x*a + y*d + z*g addx.l d3,d5 ; high 32 bits, x*a + y*d + z*g (d5:d6 = res) ; Result is in (high:low) d5:d6 bfextu d6{0:RB},d6 ; shift right by 32-RB bfins d5,d6{0:FB} ; insert bits from high word move.l d6,(a1)+ ; save rotated x ; Rotate Y move.l mb(a0),d4 ; ma muls.l d0,d3:d4 ; x*b, 64 bit result in d3:d4 move.l me(a0),d6 ; md muls.l d1,d5:d6 ; y*e, 64 bit result in d5:d6 add.l d4,d6 ; low 32 bits, x*b + y*e addx.l d3,d5 ; high 32 bits, x*b + y*e (d5:d6 = res) move.l mh(a0),d4 ; mg muls.l d2,d3:d4 ; z*g, 64 bit result in d3:d4 add.l d4,d6 ; low 32 bits, x*b + y*e + z*h addx.l d3,d5 ; high 32 bits, x*b + y*e + z*h (d5:d6 = res) ; Result is in (high:low) d5:d6 bfextu d6{0:RB},d6 ; shift right by 32-RB bfins d5,d6{0:FB} ; insert bits from high word move.l d6,(a1)+ ; save rotated y ; Rotate Z move.l mc(a0),d4 ; mc muls.l d0,d3:d4 ; x*c, 64 bit result in d3:d4 move.l mf(a0),d6 ; md muls.l d1,d5:d6 ; y*f, 64 bit result in d5:d6 add.l d4,d6 ; low 32 bits, x*c + y*f addx.l d3,d5 ; high 32 bits, x*c + y*f (d6:d5 = res) move.l mi(a0),d4 ; mg muls.l d2,d3:d4 ; z*g, 64 bit result in d3:d4 add.l d4,d6 ; low 32 bits, x*c + y*f + z*i addx.l d3,d5 ; high 32 bits, x*c + y*f + z*i (d5:d6 = res) ; Result is in (high:low) d5:d6 bfextu d6{0:RB},d6 ; shift right by 32-FB bfins d5,d6{0:FB} ; insert bits from high word move.l d6,(a1) ; save rotated z movem.l (sp)+,d2-d6 rts END One last thing, if you are constantly using the fsqrt of the 882, the fmove.l instruction is 100 cycles, so you may not see a big win using integer in some cases. Multiplies and divides aren't much slower on the 882 (than muls.l/divs.l), especially if you use fsglmul/div, but adds and subtracts are slower than multiplies and divides! This makes sense as the processor must align the bits (floating) before performing adds or subs. Hope this helps, John