cbuckley@vax1.tcd.ie (Colm Buckley) (02/05/90)
Thanks to everybody who replied to me concerning my earlier query (in comp.lang.c) about mandelbrot sets - here is a summary of the suggestions I received. ------------------------------------------------------------------------------- The main thrust of the replies which I received concerning the acceleration of mand() was that, since the only interesting bits are contained in a disc centered on the origin (0+0i) of radius 2, the use of full-blown floating-point variables is very wasteful of processor time. Of course, to machines with a FP coprocessor, the extra overhead involved with using "double" variables in the function is very small, because floating-point arithmetic is performed by the copro (eg : an 80387) almost as fast as integer arithmetic on the main processor itself. However, many machines, particularly PC's, do not usually have FP copros, so the best way to improve the speed of iterative functions such as mand() which involve many thousands of arithmetic operations is to switch to fixed-point arithmetic. Acting on this suggestion, I implemented a fixed-point version of mand(), using 32-bit "long"s to store the variables. The numbers are stored multiplied by 2**28, so, for example 1.0 becomes 268435456L, 0.5 becomes 134217728L, and so on. Using this convention, numbers from -8 to 8 can be represented, to an accuracy of (1/(2**28)) or 3.725E-9. While this is accurare enough for fairly low-resolution or low-magnification pictures, it is not sufficient if you want to go really "deep" into the set. Unfortunately, there does not exist in most versions of C any integer type with greater precision than "long", so any further advances require an assembler program (don't rush me, I'm working on it!) Addition and subtraction can be performed as usual, using ordinary integer + and - ( (x * 2**28) + (y * 2**28) = (x+y) * 2**28 ). However, multiplication poses a problem. When two numbers stored in this fashion are multiplied together, the result will be 2**28 times too big ( (x * 2**28) * (y * 2**28) = (x * y * 2**56), not (x * y * 2**28). Therefore, when multiplication is necessary, the multiplicand and multiplier must both be divided by (2**14). This is easily and quickly accomplished by right-shifting x and y by 14 bits (x >> 14 = x / (2**14) ). The function fmand() given below is an implementation of mand() using this convention. As mentioned before, it is not as accurate as the full version using "double"s, but it is between 5 and 10 times faster, depending on the details of the machine architecture. #define fixp(x) (long)((x) * (1 << 28)) #define abs(x) ((x) < 0 ? -x : x) #define FTWO 536870912L #define FFOUR 1073741824L int fmand(long x, long y) { int i = 255; long a = 0L, b = 0L, m = 0L, n = 0L; if (abs(x) >= FTWO || abs(y) >= FTWO) return 0; do { /* divide a and b so multiplication is possible */ m = (a >> 14) | ((a & -2147483648L) ? -262144L : 0L); n = (b >> 14) | ((b & -2147483648L) ? -262144L : 0L); /* b = 2 * a * b + y */ b = (m * n << 1) + y; m *= m; n *= n; a = m - n + x; if (m >= FFOUR - n || m + n < 0L || abs(b) >= FTWO || abs(a) >= FTWO) break; } while (--i); return 255 - i; } Note the peculiar-looking construction after the "m = (a >> 14)". This is a bit-mask which handles negative numbers. Dividing signed integers by a power of two is not simply a matter of bit-shifting, the number must be sign-extended as well (the leftmost bit must be copied into the bit-spaces left empty after the shift) For example, -1.0 is stored as -268435456L, which is 11110000000000000000000000000000 in binary. When this is divided by 2**14 we should get -16384L, which is 11111111111111111100000000000000 in binary. However, (-268435456L >> 14) is 00000000000000111100000000000000 - the first 14 bits are 0 instead of 1. Therefore, if the original number is negative (the leftmost bit is 1 - this is the (a & -2147483648L) test), the number is logically ORed with -262144L, which is 11111111111111000000000000000000 - ie:, the first 14 bits are set to 1. Note that this routine takes "long" parameters, not "double" - therefore the values must be converted BEFORE the routine is called. The macro fixp(x) converts the number (x) (any type) into the fixed-point representation. An example call to fmand(), would be double x = 0.4, y = -1.02; int i; . . . i = fmand(fixp(x), fixp(y)); This type of optimisation seems to be about as far as the actual mand() function is concerned, but several people emailed with algorithms to reduce the number of times that the function has to be called. One of the most interesting came from Alastair Sutherland in SCO in California. He suggests looking for a rectangle with all edge pixels the same colour, and filling it in, on the assumption that, if the edge is solid, then the interior is too. See the "diagram" below : ********* ********* * * ********* * * -> ********* * * ********* ********* ********* As yet, I don't have a source-level algorithm for this, as Alastair's machine is temporarily down - perhaps I'll have it in a few weeks; if you mail me then, I might be able to send it along. Anyway, thanks for the interest in mand() - if you have any better ideas, let me know! C o l m. +-----------------------------------------------------------------+ | Colm Buckley. (Computer Science, Trinity College, Dublin) | | | | EMAIL : cbuckley@vax1.tcd.ie | | Phone : Dublin 770101 | +---+---------------------------------------------------------+---+ | "I'm a lean, mean computing machine!" | +---------------------------------------------------------+
hascall@cs.iastate.edu (John Hascall) (02/07/90)
In article <5480@vax1.tcd.ie> cbuckley@vax1.tcd.ie (Colm Buckley) writes: [using 32bit fixed point for mandelbrot computations] }The numbers are stored multiplied by 2**28, ... [addition and subtraction work ok, but muliply...] }poses a problem. When two numbers stored in this fashion are multiplied }together, the result will be 2**28 times too big ( (x * 2**28) * (y * 2**28) = }(x * y * 2**56), not (x * y * 2**28). Therefore, when multiplication is }necessary, the multiplicand and multiplier must both be divided by (2**14). }This is easily and quickly accomplished by right-shifting x and y by 14 bits }(x >> 14 = x / (2**14) ). Umm, when you right shift aren't you losing a great deal of your precision? Or am I missing something here? John Hascall
doug@xdos.UUCP (Doug Merritt) (02/08/90)
In article <5480@vax1.tcd.ie> cbuckley@vax1.tcd.ie (Colm Buckley) writes: >Unfortunately, there does not exist in most versions of C any integer type with >greater precision than "long", so any further advances require an assembler >program (don't rush me, I'm working on it!) What? Not at all. You can do extended precision in C. You can't do a 32x32 -> 64 bit multiply in a single step, which *some* (but not all) machines allow in assembler. So, granted, there may be good reason to use assembler for speeding up such operations. But that's really no different than any other algorithm. As usual the smart approach is to get it working in C first, and then (if desired) recode that inner loop in assembler to get some extra speed. But don't go off thinking it's *necessary*. >number of times that the function has to be called. One of the most interesting >came from Alastair Sutherland in SCO in California. He suggests looking for a >rectangle with all edge pixels the same colour, and filling it in, on the >assumption that, if the edge is solid, then the interior is too. See the Although this is a useful heuristic, note that it can fail. Some regions of the set will have a "tendril" that sneaks between two pixels, and then expand to visible size inside the region you've filled in. You should be able to find this effect by trying this algorithm in one of the areas that has a (pseudo-) duplicate of the entire set. These little-M's are generally connected by extremely thin tendrils, so it should be easy to get the tendril to fall between two pixels. For this reason, it's helpful to enhance the basic idea by adding a little double checking. Always probe some number of random points inside the rectangle; if one of them is a different color than you'd expect from the heuristic, do the interior "the hard way". Doug -- Doug Merritt {pyramid,apple}!xdos!doug Member, Crusaders for a Better Tomorrow Professional Wildeyed Visionary
rspangle@jarthur.Claremont.EDU (Randy Spangler) (02/08/90)
>In article <5480@vax1.tcd.ie> cbuckley@vax1.tcd.ie (Colm Buckley) writes: > > [using 32bit fixed point for mandelbrot computations] > >}The numbers are stored multiplied by 2**28, ... > > [addition and subtraction work ok, but muliply...] > >}poses a problem. When two numbers stored in this fashion are multiplied >}together, the result will be 2**28 times too big ( (x * 2**28) * (y * 2**28) = >}(x * y * 2**56), not (x * y * 2**28). Therefore, when multiplication is >}necessary, the multiplicand and multiplier must both be divided by (2**14). >}This is easily and quickly accomplished by right-shifting x and y by 14 bits >}(x >> 14 = x / (2**14) ). Well, those of us with 80386's can multiply two 32-bit numbers and get a 64-bit number in EDX:EAX (first 32 in EDX, next 32 in EAX) with the following code: MOV eax,number1 IMUL number2 where number1 and number2 are 32-bit memory operands. I don't know if the 68000 chips have similar instructions, but I would assume so. The next is pretty system-specific (IBM 80x87 only). I have a program I wrote in assembly to plot mandelbrot sets using the floating point chip. I optimized it so that all the calculations are done using the chip's registers - no conversion to regular memory operands. It's slower than 32-bit integer math, but more accurate than any other program I've seen for the IBM at high resolution, since the numbers never leave the 10-byte format that the 80x87 stores them in (as opposed to 8-byte long floating point type for memory operands). Since the code only applies to a small segment of the readers, I'll email copies to interested people, unless I get so many responses that I think it's worth posting. -- -------------------------------------------------------------------------- | Randy Spangler | The less things change, the | | rspangle@jarthur.claremont.edu | more they remain the same | --------------------------------------------------------------------------
mclek@dcatla.UUCP (Larry E. Kollar) (02/08/90)
In article <5480@vax1.tcd.ie> cbuckley@vax1.tcd.ie (Colm Buckley) writes: > One of the most interesting [ optimizations ] > suggests looking for a >rectangle with all edge pixels the same colour, and filling it in, on the >assumption that, if the edge is solid, then the interior is too. One of the newest Amiga Mandelbrot generators goes beyond that. It traces around any "black" region it finds, fills it in, and moves on. The program doesn't bother tracing other colors, on the assumption that you wouldn't save enough time doing other regions to make that worthwhile. The normal unzoomed view takes only a few minutes to draw on a stock 68000-based Amiga using a 16-color screen. Best of all, the source (JForth) is available. Amigans can find this on Fish disk 2?? (the one with the JGoodies programs -- #237 seems to come to mind, although I think that's too early), or the JGoodies #1 disk from Delta Research. Send email if you want more info. I'd rather not email 3000 copies of the program, but I'd consider snail-mail if you send a disk and return postage. -- Larry Kollar UUCP ...!gatech!dcatla!mclek VoiceMail:One 2670 Georgia 400: the Southeast's premier 48-mile 4-lane dragstrip.