cak3g@astsun8.astro.Virginia.EDU (Colin Klipsch) (10/23/89)
It is difficult to produce a television documentary that is both incisive and probing when every twelve minutes one is interrupted by twelve dancing rabbits singing about toilet paper. -- Rod Serling __________________________________________________________________ WARNING: assembly language discussion ahead. . . (Ah, I can hear even now the sound of a million fingers pressing the "N" key. . .) I am writing a library of three-dimensional graphics routines similar to the Graf3D package supplied by Apple. However, I desire more precision than these routines use. I have chosen to represent my angles as long integers, with the angle being expressed in milli-arcseconds (with 360*3600*1000 milli-arcseconds to a circle). I am also using 64 bit integers (like the SANE comp format) to represent my coordinates. In the interest of speed, I desire to write a subroutine with the following properties: 1) on entry, D0 is an angle in milli-arcseconds (signed long integer) 2) on exit, D0 is the cosine of the given angle (fract format, accurate to the last bit) 3) a lookup table is allowable, but it should be no larger than 10KB, and preferably only a few KB The past few months of research have yielded these thoughts: Pretty obviously we can use the fact that the cosine function is self-similar. Thus the problem reduces to finding the cosine of an angle between 0 and 90 degrees (or between 0 and 324,000,000 marcsec). The cosines of other angles can be easily derived if this sub-problem has been solved. Although I expect to use a lookup table, linear interpolation seems unfruitful. To get thirty bits of accuracy (as stipulated in Property 2) from linear interpolation would require many thousands of table entries, even if the table density varied over the range. These table entries will likely be six bytes each, in the form: b47 b46 . b45 b44 b43 b42 b41 . . . b3 b2 b1 b0 | | | | | | | ---- weight 1/4 | | -------- weight 1/2 | -------------- weight 1 ------------------ weight -2 (sign bit) Why this format? If I want an accurate result in fract format, I must do my scratch calculations in a higher precision and truncate the final result. The above format has the advantages that: a) it's an even number of bytes, which fits nicely into memory, b) to reduce the final result to fract format, all I have to do is lop off the lower 16 bits. Anyway, at six bytes per entry, this limits our table to about a two thousand entries maximum. Extending linear interpolation to the general case, we can use the Gregory-Newton interpolation method, which goes: Given a table of values for a function f, with each table entry separated by a constant distance h, approximate f(x) by: f(x) = D0 + D1*t + D2*t*(t-1) + D3*t*(t-1)*(t-2) + ... ---------- ---------------- 2! 3! where x0 is the nearest multiple of h below x, t = (x-x0)/h, and: D0 = f(x0) D1 = f(x0+h) - f(x0) D2 = f(x0+2h) - 2*f(x0+h) + f(x0) D3 = f(x0+3h) - 3*f(x0+2h) + 3*f(x0+h) - f(x0) (and so on, using Pascal's triangle) Using any terms higher than the third, however, requires dividing by three, which (you'll note) is not a power of 2. Thus this series should probably be limited to the first three terms for fast use in machine language. This method so far seems to be my most hopeful route. But why use a table at all? For the sake of speed, remember. I could probably plug the given angle into some kind of polynomial, but if I'm doing this on my own, then all the multiplications will be multiple precision (because mulu.l returns a 64b result), and I'll have to worry about multiplying a signed 64b number by a 32b number, keeping track of the decimal point, accuracy of the coefficients, etc. I have already tried doing this, but my routine (even though it wasn't quite finished) was only 2.3 times faster than converting the angle to SANE extended (in radians) and calling FCOSX. This either says that my routine is surprisingly slow or that SANE is surprisingly fast. Of course, this was on a Mac II, but still... So how do the professionals do it? Good question. FracCos and FracSin are two routines which quickly calculate those two trig functions, although they take their angles in radians (fixed format) and their results are accurate to about 24 fraction bits, not 30. I've tried dissasembling these routines, but their method eludes me. (Trying to understand raw machine code is not a pleasant experience...) And again, I'm avoiding SANE and floating point numbers in general, so there's no point in investigating how FCOSX works, although I would be curious anyway. SANE undoubtedly uses a polynomial of some kind (Chebyshev, perhaps?) and does all of it's calculations in floating point. By the way, FracCos runs about 9 times faster than FCOSX (assuming equivalent angles) on a Mac II, and 20 times faster on a Mac SE. Just thought you might like to know. But is it even possible for me to beat FracCos? I'm hoping so, because it doesn't seem to use any lookup tables, and that (generally) is a fast way of getting the values of functions. Sources I have read: _The Art of Computing_ by Donald Knuth _Microcomputer Animation_ (?) by Bruce Artwick _Apple Numerics Manual_ by Apple themselves _Numerical Recipes_ by Press et. al. various obtuse textbooks on numerical analysis And then of course, there's the rest of this damn project, but God knows when I'll ever have the time. . . ______________________________________________________________________ "May the forces of evil become confused on the way to your house." -- George Carlin Sincerely, | DISCLAIMER: Colin Klipsch | Every word in this text is University of Virginia | actually a horrendous misspelling (cak3g@astsun.astro.virginia.edu) | of the word "fettuccini".
earleh@eleazar.dartmouth.edu (Earle R. Horton) (10/23/89)
In article <2157@hudson.acc.virginia.edu> cak3g@astsun8.astro.Virginia.EDU (Colin Klipsch) writes: ... >I am writing a library of three-dimensional graphics routines >similar to the Graf3D package supplied by Apple. However, I >desire more precision than these routines use. I have chosen >to represent my angles as long integers, with the angle being >expressed in milli-arcseconds (with 360*3600*1000 milli-arcseconds >to a circle). I am also using 64 bit integers (like the SANE >comp format) to represent my coordinates. > >In the interest of speed, I desire to write a subroutine with >the following properties: ... This may not be what Colin wants, but in the interest of speed, I would think the best thing to do would be to use floating point coordinates exclusively, and to compile the program to use the floating point coprocessor directly. If you can convince your end users that they require something with a 68020/68881 combo or better, then you have your performance problems licked. I seriously doubt that you can come close to the performance of the hardware math unit using any kind of integer-based package, particularly if you have to do trigonometry. >So how do the professionals do it? Good question. This depends on what kind of professionals we are talking about. Professional programmers might try to approach the performance level of a hardware math unit using highly optimized integer routines, and they might get within the same order of magnitude at the expense of huge outlays of intellectual effort. Professional engineers, physicists, chemists, accountants, etc. will say "Gimme that floating point chip!" and spend their intellectual efforts in solving the problem that they are actually interested in. That is why I wrote my 3d graphics library to use the 68881 directly. If I ever have to run it on a Plus or SE, well then I plan to byte the bullet and get a Radius or Daystar board with hardware floating point. Earle R. Horton
chewy@apple.com (Paul Snively) (10/23/89)
In article <2157@hudson.acc.virginia.edu> cak3g@astsun8.astro.Virginia.EDU (Colin Klipsch) writes: > I have already tried doing this, but my routine (even though it > wasn't quite finished) was only 2.3 times faster than converting > the angle to SANE extended (in radians) and calling FCOSX. This > either says that my routine is surprisingly slow or that SANE > is surprisingly fast. Of course, this was on a Mac II, but still... The reason for this is that SANE isn't as stupid as people think; since ALL Mac IIs come with an FPU, SANE uses it, so you're pretty close to the hardware when you use SANE on a Mac II. __________________________________________________________________________ Just because I work for Apple Computer, Inc. doesn't mean that they believe what I believe or vice-versa. __________________________________________________________________________ C++ -- The language in which only friends can access your private members. __________________________________________________________________________
earleh@eleazar.dartmouth.edu (Earle R. Horton) (10/24/89)
In article <4831@internal.Apple.COM> chewy@apple.com (Paul Snively) writes: ... >The reason for this is that SANE isn't as stupid as people think; since >ALL Mac IIs come with an FPU, SANE uses it, so you're pretty close to the >hardware when you use SANE on a Mac II. I measure a 100 times speed increase on a Mac II using the MC68881 over SANE, with a trig-intensive application that uses doubles (64 bits) for the primary data storage type. SANE does not use the MC6888[12] for transcendental functions such as sine and cosine, but it does offer 1 bit more accuracy than the FPU. You take your pick, 1 bit more accuracy out of 80, or 100-fold speed increase. The switches to turn on the FPU from MPW C are -mc68881, -elems881, and -x149. Use the whole set. The Aztec C floating point library (software) is less accurate than SANE, but 2-3 times faster in my experience. This is a software solution that does not use an FPU at all. No one is saying SANE is stupid, but it is slow. Earle R. Horton
)) (10/24/89)
If you want a cosine, try the McLaurin polynomial. (This really is a special case of the Taylor polynomial) I'm sure you're familiar with it, and cleverly coded (i.e. begin at the term to the highest power and go towards ... + 1, and all of it done "inline" i.e. no loop) it should be quite fast. Happy Hacking ! h+@nada.kth.se -- When you don't know what you're doing, do it gently
wetter@cup.portal.com (Pierce T Wetter) (10/27/89)
You want to use something called a CORDIC algorithm. If I remember correctly it uses shifts to do the divisions and multiplications by powers of two, and uses some algoritm that only requires divison and multiplication by powers of two. CORDIC is the name of the machine the algorithm was implemented for. Pierce
viking@silver.bacs.indiana.edu (Jon W. Backstrom) (10/27/89)
In article <23401@cup.portal.com> wetter@cup.portal.com (Pierce Wetter) writes: > You want to use something called a CORDIC algorithm. If I remember >correctly it uses shifts to do the divisions and multiplications by powers of >two, and uses some algoritm that only requires divison and multiplication >by powers of two. CORDIC is the name of the machine the algorithm was >implemented for. >Pierce Where is Jim Cathey when you really need him?! :-) Jim converted a fractal landscape program, originally published in Creative Computing, to use cordic rotators and the speedup is indeed amazing. The actual work involves multiplication by powers of two and this required only shifting bits in the code. As a naive programmer, not heavily into math theory, I was fascinated to see how it worked. Made want to learn more about math again. (True) ------------------------------------------------------------------------------- Jon W. Backstrom "Yah sure...we gonna have fun, you bet!" Institute for Digital Arts P.O. Box 176 Internet: viking@silver.bacs.indiana.edu Bloomington, IN 47402-0176 UUCP: {ames,rutgers,att}!iuvax!silver!viking -------------------------------------------------------------------------------
jimc@iscuva.ISCS.COM (Jim Cathey) (10/31/89)
In article <28548@iuvax.cs.indiana.edu> viking@silver.bacs.indiana.edu (Jon W. Backstrom) writes: >In article <23401@cup.portal.com> wetter@cup.portal.com (Pierce Wetter) writes: >> You want to use something called a CORDIC algorithm. If I remember >>correctly it uses shifts to do the divisions and multiplications by powers >>of two, and uses some algoritm that only requires divison and multiplication >>by powers of two. CORDIC is the name of the machine the algorithm was >>implemented for. >Where is Jim Cathey when you really need him?! :-) Jim converted a >fractal landscape program, originally published in Creative Computing, >to use cordic rotators and the speedup is indeed amazing. The actual >work involves multiplication by powers of two and this required only >shifting bits in the code. I be here! Though I am a staunch supporter of CORDIC, I don't think it is what the originator of this thread needs to solve his cosine problem. Raw CORDIC (Coordinate Rotating DIgital Computer or somesuch) is very good at rotating points around an axis as it never needs to compute trig values at all. It's capable of rotating an integer point atan(1/2^integer_shiftcount) [I think] per step -- multiple steps at differing shift counts can get you to more exact rotations. I used CORDIC in the fractals program to rotate an essentially 3D set of integer data points about the Z and X axes prior to plotting it to the screen. The exact amount of shifting was immaterial to me so long as the result looked good. For those who care, counter-clockwise CORDIC is: y' = y + (x >> shiftcount) x' = x - (y' >> shiftcount) which is 6 instructions (per step) on the 68000. Note well the use of y' and not y in the second equation -- use of y will result in a rotator that spins out into the walls instead of whirling quietly in place. Best results are obtained when you're using a fixed-point data type that is normalized so that the magnitudes are large when considered as a signed integer (so that the delta terms don't degenerate to zero). CORDIC will wobble a little bit as it spins, so don't normalize to +/- 32767, +/- 32000 will give a little breathing room. (Same argument for long integers.) There was an article in BYTE in '77 or '78 about using CORDIC for trig functions, but it looked overly complicated compared to raw CORDIC. Someone also posted to the net some CORDIC trig functions, but I've never looked at them (squirrel instinct made me grab them). I believe HP used CORDIC for the trig functions in the HP-35 calculator, but all later models use Chebyshev (sp?) polynomials because they are faster and more accurate. I think the best results for medium-precision cosine might be a combination of table lookup, fast inline integer math, and (most importantly) good range reduction so that the table isn't too big. I'm pretty sure you only need a 0-45 degree table, the rest can be deduced from trig reductions. +----------------+ ! II CCCCCC ! Jim Cathey ! II SSSSCC ! ISC-Bunker Ramo ! II CC ! TAF-C8; Spokane, WA 99220 ! IISSSS CC ! UUCP: uunet!iscuva!jimc (jimc@iscuva.iscs.com) ! II CCCCCC ! (509) 927-5757 +----------------+ "With excitement like this, who is needing enemas?"