moses@hao.ucar.edu (Julie Moses) (09/02/90)
I have recently installed a math copressor into my Mega system and have now linked my code with libraries supporting a 68881 chip in a ST. While more complex math functions, such as transcendentals and square root really show a speed increase, adding or multiplying two floating point numbers do not show any speed increase over just doing it with the 68000 chip. Matter of fact some of my functions that do repetitive mulitply and additions with floats (without more complex functions) are significantly slower when using the math copressor. Q- Can someone explain why? My best guess is that the time the processor takes to transfer the floating point numbers and also the time waiting for the math copressor to be ready to receive makes the simpler math operations slower than just doing them inside the 68000 chip. I am using the Prospero math libraries that look once for the math chip so that one must link for either a 68K chip with or without the math chip, exclusively. Julie Moses
kbad@atari.UUCP (Ken Badertscher) (09/03/90)
moses@hao.ucar.edu (Julie Moses) writes: | While more complex math functions, such as transcendentals and |square root really show a speed increase, adding or multiplying two |floating point numbers do not show any speed increase over just |doing it with the 68000 chip. Matter of fact some of my functions |that do repetitive mulitply and additions with floats (without more |complex functions) are significantly slower when using the math |copressor. Sounds to me like the libraries were not properly implemented to take best advantage of the peripheral 68881. In series of tests that I did when working on a homemade set of bindings for Megamax Laser C some time ago, I sometimes noticed that the speedups weren't all that sensational... The timings I did were /never/ slower when using the 68881, though. And the Megamax floating point routines are /fast/. -- ||| Ken Badertscher (ames!atari!kbad) ||| Atari R&D System Software Engine / | \ #include <disclaimer>
muts@fysaj.fys.ruu.nl (Peter Mutsaers /100000) (09/03/90)
moses@hao.ucar.edu (Julie Moses) writes: >Q- Can someone explain why? > My best guess is that the time the processor takes to transfer >the floating point numbers and also the time waiting for the math >copressor to be ready to receive makes the simpler math operations >slower than just doing them inside the 68000 chip. I am using the >Prospero math libraries that look once for the math chip so that one >must link for either a 68K chip with or without the math chip, >exclusively. >Julie Moses Maybe the routines of Prospero are not very fast, or they are only single precision. The 68881 uses 80 bits, generally double precision takes 4 times longer then single precision. In Turbo C, which has the fastest floating point library available to my knowledge, 80 bits take 3 times as long as the 68881 does. So if there were single pricision routines they would be a bit faster then the 68881. -- Peter Mutsaers email: muts@fysaj.fys.ruu.nl Rijksuniversiteit Utrecht nmutsaer@ruunsa.fys.ruu.nl Princetonplein 5 tel: (+31)-(0)30-533880 3584 CG Utrecht, Netherlands
t68@nikhefh.nikhef.nl (Jos Vermaseren) (09/03/90)
Some years ago a wrote my own set of floating point routines. Then a little later I got the chance to test a 68881. My findings were that single precision addition was not any faster on the 68881, but all others were. And those were very good floating point routines. On the whole however the speed increase with the floating point chip wasn't realy spectacular. Over the original Absoft library I measured on one of my calculational programs a factor 4 with the 68881 (and a factor 2 with the home made library). Most of the time is lost in the transmission and the negotiations with the chip. This won't be the case so much on the TT as it has a directer channel. A compiler that can use the floating point registers will also make an enormous difference, because then nearly all transmission delays are reduced by a factor 4 to 5. If you find that multiplication is faster without the chip you have either an incredibly fast FP library, or an inefficient link to the 68881. Jos Vermaseren
moses@hao.ucar.edu (Julie Moses) (09/04/90)
| From (Ken Badertscher) | |Sounds to me like the libraries were not properly implemented to take |best advantage of the peripheral 68881. In series of tests that I did |when working on a homemade set of bindings for Megamax Laser C some time |ago, I sometimes noticed that the speedups weren't all that |sensational... The timings I did were /never/ slower when using the |68881, though. And the Megamax floating point routines are /fast/. |-- | ||| Ken Badertscher (ames!atari!kbad) | ||| Atari R&D System Software Engine | / | \ #include <disclaimer> | From (Peter Mutsaers) | |Maybe the routines of Prospero are not very fast, or they are only |single precision. | |The 68881 uses 80 bits, generally double precision takes 4 times longer |then single precision. |In Turbo C, which has the fastest floating point library available to my |knowledge, 80 bits take 3 times as long as the 68881 does. So |if there were single pricision routines they would be a bit faster |then the 68881. The above messages and some others point to a solution to the question: why are simpler F.P. math functions slower with the 68881 than with the 68000? Ken and Pete, Yes the Prospero libraries are not highly optimized as compared to Turbo C or Megamax C but they are a solid group of functions supported by a good working environment. However, the Prospero 68881 LIBS, I would wager, are faster than Megamax C's libraries for two reasons: 1) Prospero's looks once for the 68881 at program bootup for the math chip while Laser C looks for it everytime it wants to do F.P. math, 2) Prospero comes with two 68881 libraries, the second has no error checking and that eliminates some overhead (though you better know what the ranges of the solutions will be). The solution is that I was comparing single precision (32 bit) F.P. math being done by the 68000 to 80 bit math by the math copressor. Complex F.P. functions, such as Tangent(x), are always faster when done by the 68881 math copressor, but simple functions such as add, subtract and multiply are <slower> because of the time taken : 1) waiting for the 68881 chip to be ready to receive, 2) moving 32 bits to the math chip, 3) the math chip converting the the 32 bit floats to 80 bit floats, 4) returning the solution back to the 68000. I am doing my single precision F.P. math in Fortran subroutines and linking them into my Pro-C. Double precision F.P. math, such as done by C, I would agree, is probably always slower than the math by the 68881. Having looked at the Alcyon 68881 assembly source code, there does not seem to be much one can do to further optimize the F.P. routines. Prospero's are probably based on Alcyons. The TT's copressor will probably run circles around any single precision done by a 68xxx CPUs (I hope). Julie Moses
marten@tpki.toppoint.de (M.Feldtmann) (09/04/90)
In article <8379@ncar.ucar.edu> moses@hao.ucar.edu (Julie Moses) writes: > While more complex math functions, such as transcendentals and >square root really show a speed increase, adding or multiplying two >floating point numbers do not show any speed increase over just Amazing! Of course, for '+' oder '-' there will be no much speed increase (because of the data-transfer via supervisor-mode) , but for '*' oder '/' you should expect at least a factor of about 2. Perhaps the library is not so good?? Marten Marten Feldtmann, Eckernfoerder Str.83, 2300 Kiel 1, West-Germany DNET/EUNET/USENET/SUBNET: marten@toppoint.de Please keep your replies short - I have to pay for them
rehrauer@apollo.HP.COM (Steve Rehrauer) (09/04/90)
In article <989@nikhefh.nikhef.nl> t68@nikhefh.nikhef.nl (Jos Vermaseren) writes: >If you find that multiplication is faster without the chip you have >either an incredibly fast FP library, or an inefficient link to the >68881. It's also worth noting that the 68882 can overlap execution of f.p. instructions, assuming there aren't any operand dependencies. The 68040 (which doesn't implement the full '881/'882 instruction set, but does do the "core" -- FMUL, FDIV, etc) also overlaps, and (since it directly implements the instructions itself) doesn't incur any of the coprocessor overhead of a '881/'882. In other words, there's good reasons to not do floating-point in software, even if you can shave a few clocks in a few instances on your current hardware by doing so. -- >>"Aaiiyeeee! Death from above!"<< | (Steve) rehrauer@apollo.hp.com "Spontaneous human combustion - what luck!"| Apollo Computer (Hewlett-Packard)
leo@ehviea.ine.philips.nl (Leo de Wit) (09/10/90)
In article <8387@ncar.ucar.edu> moses@hao.ucar.edu (Julie Moses) writes: | Having looked at the Alcyon 68881 assembly source code, there |does not seem to be much one can do to further optimize the F.P. |routines. Prospero's are probably based on Alcyons. The TT's copressor |will probably run circles around any single precision done by a 68xxx |CPUs (I hope). If you're looking for fast math, you could perhaps make use of representation of real numbers by integer quotients; this is a common technique when precision and/or range allow it (which is the case for a lot of real-life applications). Each real (or float if you want) is represented by a pair of integers (choose either longs or shorts), whose quotient is the real (approximately). Especially in cases where shorts can be used for the representation (low precision), and on a processor that prefers integers to floats this can boost up calculations dramatically. Another way to possibly increase performance is to put often calculated function values in an array, e.g. for sin(x) calculate sin(i*pi/180) for i = 0..90. Intermediate values can be found - for example - by linear interpolation. For an example of quotient calculation, you can take a look at the sources of the 3D demo program I wrote a while ago. Cheers, Leo.