djh@dragon.metaphor.com (Dallas J. Hodgson) (05/31/90)
If you're like me, you're tired of all the software that doesn't take advantage of your FFP. The way things work right now, we've got too many floating point standards, such as : mathffp.library ieee.library (double precision) ieee.library (new single precision, recognizes 6888x for 2.0) compiler-specific math libraries in-line FFP instructions Since the FFP instructions trap out thru the FLINE vector anyway (if there's no coprocessor present) why don't we EMULATE a 6888x when the traps occur? Put this support in the RKM, and let all the compilers generate in-line FFP instructions. Yes, there's a small amount of overhead for non-FFP users, but it's no different from the PC way of doing things - and very system-friendly! Let's say GOODBYE to the math libraries; if a user wants to install a Weitek in the future, the vendor(s) can supply the driver software themselves - and everything will still work. Just a thought. +----------------------------------------------------------------------------+ | Dallas J. Hodgson | "This here's the wattle, | | Metaphor Computer Systems | It's the emblem of our land. | | Mountain View, Ca. | You can put it in a bottle, | | USENET : djh@metaphor.com | You can hold it in your hand." | +============================================================================+ | "The views I express are my own, and not necessarily those of my employer" | +----------------------------------------------------------------------------+
gpsteffler@tiger.uwaterloo.ca (Glenn Steffler) (05/31/90)
In article <1181@metaphor.Metaphor.COM> djh@dragon.metaphor.com (Dallas J. Hodgson) writes: > >Since the FFP instructions trap out thru the FLINE vector anyway (if there's >no coprocessor present) why don't we EMULATE a 6888x when the traps occur? I am not sure about the 680x0 line of processors, but the 80x86 line definately slow down when a lot of coprocessor traps occur. The best way to do it (IMNSHO) would be to place code in the program which calls the floating point library function. If a coprocessor is available, the library runtime link code would replace the library calls with actual floating point processor instructions (like, fix up the code man). ---- Attention Hazy: Does the Amiga handle faults, and other hardware interrupts faster than a generic PC? I know the 80386 is very slow when interrupts start wailing away in protect mode. ---- (Microsoft Windows 3.0 math libraries are available which do this, and in fact i beleive Microsoft Excel uses them) >Put this support in the RKM, and let all the compilers generate in-line FFP >instructions. Yes, there's a small amount of overhead for non-FFP users, but Like I said, WAY TO MUCH OVERHEAD... >it's no different from the PC way of doing things - and very >system-friendly! Let's say GOODBYE to the math libraries; if a user wants to >install a Weitek in the future, the vendor(s) can supply the driver software >themselves - and everything will still work. Not necessarily the way the PC does it... > >Just a thought. >+----------------------------------------------------------------------------+ >| Dallas J. Hodgson | "This here's the wattle, | >| Metaphor Computer Systems | It's the emblem of our land. | >| Mountain View, Ca. | You can put it in a bottle, | >| USENET : djh@metaphor.com | You can hold it in your hand." | >+============================================================================+ >| "The views I express are my own, and not necessarily those of my employer" | >+----------------------------------------------------------------------------+ Just a reply. ---- Co-Op Scum
valentin@cbmvax.commodore.com (Valentin Pepelea) (05/31/90)
In article <1181@metaphor.Metaphor.COM> djh@dragon.metaphor.com (Dallas J. Hodgson) writes: > > Since the FFP instructions trap out thru the FLINE vector anyway (if there's > no coprocessor present) why don't we EMULATE a 6888x when the traps occur? Too much time would be spent decoding and emulating the coprocessor instructions. And what if someone plugs in a Weitek math coprocessor? The math coprocessors execute in 30 cycles what would take 3000 cycles otherwise. Emulating the instruction in software would take more than that. And what if the user has merely a 68000? Only the 68010 and higher processors have instruction suspension-completion mechanisms. The correct solution is to provide a shared library of math functions, and that's what we do. Those functions automatically take advantage what ever hardware the user has, thus the programmer does not have to worry about the configuration of the platform on which his software is about to run. The 68040 does not have a floating point coprocessor, nor the coprocessor interface of the 68020 & 68030. But it implements the most used instructions in hardware, and lets a software emulate the remaining less used instructions. The 25MHz 68040 will therefore achieve a superior performance than a 33MHz 68030/68882. Now that makes sense. Valentin -- The Goddess of democracy? "The tyrants Name: Valentin Pepelea may distroy a statue, but they cannot Phone: (215) 431-9327 kill a god." UseNet: cbmvax!valentin@uunet.uu.net - Ancient Chinese Proverb Claimer: I not Commodore spokesman be
Martin@pnt.CAM.ORG (Martin Taillefer) (05/31/90)
>The 68040 does not have a floating point coprocessor, nor the coprocessor >interface of the 68020 & 68030. But it implements the most used instructions >in hardware, and lets a software emulate the remaining less used instructions. >The 25MHz 68040 will therefore achieve a superior performance than a 33MHz >68030/68882. Now that makes sense. > >Valentin So does 2.0 provide the necessary software to support 68040 transcendental instructions transparently? Or will 040 board vendors (of which C= will surely be) have to provide some patch programs to replace the math lib entry points for the trans function and provide FLINE trap handlers for those programs using the FPU directly? -- ------------------------------------------------------------- Martin Taillefer INTERNET: martin@pnt.CAM.ORG BIX: vertex UUCP: uunet!philmtl!altitude!pnt!martin TEL: 514/640-5734
daveh@cbmvax.commodore.com (Dave Haynie) (05/31/90)
In article <11996@cbmvax.commodore.com> valentin@cbmvax (Valentin Pepelea) writes: >In article <1181@metaphor.Metaphor.COM> djh@dragon.metaphor.com (Dallas J. >Hodgson) writes: >> Since the FFP instructions trap out thru the FLINE vector anyway (if there's >> no coprocessor present) why don't we EMULATE a 6888x when the traps occur? >Too much time would be spent decoding and emulating the coprocessor >instructions. And what if someone plugs in a Weitek math coprocessor? Kind of a moot point anyway, since Weitek cancelled the 68030 bus version of their FPU. Of course, you'd get better performance out of the floating point on various AT&T DSPs, the 96002, or these new really fast FPUs from BIT. If there were a reasonable way to harness this speed in a general way. Right now there isn't, but read on... >The correct solution is to provide a shared library of math functions, and >that's what we do. Those functions automatically take advantage what ever >hardware the user has, thus the programmer does not have to worry about >the configuration of the platform on which his software is about to run. That's the best general solution, but still not perfect. You get programs written for the libraries when speed is not a major issue. When it is, most programs currently come in a separate version with direct FPU code. Even the library interface is too slow to do your floating point multiplies in the inner loop of a ray trace or something like that. Same as on the Intel based systems -- you can get some MS-DOS programs in generic, '387, or Weitek flavors. What would really help all of this is a standardized library of useful higher level math functions. You're not going to suffer a library call for an instruction that takes only 20 or 30 clocks if you're worried about speed, but you might be willing to take a library call to get a routine that takes a couple thousand clocks to run, removes lots of the work you're trying to do, works close to theoretical limits on 68000 or 68030/68882, and would get you going even faster with a faster math engine sitting around. This is what I call retargetable mathematics, directly analogous to the type of problems folks want to solve with graphics libraries. Folks complain about the speed of something like WritePixel() just as they do about the basic IEEE library's multiply routine. But you rarely hear complaints about the higher level graphics functions -- they do enough work for you that you use them, and they go fast enough. We really need something to do mathematics at a high enough level to make faster math coprocessors viable without a proliferation of math-coprocessor-specific versions of programs running around. >Valentin -- Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy "I have been given the freedom to do as I see fit" -REM
daveh@cbmvax.commodore.com (Dave Haynie) (05/31/90)
In article <1990May31.025529.28370@watdragon.waterloo.edu> gpsteffler@tiger.uwaterloo.ca (Glenn Steffler) writes: >In article <1181@metaphor.Metaphor.COM> djh@dragon.metaphor.com (Dallas J. Hodgson) writes: >Attention Hazy: >Does the Amiga handle faults, and other hardware interrupts faster than >a generic PC? I know the 80386 is very slow when interrupts start >wailing away in protect mode. As in hardware interrupts, I assume you're getting at exceptions here, such as F-line and A-line execptions. In other words, traps for unimplemented instructions. Exceptions don't take a great deal of overhead on their own; they basically store enough context on the stack to get back to where they occurred, then call a routine at their assigned vector. But they're certainly more expensive than a JSR/RTS, or even the JSR/JMP/RTS you get with Amiga libraries. And you the expense of the exception is just the beginning. Once you're in the exception handler, the nature of the emulated instruction must be determined: instruction type, operands, addressing modes. With all this computed, and effective addresses calculated and all, the actual routine for the instruction emulation must be called. That last part it just about all that's required for a library call. So, while I have no idea how bad '386 exceptions are, you can rest safe in the knowledge that in most cases, instruction emulation via traps is not the best way to go on a 680x0 if you're interested in speed. -- Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy "I have been given the freedom to do as I see fit" -REM
djh@dragon.metaphor.com (Dallas J. Hodgson) (06/01/90)
Hmm, The Sun 3/50's used to trap out thru Unix for their FFP instructions if no comprocessor was present, and Unix has a much higher overhead than AmigaDOS. Likewise, PC's often do the same. If you're doing enough FP that exception vector overhead is an issue, then you sound like a candidate for buying a math coprocessor. Well? For businesses, this is certainly the case. For hobbyists, well you know we're cheap... The "shared library" support is not effective the way it stands right now, because only 1 of the libraries RECOGNIZES the FFP. And it's Not the popular one! As long as developers keep on using Motorola FFP as their standard, (and they will unless precision requirements demand otherwise) we're gonna keep having a lot of useless 68881/2/040 hardware lying around. The only software I have that makes use of it is Turbo-Silver 3.0 SV. Impulse solved their problem by supplying to different executables of this product. What if Dpaint used the chip for Perspective? What if Design-3D or Modeler-3D did also? Too bad they don't. Now that Design-3D & Modeler-3D are no longer supported, it looks like they never will. +----------------------------------------------------------------------------+ | Dallas J. Hodgson | "This here's the wattle, | | Metaphor Computer Systems | It's the emblem of our land. | | Mountain View, Ca. | You can put it in a bottle, | | USENET : djh@metaphor.com | You can hold it in your hand." | +============================================================================+ | "The views I express are my own, and not necessarily those of my employer" | +----------------------------------------------------------------------------+
riley@batcomputer.tn.cornell.edu (Daniel S. Riley) (06/01/90)
In article <1183@metaphor.Metaphor.COM> djh@dragon.metaphor.com (Dallas J. Hodgson) writes: >The "shared library" support is not effective the way it stands right now, >because only 1 of the libraries RECOGNIZES the FFP. And it's Not the popular >one! As long as developers keep on using Motorola FFP as their standard, >(and they will unless precision requirements demand otherwise) we're gonna >keep having a lot of useless 68881/2/040 hardware lying around. Doesn't 2.0 includes a single precision IEEE library for exactly this reason? -Dan Riley (riley@tcgould.tn.cornell.edu, cornell!batcomputer!riley) -Wilson Lab, Cornell University
valentin@cbmvax.commodore.com (Valentin Pepelea) (06/01/90)
In article <1183@metaphor.Metaphor.COM> djh@dragon.metaphor.com (Dallas J. Hodgson) writes: > >Hmm, The Sun 3/50's used to trap out thru Unix for their FFP instructions if >no comprocessor was present, and Unix has a much higher overhead than >AmigaDOS. Unix has a higher overhead than AmigaDOS because it traps on every OS call. (the occasional implementation migh differr) When it comes to emulating F-line instructions, the overhead incurred by Unix or AmigaDOS would be the same. Valentin -- The Goddess of democracy? "The tyrants Name: Valentin Pepelea may distroy a statue, but they cannot Phone: (215) 431-9327 kill a god." UseNet: cbmvax!valentin@uunet.uu.net - Ancient Chinese Proverb Claimer: I not Commodore spokesman be
aburto@marlin.NOSC.MIL (Alfred A. Aburto) (06/05/90)
In article <10341@batcomputer.tn.cornell.edu> riley@tcgould.tn.cornell.edu (Daniel S. Riley) writes: >In article <1183@metaphor.Metaphor.COM> djh@dragon.metaphor.com (Dallas J. Hodgson) writes: >>As long as developers keep on using Motorola FFP as their standard, >>(and they will unless precision requirements demand otherwise) we're gonna >reason? > >-Dan Riley (riley@tcgould.tn.cornell.edu, cornell!batcomputer!riley) >-Wilson Lab, Cornell University Yes, the IEEESP (single precision) software floating-point libraries (so I've been told) are being designed with the goal to be as quick or quicker than the MathFFP but yet have an accuracy equivalent to that of the 68881/68882 (Single Precision). Once this goal is achieved (and I think C-A is getting close) then I think the MathFFP will be, and should be, weaned out of the system. That number of math libraries is reduced by two and we wind up with IEEESP libraries quicker and more accurate than the MathFFP. Al Aburto aburto@marlin.nosc.mil
djh@dragon.metaphor.com (Dallas J. Hodgson) (06/06/90)
I should mention something I found recently on a late-model Fish disk; it's called "mathtrans", and was written by a German programmer as a 68881 replacement for MathTrans.library. Let it be known that even tho' his library has to translate between IEEE and FFP for Every Call, it STILL performs 2-7x faster on 68881/2 equipped machines. So much for the "overhead is too high for this sort of thing to be done" myth. On my A3000, the averaged improvement was about 3.4%. Your mileage may vary. Benchmark People note : Dhrystone 1.1 runs 5150 on a 25MHz A-3000 using 32-bit ints, registers enabled, caches (but not bursts) on. 5500 dhrystones using 16-bit ints. These numbers courtesy of Aztec C 5.0. Now: Run NoFastMem and try again. 5500 dhrystones -> 1200 dhrystones. No DMA contention going on other than an interlaced 2bitplane WB screen and the Intuition sprite. +----------------------------------------------------------------------------+ | Dallas J. Hodgson | "This here's the wattle, | | Metaphor Computer Systems | It's the emblem of our land. | | Mountain View, Ca. | You can put it in a bottle, | | USENET : djh@metaphor.com | You can hold it in your hand." | +============================================================================+ | "The views I express are my own, and not necessarily those of my employer" | +----------------------------------------------------------------------------+
hamish@waikato.ac.nz (06/06/90)
In article <11996@cbmvax.commodore.com>, valentin@cbmvax.commodore.com (Valentin Pepelea) writes: > In article <1181@metaphor.Metaphor.COM> djh@dragon.metaphor.com (Dallas J. > Hodgson) writes: >> >> Since the FFP instructions trap out thru the FLINE vector anyway (if there's >> no coprocessor present) why don't we EMULATE a 6888x when the traps occur? > > Too much time would be spent decoding and emulating the coprocessor > instructions. And what if someone plugs in a Weitek math coprocessor? The > math coprocessors execute in 30 cycles what would take 3000 cycles otherwise. > Emulating the instruction in software would take more than that. And what if > the user has merely a 68000? Only the 68010 and higher processors have > instruction suspension-completion mechanisms. So what? We weren't talking about instruct re-start, but rather exception processing. Any 68000 can emulate an FPU by doing fline processing for you. It's the "Oh my god, there's a page fault, quick grab it from virtual memory half way through an instruction" that the 68000 can't complete. Different story altogether. > > The correct solution is to provide a shared library of math functions, and > that's what we do. Those functions automatically take advantage what ever > hardware the user has, thus the programmer does not have to worry about > the configuration of the platform on which his software is about to run. > > The 68040 does not have a floating point coprocessor, nor the coprocessor > interface of the 68020 & 68030. But it implements the most used instructions > in hardware, and lets a software emulate the remaining less used instructions. > The 25MHz 68040 will therefore achieve a superior performance than a 33MHz > 68030/68882. Now that makes sense. > The 68040 DOES have a floating point coprocessor. The fact that its a subset of the 68882, and shares the same piece of silicon, doesn't matter. It still has 11 80 bit registers, 3 control registers and still looks like an ordinary 68881/2 to the user (except for the trig stuff which is emulated via a trap) Compare this to the 68030. Are you going to say that it doesn't have an MMU, even though its in the same boat? ie subset of 68851, on same piece of silicon. > Valentin > -- > The Goddess of democracy? "The tyrants Name: Valentin Pepelea > may distroy a statue, but they cannot Phone: (215) 431-9327 > kill a god." UseNet: cbmvax!valentin@uunet.uu.net > - Ancient Chinese Proverb Claimer: I not Commodore spokesman be -- ============================================================================== | Hamish Marson | Internet hamish@waikato.ac.nz | | Computer Support Person | Phone (071)562889 xt 8181 | | Computer Science Department | Amiga 3000 for ME! | | University of Waikato | | ============================================================================== |Disclaimer: Anything said in this message is the personal opinion of the | | finger hitting the keyboard & doesn't represent my employers | | opinion in any way. (ie we probably don't agree) | ==============================================================================
daveh@cbmvax.commodore.com (Dave Haynie) (06/08/90)
In article <664.266d2e99@waikato.ac.nz> hamish@waikato.ac.nz writes: >In article <11996@cbmvax.commodore.com>, valentin@cbmvax.commodore.com (Valentin Pepelea) writes: >> The 68040 does not have a floating point coprocessor, nor the coprocessor >> interface of the 68020 & 68030. >The 68040 DOES have a floating point coprocessor. You're arguing trivia here, for the most part. The 68040 doesn't have a floating point coprocessor, at least in the traditional sense. It does that idea one better by having an internal floating point execution unit which is a first class processing unit with some of it's own buses and everything. That makes it much faster than a coprocessor, which is an external device that sits next to an integer processor and generally relies on that processor's integer unit to feed it instructions and data via some special protocol. All of which makes things slower. From the programmer's point of view, though, you have the better part of a 68882 in the 68040, assuming the factor of 10 or so increase on hardwared instructions doesn't present a problem. With an F-line math package to cover the missing 68882 codes, you wouldn't know any difference other than speed. But in truth, there is no coprocessor. >Compare this to the 68030. Are you going to say that it doesn't have an MMU, >even though its in the same boat? ie subset of 68851, on same piece of silicon. Valentin was saying it didn't have a floating point coprocessor, not that it didn't have a floating point unit. The 68851 connected to the 68020 uses the coprocessor protocol to manage it's registers and all, while the 68030 does this all internally, without dropping though the coprocessor protocol. So I would say the 68030 has an MMU, but I wouldn't say the 68030 has an MMU coprocessor. Like I said, trivia. >> Valentin >| Hamish Marson | Internet hamish@waikato.ac.nz | -- Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy "I have been given the freedom to do as I see fit" -REM