feg@cbnewsb.cb.att.com (forrest.e.gehrke) (05/14/91)
I have a Gateway 386/33 with Micronics Asic motherboard and no coprocessor. 4MB ram, 32 ram cache. When I load qemm.sys and run a Whetstone benchmark (or any floating point operations program), I find that there is about a 30% reduction in speed. This occurs even with no features of qemm in use, i.e. merely loading qemm.sys with the line device=c:\qemm.sys with no other parameters in the config.sys file, and no programs loaded into high memory. Removing that line from config.sys will show a 30% improvement in floating point operations speed. For example, Gateway includes with the Micronics motherboard a program QAPLUS which has the Whetstone benchmark. Without QEMM.SYS loaded this benchmark will report 202.1 whetstones/sec. With QEMM.SYS loaded this will drop to 154 whetstones/sec. I can not envision any reason for these results unless QEMM inserts useless code to be executed when floating point operations are involved. The same QAPLUS includes the Dhrystone benchmark; there is no difference in its report with or without QEMM.SYS loaded. Has anyone else noticed this? Is there a solution for it? Forrest Gehrke feg\@floyd.att.com
dmurdoch@watstat.waterloo.edu (Duncan Murdoch) (05/14/91)
In article <1991May14.123233.17734@cbfsb.att.com> feg@cbnewsb.cb.att.com (forrest.e.gehrke) writes: >I have a Gateway 386/33 with Micronics Asic motherboard >and no coprocessor. 4MB ram, 32 ram cache. > >When I load qemm.sys and run a Whetstone benchmark (or any >floating point operations program), I find that there is >about a 30% reduction in speed. This occurs even with >no features of qemm in use, i.e. merely loading qemm.sys >with the line device=c:\qemm.sys with no other parameters >in the config.sys file, and no programs loaded into >high memory. ... >I can not envision any reason for these results unless >QEMM inserts useless code to be executed when floating >point operations are involved. The same QAPLUS includes >the Dhrystone benchmark; there is no difference in its >report with or without QEMM.SYS loaded. > >Has anyone else noticed this? Is there a solution >for it? I haven't noticed it, but haven't done any tests. Here's a guess at why it's happening: Most compilers use interrupt calls in place of each floating point instruction to jump to the emulator when there's no FPU installed. (Borland's TP does this on the first call whether the emulator is installed or not; MS languages seem to do it on the first call only if there's an emulator there. Both of them patch the code back to a FPU instruction if there's an '87 there, so you only execute the interrupt once.) QEMM runs the 386 in protected mode, with your session running in V86 mode. I don't have 386 timings handy, but I think the INT instruction is much slower in V86 mode than in real mode. So it appears the only solution is to buy a 387 - then your program will be slower on the first pass through each instruction, but will go full speed after that. Duncan Murdoch dmurdoch@watstat.waterloo.edu
reisert@mast.enet.dec.com (Jim Reisert) (05/15/91)
In article <1991May14.142323.1929@maytag.waterloo.edu>, dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes... >In article <1991May14.123233.17734@cbfsb.att.com> feg@cbnewsb.cb.att.com (forrest.e.gehrke) writes: >>I have a Gateway 386/33 with Micronics Asic motherboard >>and no coprocessor. 4MB ram, 32 ram cache. >> >>When I load qemm.sys and run a Whetstone benchmark (or any >>floating point operations program), I find that there is >>about a 30% reduction in speed. > >So it appears the only solution is to buy a 387 - then your program will >be slower on the first pass through each instruction, but will go full >speed after that. This doesn't seem right. I have a Cyrix coprocessor in my 386 box, and I suffer similar speed penalties as Forrest, when using programs that make heavy use of the coprocessor. It must be something else. - Jim =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= "The opinions expressed here in no way represent the views of Digital Equipment Corporation." James J. Reisert Internet: reisert@mast.enet.dec.com Digital Equipment Corp. UUCP: ...decwrl!mast.enet!reisert 146 Main Street Voice: 508-493-5747 Maynard, MA 01754 FAX: 508-493-0395
feg@cbnewsb.cb.att.com (forrest.e.gehrke) (05/15/91)
In article <22671@shlump.lkg.dec.com> reisert@mast.enet.dec.com (Jim Reisert) writes: > >In article <1991May14.142323.1929@maytag.waterloo.edu>, dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes... >>In article <1991May14.123233.17734@cbfsb.att.com> feg@cbnewsb.cb.att.com (forrest.e.gehrke) writes: >>>I have a Gateway 386/33 with Micronics Asic motherboard >>>and no coprocessor. 4MB ram, 32 ram cache. >>> >>>When I load qemm.sys and run a Whetstone benchmark (or any >>>floating point operations program), I find that there is >>>about a 30% reduction in speed. >> >>So it appears the only solution is to buy a 387 - then your program will >>be slower on the first pass through each instruction, but will go full >>speed after that. > >This doesn't seem right. I have a Cyrix coprocessor in my 386 box, and I >suffer similar speed penalties as Forrest, when using programs that make >heavy use of the coprocessor. It must be something else. > >- Jim Several people have responded directly and on this net to the effect that QEMM is trapping a floating point exception interrupt with each instruction, causing this slowdown. My results using a whetstone benchmark were 202 whetstones/second without QEMM and 154 with QEMM. However, one of the people here at BTL with a machine identical to mine except with an Intel 387 installed reports 2650 whetstones/second with or without QEMM. Apparently the CPU only generates one interrupt at the beginning and then the 387 goes its merry way without anymore interrupts for QEMM to handle. One person suggested compiling the source with MSC using the parameter /Fpa which emulates the 387. He speculates that this will operate in the same way as having a 387 installed. Of course, this is only useful if one has the C source for the program. Forrest Gehrke feg\@dodger.att.com
dmurdoch@watstat.waterloo.edu (Duncan Murdoch) (05/15/91)
In article <22671@shlump.lkg.dec.com> reisert@mast.enet.dec.com (Jim Reisert) writes: > >In article <1991May14.142323.1929@maytag.waterloo.edu>, dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes... >>In article <1991May14.123233.17734@cbfsb.att.com> feg@cbnewsb.cb.att.com (forrest.e.gehrke) writes: >>>When I load qemm.sys and run a Whetstone benchmark (or any >>>floating point operations program), I find that there is >>>about a 30% reduction in speed. >> >>So it appears the only solution is to buy a 387 - then your program will >>be slower on the first pass through each instruction, but will go full >>speed after that. > >This doesn't seem right. I have a Cyrix coprocessor in my 386 box, and I >suffer similar speed penalties as Forrest, when using programs that make >heavy use of the coprocessor. It must be something else. > >Digital Equipment Corp. UUCP: ...decwrl!mast.enet!reisert Very mysterious. I wrote a little test program in Turbo Pascal (it's attached below), that just runs loops multiplying 16 bit integers, 32 bit integers, and 64 bit doubles. I calibrated it under Desqview so that each loop took 5 seconds on my 486-25 (which has the coprocessor built in). Times under various conditions are shown below; it sure looks as though QEMM has no effect when the floating point instructions are used, but causes a big slowdown when they're emulated. Duncan Murdoch Time in seconds Type Desqview QEMM Clean Number of cycles Integer 5.0 4.0 4.0 4150000 Longint 4.9 4.0 4.0 990000 FPU Doubles 5.0 4.0 4.0 2210000 Emul Doubles 6.4 5.2 3.7 60000 program timecalc; { Times integer and floating point arithmetic } uses opdos; { Object professional unit supplies the timer } var i,j : integer; i1,i2 : integer; l1,l2 : longint; d1,d2 : double; start,stop : longint; begin start := timems; i1 := 5; for i:=1 to 10000 do for j:=1 to 415 do i2 := i1*i1; stop := timems; writeln('Integers took ',(stop-start)/1000:8:3,' seconds.'); start := timems; l1 := 5; for i:=1 to 10000 do for j:=1 to 99 do l2 := l1*l1; stop := timems; writeln('Longints took ',(stop-start)/1000:8:3,' seconds.'); start := timems; d1 := 5; for i:=1 to 10000 do for j:=1 to 6 do { use 6 for 87=n, 221 for 87=Y } d2 := d1*d1; stop := timems; writeln('Doubles took ',(stop-start)/1000:8:3,' seconds.'); end.