[comp.os.msdos.programmer] Performance penalty from EMM386 ?!?!

neuron@tellabs.com (Don Graft) (06/26/91)

Respected net people, I'm having a problem with EMM386.SYS (under DOS 4.01
and now 5.0) that defies logic.  I have a simulation program written in
Borland C++ small model. It just does a 4th-order Runge-Kutta system of
ODEs to simulate a series of masses and springs. It uses the Borland
software floating point support (i.e., no math chip). The Borland BGI
stuff is used to display results to a VGA graphics mode.  There is no use of
expanded memory or anything fancy going on with memory management (that I
know of).

When I run this simulation without EMM386.SYS installed, it blazes.  When I
install EMM386.SYS with the noems switch, it crawls. We're talking 45 secs
versus 20 seconds for a typical run; this is no quibbling about a few
percent.

So, what the h is going on? I don't see any slowdowns in other programs, but
admittedly this is the only computationally intensive program I run. I'm
desperate for a solution as this is a feasibility test for a larger simulation
of neural holography. I will need as much memory as I can get and won't want
to wait unnecessary hours for runs to complete. (Coprocessor upgrade time!)

Thank you for your attention. Posted replies would be appreciated as this may
have general relevance. BTW, my system is an Amstrad 386 20MHz with 4M.

Donald Graft
(no signature)

feg@cbnewsb.cb.att.com (forrest.e.gehrke) (06/26/91)

In article <6455@tellab5.tellabs.com> neuron@tellabs.com (Don Graft) writes:
>Respected net people, I'm having a problem with EMM386.SYS (under DOS 4.01
>and now 5.0) that defies logic.  I have a simulation program written in
>Borland C++ small model. It just does a 4th-order Runge-Kutta system of
>ODEs to simulate a series of masses and springs. It uses the Borland
>software floating point support (i.e., no math chip). The Borland BGI
>stuff is used to display results to a VGA graphics mode.  There is no use of
>expanded memory or anything fancy going on with memory management (that I
>know of).
>
>When I run this simulation without EMM386.SYS installed, it blazes.  When I
>install EMM386.SYS with the noems switch, it crawls. We're talking 45 secs
>versus 20 seconds for a typical run; this is no quibbling about a few
>percent.

Having spent several hours trying to get HIMEM.SYS and EMM386.SYS to
give me as much usable low memory as QEMM386 does (both with dos=high)
and, BTW I nearly got there-600KB vs 632KB, I can confirm the big
slowdown you encountered.  Not only does floating point emulation
take a big hit but so does display writing--something of the order
of 50%.

QEMM also encounters this problem but not nearly to this extent--about
10 to 15% when using programs compiled with MSC or Borland's C.

If you can compile your program with MSC, try their Alternate Floating
Point emulator, you will encounter no slowdown with it, but with
less accuracy.  This applies only to floating point emulation.  
If you had a coprocessor, you would not see any slowdown, but display 
writing would still be affected.

Tests I have done with floating point programs compiled with the
Zortech v2.1 C compiler show no slowdown with their emulator when
using expanded memory managers, but their emulator is much slower
to begin with.

These results appear to point to the compilers introducing a problem
in programs when working with expanded memory managers, and not to the
expanded managers themselves.

-Forrest Gehrke feg\@dodger.att.com

randys@cpqhou.uucp (Randy Spurlock) (06/27/91)

in article <1991Jun26.120951.18540@cbfsb.att.com>, feg@cbnewsb.cb.att.com (forrest.e.gehrke) says:
> 
>>     ***** Program Description Deleted *****
>>
>>When I run this simulation without EMM386.SYS installed, it blazes.  When I
>>install EMM386.SYS with the noems switch, it crawls. We're talking 45 secs
>>versus 20 seconds for a typical run; this is no quibbling about a few
>>percent.
> 
> Having spent several hours trying to get HIMEM.SYS and EMM386.SYS to
> give me as much usable low memory as QEMM386 does (both with dos=high)
> and, BTW I nearly got there-600KB vs 632KB, I can confirm the big
> slowdown you encountered.  Not only does floating point emulation
> take a big hit but so does display writing--something of the order
> of 50%.
> 
> QEMM also encounters this problem but not nearly to this extent--about
> 10 to 15% when using programs compiled with MSC or Borland's C.
> 
> If you can compile your program with MSC, try their Alternate Floating
> Point emulator, you will encounter no slowdown with it, but with
> less accuracy.  This applies only to floating point emulation.  
> If you had a coprocessor, you would not see any slowdown, but display 
> writing would still be affected.
> 
> Tests I have done with floating point programs compiled with the
> Zortech v2.1 C compiler show no slowdown with their emulator when
> using expanded memory managers, but their emulator is much slower
> to begin with.
> 
> These results appear to point to the compilers introducing a problem
> in programs when working with expanded memory managers, and not to the
> expanded managers themselves.
> 
> -Forrest Gehrke feg\@dodger.att.com

	The problem lies with the EMM386 and not the compiler. The 386
	expanded memory managers perform their magic via the 386 virutal
	paged mode. This mode allows 8086 real mode emulation with paging
	to map physical memory into different positions in the logical
	address space. This is all very neat stuff...but there is one
	side effect from running in the virtual 8086 mode. Software
	interrupts, i.e. BIOS interrupts, floating-point emulation packages,
	etc. all get trapped and vectored to a "virtual" mode interrupt
	handler who has to direct the interrupt to the correct "real mode"
	handler. All of this "handling" takes time and depending upon
	how the EMS manager software was written this can result in a 
	slowdown of anywhere from 10 to 50%. Also the video can be affected
	because some of the video BIOS routines make nested BIOS calls,
	i.e. more software interrupts that take even more time. I don't
	like the slow down but given the choice between EMS and no EMS,
	I'll take the slow down every time.


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
          - Randy Spurlock -	      |      Compaq Computer Corporation    
---------------------------------------------------------------------------
These opinions are mine...all mine... | He fired his hyper-jets and...  
just ask anyone who's heard them!     | blasted into the 5th dimension!
--------------------------------------| 
UUCP: ...!uunet!cpqhou!randys         |                     Space Man Spiff
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 

ralf+@cs.cmu.edu (Ralf Brown) (06/27/91)

In article <1991Jun26.120951.18540@cbfsb.att.com> feg@cbnewsb.cb.att.com (forrest.e.gehrke) writes:
}Having spent several hours trying to get HIMEM.SYS and EMM386.SYS to
}give me as much usable low memory as QEMM386 does (both with dos=high)
}and, BTW I nearly got there-600KB vs 632KB, I can confirm the big
}slowdown you encountered.  Not only does floating point emulation
}take a big hit but so does display writing--something of the order
}of 50%.
}
}QEMM also encounters this problem but not nearly to this extent--about
}10 to 15% when using programs compiled with MSC or Borland's C.

A slight slowdown is to be expected when running in virtual-86 mode
(as all 386 memory managers do when providing EMS).  In V86 mode, *any*
interrupt kicks the processor into full protected mode.  The memory
manager then has to mess with the V86 stack and simulate a real-mode
interrupt by reading the real-mode interrupt vector table and dropping
back into V86 mode.  On return, the processor gets kicked back into 
protected mode, the memory manager messes with the real-mode stack again,
and finally returns to V86 mode for a second time to complete the
program's interrupt call.  QEMM does all of that much faster than EMM386--
Manifest reports that the timer interrupt latency is on average five
times as high for EMM386 as for QEMM; comparing worst cases for the two
yields a factor of *twelve*.  It can take over TWO MILLISECONDS for
Manifest to get control once the clock interrupt is invoked, when running
under EMM386 on my 386/33.

}Tests I have done with floating point programs compiled with the
}Zortech v2.1 C compiler show no slowdown with their emulator when
}using expanded memory managers, but their emulator is much slower
}to begin with.

No slowdown probably means it uses calls rather than interrupts.


Please excuse any garbage, I hit a really trashy line this time.



-- 
{backbone}!cs.cmu.edu!ralf  ARPA: RALF@CS.CMU.EDU   FIDO: Ralf Brown 1:129/53
BITnet: RALF%CS.CMU.EDU@CARNEGIE   AT&Tnet: (412)268-3053 (school)   FAX: ask
DISCLAIMER?  Did  | It isn't what we don't know that gives us trouble, it's
I claim something?| what we know that ain't so.  --Will Rogers

Ralf.Brown@B.GP.CS.CMU.EDU (06/28/91)

In article <13664@pt.cs.cmu.edu>, ralf+@cs.cmu.edu (Ralf Brown) wrote:
}In article <1991Jun26.120951.18540@cbfsb.att.com> feg@cbnewsb.cb.att.com (forrest.e.gehrke) writes:
}A slight slowdown is to be expected when running in virtual-86 mode
}[going between V86 and protected mode for interrupts]
}and finally returns to V86 mode for a second time to complete the
}program's interrupt call.  QEMM does all of that much faster than EMM386--
}Manifest reports that the timer interrupt latency is on average five
}times as high for EMM386 as for QEMM; comparing worst cases for the two
}yields a factor of *twelve*.  It can take over TWO MILLISECONDS for
}Manifest to get control once the clock interrupt is invoked, when running
}under EMM386 on my 386/33.

I wrote that from memory, and got a couple of the timings confused.
Below are the Manifest reports for QEMM v5.11 and EMM386.SYS, both under
plain DOS 5.0 and DOS 5.0 plus DESQview 2.31.  Note that DV adds some
overhead on the timer interrupt, but EMM386 is *so*bad* that DV actually
*improves* the average and worst-case performance!

                              (times in microseconds)
         EMS Function        Minimum  Maximum  Average
QEMM, DOS 5.0
    Timer Interrupt Latency       6       53       25
EMM386.SYS, DOS 5.0
    Timer Interrupt Latency       8     2036      316
QEMM, DOS5+DV
    Timer Interrupt Latency       5      156       59
EMM386.SYS, DOS5+DV
    Timer Interrupt Latency      46     1030      245
-- 
{backbone}!cs.cmu.edu!ralf  ARPA: RALF@CS.CMU.EDU   FIDO: Ralf Brown 1:129/53
BITnet: RALF%CS.CMU.EDU@CARNEGIE   AT&Tnet: (412)268-3053 (school)   FAX: ask
DISCLAIMER?  Did  | It isn't what we don't know that gives us trouble, it's
I claim something?| what we know that ain't so.  --Will Rogers

feg@cbnewsb.cb.att.com (forrest.e.gehrke) (06/28/91)

In article <1991Jun27.155203.3092@cpqhou.uucp> randys@cpqhou.uucp (Randy Spurlock) writes:
>
>	The problem lies with the EMM386 and not the compiler. The 386
>	expanded memory managers perform their magic via the 386 virutal
>	paged mode. This mode allows 8086 real mode emulation with paging
>	to map physical memory into different positions in the logical
>	address space. This is all very neat stuff...but there is one
>	side effect from running in the virtual 8086 mode. Software
>	interrupts, i.e. BIOS interrupts, floating-point emulation packages,
>	etc. all get trapped and vectored to a "virtual" mode interrupt
>	handler who has to direct the interrupt to the correct "real mode"
>	handler.


If you are right, then why are there such large differences between
the same program compiled by different compilers?

Using QEMM386 MSC v5.1 shows a 30% reduction in speed for floating 
point while their version 6.00a shows only 10%.  Their Alternate Flg.Pt. 
lib exhibits none at all, as does also Zortech's regular math lib.

There must be more to it than supplied by your explanation.

For any of my tests, QEMM showed far less slowdown than EMM386.
EMM386 is a disaster.  I am not happy with the slowdown seen
with QEMM but I will take that any day to EMM386.  Besides that
QEMM is much more versatile in poking stuff above 640KB.

-Forrest Gehrke feg\@dodger.att.com