[comp.os.minix] Floating point support in GCC

awb@uk.ac.ed.aipna (Alan W Black) (04/15/91)

In-reply-to: dhinds@elaine1.Stanford.EDU's message of 14 Apr 91 18:41:08 GMT
   In article <AWB.91Apr13144237@mink.uk.ac.ed.aipna> awb@uk.ac.ed.aipna (Alan W Black) writes:
   >
   >Much of the gnu software has been ported to MINIX 386. Currently on
   >plains.nodak.edu (134.129.111.64 in pub/Minix/uk).  This is includes
   >binaries and patchs for gcc1.37.1 gas, the bin utilities, patches to
   >the MINIX libraries.  There is also a port of gnu emacs 18.55 (which
   >contains a rather nasty bug regarding killing of sub-processes.  I've
   >since fixed this and I can send you a new set of patches if you want).
   >The gcc *does* support floating point but not in a very efficient way.

      What sort of floating point support is this?  I'm new to Minix, but
   I'm told the regular C compiler doesn't do floating point at all.  Does
   the gcc port support coprocessor emulation, or does it need a coprocessor?
   In what way is it inefficient?  Is there a reasonably standard libm?

    -David Hinds
     dhinds@cb-iris.stanford.edu

GCC supports floating point.  The machine description for the 386 
is such that GCC on a 386 can generate 387 code, or call soft functions
that emulate it.  In our actual implementation we use the distributed
386 description which unfortunately still generates some 387 instructions
even when -msoft-float (the default) is used.  Basically it will call
software functions to do the floating point but expects to pass its arguments
and receive its result via a register on the 387.

As we don't have a 387 we got round the porblem as follows.  When code
is linked, a floating point exception handler is also linked in.  When
a 387 instruction is executed and the handler is called the instruction
is emulated.  This does work but is not at all fast, though as it is
only for a very small number of instructions it doesn't get called
very often.

The correct solution to the problem is to modifiy the GCC 386 machine
description so that it never generates any 387 instructions.  This
might be trivial for someone who understands machine descriptions but
we don't and haven't tried to do it -- if anyone knows how to do this
please let me know.

I am not sure how this would work if you actually have a 387.  I
suspect you could recompile gcc without -msoft-float and it would 
then generate 387 instructions, then do not load the floating point
handler and it should work.

I should add that we only support doubles rather than floats because
we didn't write the appropriate functions for floats only doubles
so that you sometimes need to #define float double to make some
pieces of code work.

The rest of the floating point code (libm.a) is from Fred Fish's
portable math library, which is available from atari.archive.umich.edu
(141.211.164.8 in atari/gnustuff/minix).  We find this quite adequate
for all our floating point stuff (prologs scheme's and lisp).  We
include a compiled version if this library in our distribution on plains.

Alan

Alan W Black                          80 South Bridge, Edinburgh, UK
Dept of Artificial Intelligence       tel: (+44) -31 650 2713
University of Edinburgh               email: awb@ed.ac.uk

st12a@menudo.uh.edu (richard henderson~) (04/16/91)

In article <AWB.91Apr15132912@mink.uk.ac.ed.aipna> awb@uk.ac.ed.aipna (Alan W Black) writes:
>The correct solution to the problem is to modifiy the GCC 386 machine
>description so that it never generates any 387 instructions.  This

Would not a better solution be to add a 387 emulator to the MINIX kernel?

I have recently ordered my copy of MINIX 1.5 as a cost-effective alt to Xenix,
and I am interested in persuing this topic.  However, I would be interested in 
hooking up with someone who knows the kernel a bit more intimately.

>Alan W Black                          80 South Bridge, Edinburgh, UK
>Dept of Artificial Intelligence       tel: (+44) -31 650 2713
>University of Edinburgh               email: awb@ed.ac.uk

--------
richard~
richard@stat.tamu.edu
st12a@menudo.uh.edu

richard@aiai.ed.ac.uk (Richard Tobin) (04/17/91)

>>The correct solution to the problem is to modifiy the GCC 386 machine
>>description so that it never generates any 387 instructions.  This

>Would not a better solution be to add a 387 emulator to the MINIX kernel?

Excellent idea.  I look forward to receiving it...  :-)

Seriously, this would be very useful, but it's a fairly difficult task -
you have to decode the instructions, check that the process can legally
access the operands, and deal with switching between processes as well
as actually doing the arithmetic.

It's also likely to result in slower code than just calling procedures.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

awb@uk.ac.ed.aipna (Alan W Black) (04/17/91)

In article <1991Apr16.135942.18650@menudo.uh.edu> st12a@menudo.uh.edu (richard henderson~) writes:
   In article <AWB.91Apr15132912@mink.uk.ac.ed.aipna> awb@uk.ac.ed.aipna (Alan W Black) writes:
   >The correct solution to the problem is to modifiy the GCC 386 machine
   >description so that it never generates any 387 instructions.  This

   Would not a better solution be to add a 387 emulator to the MINIX
   kernel?  I have recently ordered my copy of MINIX 1.5 as a
   cost-effective alt to Xenix, and I am interested in persuing this
   topic.  However, I would be interested in hooking up with someone who
   knows the kernel a bit more intimately.

   richard~
   richard@stat.tamu.edu
   st12a@menudo.uh.edu

This is true, we exchanged mail with Richard Stallman on this issue and
his recommendation was that the operating system should provide the 
emulation.  This is probably true but will also ways be slow if floating
point instructions have to be interpreted.  Our solution is sort of half
way in that it is the compiler support code (added by ld) that provides
the emulation, rather than the kernel.  

I think kernel emulation would be right but always slow while changing
the gcc 386 description would be easier and more efficient, but I'm
willing to be proved wrong.

Alan

Alan W Black                          80 South Bridge, Edinburgh, UK
Dept of Artificial Intelligence       tel: (+44) -31 650 2713
University of Edinburgh               email: awb@ed.ac.uk

mc2@maccs.dcss.mcmaster.ca (Dan McCrackin) (04/17/91)

Hello, folks!  I've been enjoying comp.os.minix for quite some time now
(I'm running PH 1.5.10 + BDE 386 + virtual consoles + shared text + gcc on
a 25MHz 80386 with 80387), and thought I would throw my $.02 in:


In article <AWB.91Apr15132912@mink.uk.ac.ed.aipna> awb@uk.ac.ed.aipna (Alan W Black) writes:
>
>I am not sure how this would work if you actually have a 387.  I
>suspect you could recompile gcc without -msoft-float and it would 
>then generate 387 instructions, then do not load the floating point
>handler and it should work.
>

	I was all set to get out my proverbial (software) hacksaw and pliers
to get my coprocessor going, when I realized that there is, alas, a major 
problem with using a '387 (or '87 or '287 for that matter) under Minix.  At
present the kernel doesn't save the context of a coprocessor during a
context switch.  This is fine if you are only running one task that uses the
coprocessor, but would produce a fearsome mess with more than one task. :-(

There are three possible solutions
	(1) only run one fp task  (simple, but impractical)
	(2) modify mpx.x (the save and _restart sections) to always
	    save / restore the coprocessor context
	(3) use the MP and TS flags of CR0 (on the 80386) to implement
	    saving / restoring the coprocessor's context only as required.

	The problem with (3) is that while it is the most efficient approach
(save only when absolutely needed), it would (IMHO) complicate the kernel
a fair bit.
	The problem with (2) is (according to ye olde Intel manual)
an FSAVE takes > 100 clock cycles and dumps 90-some-odd bytes from the
coprocessor.  FRSTOR has similar characteristics. The $64K question is
how much impact would this have on system performance?  At least the 
method of (2) is generally applicable to 8087's through 80386's.


I ask:

(1) Has anybody patched the kernel to support the coprocessor?  (Bruce?)
If so, could you please send me a copy of the patches? 

(2) Would anybody be interested in such a patch? (I'm willing to see if I
can get solution (2) to work.)  

On an unrelated tack:
(3) Is anyone looking at getting paging (a la "virtual memory") to go on
the 386?  


				Thanks,

				  Dan McCrackin


-- 
Daniel C. McCrackin (mc2@maccs.dcss.mcmaster.ca)
Dept. of Electrical and Computer Engineering
McMaster University, Hamilton, Ontario, Canada

wilker@descartes.math.purdue.edu (Clarence Wilkerson) (04/19/91)

Wouldn't setjmp and longjmp have to be changed to support  saving '87
registers?
Clarence Wilkerson

adrie@philica.ica.philips.nl (Adrie Koolen) (04/19/91)

In article <280C657A.14449@maccs.dcss.mcmaster.ca> mc2@maccs.dcss.mcmaster.ca (Dan McCrackin) writes:
>	I was all set to get out my proverbial (software) hacksaw and pliers
>to get my coprocessor going, when I realized that there is, alas, a major 
>problem with using a '387 (or '87 or '287 for that matter) under Minix.  At
>present the kernel doesn't save the context of a coprocessor during a
>context switch.  This is fine if you are only running one task that uses the
>coprocessor, but would produce a fearsome mess with more than one task. :-(
>
>There are three possible solutions
>	(1) only run one fp task  (simple, but impractical)
>	(2) modify mpx.x (the save and _restart sections) to always
>	    save / restore the coprocessor context
>	(3) use the MP and TS flags of CR0 (on the 80386) to implement
>	    saving / restoring the coprocessor's context only as required.
>
>	The problem with (3) is that while it is the most efficient approach
>(save only when absolutely needed), it would (IMHO) complicate the kernel
>a fair bit.

When I ported Minix to the Sun SparcStation, I didn't support the FPU
at first. However, every SparcStation has an FPU which is quite fast
(9 processor cycles to multiply two doubles!), so I added FPU support
to the kernel. Initially, the FPU is disabled for all user processes.
When a user process tries to execute a floating point instruction, it
is trapped. The trap handler marks the process as a FPU user and
enables the FPU for this process. From then on, the (32) FPU registers
are saved and restored at task switches (yes, also if there's only one
process using the FPU). It will take some time, but I can (i.e. have
to) live with that. The overhead isn't that bad.

The changes I made in the kernel were quite small. I guess that the FPU
changes for a 387, which has exception handling comparably with the
Sparc FPU, will also be quite simple.

>	The problem with (2) is (according to ye olde Intel manual)
>an FSAVE takes > 100 clock cycles and dumps 90-some-odd bytes from the
>coprocessor.  FRSTOR has similar characteristics. The $64K question is
>how much impact would this have on system performance?  At least the 
>method of (2) is generally applicable to 8087's through 80386's.

There will certainly be some (substantial) overhead, but I think that
it won't be much more than 1% (rough estimate: if you `lose' 25us per
task switch saving/restoring FPU registers and you get 100 switches
per second, you lose 0.25% CPU performance). Compare that with the
speed you gain by using your 387 and you'll agree that it's worth the
trouble. As a bonus, you won't lose any performance when no processes
use the FPU!

Adrie Koolen (adrie@ica.philips.nl)
Philips Innovation Centre Aachen

PS. With Minix-Sparc and a brute-force mandelbrot program, I can
generate full screen (1152*900) parts of the mandelbrot set within
ONE minute!

evans@syd.dit.CSIRO.AU (Bruce.Evans) (04/21/91)

In article <10752@mentor.cc.purdue.edu> wilker@descartes.math.purdue.edu (Clarence Wilkerson) writes:
>Wouldn't setjmp and longjmp have to be changed to support  saving '87
>registers?

Not for most implementations, because the '87 registers are usually not
left alive across function calls (because it would be difficult to handle
'87 stack overflow).
-- 
Bruce Evans		evans@syd.dit.csiro.au

richard@aiai.ed.ac.uk (Richard Tobin) (04/22/91)

In article <10752@mentor.cc.purdue.edu> wilker@descartes.math.purdue.edu (Clarence Wilkerson) writes:
>Wouldn't setjmp and longjmp have to be changed to support  saving '87
>registers?

Yes, in theory at least.  As far as I recall, gcc doesn't do much with
the 387, so maybe in practice there's nothing that needs saving outside
floating point expressions.

-- Richard

-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin