[comp.unix.i386] Why do you need a 387 to run X11R3?

rtaylor@batcomputer.tn.cornell.edu (Russ Taylor) (03/08/90)

Please respond to cchase@ee.cornell.edu
Mr. Taylor is kind enough to post this for me, but has no interest in this topic

The  conventional wisdom seem to be, that in order to run X11R3 (either
the ESIX port or the Interactive port) you should have:
a) a fast  cpu
b) gobs of memory
c) floating point support

I can see the fast cpu and RAM parts, but why the floating point?  I 
just took a (very quick) look through the X11R3 server source, and there
doesn't seem to be much floating point.  In fact, except for drawing arcs, 
I can't see why any floating point is needed.

Can anyone shed some light on this? 

Craig Chase
cchase@ee.cornell.edu

-- 
  _____________________________________________________________________
 | ARPA: rtaylor@tcgould.tn.cornell.edu, russellt@tesla.ee.cornell.edu |
 |   UUCP: {cmc12,shasta,uw-beaver,rochester}!cornell!tesla!russellt   |
  ---------------------------------------------------------------------

jrh@mustang.dell.com (James R. Howard) (03/09/90)

 
 > The  conventional wisdom seem to be, that in order to run X11R3 (either
 > the ESIX port or the Interactive port) you should have:
 > a) a fast  cpu
 > b) gobs of memory
 > c) floating point support
 > 
 > I can see the fast cpu and RAM parts, but why the floating point?  I 
 > just took a (very quick) look through the X11R3 server source, and there
 > doesn't seem to be much floating point.  In fact, except for drawing arcs, 
 > I can't see why any floating point is needed.
 > 
 > Can anyone shed some light on this? 

Well, the Athena Widget set uses quite a bit, especially in the Box Widget
code, which a LOT of clients use.  This is probably why it is necessary.
If you have ever had two identical systems side by side, one with, one
without the 387, you'd see the difference quite clearly.



--------------------------------------------------------------
James Howard
..cs.utexas.edu!dell!mustang!jrh   or    jrh@mustang.dell.com

"I've got a firm policy on gun control, if there's a gun       
around, I want to be the one controlling it."          
-- Clint Eastwood 
--------------------------------------------------------------

steve@nuchat.UUCP (Steve Nuchia) (03/10/90)

In article <9868@batcomputer.tn.cornell.edu> cchase@ee.cornell.edu (Craig Chase) writes:
>I can see the fast cpu and RAM parts, but why the floating point?  I 

When I cut my teeth, unix systems without FP hardware did the
emulation thing in library routines that were 10 to mayby
100 time slower than hardware, so a program that would spend
10% of its time doing FP would slow down by less than ten times.

For the amount of FP I do, I can live with that.

Unfortunately, the 386 sysV unixes do their FP in the kernel by
taking a trap on the unimplemented intruction.  This is _real_
slow.  Slow enough that simple awk scripts can take hours to
slog through a few hundred K of input.  Get the '87 -- its
cheaper and easier than wishing you had until you do.
-- 
Steve Nuchia	      South Coast Computing Services      (713) 964-2462
"You have no scars on your face, and you cannot handle pressure." - Billy Joel

peter@ficc.uu.net (Peter da Silva) (03/12/90)

In article <20203@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
> When I cut my teeth, unix systems without FP hardware did the
> emulation thing in library routines that were 10 to mayby
> 100 time slower than hardware, so a program that would spend
> 10% of its time doing FP would slow down by less than ten times.

I'm moderately sure that Version 7 UNIX on the PDP-11 trapped an illegal
instruction and emulated it, just like the 80386 UNIXes do. It might be that
the PDP-11 was just quicker at handling the fault. I'm *sure* that the PDP-11
instructions were easier to parse than the 80386 ones. :->.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

steve@nuchat.UUCP (Steve Nuchia) (03/12/90)

In article <4.523N2ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>In article <20203@nuchat.UUCP> I wrote:
>> When I cut my teeth, unix systems without FP hardware did the
>> emulation thing in library routines that were 10 to maybe

note that I specifically mean user-mode routines, no trapping
allowed.  Of course there are dozens of variations, no need
to enumerate them.

>I'm moderately sure that Version 7 UNIX on the PDP-11 trapped an illegal
>instruction and emulated it, just like the 80386 UNIXes do. It might be that

Not having immediate access to a V7 system I can't argue, but I
think that the mechanism is quite old.  I know that the V7 compiler
I used generated library calls rather than FP instructions, and
I know most of the low-priced sysIII boxes from the early eighties
did the same.  Come to think of it though, I'm not sure how many
of them even had FPA options available.  The v7 compiler had the
option of generating FP instuctions, but I can't remember another
system off the top of my head, other than Sun, that uses that scheme.

It is an unpleasant trade off to have to make -- either you make FP
slower than it has to be (386 case, and other "transparent" schemes)
or you complicate the users' lives with N different versions of the
library and of each executable (Sun case).  Sun tried to choose the
third alternative by specing a single, always-present FP device for
the sun4 line, but less than a year after the sun4 shipped there
was a (slightly) different FPA option requiring different compile
options.  Sigh.

Dynamic linking is about the only good solution, but you still have
to allow the heavy FP users to compile for specific hardware to
avoid the two jumps per FLOP overhead.  There was a scheme proposed
in which illegal instructions were compiled into the program and
replaced at run time with the "right" intruction by the kernel.  Don't
know if that ever went out with a commercial system.

Personally I think the choice of doing FP emulation in the kernel
was regretable, especially given the price-sensitivity common
to most 386 users and the astounding price of 387 chips.  But
I can understand why they went that way, with the huge push
for binary compatiblity that is going on.

-- 
Steve Nuchia	      South Coast Computing Services      (713) 964-2462
"You have no scars on your face, and you cannot handle pressure." - Billy Joel

edhall@rand.org (Ed Hall) (03/12/90)

In article <4.523N2ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>In article <20203@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
>> When I cut my teeth, unix systems without FP hardware did the
>> emulation thing in library routines that were 10 to mayby
>> 100 time slower than hardware, so a program that would spend
>> 10% of its time doing FP would slow down by less than ten times.
>
>I'm moderately sure that Version 7 UNIX on the PDP-11 trapped an illegal
>instruction and emulated it, just like the 80386 UNIXes do. It might be that
>the PDP-11 was just quicker at handling the fault. I'm *sure* that the PDP-11
>instructions were easier to parse than the 80386 ones. :->.

Yes, V6 & V7 used illegal instruction traps to emulate floating-point
instructions.  It was dog slow, too.  Simply printing a floating point
value could take tens of milliseconds (one FP division and subtraction
per digit).  The signal mechanism was none too fast for Unix V6 & V7,
making a slow task even slower.  But I'll agree with the original poster
that '87 emulators seem even worse.  [Note that '87, '287 & '387s are
pretty similar from an instruction set perspective, so I'll just use
the term '87.]

There is a reason why '87 emulation is slower, however, and it's not
just the arcane instruction format.  Internally, the '87 does all
operations in 80-bit extended floating-point format.  This is true
even if the source and/or destination of a calculation is only 32-bit
(single-precision)!  As you might guess, all these bits can make
emulation REAL SLOW.  Also, operations need to do appropriate things
with infinity, NaN, indefinite, and denormals--things PDP-11 FP units
(or at least their emulators) never dreamed of.

So, if you do any floating point whatever, save up your lunch money
and buy yourself an '87.  If you do a LOT of floating point, get a
Weitek and a good compiler for it--even the 387 is a bit of a wimp,
dispite the fact that it is a few hundred times better than an
emulator.

		-Ed Hall
		edhall@rand.org

montnaro@spyder.crd.ge.com (Skip Montanaro) (03/13/90)

In article <20301@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:

   Personally I think the choice of doing FP emulation in the kernel
   was regretable, especially given the price-sensitivity common
   to most 386 users and the astounding price of 387 chips.  But
   I can understand why they went that way, with the huge push
   for binary compatiblity that is going on.

Sun also uses a kernel trap/emulation scheme on there SPARC machines. (They
use (or used to use) compiler flags on there 680x0 machines.) What a
disaster! Fortunately, the only machine that exhibits this problem is the
4/110. All other SPARC machines Sun sells have FPUs.

Skip (montanaro@crdgw1.ge.com)
--
Skip (montanaro@crdgw1.ge.com)

brando@uiucme2.me.uiuc.edu (Brando W. Brown) (03/13/90)

Who said you needed a 387??? It seems that no matter what the 80x86 product,
every vendor recommends a 387; probably because they are still $400+. I
have Interactive's 386/ix with the X11 distribution, VP/ix, etc, and mine
runs plenty fast with a base 386/25MHz, and 8mb core ram. I still also 
reserve opinions on whether tons on swap actually help. I have configured
20mb of swap compared with 10mb with no performance difference.

When everyone is up to the 486 level, a math coprocessor is on-board so that
throws the 387 discussion out the window.

Brandon Brown
uunet!uiucuxc!addamax!brown

jeff@samna.UUCP (jeff) (03/14/90)

In article <20203@nuchat.UUCP> (Steve Nuchia) writes:
:In article <9868@batcomputer.tn.cornell.edu> (Craig Chase) writes:
:>I can see the fast cpu and RAM parts, but why the floating point?  I 
:
:Unfortunately, the 386 sysV unixes do their FP in the kernel by
:taking a trap on the unimplemented intruction.  This is _real_
:slow.  Slow enough that simple awk scripts can take hours to
:slog through a few hundred K of input.  Get the '87 -- its
:cheaper and easier than wishing you had until you do.

This is all very interesting but can anyone answer the question?

I.e. Does having an FP chip speed up X substantially (exclusive of
arc-drawing), and, if so, why?

Jeff

clay@uci.mn.org (Clayton Haapala) (03/14/90)

In article <20301@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
>Dynamic linking is about the only good solution, but you still have
>to allow the heavy FP users to compile for specific hardware to
>avoid the two jumps per FLOP overhead.  There was a scheme proposed
>in which illegal instructions were compiled into the program and
>replaced at run time with the "right" intruction by the kernel.  Don't
>know if that ever went out with a commercial system.

I think XENIX uses a variation of that scheme -- an initial fault on the
first time an FP instruction is attempted, then the kernel "hot patches"
the binary with a transfer to the right emulation code.  Next time through
there is no fault, just a call.  Has to be faster than a complete context
switch with a fault, I would think.

I heard this rumor when I was working with XENIX 3 for 286 boxes.  I know
from empirical evidence that XENIX FPP emulation beats the pants off of
SysV 386 emulation.

Say, anybody try the IIT 3C87?  It's a clone of the Intel 80387, supposed to
run 25-30% faster.  They also have a 287 replacement.  I'm thinking of buying
one in a couple of weeks unless I hear terrible things about its performance
under XENIX/UNIX.
-- 
Clayton Haapala                ...!bungia!uci!clay (clay@uci.uci.com)
Unified Communications Inc.    "Every morning I get in the Queue.
3001 Metro Drive - Suite 500    'n get on the Bus that takes me to you."
Bloomington, MN  55425             -- the Who

jrh@mustang.dell.com (James R. Howard) (03/15/90)

In article <207@samna.UUCP>, jeff@samna.UUCP (jeff) writes:
 
> This is all very interesting but can anyone answer the question?
> 
> I.e. Does having an FP chip speed up X substantially (exclusive of
> arc-drawing), and, if so, why?
> 
> Jeff

If you have a mathco installed, certain routines will run faster.  Some
apps which demontstate this include xcalendar and xrn.  Both make 
extensive use of the Athena BOX widget, which was modified recently
to fix a very minor bug.  The problem is, the fix involved the addition
of a great deal of FP.  Installing a 387 on the machine will effectively
double or triple display speed on xrn and xcalendar.  There are other
programs which have the same problem, but those two make the difference
VERY clear.

--------------------------------------------------------------
James Howard
..uunet!dell!mustang!jrh   or    jrh@mustang.dell.com

The opinions stated are my own, and do not necessarily 
reflect the opinions of my employer, or anyone else.
--------------------------------------------------------------