rtaylor@batcomputer.tn.cornell.edu (Russ Taylor) (03/08/90)
Please respond to cchase@ee.cornell.edu Mr. Taylor is kind enough to post this for me, but has no interest in this topic The conventional wisdom seem to be, that in order to run X11R3 (either the ESIX port or the Interactive port) you should have: a) a fast cpu b) gobs of memory c) floating point support I can see the fast cpu and RAM parts, but why the floating point? I just took a (very quick) look through the X11R3 server source, and there doesn't seem to be much floating point. In fact, except for drawing arcs, I can't see why any floating point is needed. Can anyone shed some light on this? Craig Chase cchase@ee.cornell.edu -- _____________________________________________________________________ | ARPA: rtaylor@tcgould.tn.cornell.edu, russellt@tesla.ee.cornell.edu | | UUCP: {cmc12,shasta,uw-beaver,rochester}!cornell!tesla!russellt | ---------------------------------------------------------------------
jrh@mustang.dell.com (James R. Howard) (03/09/90)
> The conventional wisdom seem to be, that in order to run X11R3 (either > the ESIX port or the Interactive port) you should have: > a) a fast cpu > b) gobs of memory > c) floating point support > > I can see the fast cpu and RAM parts, but why the floating point? I > just took a (very quick) look through the X11R3 server source, and there > doesn't seem to be much floating point. In fact, except for drawing arcs, > I can't see why any floating point is needed. > > Can anyone shed some light on this? Well, the Athena Widget set uses quite a bit, especially in the Box Widget code, which a LOT of clients use. This is probably why it is necessary. If you have ever had two identical systems side by side, one with, one without the 387, you'd see the difference quite clearly. -------------------------------------------------------------- James Howard ..cs.utexas.edu!dell!mustang!jrh or jrh@mustang.dell.com "I've got a firm policy on gun control, if there's a gun around, I want to be the one controlling it." -- Clint Eastwood --------------------------------------------------------------
steve@nuchat.UUCP (Steve Nuchia) (03/10/90)
In article <9868@batcomputer.tn.cornell.edu> cchase@ee.cornell.edu (Craig Chase) writes: >I can see the fast cpu and RAM parts, but why the floating point? I When I cut my teeth, unix systems without FP hardware did the emulation thing in library routines that were 10 to mayby 100 time slower than hardware, so a program that would spend 10% of its time doing FP would slow down by less than ten times. For the amount of FP I do, I can live with that. Unfortunately, the 386 sysV unixes do their FP in the kernel by taking a trap on the unimplemented intruction. This is _real_ slow. Slow enough that simple awk scripts can take hours to slog through a few hundred K of input. Get the '87 -- its cheaper and easier than wishing you had until you do. -- Steve Nuchia South Coast Computing Services (713) 964-2462 "You have no scars on your face, and you cannot handle pressure." - Billy Joel
peter@ficc.uu.net (Peter da Silva) (03/12/90)
In article <20203@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: > When I cut my teeth, unix systems without FP hardware did the > emulation thing in library routines that were 10 to mayby > 100 time slower than hardware, so a program that would spend > 10% of its time doing FP would slow down by less than ten times. I'm moderately sure that Version 7 UNIX on the PDP-11 trapped an illegal instruction and emulated it, just like the 80386 UNIXes do. It might be that the PDP-11 was just quicker at handling the fault. I'm *sure* that the PDP-11 instructions were easier to parse than the 80386 ones. :->. -- _--_|\ `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ 'U` \_.--._/ v
steve@nuchat.UUCP (Steve Nuchia) (03/12/90)
In article <4.523N2ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >In article <20203@nuchat.UUCP> I wrote: >> When I cut my teeth, unix systems without FP hardware did the >> emulation thing in library routines that were 10 to maybe note that I specifically mean user-mode routines, no trapping allowed. Of course there are dozens of variations, no need to enumerate them. >I'm moderately sure that Version 7 UNIX on the PDP-11 trapped an illegal >instruction and emulated it, just like the 80386 UNIXes do. It might be that Not having immediate access to a V7 system I can't argue, but I think that the mechanism is quite old. I know that the V7 compiler I used generated library calls rather than FP instructions, and I know most of the low-priced sysIII boxes from the early eighties did the same. Come to think of it though, I'm not sure how many of them even had FPA options available. The v7 compiler had the option of generating FP instuctions, but I can't remember another system off the top of my head, other than Sun, that uses that scheme. It is an unpleasant trade off to have to make -- either you make FP slower than it has to be (386 case, and other "transparent" schemes) or you complicate the users' lives with N different versions of the library and of each executable (Sun case). Sun tried to choose the third alternative by specing a single, always-present FP device for the sun4 line, but less than a year after the sun4 shipped there was a (slightly) different FPA option requiring different compile options. Sigh. Dynamic linking is about the only good solution, but you still have to allow the heavy FP users to compile for specific hardware to avoid the two jumps per FLOP overhead. There was a scheme proposed in which illegal instructions were compiled into the program and replaced at run time with the "right" intruction by the kernel. Don't know if that ever went out with a commercial system. Personally I think the choice of doing FP emulation in the kernel was regretable, especially given the price-sensitivity common to most 386 users and the astounding price of 387 chips. But I can understand why they went that way, with the huge push for binary compatiblity that is going on. -- Steve Nuchia South Coast Computing Services (713) 964-2462 "You have no scars on your face, and you cannot handle pressure." - Billy Joel
edhall@rand.org (Ed Hall) (03/12/90)
In article <4.523N2ggpc2@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >In article <20203@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: >> When I cut my teeth, unix systems without FP hardware did the >> emulation thing in library routines that were 10 to mayby >> 100 time slower than hardware, so a program that would spend >> 10% of its time doing FP would slow down by less than ten times. > >I'm moderately sure that Version 7 UNIX on the PDP-11 trapped an illegal >instruction and emulated it, just like the 80386 UNIXes do. It might be that >the PDP-11 was just quicker at handling the fault. I'm *sure* that the PDP-11 >instructions were easier to parse than the 80386 ones. :->. Yes, V6 & V7 used illegal instruction traps to emulate floating-point instructions. It was dog slow, too. Simply printing a floating point value could take tens of milliseconds (one FP division and subtraction per digit). The signal mechanism was none too fast for Unix V6 & V7, making a slow task even slower. But I'll agree with the original poster that '87 emulators seem even worse. [Note that '87, '287 & '387s are pretty similar from an instruction set perspective, so I'll just use the term '87.] There is a reason why '87 emulation is slower, however, and it's not just the arcane instruction format. Internally, the '87 does all operations in 80-bit extended floating-point format. This is true even if the source and/or destination of a calculation is only 32-bit (single-precision)! As you might guess, all these bits can make emulation REAL SLOW. Also, operations need to do appropriate things with infinity, NaN, indefinite, and denormals--things PDP-11 FP units (or at least their emulators) never dreamed of. So, if you do any floating point whatever, save up your lunch money and buy yourself an '87. If you do a LOT of floating point, get a Weitek and a good compiler for it--even the 387 is a bit of a wimp, dispite the fact that it is a few hundred times better than an emulator. -Ed Hall edhall@rand.org
montnaro@spyder.crd.ge.com (Skip Montanaro) (03/13/90)
In article <20301@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
Personally I think the choice of doing FP emulation in the kernel
was regretable, especially given the price-sensitivity common
to most 386 users and the astounding price of 387 chips. But
I can understand why they went that way, with the huge push
for binary compatiblity that is going on.
Sun also uses a kernel trap/emulation scheme on there SPARC machines. (They
use (or used to use) compiler flags on there 680x0 machines.) What a
disaster! Fortunately, the only machine that exhibits this problem is the
4/110. All other SPARC machines Sun sells have FPUs.
Skip (montanaro@crdgw1.ge.com)
--
Skip (montanaro@crdgw1.ge.com)
brando@uiucme2.me.uiuc.edu (Brando W. Brown) (03/13/90)
Who said you needed a 387??? It seems that no matter what the 80x86 product, every vendor recommends a 387; probably because they are still $400+. I have Interactive's 386/ix with the X11 distribution, VP/ix, etc, and mine runs plenty fast with a base 386/25MHz, and 8mb core ram. I still also reserve opinions on whether tons on swap actually help. I have configured 20mb of swap compared with 10mb with no performance difference. When everyone is up to the 486 level, a math coprocessor is on-board so that throws the 387 discussion out the window. Brandon Brown uunet!uiucuxc!addamax!brown
jimf%saber@HARVARD.HARVARD.EDU (03/14/90)
|Who said you needed a 387??? [...] I |have Interactive's 386/ix with the X11 distribution, VP/ix, etc, and mine |runs plenty fast with a base 386/25MHz, and 8mb core ram. I still also |reserve opinions on whether tons on swap actually help. I have configured |20mb of swap compared with 10mb with no performance difference. I concur. I used 386/ix for awhile without FPU with no performance problems (except excessive swapping due to low onboard memory, but that's my fault). Tons of swap won't speed things up (more swap doesn't make swapping faster, just more available) but it will keep the machine running if you have to really overload it at some point. I recommend no less than 16mb if you're running X (especially if you also like emacs like I like emacs). If you're doing a lot of arcs then you really do want the FPU but I found that for average use it wasn't too bad. This was, of course, X11R3. I've never seen R4 on one of the machines so I have no comment. jim frost saber software jimf@saber.com
jeff@samna.UUCP (jeff) (03/14/90)
In article <20203@nuchat.UUCP> (Steve Nuchia) writes: :In article <9868@batcomputer.tn.cornell.edu> (Craig Chase) writes: :>I can see the fast cpu and RAM parts, but why the floating point? I : :Unfortunately, the 386 sysV unixes do their FP in the kernel by :taking a trap on the unimplemented intruction. This is _real_ :slow. Slow enough that simple awk scripts can take hours to :slog through a few hundred K of input. Get the '87 -- its :cheaper and easier than wishing you had until you do. This is all very interesting but can anyone answer the question? I.e. Does having an FP chip speed up X substantially (exclusive of arc-drawing), and, if so, why? Jeff
clay@uci.mn.org (Clayton Haapala) (03/14/90)
In article <20301@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: >Dynamic linking is about the only good solution, but you still have >to allow the heavy FP users to compile for specific hardware to >avoid the two jumps per FLOP overhead. There was a scheme proposed >in which illegal instructions were compiled into the program and >replaced at run time with the "right" intruction by the kernel. Don't >know if that ever went out with a commercial system. I think XENIX uses a variation of that scheme -- an initial fault on the first time an FP instruction is attempted, then the kernel "hot patches" the binary with a transfer to the right emulation code. Next time through there is no fault, just a call. Has to be faster than a complete context switch with a fault, I would think. I heard this rumor when I was working with XENIX 3 for 286 boxes. I know from empirical evidence that XENIX FPP emulation beats the pants off of SysV 386 emulation. Say, anybody try the IIT 3C87? It's a clone of the Intel 80387, supposed to run 25-30% faster. They also have a 287 replacement. I'm thinking of buying one in a couple of weeks unless I hear terrible things about its performance under XENIX/UNIX. -- Clayton Haapala ...!bungia!uci!clay (clay@uci.uci.com) Unified Communications Inc. "Every morning I get in the Queue. 3001 Metro Drive - Suite 500 'n get on the Bus that takes me to you." Bloomington, MN 55425 -- the Who
brando@uiucme2.me.uiuc.edu (Brando W. Brown) (03/15/90)
I haven't tried X11R4 on 386/ix either partially because of i-make files and the like. I was wondering if anyone else had. I have the standard distribution from MIT, but after a couple of nights wrestling with it on my 386, I gave it up. While on the subject of speed enhancements/degradations, I have Adaptec's SCSI on my 386/ix with sometimes as much as a 50% increase in speed when executing applications on different drives (not partitions). If anyone is considering buying another disk subsystem, I would highly recommend it. Adding a SCSI tape unit was also painless as Interactive supports the Archive series of SCSI tapes and it doesn't require another slot in your 386. Another issue altogether is the fact the I have a Viper 150mb SCSI drive and with the supplied drivers, will not perform a "no rewind after tape close" correctly. Sorry to clutter an otherwise simple response with other garbage. +============================================================================+ | Brandon Brown Internet: brando@uiucme.me.uiuc.edu | | Addamax Corporation UUCP: uunet!uiucuxc!addamax!brown | | 2009 Fox Drive GEnie: xmg23356 macbrando | | Champaign, IL 61820 CompuServe: 73040,447 | +============================================================================+