[comp.sys.ibm.pc.hardware] Faster 387s and 486 timing

dmurdoch@watstat.waterloo.edu (Duncan Murdoch) (10/01/90)

In article <1969@sixhub.UUCP> davidsen@sixhub.UUCP (bill davidsen) writes:
>
>  I believe that was at 25MHz, but not 33MHz. Intel seems to have
>changed the mask in the 33MHz version, so that it is faster by a good
>bit. I believe several tests have shown this, including the recent one
>in either _PC Week_ or _Info World_. I have been told that putting the
>33MHz version in a 25MHz system will be about 40% faster, due to some
>instructions taking fewer cycles. The people who told me this believed
>it enough to buy the fa$ter part.

Do you know the old rated time for a 387?  I've noticed huge drops in cycles
going from an 8087 to an 80486 (e.g. FLD, FADD and FMUL are 80-100 cycles
on the 8087, and are 3, 10, and 16 cycles on the 486).  When did these
happen?

A somewhat related question:  the 486 manuals I've got (which were the source
for the above) have a column called "Concurrent Execution" in the table
giving instruction timings, but only for the floating point instructions,
and as far as I can tell, it's mentioned nowhere in the text.  I'd guess
it's a bit of sloppy merging of the 386 and 387 manuals that lost the
description.  

Does anyone know what the numbers mean?  (Hint:  not all
instructions get entries:  e.g. FLD, FST, FIST, FXAM and FLD1 don't, while 
FILD, FCOM, and FLDPI do.  Those that do generally get a number somewhat
less than the main clock count entry.)

Duncan Murdoch
dmurdoch@watstat.waterloo.edu

johnl@esegue.segue.boston.ma.us (John R. Levine) (10/02/90)

In article <1990Oct1.163030.3394@maytag.waterloo.edu> you write:
>A somewhat related question:  the 486 manuals I've got (which were the source
>for the above) have a column called "Concurrent Execution" in the table
>giving instruction timings, but only for the floating point instructions,
>and as far as I can tell, it's mentioned nowhere in the text.

A careful inspection of the table of contents reveals section 18.2 which is
named CONCURRENT PROCESSING.  It tells us that the internal architecture of
the 486 is similar to a 386/387 combo in that the integer and floating point
units can operate independently.  For example, although the FSIN instruction
takes about 241 cycles, the concurrent time is 2 cycles, so the CPU can
proceed with another integer instruction two cycles later and can execute up
to 239 cycles of fixed point instructions while the FPU thinks.

This is an extreme example since FSIN has to do a lot of work and all of its
sources and results live inside the FPU.  More typical is FMUL which takes
16 cycles with 13 concurrent.  A clever code scheduler (or assembler
programmer) can improve performance quite a lot by interleaving floating and
fixed instructions to keep both units active.

Regards,
John Levine, johnl@esegue.segue.boston.ma.us, {spdcc|ima|world}!esegue!johnl

dmurdoch@watstat.waterloo.edu (Duncan Murdoch) (10/02/90)

In article <9010012252.AA06030@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes:
>
>A careful inspection of the table of contents reveals section 18.2 which is
>named CONCURRENT PROCESSING.  It tells us that the internal architecture of
>the 486 is similar to a 386/387 combo in that the integer and floating point
>units can operate independently.  For example, although the FSIN instruction
>takes about 241 cycles, the concurrent time is 2 cycles, so the CPU can
>proceed with another integer instruction two cycles later and can execute up
>to 239 cycles of fixed point instructions while the FPU thinks.
>
>This is an extreme example since FSIN has to do a lot of work and all of its
>sources and results live inside the FPU.  More typical is FMUL which takes
>16 cycles with 13 concurrent.  A clever code scheduler (or assembler
>programmer) can improve performance quite a lot by interleaving floating and
>fixed instructions to keep both units active.

Thanks for pointing out that section.  I'd read it, but couldn't believe
that it applied here.  I wonder why the divides take so much integer cpu time 
(70 cycles of it)?

Duncan Murdoch

kdq@demott.COM (Kevin D. Quitt) (10/03/90)

In article <1990Oct2.133130.11674@maytag.waterloo.edu> dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes:
>  I wonder why the divides take so much integer cpu time 
>(70 cycles of it)?
>

    I believe that the CPU and FPU share the shifter.  FP uses the shifter
very frequently (e.g. any time it has two operands)


-- 
 _
Kevin D. Quitt         demott!kdq   kdq@demott.com
DeMott Electronics Co. 14707 Keswick St.   Van Nuys, CA 91405-1266
VOICE (818) 988-4975   FAX (818) 997-1190  MODEM (818) 997-4496 PEP last

                96.37% of all statistics are made up.