dmurdoch@watstat.waterloo.edu (Duncan Murdoch) (10/01/90)
In article <1969@sixhub.UUCP> davidsen@sixhub.UUCP (bill davidsen) writes: > > I believe that was at 25MHz, but not 33MHz. Intel seems to have >changed the mask in the 33MHz version, so that it is faster by a good >bit. I believe several tests have shown this, including the recent one >in either _PC Week_ or _Info World_. I have been told that putting the >33MHz version in a 25MHz system will be about 40% faster, due to some >instructions taking fewer cycles. The people who told me this believed >it enough to buy the fa$ter part. Do you know the old rated time for a 387? I've noticed huge drops in cycles going from an 8087 to an 80486 (e.g. FLD, FADD and FMUL are 80-100 cycles on the 8087, and are 3, 10, and 16 cycles on the 486). When did these happen? A somewhat related question: the 486 manuals I've got (which were the source for the above) have a column called "Concurrent Execution" in the table giving instruction timings, but only for the floating point instructions, and as far as I can tell, it's mentioned nowhere in the text. I'd guess it's a bit of sloppy merging of the 386 and 387 manuals that lost the description. Does anyone know what the numbers mean? (Hint: not all instructions get entries: e.g. FLD, FST, FIST, FXAM and FLD1 don't, while FILD, FCOM, and FLDPI do. Those that do generally get a number somewhat less than the main clock count entry.) Duncan Murdoch dmurdoch@watstat.waterloo.edu
johnl@esegue.segue.boston.ma.us (John R. Levine) (10/02/90)
In article <1990Oct1.163030.3394@maytag.waterloo.edu> you write: >A somewhat related question: the 486 manuals I've got (which were the source >for the above) have a column called "Concurrent Execution" in the table >giving instruction timings, but only for the floating point instructions, >and as far as I can tell, it's mentioned nowhere in the text. A careful inspection of the table of contents reveals section 18.2 which is named CONCURRENT PROCESSING. It tells us that the internal architecture of the 486 is similar to a 386/387 combo in that the integer and floating point units can operate independently. For example, although the FSIN instruction takes about 241 cycles, the concurrent time is 2 cycles, so the CPU can proceed with another integer instruction two cycles later and can execute up to 239 cycles of fixed point instructions while the FPU thinks. This is an extreme example since FSIN has to do a lot of work and all of its sources and results live inside the FPU. More typical is FMUL which takes 16 cycles with 13 concurrent. A clever code scheduler (or assembler programmer) can improve performance quite a lot by interleaving floating and fixed instructions to keep both units active. Regards, John Levine, johnl@esegue.segue.boston.ma.us, {spdcc|ima|world}!esegue!johnl
dmurdoch@watstat.waterloo.edu (Duncan Murdoch) (10/02/90)
In article <9010012252.AA06030@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes: > >A careful inspection of the table of contents reveals section 18.2 which is >named CONCURRENT PROCESSING. It tells us that the internal architecture of >the 486 is similar to a 386/387 combo in that the integer and floating point >units can operate independently. For example, although the FSIN instruction >takes about 241 cycles, the concurrent time is 2 cycles, so the CPU can >proceed with another integer instruction two cycles later and can execute up >to 239 cycles of fixed point instructions while the FPU thinks. > >This is an extreme example since FSIN has to do a lot of work and all of its >sources and results live inside the FPU. More typical is FMUL which takes >16 cycles with 13 concurrent. A clever code scheduler (or assembler >programmer) can improve performance quite a lot by interleaving floating and >fixed instructions to keep both units active. Thanks for pointing out that section. I'd read it, but couldn't believe that it applied here. I wonder why the divides take so much integer cpu time (70 cycles of it)? Duncan Murdoch
kdq@demott.COM (Kevin D. Quitt) (10/03/90)
In article <1990Oct2.133130.11674@maytag.waterloo.edu> dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes: > I wonder why the divides take so much integer cpu time >(70 cycles of it)? > I believe that the CPU and FPU share the shifter. FP uses the shifter very frequently (e.g. any time it has two operands) -- _ Kevin D. Quitt demott!kdq kdq@demott.com DeMott Electronics Co. 14707 Keswick St. Van Nuys, CA 91405-1266 VOICE (818) 988-4975 FAX (818) 997-1190 MODEM (818) 997-4496 PEP last 96.37% of all statistics are made up.