chris@alderan.uucp (Christoph Splittgerber) (03/11/91)
In Intel's "387 DX User's Manual" - Programmer's Reference I found in Chapter 5.2: Because the 386 DX CPU and the 387 DX NPX have separate execution units, it is possible for the NPX to execute numeric instructions in parallel with instructions executed by the CPU. [] No special programming techniques are required to gain advantages of concurrent execution; ... etc. So I wrote a 2 very small test function to proof this. Something like: 1) . .data constant: .double 1.3456 / what ever result: .double 0 .text .align 4 fldl constant ; followd by about 300 clocks 80386 instructions fcos fstl result . . 2) . .data constant: .double 1.3456 / what ever result: .double 0 .text .align 4 fldl constant fcos ; followd by about 300 clocks 80386 instructions fstl result . . In the first function the 300 clocks 80386 instructions go behind the "fld" which requires 25 fpu clocks. That means about 270 386-clocks are executed while the fpu does nothing; right ? In the second function the 300 386-clocks go behind the "fcos" which takes between 200-800 fpu clocks. That means the 300 80386 clocks should be executed while the cosine is computed; no ? The thing is: I could *NOT* determine any difference in speed-of-execution. NOT EVEN THE SLIGHTEST DIFFERENCE. So, what am I doing wrong ? Any ideas ? Chris -- ************************ Brain fault (core dumped) ************************* Replies-To: chris@alderan.uucp UUCP: uunet!mcsun!unido!alderan!chris Phone: +49 711 344375 Fax: +49 711 3460684