TAustin.HENR801c@xerox.com (07/24/90)
I've recently been performing some instruction trace analysis on a Sun 4/110 workstation. It seems that there is no way to generate the fsqrt instruction from a high level language. All square root operations are turned into calls to _sqrt in libm.a which does not use the fsqrt instruction. I took the assembly output of a simple program that calculates the square root of 2, replaced the call to _sqrt with the fsqrt instruction and the program ran fine. What gives? Why doesn't anyone use fsqrt? It seems a significant amount of performance is being lost in floating point intensive programs that use sqrt(). Todd Austin taustin.henr801c@xerox.com
poffen@sj.ate.slb.com (Russ Poffenberger) (10/08/90)
In article <10155@brazos.Rice.edu> TAustin.HENR801c@xerox.com writes: >X-Sun-Spots-Digest: Volume 9, Issue 277, message 12 > >I've recently been performing some instruction trace analysis on a Sun >4/110 workstation. It seems that there is no way to generate the fsqrt >instruction from a high level language. All square root operations are >turned into calls to _sqrt in libm.a which does not use the fsqrt >instruction. I took the assembly output of a simple program that >calculates the square root of 2, replaced the call to _sqrt with the fsqrt >instruction and the program ran fine. What gives? Why doesn't anyone use >fsqrt? It seems a significant amount of performance is being lost in >floating point intensive programs that use sqrt(). This has been brought up a lot, so I will explain this. This comes from the SPARCstation 1SunOS 4.0.3 Sun-4c Release Notes. The inline expansion template file /usr/lib/sqrt.il, included in the c tape, may be used to improve performance of SPARC systems on problems that perform multiple square root operations. This inline expansion template replaces calls to sqrt subroutines with hardware sqrt instructions. Executables created with these templates may run slower on older Sun-4's without hardware sqrt instruction. (For example, 4/110 and 4/260 with Weitek 1164/1165). To compile, use the following example cc -O4 source.c /usr/lib/sqrt.il /usr/lib/libm.il -lm See inline(1) man page or Floating Point Programmers Guide for SunOS 4. A new utility program can search for the FPU2 (TI 8847), this is called fpuversion4(8). When run, it prints a message confirming the existence of FPU2. Sun-4 floating-point controller version 2 found. Russ Poffenberger DOMAIN: poffen@sj.ate.slb.com Schlumberger Technologies UUCP: {uunet,decwrl,amdahl}!sjsca4!poffen 1601 Technology Drive CIS: 72401,276 San Jose, Ca. 95110 (408)437-5254