[comp.sys.sun] SPARCs appear to not use fsqrt

TAustin.HENR801c@xerox.com (07/24/90)

I've recently been performing some instruction trace analysis on a Sun
4/110 workstation.  It seems that there is no way to generate the fsqrt
instruction from a high level language.  All square root operations are
turned into calls to _sqrt in libm.a which does not use the fsqrt
instruction.  I took the assembly output of a simple program that
calculates the square root of 2, replaced the call to _sqrt with the fsqrt
instruction and the program ran fine.  What gives?  Why doesn't anyone use
fsqrt?  It seems a significant amount of performance is being lost in
floating point intensive programs that use sqrt().

Todd Austin
taustin.henr801c@xerox.com

poffen@sj.ate.slb.com (Russ Poffenberger) (10/08/90)

In article <10155@brazos.Rice.edu> TAustin.HENR801c@xerox.com writes:
>X-Sun-Spots-Digest: Volume 9, Issue 277, message 12
>
>I've recently been performing some instruction trace analysis on a Sun
>4/110 workstation.  It seems that there is no way to generate the fsqrt
>instruction from a high level language.  All square root operations are
>turned into calls to _sqrt in libm.a which does not use the fsqrt
>instruction.  I took the assembly output of a simple program that
>calculates the square root of 2, replaced the call to _sqrt with the fsqrt
>instruction and the program ran fine.  What gives?  Why doesn't anyone use
>fsqrt?  It seems a significant amount of performance is being lost in
>floating point intensive programs that use sqrt().

This has been brought up a lot, so I will explain this.

This comes from the SPARCstation 1SunOS 4.0.3 Sun-4c Release Notes.

The inline expansion template file /usr/lib/sqrt.il, included in the c
tape, may be used to improve performance of SPARC systems on problems that
perform multiple square root operations. This inline expansion template
replaces calls to sqrt subroutines with hardware sqrt instructions.

Executables created with these templates may run slower on older Sun-4's
without hardware sqrt instruction. (For example, 4/110 and 4/260 with
Weitek 1164/1165).

To compile, use the following example

cc -O4 source.c /usr/lib/sqrt.il /usr/lib/libm.il -lm

See inline(1) man page or Floating Point Programmers Guide for SunOS 4.

A new utility program can search for the FPU2 (TI 8847), this is called
fpuversion4(8). When run, it prints a message confirming the existence of
FPU2.

Sun-4 floating-point controller version 2 found.

Russ Poffenberger               DOMAIN: poffen@sj.ate.slb.com
Schlumberger Technologies       UUCP:   {uunet,decwrl,amdahl}!sjsca4!poffen
1601 Technology Drive		CIS:	72401,276
San Jose, Ca. 95110             (408)437-5254