achhabra@uceng.uc.edu (atul k chhabra) (03/21/89)
I chanced upon a segment of code that runs approximately 300 times faster
in FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on
Sun4(OS4.0) (of course, on Sun4 the -f68881 flag was not used.) The
results are similar on both machines. Can anyone enlighten me on this
bizzare result?
Listing of cosc.c:
__________
/*
* Compile using:
* cc -f68881 -O -o cosc cosc.c -lm.
*/
#include <math.h>
main()
{
int i;
float tmp;
for(i=0;i<262144;i++)
tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);
}
Listing of cosf.f
__________
c
c Compile using:
c f77 -f68881 -O -o cosf cosf.f
c
program cosf
integer i
real tmp
do 10 i=1,262144
tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5)
10 continue
end
Timings on Sun3(OS3.5):
% time cosc
55.6u 1.0s 1:49 51% 24+8k 12+1io 0pf+0w
^^^^^
% time cosf
0.2u 0.0s 0:00 75% 16+8k 4+0io 0pf+0w
^^^^
[[ This is a good one! You've gotten tricked by one of the most common
optimization snares around. For f77, "-O" defaults to "-O3", which
includes global optimization. Presumably it also does dead code
elimination. But what constitutes dead code? Well, code that performs
calculations that aren't used. How do you define a value that has been
used? Ultimately, if it has an effect on the outside world, or more
specifically, if it or something it effects is written or printed. Your
cosf.f does neither; "tmp" is never used for anything important, so your
entire program gets optimized away! Add "print *,tmp" after the body of
the loop and cosf.f takes alot longer. Interestingly enough, it still
appears to be quicker than cosc.c. --wnl ]]
Atul Chhabra, Dept. of Electrical & Computer Engineering, ML 030,
University of Cincinnati, Cincinnati, OH 45221-0030.
voice: (513)556-4766 INTERNET: achhabra@ucesp1.ece.uc.edu [129.137.33.114]
OR achhabra@uceng.uc.edu [129.137.33.1]
chuck@trantor.harris-atd.com (Chuck Musciano) (04/19/89)
Our moderator muses: "Interestingly enough, it [computing a function involving four calls to cos()] still appears to be quicker [in FORTRAN] than cosc.c [the C version]". The answer is that in FORTRAN, cos() is an intrinsic function, and the compiler uses the direct floating point opcode. In C, cos() is an external function, and the compiler generates a call to an external named "cos". Compare the code generated for "sin(x) + cos(x)" in FORTRAN and C. In FORTRAN (a5@(-0x7ffc) is x): fcoss a5@(-0x7ffc),fp0 fsins a5@(-0x7ffc),fp1 faddx fp1,fp0 In C (a6@(-12) and a6@(-16) are x): movl a6@(-12),sp@- movl a6@(-16),sp@- jbsr _sin addqw #8,sp movl d0,a6@(-24) movl d1,a6@(-20) movl a6@(-12),sp@- movl a6@(-16),sp@- jbsr _cos addqw #8,sp movl d1,sp@- movl d0,sp@- fmoved sp@+,fp0 faddd a6@(-24),fp0 Does anyone know of an optimizer which will convert calls to the standard math routines to their 68881 instruction counterparts? Chuck Musciano Advanced Technology Department Harris Corporation (407) 727-6131 ARPA: chuck@trantor.harris-atd.com
henry@uunet.uu.net (04/19/89)
> do 10 i=1,262144 > tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5) This one got hashed out at some length in comp.lang.c a while ago. The major reason for the difference in speed is as our moderator mentioned: the Fortran compiler knows that cos() is a builtin function, notices that the value is never used, and optimizes out the body of the loop. C compilers *could not* do this until recently, since it's only the (imminent) ANSI C standard that has declared certain library functions to be potentially built-in. >... Add "print *,tmp" after the body of >the loop and cosf.f takes alot longer. Interestingly enough, it still >appears to be quicker than cosc.c. --wnl ]] It may only be computing cos(2.5) once per loop, since the compiler is entitled to notice the duplication. Also, note that the Fortran version is using single-precision arithmetic where the C one is using double. Henry Spencer at U of Toronto Zoology uunet!attcan!utzoo!henry henry@zoo.toronto.edu
cudcv@uunet.uu.net (Rob McMahon) (04/22/89)
In article <764@uceng.UC.EDU> you write: >I chanced upon a segment of code that runs approximately 300 times faster >in FORTRAN than in C ... > >Timings on Sun3(OS3.5): > >% time cosc >55.6u 1.0s 1:49 51% 24+8k 12+1io 0pf+0w >^^^^^ >% time cosf >0.2u 0.0s 0:00 75% 16+8k 4+0io 0pf+0w >^^^^ > >[[ ... [valid comments about results being optimised away when not used, >because of the higher level of optimisation used by f77] ... --wnl ]] The compiler might also be realising that cos returns the same value for the same argument. Here's a script using gcc, given the following definition of cos, run on a Sun3/160 SunOS 4.0, and using gcc 1.34: inline static const double cos(double x) { double result; asm("fcosx %1,%0" : "=f" (result) : "f" (x)); return (result); } Script started on Wed Mar 29 22:06:38 1989 cudcv (6) >> cat x.c #include <math.h> main() { int i; float tmp; for(i=0;i<262144;i++) tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); } cudcv (7) >> gcc -O x.c cudcv (8) >> time ./a.out 0.2u 0.0s 0:00 141% 0+8k 0+0io 0pf+0w cudcv (9) >> ed x.c 126 /}/i printf("%f\n", tmp);
stevo@elroy.jpl.nasa.gov (Steve Groom) (05/06/89)
chuck@trantor.harris-atd.com (Chuck Musciano) writes: >Does anyone know of an optimizer which will convert calls to the standard >math routines to their 68881 instruction counterparts? /usr/lib/inline. Try "cc -c foo.c /usr/lib/f68881/libm.il". Better yet, take a look at the Sun Floating Point Programmer's Guide (in the 4.0 manual set). -steve Steve Groom, Jet Propulsion Laboratory, Pasadena, CA 91109 Internet: stevo@elroy.jpl.nasa.gov UUCP: {ames,cit-vax}!elroy!stevo Disclaimer: (thick German accent) "I know noothingg! Noothingg!"
stpeters@uunet.uu.net (05/06/89)
In article <8903221250.AA03853@trantor.harris-atd.com> you write: >Does anyone know of an optimizer which will convert calls to the standard >math routines to their 68881 instruction counterparts? In effect, yes. The mechanism Sun uses is to have in-line code (.il) templates that cause the compiler to generate in-line code in place of the math library calls. If you're running SunOS 3.X: cc -c -f68881 program.c /usr/lib/f68881.il cc -o program -f68881 program.o -lm If you're running 4.0, it's even easier: cc -o program program.c /usr/lib/f68881/libm.il Substitute soft, switch, or fpa for f68881 as appropriate. These can make differences of HUGE factors in how fast floating point programs run. That floating point manual really is worth reading :-) Dick St.Peters GE Corporate R&D, Schenectady, NY stpeters@dawn.crd.ge.com uunet!steinmetz!dawn!stpeters
paul@uunet.uu.net (Paul Hudson) (05/06/89)
Whatever the question, the answer is GNU! Using GNU gcc wonderful extended asm statement + inline functions it's quite possible to have cos just generate the fp opcode. A header file to do it was recently posted to bug-gcc. I think I have it squirreled away somewhere .... Paul Hudson Snail mail: Monotype ADG Email: ...!ukc!acorn!moncam!paul Science Park, paul@moncam.co.uk Milton Road, Cambridge, CB4 4FQ
joerg%berthold.UUCP%TUB.BITNET@mitvma.mit.edu (Joerg Schilling) (05/06/89)
The sun C compiles does convert calls to the *external* c function sin() ... into 68881 routines if you tell him to use the inline expander. Try the folling c program (t.c): extern double sin(); double t() { return sin(123.0); } use the command : joerg > cc -v -O2 -f68881 -S /usr/lib/f68881/libm.il t.c .. Verbose output removed. joerg > cat t.s .. Several lines removed. .text .globl _t _t: |#PROLOGUE# 0 link a6,#-8 |#PROLOGUE# 1 movl L2000000+4,sp@- movl L2000000,sp@- fsind sp@,fp0 fmoved fp0,sp@ movl sp@+,a6@(-8) movl sp@+,a6@(-4) movl a6@(-8),d0 movl a6@(-4),d1 |#PROLOGUE# 2 unlk a6 |#PROLOGUE# 3 rts L2000000: .long 1079951360,0 joerg > As you see the call to the *external* function is converted into the correct mc68881 code. See the manual of inline(1) for more information. J. Schilling H. Berthold AG Teltowkanalstr. 1-4 D 1000 Berlin 46 +49 30 7795 - 400 joerg@berthold.de .. tub!berthold!joerg .. unido!berthold!joerg
khb@sun.com (Keith Bierman - SPD Languages Marketing -- MTS) (05/10/89)
I'm sorry for the very tardy reply, but I was out of town, my machine (and office) were moved, and my dog ate my transmittal :> chuck@trantor.harris-atd.com (Chuck Musciano) writes: >X-Sun-Spots-Digest: Volume 7, Issue 230, message 1 of 16 > >Compare the code generated for "sin(x) + cos(x)" in FORTRAN and C. In >FORTRAN (a5@(-0x7ffc) is x): > > fcoss a5@(-0x7ffc),fp0 > fsins a5@(-0x7ffc),fp1 > faddx fp1,fp0 > One can do better. Here is some source: real x x = 10.0 y = sin(x)+cos(x) print*,y end and we compile (f77v1.2) f77 -O3 -f68881 -S .... stuff removed |#PROLOGUE# 1 fmoves L1D20,fp7 fsincosx fp7,fp1:fp0 faddx fp1,fp0 fmoves fp0,VAR_SEG1+4 pea v.16 jbsr _s_wsle since we can compute sin and cos together with a big savings. Don't forget to use the inline library /usr/lib/f77/f68881/libm.il for your floating point needs. Cheers. -- Keith H. Bierman |*My thoughts are my own. Only my work belongs to Sun* It's Not My Fault | Marketing Technical Specialist -- I Voted for Bill & Opus| Languages and Performance Tools. (* strange as it may seem, I do more engineering now *)
igp@cambridge-consultants.co.uk (Ian Phillipps) (05/11/89)
chuck@trantor.harris-atd.com (Chuck Musciano) writes: >Compare the code generated for "sin(x) + cos(x)" in FORTRAN and C. In >FORTRAN (a5@(-0x7ffc) is x): [ 3 x 68881 instructions ] >In C (a6@(-12) and a6@(-16) are x): [ 13 x 68020 + 1 x 68881 instructions] >Does anyone know of an optimizer which will convert calls to the standard >math routines to their 68881 instruction counterparts? Sun provide one! (/usr/lib/inline) Using cc -S -O -f68881 trig.c /usr/lib/f68881.il We get: movl a6@(12),sp@- ; x movl a6@(8),sp@- fcosd sp@,fp0 fmoved fp0,sp@ movl sp@+,a6@(-8) movl sp@+,a6@(-4) movl a6@(12),sp@- movl a6@(8),sp@- fsind sp@,fp0 fmoved fp0,sp@ movl sp@+,d0 movl sp@,d1 movl d0,sp@- fmoved sp@+,fp0 faddd a6@(-8),fp0 fmoved fp0,a6@(16) ; assign result Not very optimised, but at least it does fcosd! [Note - C gives you DCOS by default] -- UUCP: igp@camcon.co.uk | Cambridge Consultants Ltd | Ian Phillipps or: igp@camcon.uucp | Science Park, Milton Road |----------------- Phone: +44 223 420024 | Cambridge CB4 4DW, England |