achhabra@uceng.UC.EDU (atul k chhabra) (03/08/89)
I chanced upon a segment of code that runs approximately 300 times faster in
FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0)
(of course, on Sun4 the -f68881 flag was not used.) The results are similar
on both machines. Can anyone enlighten me on this bizzare result?
Listing of cosc.c:
--------------------------------------------------------------------------------
/*
* Compile using:
* cc -f68881 -O -o cosc cosc.c -lm.
*/
#include <math.h>
main()
{
int i;
float tmp;
for(i=0;i<262144;i++)
tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);
}
--------------------------------------------------------------------------------
Listing of cosf.f
--------------------------------------------------------------------------------
c
c Compile using:
c f77 -f68881 -O -o cosf cosf.f
c
program cosf
integer i
real tmp
do 10 i=1,262144
tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5)
10 continue
end
--------------------------------------------------------------------------------
Timings on Sun3(OS3.5):
--------------------------------------------------------------------------------
% time cosc
55.6u 1.0s 1:49 51% 24+8k 12+1io 0pf+0w
^^^^^
% time cosf
0.2u 0.0s 0:00 75% 16+8k 4+0io 0pf+0w
^^^^
--------------------------------------------------------------------------------
===========================================================================
Atul Chhabra, Dept. of Electrical & Computer Engineering, ML 030,
University of Cincinnati, Cincinnati, OH 45221-0030.
voice: (513)556-4766 INTERNET: achhabra@ucesp1.ece.uc.edu
OR achhabra@uceng.uc.edu
===========================================================================
chris@mimsy.UUCP (Chris Torek) (03/08/89)
In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes: >I chanced upon a segment of code that runs approximately 300 times faster in >FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0) >(of course, on Sun4 the -f68881 flag was not used.) The results are similar >on both machines. Can anyone enlighten me on this bizzare result? `COS' is an intrisinc function in Fortran. This means that the compiler is required to know about it. It is typically provided as an external function in C, so that the compiler knows nothing of it. Thus: > for(i=0;i<262144;i++) > tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); makes the compiler call `cos' (262144*4) times, each time with the same argument, and multiply all those values together. The compiler does not `guess at' the function and assume that, since its value is not used the first 262143 times, eliminate the call, because `cos' might print `hello world'. On the other hand, given > do 10 i=1,262144 > tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5) >10 continue the Fortran compiler can be certain that COS(2.5) does nothing but compute cosines, and can change the code to TMP = 4.0 * COS(2.5) 10 CONTINUE possibly even replacing the COS(2.5) with the constant -.8011436155.... (Actually, since in both fragment, tmp is unused, both versions can elide the assignment to tmp and the C version can elide the four multiplies per iteration. It cannot, however, replace the four calls wtih a single call.) Now, if Sun had a pANS-conformant compiler, they could make <math.h> do something like #define cos(x) __intrinsic_cos(x) and recognise calls to `__intrinsic_cos'. This sort of optimisation does have a real effect on real code (as opposed to silly examples like calling cos four times with the same constant in a loop that runs 262144 times, then throwing away the result). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
tim@crackle.amd.com (Tim Olson) (03/09/89)
In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes: | I chanced upon a segment of code that runs approximately 300 times faster in | FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0) | (of course, on Sun4 the -f68881 flag was not used.) The results are similar | on both machines. Can anyone enlighten me on this bizzare result? Welcome to the world of benchmarking. You can see what happened if you take a look at the assembly-language generated by the compilers. In the FORTRAN version, there is no call to the cosine routine; only an empty loop remains. This is because cosine is a FORTRAN intrinsic which the compiler knows about. Since you didn't use any of the results of the cosine calls, the compiler was able to eliminate it entirely as "dead code". The C version had to keep the cosine function calls, because it isn't an intrinsic function in K&R C, so the compiler knows nothing of what it does (it may have side-effects). To get more realistic numbers, you have to "fake out" the compiler, by using the results of the calls: ________________________________________ /* * Compile using: * cc -f68881 -O -o cosc cosc.c -lm. */ #include <math.h> float bench() { int i; float tmp; for(tmp=0.0,i=0;i<262144;i++) tmp+=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); return tmp; } main() { float tmp; tmp = bench(); } ________________________________________ c f77 -f68881 -O -o cosf cosf.f c real function bench() integer i real tmp tmp = 0.0 do 10 i=1,262144 tmp = tmp+cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5) 10 continue bench = tmp end program cosf real tmp1 tmp1 = bench() end ________________________________________ On a Sun 4/110: crackle49 time cosc 35.3u 0.5s 0:37 95% 0+144k 1+0io 2pf+0w crackle50 time cosf 19.4u 0.3s 0:20 96% 0+232k 0+0io 0pf+0w This difference is mainly due to floating-point math being performed in double-precision in C, vs. single-precision in FORTRAN. -- Tim Olson Advanced Micro Devices (tim@amd.com)
gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/09/89)
In article <16279@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >`COS' is an intrisinc function in Fortran. ... Three other contributions to the difference in running time are: (a) C's cos() computes a double-precision value. (b) The C code required conversion from double to single precision for the assignment. (c) C's semantics required that the multiplications be performed in double precision.
henry@utzoo.uucp (Henry Spencer) (03/09/89)
In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes: >I chanced upon a segment of code that runs approximately 300 times faster in >FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0) >(of course, on Sun4 the -f68881 flag was not used.) The results are similar >on both machines. Can anyone enlighten me on this bizzare result? Two things. First, you're asking for single-precision cosine in Fortran and double-precision in C. Second, Sun's Fortran optimizer is much better than their C optimizer, and it has noticed that you're not *doing* anything with those values and deleted the whole computation. You're timing the C code against an empty Fortran loop. -- Welcome to Mars! Your | Henry Spencer at U of Toronto Zoology passport and visa, comrade? | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
thoth@beach.cis.ufl.edu (Robert Forsman) (03/09/89)
>From: achhabra@uceng.UC.EDU (atul k chhabra) >I chanced upon a segment of code that runs approximately 300 times >faster in FORTRAN than in C. I have tried the code on Sun3(OS3.5) and >on Sun4(OS4.0) (of course, on Sun4 the -f68881 flag was not used.) >The results are similar on both machines. Can anyone enlighten me on >this bizarre result? > for(i=0;i<262144;i++) > tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); > [equivalent FORTRASH code omitted] Simple. Fortran compilers usually optimize code to death. From reading the postings of others on this subject I figure it can do one of several drastic things. Most drastic - skip the computation; the result is never used. #2 - say tmp=cos(2.5)**4, that's all that happens anyway. There are probably others but I should think that your average knowledgeable FORTRAN programmer would spit on anything that did less than number 2. A smart C compiler could come close but you would have to flip a few switches. From what I've heard, FORTRAN compilers have been ludicrously optimizing since the dawn of time (~1950?) and as such are the language of choice for supercomputers and other number crunchers. I would much rather use C but I can't remember any huge interest in optimizing C code to death. Just think what it would do to your timing loops for (i=0; i<6 jillion; i++) {} optimized into nothing. --------------------------------------------------------------------- Just say maybe to .signatures
boyne@hplvli.HP.COM (Art Boyne) (03/09/89)
chris@mimsy.UUCP (Chris Torek) writes: >the Fortran compiler can be certain that COS(2.5) does nothing but >compute cosines, and can change the code to > > TMP = 4.0 * COS(2.5) ^^^^^^^^^^^^^^ make that COS(2.5)**4 >10 CONTINUE Art Boyne, boyne@hplvla.hp.com
fritz@friday.UUCP (Fritz Whittington) (03/10/89)
In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes: >I chanced upon a segment of code that runs approximately 300 times faster in >FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0) . . . > for(i=0;i<262144;i++) > tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); . . . >% time cosc >55.6u 1.0s 1:49 51% 24+8k 12+1io 0pf+0w >^^^^^ >% time cosf >0.2u 0.0s 0:00 75% 16+8k 4+0io 0pf+0w >^^^^ I suspect that the FORTRAN math library has been "memoized" and the C library hasn't. Memoization consists of having a function keep track of prior input-output pairs (at least the one from the previous call, sometimes a small hash table of prior calls); if called again with an input that matches one in its past history, it doesn't have to re-compute the output, simply supply it. You are calling with the same value all the time.... Try replacing the 2.5 with something like (i mod 5000) in both versions and compare again. ---- Fritz Whittington Texas Instruments, Incorporated I don't even claim these opinions myself! MS 3105 UUCP: killer!ernest!friday!fritz 8505 Forest Lane AT&T: (214)480-6302 Dallas, Texas 75243
john@frog.UUCP (John Woods) (03/10/89)
In article <THOTH.89Mar8212933@beach.cis.ufl.edu>, thoth@beach.cis.ufl.edu (Robert Forsman) writes: > >From: achhabra@uceng.UC.EDU (atul k chhabra) > >I chanced upon a segment of code that runs approximately 300 times > >faster in FORTRAN than in C. > > for(i=0;i<262144;i++) > > tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); > > [equivalent FORTRASH code omitted] > Simple. Fortran compilers usually optimize code to death. From > reading the postings of others on this subject I figure it can do one > of several drastic things. > Most drastic - skip the computation; the result is never used. > #2 - say tmp=cos(2.5)**4, that's all that happens > anyway. > #3 tmp = 0.4119472 (since COS is hardwired into FORTRAN and the compiler can evaluate the constant expression itself. > From what I've heard, FORTRAN compilers have been ludicrously > optimizing since the dawn of time An interesting story: when I worked at Lincoln Labs, one group was buying a VAX and wondered whether to run VMS or UNIX. One person there was selected to run a FORTRAN program that they were interested in through the VMS FORTRAN and f77 compilers, and got the rather expected result that VMS FORTRAN created a faster program (I think by 20% on that particular program). But the interesting part is this: he also recoded the program in C, using tricks common to C programmers but not doing any constant expression precalculation, and came up with a program that ran twice as fast as the VMS version. There's a lot to be said for highly optimizing compilers (ask any supercomputer jock), but sometimes a Neanderthal language can get in the way of a clear (and efficient) exposition of one's intent. -- John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101 ...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu "He should be put in stocks in Lafeyette Square across from the White House and pelted with dead cats." - George F. Will
chris@mimsy.UUCP (Chris Torek) (03/11/89)
In article <16279@mimsy.UUCP> I substituted > TMP = 4.0 * COS(2.5) for >> tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); Oops. (What, you mean $4x \ne x^4$? :-) ) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
guy@auspex.UUCP (Guy Harris) (03/11/89)
>Second, Sun's Fortran optimizer is much better than their C optimizer, Not in SunOS 4.0; Sun's FORTRAN 1.1 product uses the same "iropt" optimizer that the SunOS 4.0 68K and SPARC C compiler can use (although the product may carry its own copy of that optimizer with it). However, you have to *ask* for it in the 68K C compiler, by using "-O2"; "-O" defaults to "-O1", which only runs the peephole optimizer. (Future releases may default to "-O2" on 68K-based Suns, as current releases do on SPARC-based Suns and, presumably, Solbournes. I can't speak for the 386i, which does not, as far as I know, currently offer the "iropt" optimizer for C.) FORTRAN 1.1 defaults to "-O3". >and it has noticed that you're not *doing* anything with those values >and deleted the whole computation. You're timing the C code against >an empty Fortran loop. As noted in other articles, even doing "cc -O4" on a Sun probably wouldn't cause the loop to be eliminated, since the (current) Sun C compiler doesn't "know" about "cos" - specifically, doesn't know that it's a "pure" function - and therefore can't safely eliminate calls to it (or even move them outside the loop). (Note: do not extrapolate from the use of "(current)" to a conclusion that future Sun compilers *will* know about "cos", using e.g. the "__builtin_cos" mechanism described in earlier postings. "(current)" was only put there to indicate that future Sun compilers *might* do this.)
guy@auspex.UUCP (Guy Harris) (03/11/89)
>A smart C compiler could come close but you would have to flip a few >switches. And somehow convince it that "cos" is a pure function, e.g. with the "__builtin_cos" mechanism described in other postings. > From what I've heard, FORTRAN compilers have been ludicrously >optimizing since the dawn of time (~1950?) ~1954, as I remember, but I don't know that the original (Backus?) FORTRAN compiler would do the level of optimizing that you describe (especially in non-trivial cases). >I would much rather use C but I can't remember any huge interest in >optimizing C code to death. Well, there's: GCC; the MIPS C compiler; the SunOS 4.0 C compiler, at least on 68K and SPARC; and a number of other vendors' and third-party compilers (the ones listed are the ones I *know* do "aggressive" optimization - I'm sure there are others; I think VMS C, Apollo C, and HP Precision Architecture C do, and there are probably more that do as well); so I see a fair bit of interest in it, at least on the compiler-writers side; presumably, they're not all doing it just for their health, and there's demand for aggressively-optimizing C compilers. >Just think what it would do to your timing loops > for (i=0; i<6 jillion; i++) {} >optimized into nothing. I'd rather think of the good things it can do for the 99.9999999% of code I deal with that's *not* just timing loops (e.g., doing interprocedural register allocation - something you can't do in vanilla C without cheating and "knowing" how the compiler allocates registers; such "knowledge" can become invalid with the next release of the compiler, and may be invalid on compilers for other architectures or even on other compilers for the same architecture - and even if C were modified to allow it, I'm not sure I'd trust myself not to screw up and forget to change one routine when its predecessor or successor on the call chain is changed).
henry@utzoo.uucp (Henry Spencer) (03/12/89)
In article <1144@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes: >>Second, Sun's Fortran optimizer is much better than their C optimizer, > >Not in SunOS 4.0... I don't run SunOs 4.0 -- I have a policy of not running beta-test versions of operating systems. :-) :-( -- Welcome to Mars! Your | Henry Spencer at U of Toronto Zoology passport and visa, comrade? | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
achhabra@uceng.UC.EDU (atul k chhabra) (03/12/89)
Thanks to all who responded to the querry. I have learnt a lot from the responses. Atul
cdold@starfish.Convergent.COM (Clarence Dold) (03/14/89)
From article <688@friday.UUCP>, by fritz@friday.UUCP (Fritz Whittington): >>I chanced upon a segment of code that runs approximately 300 times faster in >>FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0) . . . >> for(i=0;i<262144;i++) >> tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5); . . . What about a Floating Point Chip? Is Fortran configured to use the FPU by default, while the C compiler uses software floating point? -- Clarence A Dold - cdold@starfish.Convergent.COM (408) 434-2083 ...pyramid!ctnews!starfish!cdold P.O.Box 6685, San Jose, CA 95150-6685