[comp.lang.c] Fortran computes cosine 300 times faster than C

achhabra@uceng.UC.EDU (atul k chhabra) (03/08/89)

I chanced upon a segment of code that runs approximately 300 times faster in
FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0)
(of course, on Sun4 the -f68881 flag was not used.) The results are similar
on both machines. Can anyone enlighten me on this bizzare result?

Listing of cosc.c:
--------------------------------------------------------------------------------
/*
 * Compile using:
 *	cc -f68881 -O -o cosc cosc.c -lm.
 */

#include <math.h>

main()
{
    int i;
    float tmp;

    for(i=0;i<262144;i++)
	tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);
}
--------------------------------------------------------------------------------

Listing of cosf.f
--------------------------------------------------------------------------------
c
c	Compile using:
c		f77 -f68881 -O -o cosf cosf.f
c
	program cosf
	integer i
	real tmp

	do 10 i=1,262144
		tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5)
10	continue
	end
--------------------------------------------------------------------------------

Timings on Sun3(OS3.5):
--------------------------------------------------------------------------------
% time cosc
55.6u 1.0s 1:49 51% 24+8k 12+1io 0pf+0w
^^^^^
% time cosf
0.2u 0.0s 0:00 75% 16+8k 4+0io 0pf+0w
^^^^
--------------------------------------------------------------------------------

===========================================================================
Atul Chhabra, Dept. of Electrical & Computer Engineering, ML 030,
University of Cincinnati, Cincinnati, OH 45221-0030.

voice: (513)556-4766  INTERNET: achhabra@ucesp1.ece.uc.edu
                                OR achhabra@uceng.uc.edu
===========================================================================

chris@mimsy.UUCP (Chris Torek) (03/08/89)

In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes:
>I chanced upon a segment of code that runs approximately 300 times faster in
>FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0)
>(of course, on Sun4 the -f68881 flag was not used.) The results are similar
>on both machines. Can anyone enlighten me on this bizzare result?

`COS' is an intrisinc function in Fortran.  This means that the compiler
is required to know about it.  It is typically provided as an external
function in C, so that the compiler knows nothing of it.  Thus:

>    for(i=0;i<262144;i++)
>	tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);

makes the compiler call `cos' (262144*4) times, each time with the same
argument, and multiply all those values together.  The compiler does not
`guess at' the function and assume that, since its value is not used
the first 262143 times, eliminate the call, because `cos' might print
`hello world'.

On the other hand, given

>	do 10 i=1,262144
>		tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5)
>10	continue

the Fortran compiler can be certain that COS(2.5) does nothing but
compute cosines, and can change the code to

	TMP = 4.0 * COS(2.5)
10	CONTINUE

possibly even replacing the COS(2.5) with the constant -.8011436155....
(Actually, since in both fragment, tmp is unused, both versions can
elide the assignment to tmp and the C version can elide the four multiplies
per iteration.  It cannot, however, replace the four calls wtih a single
call.)

Now, if Sun had a pANS-conformant compiler, they could make <math.h>
do something like

	#define cos(x) __intrinsic_cos(x)

and recognise calls to `__intrinsic_cos'.  This sort of optimisation
does have a real effect on real code (as opposed to silly examples like
calling cos four times with the same constant in a loop that runs 262144
times, then throwing away the result).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

tim@crackle.amd.com (Tim Olson) (03/09/89)

In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes:
| I chanced upon a segment of code that runs approximately 300 times faster in
| FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0)
| (of course, on Sun4 the -f68881 flag was not used.) The results are similar
| on both machines. Can anyone enlighten me on this bizzare result?

Welcome to the world of benchmarking.

You can see what happened if you take a look at the assembly-language
generated by the compilers.  In the FORTRAN version, there is no call to
the cosine routine; only an empty loop remains.  This is because cosine
is a FORTRAN intrinsic which the compiler knows about.  Since you didn't
use any of the results of the cosine calls, the compiler was able to
eliminate it entirely as "dead code".

The C version had to keep the cosine function calls, because it isn't an
intrinsic function in K&R C, so the compiler knows nothing of what it
does (it may have side-effects).

To get more realistic numbers, you have to "fake out" the compiler, by
using the results of the calls:

________________________________________
/*
 * Compile using:
 *      cc -f68881 -O -o cosc cosc.c -lm.
 */

#include <math.h>

float bench()
{
	int i;
	float tmp;
	
	for(tmp=0.0,i=0;i<262144;i++)
        	tmp+=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);
	return tmp;
}

main()
{
	float tmp;

	tmp = bench();
}
________________________________________
c               f77 -f68881 -O -o cosf cosf.f
c
	real function bench()
        integer i
        real tmp


	tmp = 0.0
        do 10 i=1,262144
                tmp = tmp+cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5)
10      continue

	bench = tmp
	end



        program cosf
	real tmp1

	tmp1 = bench()
        end
________________________________________

On a Sun 4/110:

crackle49 time cosc
35.3u 0.5s 0:37 95% 0+144k 1+0io 2pf+0w
crackle50 time cosf
19.4u 0.3s 0:20 96% 0+232k 0+0io 0pf+0w

This difference is mainly due to floating-point math being performed in
double-precision in C, vs. single-precision in FORTRAN.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/09/89)

In article <16279@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>`COS' is an intrisinc function in Fortran.  ...

Three other contributions to the difference in running time are:
	(a) C's cos() computes a double-precision value.
	(b) The C code required conversion from double to
		single precision for the assignment.
	(c) C's semantics required that the multiplications
		be performed in double precision.

henry@utzoo.uucp (Henry Spencer) (03/09/89)

In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes:
>I chanced upon a segment of code that runs approximately 300 times faster in
>FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0)
>(of course, on Sun4 the -f68881 flag was not used.) The results are similar
>on both machines. Can anyone enlighten me on this bizzare result?

Two things.  First, you're asking for single-precision cosine in Fortran
and double-precision in C.  Second, Sun's Fortran optimizer is much
better than their C optimizer, and it has noticed that you're not *doing*
anything with those values and deleted the whole computation.  You're
timing the C code against an empty Fortran loop.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

thoth@beach.cis.ufl.edu (Robert Forsman) (03/09/89)

>From: achhabra@uceng.UC.EDU (atul k chhabra)

>I chanced upon a segment of code that runs approximately 300 times
>faster in FORTRAN than in C. I have tried the code on Sun3(OS3.5) and
>on Sun4(OS4.0) (of course, on Sun4 the -f68881 flag was not used.)
>The results are similar on both machines. Can anyone enlighten me on
>this bizarre result?

>    for(i=0;i<262144;i++)
>	tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);

>	[equivalent FORTRASH code omitted]

Simple.  Fortran compilers usually optimize code to death.  From
reading the postings of others on this subject I figure it can do one
of several drastic things.
  Most drastic	- skip the computation;  the result is never used.
  #2		- say	tmp=cos(2.5)**4, that's all that happens
			anyway.

There are probably others but I should think that your average
knowledgeable FORTRAN programmer would spit on anything that did less
than number 2.  A smart C compiler could come close but you would have
to flip a few switches.
	From what I've heard, FORTRAN compilers have been ludicrously
optimizing since the dawn of time (~1950?) and as such are the
language of choice for supercomputers and other number crunchers.  I
would much rather use C but I can't remember any huge interest in
optimizing C code to death.  Just think what it would do to your
timing loops 
  for (i=0; i<6 jillion; i++) {}
optimized into nothing.

---------------------------------------------------------------------
Just say maybe to .signatures

boyne@hplvli.HP.COM (Art Boyne) (03/09/89)

chris@mimsy.UUCP (Chris Torek) writes:

>the Fortran compiler can be certain that COS(2.5) does nothing but
>compute cosines, and can change the code to
>
>	TMP = 4.0 * COS(2.5)
              ^^^^^^^^^^^^^^ make that COS(2.5)**4
>10	CONTINUE

Art Boyne, boyne@hplvla.hp.com

fritz@friday.UUCP (Fritz Whittington) (03/10/89)

In article <765@uceng.UC.EDU> achhabra@uceng.UC.EDU (atul k chhabra) writes:
>I chanced upon a segment of code that runs approximately 300 times faster in
>FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0)
  . . .
>    for(i=0;i<262144;i++)
>	tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);
  . . .
>% time cosc
>55.6u 1.0s 1:49 51% 24+8k 12+1io 0pf+0w
>^^^^^
>% time cosf
>0.2u 0.0s 0:00 75% 16+8k 4+0io 0pf+0w
>^^^^
I suspect that the FORTRAN math library has been "memoized" and the C
library hasn't.  Memoization consists of having a function keep track of
prior input-output pairs (at least the one from the previous call,
sometimes a small hash table of prior calls); if called again with an
input that matches one in its past history, it doesn't have to
re-compute the output, simply supply it.  You are calling with the same
value all the time.... Try replacing the 2.5 with something like (i mod
5000) in both versions and compare again.

---- 
Fritz Whittington                               Texas Instruments, Incorporated
I don't even claim these opinions myself!       MS 3105
UUCP: killer!ernest!friday!fritz                8505 Forest Lane
AT&T: (214)480-6302                             Dallas, Texas  75243

john@frog.UUCP (John Woods) (03/10/89)

In article <THOTH.89Mar8212933@beach.cis.ufl.edu>, thoth@beach.cis.ufl.edu (Robert Forsman) writes:
> >From: achhabra@uceng.UC.EDU (atul k chhabra)
> >I chanced upon a segment of code that runs approximately 300 times
> >faster in FORTRAN than in C.
> >    for(i=0;i<262144;i++)
> >	tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);
> >	[equivalent FORTRASH code omitted]
> Simple.  Fortran compilers usually optimize code to death.  From
> reading the postings of others on this subject I figure it can do one
> of several drastic things.
>   Most drastic	- skip the computation;  the result is never used.
>   #2		- say	tmp=cos(2.5)**4, that's all that happens
> 			anyway.
> 
    #3			tmp = 0.4119472		(since COS is hardwired into
						FORTRAN and the compiler can
						evaluate the constant
						expression itself.

> 	From what I've heard, FORTRAN compilers have been ludicrously
> optimizing since the dawn of time

An interesting story:  when I worked at Lincoln Labs, one group was buying
a VAX and wondered whether to run VMS or UNIX.  One person there was selected
to run a FORTRAN program that they were interested in through the VMS FORTRAN
and f77 compilers, and got the rather expected result that VMS FORTRAN created
a faster program (I think by 20% on that particular program).  But the
interesting part is this:  he also recoded the program in C, using tricks common
to C programmers but not doing any constant expression precalculation, and
came up with a program that ran twice as fast as the VMS version.

There's a lot to be said for highly optimizing compilers (ask any supercomputer
jock), but sometimes a Neanderthal language can get in the way of a clear (and
efficient) exposition of one's intent.

-- 
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

"He should be put in stocks in Lafeyette Square across from the White House
 and pelted with dead cats."	- George F. Will

chris@mimsy.UUCP (Chris Torek) (03/11/89)

In article <16279@mimsy.UUCP> I substituted
>	TMP = 4.0 * COS(2.5)

for

>>	tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);

Oops.  (What, you mean $4x \ne x^4$? :-) )
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

guy@auspex.UUCP (Guy Harris) (03/11/89)

>Second, Sun's Fortran optimizer is much better than their C optimizer,

Not in SunOS 4.0; Sun's FORTRAN 1.1 product uses the same "iropt"
optimizer that the SunOS 4.0 68K and SPARC C compiler can use (although
the product may carry its own copy of that optimizer with it).  However,
you have to *ask* for it in the 68K C compiler, by using "-O2"; "-O"
defaults to "-O1", which only runs the peephole optimizer.  (Future
releases may default to "-O2" on 68K-based Suns, as current releases do
on SPARC-based Suns and, presumably, Solbournes.  I can't speak for the
386i, which does not, as far as I know, currently offer the "iropt"
optimizer for C.)  FORTRAN 1.1 defaults to "-O3".

>and it has noticed that you're not *doing* anything with those values
>and deleted the whole computation.  You're timing the C code against
>an empty Fortran loop.

As noted in other articles, even doing "cc -O4" on a Sun probably
wouldn't cause the loop to be eliminated, since the (current) Sun C
compiler doesn't "know" about "cos" - specifically, doesn't know that
it's a "pure" function - and therefore can't safely eliminate calls to
it (or even move them outside the loop).

(Note: do not extrapolate from the use of "(current)" to a conclusion
that future Sun compilers *will* know about "cos", using e.g. the
"__builtin_cos" mechanism described in earlier postings.  "(current)"
was only put there to indicate that future Sun compilers *might* do
this.)

guy@auspex.UUCP (Guy Harris) (03/11/89)

>A smart C compiler could come close but you would have to flip a few
>switches.

And somehow convince it that "cos" is a pure function, e.g. with the
"__builtin_cos" mechanism described in other postings.

>	From what I've heard, FORTRAN compilers have been ludicrously
>optimizing since the dawn of time (~1950?)

~1954, as I remember, but I don't know that the original (Backus?)
FORTRAN compiler would do the level of optimizing that you describe
(especially in non-trivial cases).

>I would much rather use C but I can't remember any huge interest in
>optimizing C code to death.

Well, there's:

	GCC;

	the MIPS C compiler;

	the SunOS 4.0 C compiler, at least on 68K and SPARC;

	and a number of other vendors' and third-party compilers (the
	ones listed are the ones I *know* do "aggressive" optimization -
	I'm sure there are others; I think VMS C, Apollo C, and HP
	Precision Architecture C do, and there are probably more that do
	as well);

so I see a fair bit of interest in it, at least on the compiler-writers
side; presumably, they're not all doing it just for their health, and
there's demand for aggressively-optimizing C compilers.

>Just think what it would do to your timing loops 
>  for (i=0; i<6 jillion; i++) {}
>optimized into nothing.

I'd rather think of the good things it can do for the 99.9999999% of
code I deal with that's *not* just timing loops (e.g., doing
interprocedural register allocation - something you can't do in vanilla
C without cheating and "knowing" how the compiler allocates registers;
such "knowledge" can become invalid with the next release of the
compiler, and may be invalid on compilers for other architectures or
even on other compilers for the same architecture - and even if C were
modified to allow it, I'm not sure I'd trust myself not to screw up and
forget to change one routine when its predecessor or successor on the
call chain is changed). 

henry@utzoo.uucp (Henry Spencer) (03/12/89)

In article <1144@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
>>Second, Sun's Fortran optimizer is much better than their C optimizer,
>
>Not in SunOS 4.0...

I don't run SunOs 4.0 -- I have a policy of not running beta-test versions
of operating systems.  :-) :-(
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

achhabra@uceng.UC.EDU (atul k chhabra) (03/12/89)

Thanks to all who responded to the querry.
I have learnt a lot from the responses.

Atul

cdold@starfish.Convergent.COM (Clarence Dold) (03/14/89)

From article <688@friday.UUCP>, by fritz@friday.UUCP (Fritz Whittington):
>>I chanced upon a segment of code that runs approximately 300 times faster in
>>FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on Sun4(OS4.0)
   . . .
>>    for(i=0;i<262144;i++)
>>	tmp=cos(2.5)*cos(2.5)*cos(2.5)*cos(2.5);
   . . .
What about a Floating Point Chip?
Is Fortran configured to use the FPU by default, while the C compiler
uses software floating point?

-- 
Clarence A Dold - cdold@starfish.Convergent.COM         (408) 434-2083
                ...pyramid!ctnews!starfish!cdold         
                P.O.Box 6685, San Jose, CA 95150-6685