[comp.sys.sun] Fortran 300 times faster than C cosine

prl@eiger.uucp (04/21/89)

achhabra@uceng.uc.edu (atul k chhabra):

> I chanced upon a segment of code that runs approximately 300 times faster
> in FORTRAN than in C. I have tried the code on Sun3(OS3.5) and on
> Sun4(OS4.0) (of course, on Sun4 the -f68881 flag was not used.) The
> results are similar on both machines. Can anyone enlighten me on this
> bizzare result?...
> [[ ...For f77, "-O" defaults to "-O3", which
> includes global optimization.  Presumably it also does dead code
> elimination.... --wnl ]]

> Atul Chhabra, Dept. of Electrical & Computer Engineering, ML 030,
> University of Cincinnati, Cincinnati, OH 45221-0030.

Wnl is correct in his analysis, but the suggestion that placing a

	print *,tmp

will defeat global optimisation is wrong (demonstrably in some non-Sun
compilers).

There are a number of optimisations which can be applied to the program.

1.	(cos(2.5)**4) is a constant which can be calculated at compile time.

2.	The assignment tmp = cos(2.5)**4 is independent of the loop variable
		and so can be moved out of the loop.

3.	After the assignment is moved out of the loop, the loop is empty,
		so it can be replaced by the assignment
			i = 262144

4.	Since neither `tmp' nor `i' are used in expressions which affect
	the global state of the program or the outside world, their calculation
	can be discarded entirely.

The statement
	print *,tmp

only disables optimisation 4. (3 and 4 are the only optimisations being
applied by Sun's compiler: in the order (4) on tmp, (3) on the now-empty
loop, then (4) on i).  Alliant's compiler with global optimisation turned
on applies 1, 2 and 4, but not 3).

In defense of Alliant's compiler writers, optimisation 3 is of little
practical use, since in any reasonable program *all* loops should have
some code which can't be moved outside the loop (otherwise **WHY** is
there a loop there anyway?).

A good optimising compiler should be able to turn Atul's program
(including the print statement) into equivalently:

	tmp=0.411947.... (cos(2.5)**4)
	print *,tmp

Writing synthetic benchmarks which cannot be defeated in `unreasonable'
ways by good optimising compilers is **very** difficult.

You will often need to look at the assembly code, or better, run some real
application set that is important to you, or a benchmark set which is very
careful about such things....

Peter Lamb
uucp:  uunet!mcvax!ethz!prl     eunet: prl@ethz.uucp    Tel:   +411 256 5241
Integrated Systems Laboratory
ETH-Zentrum, 8092 Zurich