[comp.sys.sgi] FORTRAN Optimization

jz0t+@andrew.cmu.edu (James Zurlo) (01/02/90)

I'm working on some FORTRAN code that takes between 2 and 3 CPU
days to execute on my Personal Iris.  Obviously, I'm interested
in cutting down execution time.  I've run pixie on my code and
found that ~90% of the time is spent in one subroutine.  I've
compiled this subroutine seperately with -O2 optimization level.
From some crude timing tests that looks like it will cut execution
time in half.  I can't apply any optimization to the main code,
since it gives me wrong answers.  I think this is due to a
CALL statment that has a function as one its arguements.
I'm still interested in getting my execution time down.  I've been
reading through the IRIS-4D Series Compiler Guide.  It gives some
recommendations, most of which I don't understand since I know
no C.  I've also been reading "FORTRAN Optimization" by Michael
Metcalf.  It mentions a number of different ways of speeding up
my code.  It's not clear how his recommendations would change for
a RISC architecture.  By looking at the assembly language output,
I noticed that the optimizer does a good job with exponentiation.
However, the optimizer doesn't seem to do strength reduction.
Ie,  replacing divisions with multiplications, or subtractions
with additions.  Does DO loop unrolling, to reduce the number
of overheads per operation, buy one anything?  Does anyone have
any specific recommendations, from a FORTRAN perspective, that
I can use to speed up my code?
Thanks in advance.
Jim Zurlo
jz0t+@andrew.cmu.edu

doelz@urz.unibas.ch (Reinhard Doelz) (01/04/90)

One could try variuos things to spped up F77. One main drawback 
is the fixed memory allocation. Within molecular dynamics routines
I found out that much time is lost by swapping the memory. If 
one uses arrays one should keep them as small as possible. I.e., 
if you store integers in a real array it takes much more space.
But even the dimensions of large arrays could influence the spped
in an unexplained way. I recognized that it results in higher 
spped if you set large dimensions for an array more frequently used: it 
is processed faster than with the (smaller) dimensions needed. 
The indices are essential: x(3,4) is stored as X11 X21 X31 X21 X22 X32 etc.
(Sorry if this is trivial). It is possible the other way round as well.
The IRIS will do it, however, kill itself by swapping. You could 
also try to use multiple assignments to memory, instead or reassigning 
variables, with EQUIVALENCES (dangerous but sppeds things up a lot, 
and saves space, too.) We fool around a bit with structures, but 
that is more for convenience than improving speed. another possibility
to make things go faster is to use /dev/mem, but I think MIPS F77 
does not support the mem calls of C. Another useful thing is to think 
about trigonometric functions. If they're used without challenging 
accuracy too much (e.g., we need horrible ammounts of sines and cosines 
for the normals in lighting models), it is worth to set up a table 
once containing the values in, lets say, 180 steps and to skip the 
time-consuming sin(x) by simply pointing to the table. 

Maybe this helps a little, 

Reinhard