jz0t+@andrew.cmu.edu (James Zurlo) (01/02/90)
I'm working on some FORTRAN code that takes between 2 and 3 CPU days to execute on my Personal Iris. Obviously, I'm interested in cutting down execution time. I've run pixie on my code and found that ~90% of the time is spent in one subroutine. I've compiled this subroutine seperately with -O2 optimization level. From some crude timing tests that looks like it will cut execution time in half. I can't apply any optimization to the main code, since it gives me wrong answers. I think this is due to a CALL statment that has a function as one its arguements. I'm still interested in getting my execution time down. I've been reading through the IRIS-4D Series Compiler Guide. It gives some recommendations, most of which I don't understand since I know no C. I've also been reading "FORTRAN Optimization" by Michael Metcalf. It mentions a number of different ways of speeding up my code. It's not clear how his recommendations would change for a RISC architecture. By looking at the assembly language output, I noticed that the optimizer does a good job with exponentiation. However, the optimizer doesn't seem to do strength reduction. Ie, replacing divisions with multiplications, or subtractions with additions. Does DO loop unrolling, to reduce the number of overheads per operation, buy one anything? Does anyone have any specific recommendations, from a FORTRAN perspective, that I can use to speed up my code? Thanks in advance. Jim Zurlo jz0t+@andrew.cmu.edu
doelz@urz.unibas.ch (Reinhard Doelz) (01/04/90)
One could try variuos things to spped up F77. One main drawback is the fixed memory allocation. Within molecular dynamics routines I found out that much time is lost by swapping the memory. If one uses arrays one should keep them as small as possible. I.e., if you store integers in a real array it takes much more space. But even the dimensions of large arrays could influence the spped in an unexplained way. I recognized that it results in higher spped if you set large dimensions for an array more frequently used: it is processed faster than with the (smaller) dimensions needed. The indices are essential: x(3,4) is stored as X11 X21 X31 X21 X22 X32 etc. (Sorry if this is trivial). It is possible the other way round as well. The IRIS will do it, however, kill itself by swapping. You could also try to use multiple assignments to memory, instead or reassigning variables, with EQUIVALENCES (dangerous but sppeds things up a lot, and saves space, too.) We fool around a bit with structures, but that is more for convenience than improving speed. another possibility to make things go faster is to use /dev/mem, but I think MIPS F77 does not support the mem calls of C. Another useful thing is to think about trigonometric functions. If they're used without challenging accuracy too much (e.g., we need horrible ammounts of sines and cosines for the normals in lighting models), it is worth to set up a table once containing the values in, lets say, 180 steps and to skip the time-consuming sin(x) by simply pointing to the table. Maybe this helps a little, Reinhard