[comp.lang.fortran] why does optimization take so long ?

t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) (05/21/91)

I am curious whether any of you know why compiling certain routines with a -O 
flag takes a semi-infinite amount of time on certain computers (Apollo DN1000, 
Sun Fortran 1.3.1) (i.e. I killed compilation after a few hours CPU time for a 
single routine).  The code in question is generated by a computer algebra 
program, and looks like this

      qqgg= n1i*n2i*pbqi*ptqi * (  - 2*pnpe*pbpt**2*qk + 2*pnpe*
     + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2*
     + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - 
      ...400 similar lines...	

As a lot of the time is spent here I'd like to be able to optimize it.

Geert Jan van Oldenborgh
gj@hep.physik.uni-muenchen.de

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (05/21/91)

In article <1236@nikhefh.nikhef.nl> t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:


	 qqgg= n1i*n2i*pbqi*ptqi * (  - 2*pnpe*pbpt**2*qk + 2*pnpe*
	+ qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2*
	+ pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - 
	 ...400 similar lines...	


presumably all 400 lines are in a single basic block. The majority of
modern compilers tend towards being a bit naive and try to do vast
amounts of analysis on basic blocks (and often perform transformations
to make bigger ones). However, a basic block can be "too big" and a
clever enough compiler would essentially break it into several loops
(each of which would maximially utilize the register set(s)).

You may very well see improvement both in compile speed and execution
speed if you break up the computation into several smaller loops.

You can rightly critize the compilers behavior, but you probably are
more concerned with improving performance than tossing brickbats ;>

Sun's f77 v1.4 does a bit better on problems of this sort, but we
aren't out of the wood yet.

(btw: adding tons of RAM is also helpful).
--
----------------------------------------------------------------
Keith H. Bierman    keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (05/22/91)

In article <KHB.91May20200715@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes:
>
>In article <1236@nikhefh.nikhef.nl> t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:
>
>	 qqgg= n1i*n2i*pbqi*ptqi * (  - 2*pnpe*pbpt**2*qk + 2*pnpe*
>	+ qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2*
>	+ pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - 
>	 ...400 similar lines...	
>
> ...
>You may very well see improvement both in compile speed and execution
>speed if you break up the computation into several smaller loops.

You can probably tell the symbolic manipulator to do this - Maple
certainly will. It then produces much more intelligible code, both for
humans and compilers.

>You can rightly critize the compilers behavior, but you probably are
>more concerned with improving performance than tossing brickbats ;>

I sent in 2 examples of this sort of code, and Apollo's response was
"will not be fixed". They didn't agree that the compiler spending over
12 hours of cpu time (and still not producing an object file) on a
program was worthy of correction - I did not ask for the compiler to
compile the code, just to give a message that the code exceeds its
capabilities (and a message to tell the user to simplify the program 
would be an added bonus :-).

Mike.
-- 
Mike Peterson, System Administrator, U/Toronto Department of Chemistry
E-mail: system@alchemy.chem.utoronto.ca
Tel: (416) 978-7094                  Fax: (416) 978-8775

ake@dayton.saic.com (Earle Ake) (05/22/91)

In article <1236@nikhefh.nikhef.nl>, t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:
> I am curious whether any of you know why compiling certain routines with a -O 
> flag takes a semi-infinite amount of time on certain computers (Apollo DN1000, 
> Sun Fortran 1.3.1) (i.e. I killed compilation after a few hours CPU time for a 
> single routine).  The code in question is generated by a computer algebra 
> program, and looks like this
> 
>       qqgg= n1i*n2i*pbqi*ptqi * (  - 2*pnpe*pbpt**2*qk + 2*pnpe*
>      + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2*
>      + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - 
>       ...400 similar lines...	
> 
> As a lot of the time is spent here I'd like to be able to optimize it.

	My guess is you are running out of physical memory during the
optimization and the thing is paging like crazy.  Why not try to optimize it
some yourself and give the compiler a break?  I see a few things to try.


      xxxx = pnpe*pbpt
      xxx0 = pnpb*pepb
      xxx1 = xxx0*pbpt
      xxx2 = xxx0*qk
      xxx3 = pnpb*pept
      xxx4 = xxx3*pbpt
      xxx5 = xxx3*qk

      qqgg= n1i*n2i*pbqi*ptqi * (  - 2*xxxx**2*qk + 2*pnpe*
     + qk*mb2*mt2 + 2*xxx1*qk - 2*xxx2*mt2 + 2*
     + xxx4*qk + 4*xxx4**2 - xxx5*mb2 - 

And so on.  Move those parts of the equation that are computed more than once
outside the LARGE equation.  What you give up is readibility unless you start
giving the intermediate variables names like pnpbpept = pnpb*pept or
pnpbpeptqk = pnpb*pept*qk.



Earle
_____________________________________________________________________________
             ____ ____    ___
Earle Ake   /___ /___/ / /     Science Applications International Corporation
           ____//   / / /__                 Dayton, Ohio
-----------------------------------------------------------------------------
Internet: ake@dayton.saic.com        uucp: dayvb!ake         SPAN: 28284::ake

rchrd@well.sf.ca.us (Richard Friedman) (05/22/91)

t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:

>I am curious whether any of you know why compiling certain routines with a -O 
>flag takes a semi-infinite amount of time on certain computers
>      qqgg= n1i*n2i*pbqi*ptqi * (  - 2*pnpe*pbpt**2*qk + 2*pnpe*
>     + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2*
>     + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - 
>      ...400 similar lines...	

>As a lot of the time is spent here I'd like to be able to optimize it.

For starters, just try yourself to find all the common subexpressions
in all that garbage so you can minimize the number of unnecessary
computations of the same things.  Compilers also have to be smart enough
to detect that   pnpb*pept*pbpt*qk  is the same as pept*qk*pnpb*pnbt.
That generates lots of linked lists to be searched.  
If you could get your symbolic code generator to generate stores into
temporaries for these subexpressions, I'm sure the optimizer would
speed up for you.
-- 
/\=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=/\
\/Richard Friedman     RCHRD      |     rchrd@well.sf.ca.us  \/
/\Box 9584 Berkeley CA 94709-0584 | or well!rchrd@apple.com  /\
\/=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\/

alan@dmsmelb.mel.dms.CSIRO.AU (Alan Miller) (05/22/91)

In article <1236@nikhefh.nikhef.nl:  t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:
: I am curious whether any of you know why compiling certain routines with a -O 
: flag takes a semi-infinite amount of time on certain computers (Apollo DN1000, 
: Sun Fortran 1.3.1) (i.e. I killed compilation after a few hours CPU time for a 
: single routine).  The code in question is generated by a computer algebra 
: program, and looks like this
: 
:       qqgg= n1i*n2i*pbqi*ptqi * (  - 2*pnpe*pbpt**2*qk + 2*pnpe*
:      + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2*
:      + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - 
:       ...400 similar lines...	
: 
: As a lot of the time is spent here I'd like to be able to optimize it.
: 
: Geert Jan van Oldenborgh
: gj@hep.physik.uni-muenchen.de


Geert sent me the piece of code, which was in the form of a Fortran function
without a main program.   I compiled it with the Sun f77 compiler (v 1.3.1)
and with version 2.6.3 of the Edinburgh f77 compiler, with and without the
optimization option -O.   The times taken were as follows:

   Compiler + options    Time
   f77 -c               16.1u + 1.8s
   f77 -O -c            Gave up after 37 mins. of CPU time !
   epcf77 -c             5.2u + 0.6s
   epcf77 -O -c        156.0u + 3.1s

The Edinburgh compiler certainly struggled with the 485-line function, while
the Sun compiler looks as if it was firmly stuck.

Alan Miller

jlg@cochiti.lanl.gov (Jim Giles) (05/22/91)

In article <24940@well.sf.ca.us>, rchrd@well.sf.ca.us (Richard Friedman) writes:
|> t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:
|> [...]                         Compilers also have to be smart enough
|> to detect that   pnpb*pept*pbpt*qk  is the same as pept*qk*pnpb*pnbt.
                              ^^^^                                 ^^^^
I assume you meant that _part_ of these expressions are the same.  If 
they were completely identical, the compiler could detect the fact with
a simple cannonical ordering on the operands (say, alphabetical on the
identifiers).  However, _partial_ commonality is an interesting exercise.
Consider (A*B*C + A*D*C + B*C*D): common expressions are A*C, B*C, A*D,
and C*D.  However, no _two_ of these are simultaneously useful as common
expressions!  Fortunately, stuff like this comes out automatically in 
the DAG construction.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (05/22/91)

In article <24204@lanl.gov>, jlg@cochiti.lanl.gov (Jim Giles) writes:
|> [...]
|> Consider (A*B*C + A*D*C + B*C*D): common expressions are A*C, B*C, A*D,
|> and C*D.  [...]                                                    ^^^
                                                                       |
_______________________________________________________________________|

That of course is an error.  I don't know why I wrote that.  The rest
of the discussion is right.

J. Giles

rhoward@msd.gatech.edu (Robert L. Howard) (06/05/91)

>In article <1236@nikhefh.nikhef.nl:
>	t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:
>: I am curious whether any of you know why compiling certain routines
>: with a -O flag takes a semi-infinite amount of time on certain
>: computers (Apollo DN1000, Sun Fortran 1.3.1)
>: The code in question is generated by a computer algebra program,

Alan Miller responds:

>Geert sent me the piece of code...
>I compiled it with the Sun f77 compiler (v 1.3.1)....
>
>   Compiler + options    Time
>   f77 -c               16.1u + 1.8s
>   f77 -O -c            Gave up after 37 mins. of CPU time !

Here are the times for Sun f77 1.4.  I went through a number of
optimization levels...

f77 test.f	Time=0:22.89 (15.880u, 2.400s)  size=596k/1816k avg/max
f77 -O1 test.f	Time=0:26.06 (20.930u, 2.650s)  size=938k/3492k avg/max
f77 -O2 test.f	Time=7:46.69 (448.790u, 4.880s) size=1595k/3760k avg/max
f77 -O3 test.f	Time=8:52.24 (480.800u, 6.200s) size=1449k/6244k avg/max
     ^^
same as -O

It takes a long time but it does finish.

Robert
--
| Robert L. Howard             |    Georgia Tech Research Institute     |
| rhoward@msd.gatech.edu       |    MATD Laboratory                     |
| (404) 528-7165               |    Atlanta, Georgia  30332             |
-------------------------------------------------------------------------
|     "Reality is showing us that perhaps we should do with nuclear     |
|      power the same thing Keloggs is advocating for Corn Flakes -     |
|      Discover it again for the first time." -- John De Armond         |