t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) (05/21/91)
I am curious whether any of you know why compiling certain routines with a -O flag takes a semi-infinite amount of time on certain computers (Apollo DN1000, Sun Fortran 1.3.1) (i.e. I killed compilation after a few hours CPU time for a single routine). The code in question is generated by a computer algebra program, and looks like this qqgg= n1i*n2i*pbqi*ptqi * ( - 2*pnpe*pbpt**2*qk + 2*pnpe* + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2* + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - ...400 similar lines... As a lot of the time is spent here I'd like to be able to optimize it. Geert Jan van Oldenborgh gj@hep.physik.uni-muenchen.de
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (05/21/91)
In article <1236@nikhefh.nikhef.nl> t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:
qqgg= n1i*n2i*pbqi*ptqi * ( - 2*pnpe*pbpt**2*qk + 2*pnpe*
+ qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2*
+ pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 -
...400 similar lines...
presumably all 400 lines are in a single basic block. The majority of
modern compilers tend towards being a bit naive and try to do vast
amounts of analysis on basic blocks (and often perform transformations
to make bigger ones). However, a basic block can be "too big" and a
clever enough compiler would essentially break it into several loops
(each of which would maximially utilize the register set(s)).
You may very well see improvement both in compile speed and execution
speed if you break up the computation into several smaller loops.
You can rightly critize the compilers behavior, but you probably are
more concerned with improving performance than tossing brickbats ;>
Sun's f77 v1.4 does a bit better on problems of this sort, but we
aren't out of the wood yet.
(btw: adding tons of RAM is also helpful).
--
----------------------------------------------------------------
Keith H. Bierman keith.bierman@Sun.COM| khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33 | (415 336 2648)
Mountain View, CA 94043
system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (05/22/91)
In article <KHB.91May20200715@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes: > >In article <1236@nikhefh.nikhef.nl> t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes: > > qqgg= n1i*n2i*pbqi*ptqi * ( - 2*pnpe*pbpt**2*qk + 2*pnpe* > + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2* > + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - > ...400 similar lines... > > ... >You may very well see improvement both in compile speed and execution >speed if you break up the computation into several smaller loops. You can probably tell the symbolic manipulator to do this - Maple certainly will. It then produces much more intelligible code, both for humans and compilers. >You can rightly critize the compilers behavior, but you probably are >more concerned with improving performance than tossing brickbats ;> I sent in 2 examples of this sort of code, and Apollo's response was "will not be fixed". They didn't agree that the compiler spending over 12 hours of cpu time (and still not producing an object file) on a program was worthy of correction - I did not ask for the compiler to compile the code, just to give a message that the code exceeds its capabilities (and a message to tell the user to simplify the program would be an added bonus :-). Mike. -- Mike Peterson, System Administrator, U/Toronto Department of Chemistry E-mail: system@alchemy.chem.utoronto.ca Tel: (416) 978-7094 Fax: (416) 978-8775
ake@dayton.saic.com (Earle Ake) (05/22/91)
In article <1236@nikhefh.nikhef.nl>, t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes: > I am curious whether any of you know why compiling certain routines with a -O > flag takes a semi-infinite amount of time on certain computers (Apollo DN1000, > Sun Fortran 1.3.1) (i.e. I killed compilation after a few hours CPU time for a > single routine). The code in question is generated by a computer algebra > program, and looks like this > > qqgg= n1i*n2i*pbqi*ptqi * ( - 2*pnpe*pbpt**2*qk + 2*pnpe* > + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2* > + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - > ...400 similar lines... > > As a lot of the time is spent here I'd like to be able to optimize it. My guess is you are running out of physical memory during the optimization and the thing is paging like crazy. Why not try to optimize it some yourself and give the compiler a break? I see a few things to try. xxxx = pnpe*pbpt xxx0 = pnpb*pepb xxx1 = xxx0*pbpt xxx2 = xxx0*qk xxx3 = pnpb*pept xxx4 = xxx3*pbpt xxx5 = xxx3*qk qqgg= n1i*n2i*pbqi*ptqi * ( - 2*xxxx**2*qk + 2*pnpe* + qk*mb2*mt2 + 2*xxx1*qk - 2*xxx2*mt2 + 2* + xxx4*qk + 4*xxx4**2 - xxx5*mb2 - And so on. Move those parts of the equation that are computed more than once outside the LARGE equation. What you give up is readibility unless you start giving the intermediate variables names like pnpbpept = pnpb*pept or pnpbpeptqk = pnpb*pept*qk. Earle _____________________________________________________________________________ ____ ____ ___ Earle Ake /___ /___/ / / Science Applications International Corporation ____// / / /__ Dayton, Ohio ----------------------------------------------------------------------------- Internet: ake@dayton.saic.com uucp: dayvb!ake SPAN: 28284::ake
rchrd@well.sf.ca.us (Richard Friedman) (05/22/91)
t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes: >I am curious whether any of you know why compiling certain routines with a -O >flag takes a semi-infinite amount of time on certain computers > qqgg= n1i*n2i*pbqi*ptqi * ( - 2*pnpe*pbpt**2*qk + 2*pnpe* > + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2* > + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 - > ...400 similar lines... >As a lot of the time is spent here I'd like to be able to optimize it. For starters, just try yourself to find all the common subexpressions in all that garbage so you can minimize the number of unnecessary computations of the same things. Compilers also have to be smart enough to detect that pnpb*pept*pbpt*qk is the same as pept*qk*pnpb*pnbt. That generates lots of linked lists to be searched. If you could get your symbolic code generator to generate stores into temporaries for these subexpressions, I'm sure the optimizer would speed up for you. -- /\=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=/\ \/Richard Friedman RCHRD | rchrd@well.sf.ca.us \/ /\Box 9584 Berkeley CA 94709-0584 | or well!rchrd@apple.com /\ \/=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\/
alan@dmsmelb.mel.dms.CSIRO.AU (Alan Miller) (05/22/91)
In article <1236@nikhefh.nikhef.nl: t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes:
: I am curious whether any of you know why compiling certain routines with a -O
: flag takes a semi-infinite amount of time on certain computers (Apollo DN1000,
: Sun Fortran 1.3.1) (i.e. I killed compilation after a few hours CPU time for a
: single routine). The code in question is generated by a computer algebra
: program, and looks like this
:
: qqgg= n1i*n2i*pbqi*ptqi * ( - 2*pnpe*pbpt**2*qk + 2*pnpe*
: + qk*mb2*mt2 + 2*pnpb*pepb*pbpt*qk - 2*pnpb*pepb*qk*mt2 + 2*
: + pnpb*pept*pbpt*qk + 4*pnpb*pept*pbpt**2 - pnpb*pept*qk*mb2 -
: ...400 similar lines...
:
: As a lot of the time is spent here I'd like to be able to optimize it.
:
: Geert Jan van Oldenborgh
: gj@hep.physik.uni-muenchen.de
Geert sent me the piece of code, which was in the form of a Fortran function
without a main program. I compiled it with the Sun f77 compiler (v 1.3.1)
and with version 2.6.3 of the Edinburgh f77 compiler, with and without the
optimization option -O. The times taken were as follows:
Compiler + options Time
f77 -c 16.1u + 1.8s
f77 -O -c Gave up after 37 mins. of CPU time !
epcf77 -c 5.2u + 0.6s
epcf77 -O -c 156.0u + 3.1s
The Edinburgh compiler certainly struggled with the 485-line function, while
the Sun compiler looks as if it was firmly stuck.
Alan Miller
jlg@cochiti.lanl.gov (Jim Giles) (05/22/91)
In article <24940@well.sf.ca.us>, rchrd@well.sf.ca.us (Richard Friedman) writes: |> t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes: |> [...] Compilers also have to be smart enough |> to detect that pnpb*pept*pbpt*qk is the same as pept*qk*pnpb*pnbt. ^^^^ ^^^^ I assume you meant that _part_ of these expressions are the same. If they were completely identical, the compiler could detect the fact with a simple cannonical ordering on the operands (say, alphabetical on the identifiers). However, _partial_ commonality is an interesting exercise. Consider (A*B*C + A*D*C + B*C*D): common expressions are A*C, B*C, A*D, and C*D. However, no _two_ of these are simultaneously useful as common expressions! Fortunately, stuff like this comes out automatically in the DAG construction. J. Giles
jlg@cochiti.lanl.gov (Jim Giles) (05/22/91)
In article <24204@lanl.gov>, jlg@cochiti.lanl.gov (Jim Giles) writes: |> [...] |> Consider (A*B*C + A*D*C + B*C*D): common expressions are A*C, B*C, A*D, |> and C*D. [...] ^^^ | _______________________________________________________________________| That of course is an error. I don't know why I wrote that. The rest of the discussion is right. J. Giles
rhoward@msd.gatech.edu (Robert L. Howard) (06/05/91)
>In article <1236@nikhefh.nikhef.nl: > t19@nikhefh.nikhef.nl (Geert J v Oldenborgh) writes: >: I am curious whether any of you know why compiling certain routines >: with a -O flag takes a semi-infinite amount of time on certain >: computers (Apollo DN1000, Sun Fortran 1.3.1) >: The code in question is generated by a computer algebra program, Alan Miller responds: >Geert sent me the piece of code... >I compiled it with the Sun f77 compiler (v 1.3.1).... > > Compiler + options Time > f77 -c 16.1u + 1.8s > f77 -O -c Gave up after 37 mins. of CPU time ! Here are the times for Sun f77 1.4. I went through a number of optimization levels... f77 test.f Time=0:22.89 (15.880u, 2.400s) size=596k/1816k avg/max f77 -O1 test.f Time=0:26.06 (20.930u, 2.650s) size=938k/3492k avg/max f77 -O2 test.f Time=7:46.69 (448.790u, 4.880s) size=1595k/3760k avg/max f77 -O3 test.f Time=8:52.24 (480.800u, 6.200s) size=1449k/6244k avg/max ^^ same as -O It takes a long time but it does finish. Robert -- | Robert L. Howard | Georgia Tech Research Institute | | rhoward@msd.gatech.edu | MATD Laboratory | | (404) 528-7165 | Atlanta, Georgia 30332 | ------------------------------------------------------------------------- | "Reality is showing us that perhaps we should do with nuclear | | power the same thing Keloggs is advocating for Corn Flakes - | | Discover it again for the first time." -- John De Armond |