[comp.benchmarks] benchmarks and interprocedural optimization

metzger@convex.com (Robert Metzger) (03/22/91)

>	Article 326 of comp.benchmarks:
>	Path: convex!texsun!sundc!seismo!dimacs.rutgers.edu!mips!sdd.hp.com!ucsd!ogicse!borasky
>	Date: 3 Jan 91 15:46:45 GMT
>	From: borasky@ogicse.ogi.edu (M. Edward Borasky)

>	Even discounting blatant cheats like the ones I describe above, there
>	is a wider issue here about how much COMPILE time a user is willing 
>	to expend for "perfect" compiles.  It is possible using techniques of
>	whole-program compiling to optimize out all the unnecessary checks in
>	the LINPACK benchmark -- it's simple enough and well-structured enough
>	so that a compiler can "realize" that DAXPY is always called with inc-
>	rement 1, that the DAXPY can be in-lined, that N is always greater than
>	zero, that one-trip DO loops will work, etc.  But these optimizations
>	take time -- so much time that on real codes larger than LINPACK, one 
>	of two things will happen:

>	1. The user will think the compiler has gone into an endless loop and
>	zap the compile, or

>	There ARE users who claim they will tolerate long optimization times;
>	my experience in Marketing was that such were few and far between.  
>	The main uses for such a compiler are large third-party application
>	codes that are seldom compiled -- once or twice a year for the ones I
>	worked with.  But the size of these codes works AGAINST the whole-
>	program compiler!

CONVEX's experience with its new Application Compiler (performs
interprocedural optimization on codes in written in FORTRAN and/or C)
contradicts these assertions.

First, properly implemented interprocedural optimization does not take
forever.  Our experience is that it takes, on average 2-3 times as much
CPU time to do interprocedural compilation as procedural compilation.
More importantly, the time usually grows linearly, with the size of
the application. 

I will give you one data point.  One of the larger applications we have
compiled had 214,500 lines of FORTRAN source in 971 source files.  It took
a total of 5 hours and 26 minutes wall clock time to compile and link on
a CONVEX C-2. This is about 660 lines per minute.  Of that time, 2 hours
and 43 minutes were spent in the interprocedural phase and 32 minutes in
the linker.  Lots of IP optimizations were performed -- 1576 calls inlined,
1774 constants propagated, etc, etc.

Second, our users seem to be quite willing to wait a little longer to get
optimizations and diagnostics they can get no other way. 
We have a lot of users who measure their job runs in DAYS, and they are quite
capable of seeing the benefit in spending some extra hours once in a while
when they compile in order to save days when they run their jobs every week.

To prevent worries about "endless loops" we provide a "verbose" option that
tells the user what step it is working as it goes along.  

Our users are also smart enough to use the right tool for the right job --
they don't use the interprocedural compiler to do syntax checking of newly
written code.  They do use it for semantic checking of code that supposedly
works (being ported from another machine) and for optimization, once the
application is fully tested using the existing procedure compilers.

One other note:  interprocedural optimization will render many industry 
standard benchmarks meaningless.  Those which measure call overhead
or depend on procedure calls to keep information from the compiler have 
already been trashed (like whetstone).
--
Robert Metzger		CONVEX Computer Corp.  		Richardson, Texas
Generic Disclaimer:	I write software, not CONVEX Corporate Policies.
"The only legitimate function of government is to protect its citizens from
harm or property loss by violence, stealth, or fraud.  All else is tyranny."

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (03/22/91)

>>>>> On 21 Mar 91 18:44:28 GMT, metzger@convex.com (Robert Metzger) said:

>>Article 326 of comp.benchmarks:
>>Date: 3 Jan 91 15:46:45 GMT
>>From: borasky@ogicse.ogi.edu (M. Edward Borasky)

>>Even discounting blatant cheats like the ones I describe above, there
>>is a wider issue here about how much COMPILE time a user is willing 
>>to expend for "perfect" compiles.  It is possible using techniques of
>>whole-program compiling to optimize out all the unnecessary checks in
>>the LINPACK benchmark -- [....]

Robert> CONVEX's experience with its new Application Compiler (performs
Robert> interprocedural optimization on codes in written in FORTRAN and/or C)
Robert> contradicts these assertions.

I was told by a source within Convex that the Application Compiler did
not, in fact, manage to eliminate all of the unnecessary checks in the
level-1 BLAS routines used in the LINPACK code.  The reason had
something to do with being able to optimize expressions after the
inlining, but not logical predicates -- though I never really
understood the explanation.

Is there a "definitive answer" from Convex on this?



--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

metzger@convex.com (Robert Metzger) (03/22/91)

In article <MCCALPIN.91Mar21152638@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
>
>I was told by a source within Convex that the Application Compiler did
>not, in fact, manage to eliminate all of the unnecessary checks in the
>level-1 BLAS routines used in the LINPACK code.  The reason had
>something to do with being able to optimize expressions after the
>inlining, but not logical predicates -- though I never really
>understood the explanation.
>
>Is there a "definitive answer" from Convex on this?

There is a least one useless test in LINPACK which requires
predicate (eg. n>=1&&n<=100) propagation to delete.  The Application
Compiler performs constant propagation (eg. n==100) in V1.0, but not
generalized predicate propagation.  Sooner or later it will.

--
Robert Metzger		CONVEX Computer Corp.  		Richardson, Texas
Generic Disclaimer:	I write software, not CONVEX Corporate Policies.
"The only legitimate function of government is to protect its citizens from
harm or property loss by violence, stealth, or fraud.  All else is tyranny."

eugene@nas.nasa.gov (Eugene N. Miya) (03/22/91)

In article <1991Mar21.184428.8226@convex.com> metzger@convex.com
(Robert Metzger) writes:
>One other note:  interprocedural optimization will render many industry 
>standard benchmarks meaningless.

Just getting back......
You should be careful about this.
A company, who shall go unnamed, gave us some of their latest software.
We scheduled an expensive stand-alone test of a super (this is a few years
back) expecting to see several CPUs in use.  One CPU was used.  It was
a mistake, being developed on a uniprocessor system, just a simple default
initialization problem.

The problem is how companies like yours go about testing fixes and
optimizations.  Standard test suites exist for language syntax conformance,
deviance, but no such standard set of tests exists for language optimizations.
There needs to be some testing consistency for all this fancy software,
otherwise, how do we have any assurance said features are working.

That's one reason why I just regard this as another form of testing,
and nothing special.  Of course, no company wants embarassing test results.

--e. nobuo utsunomiya, NASA Ames Research Center, eugene@orville.nas.nasa.gov
  {uunet,mailrus,other gateways}!ames!eugene
  AMERICA: CHANGE IT OR LOSE IT.

metzger@convex.com (Robert Metzger) (03/24/91)

In article <1991Mar21.213144.14636@nas.nasa.gov> eugene@wilbur.nas.nasa.gov (Eugene N. Miya) writes:
>In article <1991Mar21.184428.8226@convex.com> metzger@convex.com
>(Robert Metzger) writes:
>>One other note:  interprocedural optimization will render many industry 
>>standard benchmarks meaningless.
>
>Just getting back......
>You should be careful about this.
>
>The problem is how companies like yours go about testing fixes and
>optimizations.  Standard test suites exist for language syntax conformance,
>deviance, but no such standard set of tests exists for language optimizations.
>There needs to be some testing consistency for all this fancy software,
>otherwise, how do we have any assurance said features are working.
>
>That's one reason why I just regard this as another form of testing,
>and nothing special.  Of course, no company wants embarassing test results.

I appreciate your kind concern, but I'm not sure what your're warning me
about :-)

I'll elaborate on my point:
1) Compilers that perform interprocedural optimization render meaningless
   the results of simple-minded benchmarks.  For example, one of the first
   things a well-meaning technical-sales-support person at CONVEX did when he
   got a crack at the Application Compiler was to run whetstone through it.
   Sure enough, since whetstone attempts to measure call overhead and the APC
   attempts to minimize call overhead, the results made it look like CONVEX
   had released a new generation of hardware. 
   
   We didn't release those numbers to the public, but there's nothing that
   prevents an unsophisticated buyer from doing the same thing and being
   misled.  (Of course if he's buying million dollar computers based on the
   results of such stupid benchmarks, maybe he deserves what he gets.)

2) Compilers that perform interprocedural optimization cannot be fooled
   by subroutine calls into not performing optimizations.  This is a common
   trick in some widely used benchmarks.   Well, an IPO compiler will
   certainly perform thorough IP side-effect analysis, and it will know
   exactly what is going on inside the procedure.  It may automatically
   perform procedure inlining as well.  Procedure calls no longer prevent
   compilers from moving assignments and uses, or eliminating them altogether,
   where procedure compilers could not do so.  Procedure calls are no longer
   firewalls.  

3) Compilers that perform interprocedural optimization perform a lot more
   compile-time evaluation of your code.   Basically, if you don't read
   your all data from a data file at runtime, the compiler could potentially
   evaluate your expressions at compile time and replace your application
   with a bunch of PRINT statements.  

I will be the first to say that testing interprocedural optimizers is very
challenging.  In some senses, a 100,000 line application is ONE test case.
Which means you have to run hundreds of applications and millions of lines
of code through the compiler before you trust it.

On the other hand, our experience with an interprocedural compiler is that 
the compiler find a dozens or hundreds of bugs in every application for
every one bug the application finds in the compiler.  Logic errors, abuse
of language features, and numerically unstable algorithms are all exposed
by this new technology.  From my perspective, 99.99% of all scientific/
engineering FORTRAN applications are shot through with numerous errors,
and I'm far more concerned about the reliability of the applications than
the compilers at this point.
--
Robert Metzger		CONVEX Computer Corp.  		Richardson, Texas
Generic Disclaimer:	I write software, not CONVEX Corporate Policies.
"The only legitimate function of government is to protect its citizens from
harm or property loss by violence, stealth, or fraud.  All else is tyranny."

blee@digi.lonestar.org (Benjamin W. Lee) (03/27/91)

In article <1991Mar21.184428.8226@convex.com> metzger@convex.com (Robert Metzger) writes:
>
>I will give you one data point.  One of the larger applications we have
>compiled had 214,500 lines of FORTRAN source in 971 source files.  It took
>a total of 5 hours and 26 minutes wall clock time to compile and link on
[...]

I am very much interested on which portion of the optimization is most
time consuming? The data-gathering stage I suppose? Any good paper available
on interprocedural optimization? Thanks!

>Robert Metzger		CONVEX Computer Corp.  		Richardson, Texas

preston@ariel.rice.edu (Preston Briggs) (03/31/91)

metzger@convex.com (Robert Metzger) writes:

>>I will give you one data point.  One of the larger applications we have
>>compiled had 214,500 lines of FORTRAN source in 971 source files.  It took
>>a total of 5 hours and 26 minutes wall clock time to compile and link on

and blee@digi.lonestar.org (Benjamin W. Lee) writes:
>I am very much interested on which portion of the optimization is most
>time consuming? The data-gathering stage I suppose? Any good paper available
>on interprocedural optimization? Thanks!

I don't know details about Convex's compiler, so I can't comment directly.
However, this interprocedural optimization stuff need not be as expensive as
commonly imagined.  The approach developed at Rice (and used at Convex)
goes approximately like this

   1)	Collect summary information for every procedure
	(at Rice, we do it while editing, but there are other possibilities)

   2)	Build the call graph and
	calculate interprocedural data flow information.
	Determine which procedures must be recompiled (step 3).
	This step examines only the summary information,
	not the source code.

   3)	Compile individual procedures, supplying interprocedural
	information to the optimizer.

This decomposition enables practical program development and maintanence.
If you change a procedure or two, step 1 is run only for the changed
procedures.  Then step 2 is run for all the procedures in the program.
Finally, step 3 runs for the changed procedures, plus any procedures
that depend on changed interprocedural facts.

Part 1 occurs during editing and is very cheap.  The summary information
could be collected in a seperate pass; in this case, the expense would
be similar to a very fast 1 pass compiler (mostly syntax analysis).

Step 2 is actually similar to a super-Make.  The cost depends heavily
on implementation decisions.  The assymptotic complexity of the analysis,
however, is very low.

Step 3 closely resembles a traditional optimizing compiler.
The bulk of the overall compilation time should be consumed here.

There are many papers by researchers at Rice describing
all this in detail.  A good starting point might be

	A practical environment for scientific programming
	Carle, Cooper, Hood, Kennedy, Torczon, Warren
	IEEE Computer, 20(11), November 1987.

Preston Briggs

metzger@convex.com (Robert Metzger) (04/05/91)

In article <1991Mar26.175137.6706@digi.lonestar.org> blee@digi.lonestar.org (Benjamin W. Lee) writes:
>In article <1991Mar21.184428.8226@convex.com> metzger@convex.com (Robert Metzger) writes:
>>
>>I will give you one data point.  One of the larger applications we have
>>compiled had 214,500 lines of FORTRAN source in 971 source files.  It took
>>a total of 5 hours and 26 minutes wall clock time to compile and link on
>[...]
>
>I am very much interested on which portion of the optimization is most
>time consuming? The data-gathering stage I suppose? Any good paper available
>on interprocedural optimization? Thanks!
>
In the examples cited, constant propagation took the longest time,
array side-effect analysis the next longest.  I can't generalize
about these well, because:

a) relative times between algorithms vary widely based on the coding style.

   - Some FORTRAN programmers pass dozens or hundreds of arguments to
     every subroutine.  The cost of alias analysis is a function of the
     number of calls times the number of arguments.  There is a substantial
     difference in alias analysis time when the average number of arguments
     passed is 10 versus when it is 100.

   - The more compile-time constants you put in your code, the more work
     constant propagation does, but this is positive reflection on the
     IP algorithm, and sometimes on the software engineering of the
     application as well.

   - The effort required for several algorithms is increased when FORTRAN
     programmers put every variable in the application in COMMON,
     (yes, there are FORTRAN applications with NO local variables)
     as opposed to keeping everything local that doesn't need to be global.

b) some IP algorithms are vectorized in our product.  The ratios would
   be quite different on a scalar machine.
--
Robert Metzger		CONVEX Computer Corp.  		Richardson, Texas
Generic Disclaimer:	I write software, not CONVEX Corporate Policies.
"The only legitimate function of government is to protect its citizens from
harm or property loss by violence, stealth, or fraud.  All else is tyranny."