metzger@convex.com (Robert Metzger) (03/22/91)
> Article 326 of comp.benchmarks: > Path: convex!texsun!sundc!seismo!dimacs.rutgers.edu!mips!sdd.hp.com!ucsd!ogicse!borasky > Date: 3 Jan 91 15:46:45 GMT > From: borasky@ogicse.ogi.edu (M. Edward Borasky) > Even discounting blatant cheats like the ones I describe above, there > is a wider issue here about how much COMPILE time a user is willing > to expend for "perfect" compiles. It is possible using techniques of > whole-program compiling to optimize out all the unnecessary checks in > the LINPACK benchmark -- it's simple enough and well-structured enough > so that a compiler can "realize" that DAXPY is always called with inc- > rement 1, that the DAXPY can be in-lined, that N is always greater than > zero, that one-trip DO loops will work, etc. But these optimizations > take time -- so much time that on real codes larger than LINPACK, one > of two things will happen: > 1. The user will think the compiler has gone into an endless loop and > zap the compile, or > There ARE users who claim they will tolerate long optimization times; > my experience in Marketing was that such were few and far between. > The main uses for such a compiler are large third-party application > codes that are seldom compiled -- once or twice a year for the ones I > worked with. But the size of these codes works AGAINST the whole- > program compiler! CONVEX's experience with its new Application Compiler (performs interprocedural optimization on codes in written in FORTRAN and/or C) contradicts these assertions. First, properly implemented interprocedural optimization does not take forever. Our experience is that it takes, on average 2-3 times as much CPU time to do interprocedural compilation as procedural compilation. More importantly, the time usually grows linearly, with the size of the application. I will give you one data point. One of the larger applications we have compiled had 214,500 lines of FORTRAN source in 971 source files. It took a total of 5 hours and 26 minutes wall clock time to compile and link on a CONVEX C-2. This is about 660 lines per minute. Of that time, 2 hours and 43 minutes were spent in the interprocedural phase and 32 minutes in the linker. Lots of IP optimizations were performed -- 1576 calls inlined, 1774 constants propagated, etc, etc. Second, our users seem to be quite willing to wait a little longer to get optimizations and diagnostics they can get no other way. We have a lot of users who measure their job runs in DAYS, and they are quite capable of seeing the benefit in spending some extra hours once in a while when they compile in order to save days when they run their jobs every week. To prevent worries about "endless loops" we provide a "verbose" option that tells the user what step it is working as it goes along. Our users are also smart enough to use the right tool for the right job -- they don't use the interprocedural compiler to do syntax checking of newly written code. They do use it for semantic checking of code that supposedly works (being ported from another machine) and for optimization, once the application is fully tested using the existing procedure compilers. One other note: interprocedural optimization will render many industry standard benchmarks meaningless. Those which measure call overhead or depend on procedure calls to keep information from the compiler have already been trashed (like whetstone). -- Robert Metzger CONVEX Computer Corp. Richardson, Texas Generic Disclaimer: I write software, not CONVEX Corporate Policies. "The only legitimate function of government is to protect its citizens from harm or property loss by violence, stealth, or fraud. All else is tyranny."
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (03/22/91)
>>>>> On 21 Mar 91 18:44:28 GMT, metzger@convex.com (Robert Metzger) said: >>Article 326 of comp.benchmarks: >>Date: 3 Jan 91 15:46:45 GMT >>From: borasky@ogicse.ogi.edu (M. Edward Borasky) >>Even discounting blatant cheats like the ones I describe above, there >>is a wider issue here about how much COMPILE time a user is willing >>to expend for "perfect" compiles. It is possible using techniques of >>whole-program compiling to optimize out all the unnecessary checks in >>the LINPACK benchmark -- [....] Robert> CONVEX's experience with its new Application Compiler (performs Robert> interprocedural optimization on codes in written in FORTRAN and/or C) Robert> contradicts these assertions. I was told by a source within Convex that the Application Compiler did not, in fact, manage to eliminate all of the unnecessary checks in the level-1 BLAS routines used in the LINPACK code. The reason had something to do with being able to optimize expressions after the inlining, but not logical predicates -- though I never really understood the explanation. Is there a "definitive answer" from Convex on this? -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
metzger@convex.com (Robert Metzger) (03/22/91)
In article <MCCALPIN.91Mar21152638@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: > >I was told by a source within Convex that the Application Compiler did >not, in fact, manage to eliminate all of the unnecessary checks in the >level-1 BLAS routines used in the LINPACK code. The reason had >something to do with being able to optimize expressions after the >inlining, but not logical predicates -- though I never really >understood the explanation. > >Is there a "definitive answer" from Convex on this? There is a least one useless test in LINPACK which requires predicate (eg. n>=1&&n<=100) propagation to delete. The Application Compiler performs constant propagation (eg. n==100) in V1.0, but not generalized predicate propagation. Sooner or later it will. -- Robert Metzger CONVEX Computer Corp. Richardson, Texas Generic Disclaimer: I write software, not CONVEX Corporate Policies. "The only legitimate function of government is to protect its citizens from harm or property loss by violence, stealth, or fraud. All else is tyranny."
eugene@nas.nasa.gov (Eugene N. Miya) (03/22/91)
In article <1991Mar21.184428.8226@convex.com> metzger@convex.com (Robert Metzger) writes: >One other note: interprocedural optimization will render many industry >standard benchmarks meaningless. Just getting back...... You should be careful about this. A company, who shall go unnamed, gave us some of their latest software. We scheduled an expensive stand-alone test of a super (this is a few years back) expecting to see several CPUs in use. One CPU was used. It was a mistake, being developed on a uniprocessor system, just a simple default initialization problem. The problem is how companies like yours go about testing fixes and optimizations. Standard test suites exist for language syntax conformance, deviance, but no such standard set of tests exists for language optimizations. There needs to be some testing consistency for all this fancy software, otherwise, how do we have any assurance said features are working. That's one reason why I just regard this as another form of testing, and nothing special. Of course, no company wants embarassing test results. --e. nobuo utsunomiya, NASA Ames Research Center, eugene@orville.nas.nasa.gov {uunet,mailrus,other gateways}!ames!eugene AMERICA: CHANGE IT OR LOSE IT.
metzger@convex.com (Robert Metzger) (03/24/91)
In article <1991Mar21.213144.14636@nas.nasa.gov> eugene@wilbur.nas.nasa.gov (Eugene N. Miya) writes: >In article <1991Mar21.184428.8226@convex.com> metzger@convex.com >(Robert Metzger) writes: >>One other note: interprocedural optimization will render many industry >>standard benchmarks meaningless. > >Just getting back...... >You should be careful about this. > >The problem is how companies like yours go about testing fixes and >optimizations. Standard test suites exist for language syntax conformance, >deviance, but no such standard set of tests exists for language optimizations. >There needs to be some testing consistency for all this fancy software, >otherwise, how do we have any assurance said features are working. > >That's one reason why I just regard this as another form of testing, >and nothing special. Of course, no company wants embarassing test results. I appreciate your kind concern, but I'm not sure what your're warning me about :-) I'll elaborate on my point: 1) Compilers that perform interprocedural optimization render meaningless the results of simple-minded benchmarks. For example, one of the first things a well-meaning technical-sales-support person at CONVEX did when he got a crack at the Application Compiler was to run whetstone through it. Sure enough, since whetstone attempts to measure call overhead and the APC attempts to minimize call overhead, the results made it look like CONVEX had released a new generation of hardware. We didn't release those numbers to the public, but there's nothing that prevents an unsophisticated buyer from doing the same thing and being misled. (Of course if he's buying million dollar computers based on the results of such stupid benchmarks, maybe he deserves what he gets.) 2) Compilers that perform interprocedural optimization cannot be fooled by subroutine calls into not performing optimizations. This is a common trick in some widely used benchmarks. Well, an IPO compiler will certainly perform thorough IP side-effect analysis, and it will know exactly what is going on inside the procedure. It may automatically perform procedure inlining as well. Procedure calls no longer prevent compilers from moving assignments and uses, or eliminating them altogether, where procedure compilers could not do so. Procedure calls are no longer firewalls. 3) Compilers that perform interprocedural optimization perform a lot more compile-time evaluation of your code. Basically, if you don't read your all data from a data file at runtime, the compiler could potentially evaluate your expressions at compile time and replace your application with a bunch of PRINT statements. I will be the first to say that testing interprocedural optimizers is very challenging. In some senses, a 100,000 line application is ONE test case. Which means you have to run hundreds of applications and millions of lines of code through the compiler before you trust it. On the other hand, our experience with an interprocedural compiler is that the compiler find a dozens or hundreds of bugs in every application for every one bug the application finds in the compiler. Logic errors, abuse of language features, and numerically unstable algorithms are all exposed by this new technology. From my perspective, 99.99% of all scientific/ engineering FORTRAN applications are shot through with numerous errors, and I'm far more concerned about the reliability of the applications than the compilers at this point. -- Robert Metzger CONVEX Computer Corp. Richardson, Texas Generic Disclaimer: I write software, not CONVEX Corporate Policies. "The only legitimate function of government is to protect its citizens from harm or property loss by violence, stealth, or fraud. All else is tyranny."
blee@digi.lonestar.org (Benjamin W. Lee) (03/27/91)
In article <1991Mar21.184428.8226@convex.com> metzger@convex.com (Robert Metzger) writes: > >I will give you one data point. One of the larger applications we have >compiled had 214,500 lines of FORTRAN source in 971 source files. It took >a total of 5 hours and 26 minutes wall clock time to compile and link on [...] I am very much interested on which portion of the optimization is most time consuming? The data-gathering stage I suppose? Any good paper available on interprocedural optimization? Thanks! >Robert Metzger CONVEX Computer Corp. Richardson, Texas
preston@ariel.rice.edu (Preston Briggs) (03/31/91)
metzger@convex.com (Robert Metzger) writes: >>I will give you one data point. One of the larger applications we have >>compiled had 214,500 lines of FORTRAN source in 971 source files. It took >>a total of 5 hours and 26 minutes wall clock time to compile and link on and blee@digi.lonestar.org (Benjamin W. Lee) writes: >I am very much interested on which portion of the optimization is most >time consuming? The data-gathering stage I suppose? Any good paper available >on interprocedural optimization? Thanks! I don't know details about Convex's compiler, so I can't comment directly. However, this interprocedural optimization stuff need not be as expensive as commonly imagined. The approach developed at Rice (and used at Convex) goes approximately like this 1) Collect summary information for every procedure (at Rice, we do it while editing, but there are other possibilities) 2) Build the call graph and calculate interprocedural data flow information. Determine which procedures must be recompiled (step 3). This step examines only the summary information, not the source code. 3) Compile individual procedures, supplying interprocedural information to the optimizer. This decomposition enables practical program development and maintanence. If you change a procedure or two, step 1 is run only for the changed procedures. Then step 2 is run for all the procedures in the program. Finally, step 3 runs for the changed procedures, plus any procedures that depend on changed interprocedural facts. Part 1 occurs during editing and is very cheap. The summary information could be collected in a seperate pass; in this case, the expense would be similar to a very fast 1 pass compiler (mostly syntax analysis). Step 2 is actually similar to a super-Make. The cost depends heavily on implementation decisions. The assymptotic complexity of the analysis, however, is very low. Step 3 closely resembles a traditional optimizing compiler. The bulk of the overall compilation time should be consumed here. There are many papers by researchers at Rice describing all this in detail. A good starting point might be A practical environment for scientific programming Carle, Cooper, Hood, Kennedy, Torczon, Warren IEEE Computer, 20(11), November 1987. Preston Briggs
metzger@convex.com (Robert Metzger) (04/05/91)
In article <1991Mar26.175137.6706@digi.lonestar.org> blee@digi.lonestar.org (Benjamin W. Lee) writes: >In article <1991Mar21.184428.8226@convex.com> metzger@convex.com (Robert Metzger) writes: >> >>I will give you one data point. One of the larger applications we have >>compiled had 214,500 lines of FORTRAN source in 971 source files. It took >>a total of 5 hours and 26 minutes wall clock time to compile and link on >[...] > >I am very much interested on which portion of the optimization is most >time consuming? The data-gathering stage I suppose? Any good paper available >on interprocedural optimization? Thanks! > In the examples cited, constant propagation took the longest time, array side-effect analysis the next longest. I can't generalize about these well, because: a) relative times between algorithms vary widely based on the coding style. - Some FORTRAN programmers pass dozens or hundreds of arguments to every subroutine. The cost of alias analysis is a function of the number of calls times the number of arguments. There is a substantial difference in alias analysis time when the average number of arguments passed is 10 versus when it is 100. - The more compile-time constants you put in your code, the more work constant propagation does, but this is positive reflection on the IP algorithm, and sometimes on the software engineering of the application as well. - The effort required for several algorithms is increased when FORTRAN programmers put every variable in the application in COMMON, (yes, there are FORTRAN applications with NO local variables) as opposed to keeping everything local that doesn't need to be global. b) some IP algorithms are vectorized in our product. The ratios would be quite different on a scalar machine. -- Robert Metzger CONVEX Computer Corp. Richardson, Texas Generic Disclaimer: I write software, not CONVEX Corporate Policies. "The only legitimate function of government is to protect its citizens from harm or property loss by violence, stealth, or fraud. All else is tyranny."