hanche@imf.unit.no (Harald Hanche-Olsen) (12/10/89)
This is a short tale about a long FORTRAN compilation on the DN10000. I was to compile a FORTRAN program with a total of 120K lines split up into 37 files. Since our machine has only 8MB of memory and compiling is known to load it down severely, I set it up to start compiling at 1am. Next morning I got in to the office sort of late and found that the compilation was still running, and students were complaining loudly about the sluggishness of the machine. So I killed it, and restarted it the following night. History repeated itself next morning, which got me to thinking. By now, I had compiled 87K lines, which had taken 29.5 hours of elapsed time and a total of 5 million page faults! "All right", I thought, "maybe the compiler tries to compile all subroutines in one file simultaneously, thereby filling up a lot of memory and causing page trashing. In other words, maybe it helps to split the files into single subroutine pieces?" To test my hypothesis, I took a representative FORTRAN file of 2850 lines and split it up into little files using `fsplit'. I got 19 files with sizes ranging from 22 lines to 280 lines. Compiling them all and timing with the `time' command, I got the answer 289.3u 20.4s 14:51 34% 0+0k 1014+0io 30083pf+0w For those who don't know `time' output format, this means 289 seconds user time, 20 seconds system time (i.e., spent in the kernel), almost 15 minutes elapsed time, and 30000 page faults. Compare this with the result of trying to compile the whole file in one shot. After tying up the machine for over two hours, I decided the most merciful thing to do was to kill it, yielding this result: 267.2u 271.5s 2:17:10 6% 0+0k 7043+0io 441500pf+0w That is, 9 times as much elapsed time, 14 times as many page faults! Needless to say, I decided to split up the rest of the input files and compile them all separately. Thus, the remaining 32K lines took a total of 2 hours elapsed time and 300000 page faults. That is 266 lines per (elapsed) minute, compared to 67 lines/minute for the first part of the compilation! Not quite the same performance ratio as in the test case, but still quite a difference. Technical detail: We are running under SR10.1.p. Our Fortran compiler is version 10.5.p. As I said, we have 8M of memory. (We will upgrade to 32M soon). Conclusion: Split your files before compiling! Rhetoric question: What do those compiler writers at HP/apollo think they're doing? Writing for machines with a minimum of 128M of memory? - Harald Hanche-Olsen Division of Mathematical Sciences hanche@imf.unit.no The Norwegian Institute of Technology hanche@norunit.bitnet N-7034 Trondheim-NTH NORWAY
pha@CAEN.ENGIN.UMICH.EDU (Paul H. Anderson) (12/12/89)
Compiles that go on to no end due to page faults can be fixed by splitting files up, as you mentioned, or by reducing the level of optimization, which is a very memory intensive process. For example, if you have 10000 lines of code compiling at max optimization leve, it will be doing an awful lot of global optimization, in attempts to remove redunant code, reuse common code, etc. Addiing memory is a good solution, and should take care of most of the problem. Paul Anderson CAEN
madler@apollo.HP.COM (Michael Adler) (12/13/89)
You are quite correct that 8MB isn't much for compiling large programs. However, I strongly discourage breaking apart source files for users with reasonable amounts of memory. The current Prism compilers compile depth first. Since leaf routines are coded first their register usage and some additional information about memory and condition code usage is available to routines that call them. Breaking routines apart into separate source files causes longer calling sequences, makes inlining (-opt 4) impossible and destroys any chances for interprocedural optimizations. -Michael Languages Group Apollo Computer
geiser@apollo.HP.COM (Wayne Geiser) (12/13/89)
In article <CMM.0.88.629305379.hanche@vifsla.imf.unit.no>, hanche@imf.unit.no (Harald Hanche-Olsen) writes: > This is a short tale about a long FORTRAN compilation on the DN10000. > I was to compile a FORTRAN program with a total of 120K lines split up > into 37 files. Since our machine has only 8MB of memory and compiling > is known to load it down severely, I set it up to start compiling at > 1am. Next morning I got in to the office sort of late and found that > the compilation was still running, and students were complaining > loudly about the sluggishness of the machine. . . . > To test my hypothesis, I took a representative FORTRAN file of 2850 > lines and split it up into little files using `fsplit'. I got 19 files > with sizes ranging from 22 lines to 280 lines. Compiling them all and > timing with the `time' command, I got the answer > > 289.3u 20.4s 14:51 34% 0+0k 1014+0io 30083pf+0w > > For those who don't know `time' output format, this means 289 seconds > user time, 20 seconds system time (i.e., spent in the kernel), almost > 15 minutes elapsed time, and 30000 page faults. Compare this with the > result of trying to compile the whole file in one shot. After tying > up the machine for over two hours, I decided the most merciful thing > to do was to kill it, yielding this result: > > 267.2u 271.5s 2:17:10 6% 0+0k 7043+0io 441500pf+0w > > That is, 9 times as much elapsed time, 14 times as many page faults! . . . > Conclusion: Split your files before compiling! > Rhetoric question: What do those compiler writers at HP/apollo think > they're doing? Writing for machines with a minimum of 128M of memory? > > - Harald Hanche-Olsen Division of Mathematical Sciences > hanche@imf.unit.no The Norwegian Institute of Technology > hanche@norunit.bitnet N-7034 Trondheim-NTH NORWAY The new (i.e. SR10 and later) compilers do, indeed, compile the entire source at once. This was done to make our FORTRAN behave more like the Unix f77 compiler. Your conclusion is the best solution if splitting files is acceptable. Obviously, you'll get the fastest compilation speed using this method. Another method is to turn the optimizer down (or off) for all but the final build of your system. I think you will find that it is the optimizer which is taking up the majority of the time. As an asside, the form of your program may also have somthing to do with how much time the compiler takes. It is much more difficult (read time consuming) to work with control flows that consist of large numbers of GOTOs and such. Calculating register lives in that instance is definitly a non-trivial problem. I hope this sheds a little light on what is going on in the compilers and "What ... those compiler writers at HP/apollo think they're doing." Wayne Geiser Apollo Computer, Inc. - A subsidiary of Hewlett Packard {mit-erl, yale, uw-beaver, decvax}!apollo!geiser
khb@chiba.Sun.COM (chiba) (12/13/89)
In article <CMM.0.88.629305379.hanche@vifsla.imf.unit.no> hanche@imf.unit.no (Harald Hanche-Olsen) writes: >This is a short tale about a long FORTRAN compilation on the DN10000. >... >Rhetoric question: What do those compiler writers at HP/apollo think >they're doing? Writing for machines with a minimum of 128M of memory? > Most (all I have ever seen, from PDP-11 to Cray) unix compilers act this way. In the unix universe the common idiom is to store things which are different in different files. This permits make to act in a sensible fashion. In addition, it is seldom a good idea to compile a large application with the optimizer set to just one setting for the whole application on machines with interesting optimizers. The proper approach is to compile at low (or no) levels of optimization, and to enable profiling. Optimizers can, and will be serious pessimizers in some cases (consider a machine with multiple functional units and long pipes, and an application which has loops which run from 1 to 3 ... but this is hidden from the compiler .... all loops will be unrolled to a depth of beyond 3 ... and etc.). In addition, most often the modules most expensive to optimize are the ones with the least payoff... Lastly, one should consider one of Amdahl's old "laws" .... 1MIP => 1Mb of RAM. Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com It's Not My Fault | MTS --Only my work belongs to Sun* I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO"
khb@chiba.Sun.COM (chiba) (12/13/89)
In article <47625cb9.20b6d@apollo.HP.COM> madler@apollo.HP.COM (Michael Adler) writes: >You are quite correct that 8MB isn't much for compiling large programs. >However, I strongly discourage breaking apart source files for users with >reasonable amounts of memory. ... I have not used the PRISM compiler, but have used many others with similar behavior. Really large programs should typically be broken, profiled and if necessary reassembled. On one large commerical FE code (276K lines), for a largish computer (names withheld) the moral equivalent of saxpy was called several hundred times. Inlining (which would have been natural) all such call sites was a big lose. In about 4 call sites, inlining, loop interchanges, hand unrolling to a depth of 3 (all calls were multiples of 3) and some other vile and unnatural transformations resulted in a total application speedup of more than a factor of 2. >Breaking routines apart into separate source files causes longer calling >sequences, makes inlining (-opt 4) impossible and destroys any chances for >interprocedural optimizations. Compilers can only do so much. In addition, the cost of compilation dominates in many shops .... Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com It's Not My Fault | MTS --Only my work belongs to Sun* I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO"
achille@cernvax.UUCP (achille petrilli) (12/19/89)
In article <47625cb9.20b6d@apollo.HP.COM> madler@apollo.HP.COM (Michael Adler) writes: >Breaking routines apart into separate source files causes longer calling >sequences, makes inlining (-opt 4) impossible and destroys any chances for >interprocedural optimizations. Well said Michael, but do you know that if we compile a bunch of subroutine together and then make a library out of them, then a reference to any one of the subroutine will bring in all others ? This is due to the wonderful Unix library scheme. We all would be very happy to avoid file splitting, it just does not work if we want to have real libraries. Give us a decent librarian and we'll follow your advice, no doubt about that ! Achille Petrilli Cray and PWS Operations
rehrauer@apollo.HP.COM (Steve Rehrauer) (12/20/89)
In article <CMM.0.88.629305379.hanche@vifsla.imf.unit.no> hanche@imf.unit.no (Harald Hanche-Olsen) writes: >This is a short tale about a long FORTRAN compilation on the DN10000. >... >Rhetoric question: What do those compiler writers at HP/apollo think >they're doing? Writing for machines with a minimum of 128M of memory? (I'm responding for Bruce Olsen regards Prism compiler performance. These are his words, not mine.) --------------------------------------------- Mr. Hanche-Olsen has evidently encountered a compile_time performance problem that we have not seen. He got these results with the first release Fortran compiler, compiling on a machine configured with the minimum amount of physical memory. The design goals for the DN10000 compilers were ( in priority order ) - Code Quality, - Reliability, - Compile-time efficiency. While we were very pleased with the levels of code quality an reliability that we attained, the initial release did not exhibit the level of compile- time efficiency that we wanted. The second release shows substantial improvement. The typical user, compiling code that is not dramatically too large for his physical memory, will see compile speeds on the order of several thousand lines per minute. Your mileage may vary. We expect to make further substantial compile-time improvements in subsequent releases. In sum, we believe that you can expect good compile times on the DN10000 provided that you have a reasonable amount of physical memory for the job at hand. When you're estimating your memory needs, keep in mind that code for most RISC machines is 1.5 -2.0X as large as the corresponding code for CISC machines. Again, your mileage may vary. Also, it generally requires a sophisticated compiler to take advantage of the improved run-time performance of RISC architectures. Such compilers tend to be bigger and (yes, we admit it) slower than comparable CISC compilers. The payoff, we believe, is that your program runs faster. Should you experience this problem (or any other), we would of course welcome any input that can help us reproduce and analyze it. Bruce Olsen Apollo Division/HP -- >>"Aaiiyeeee! Death from above!"<< | Steve Rehrauer, rehrauer@apollo.hp.com "Flee, lest we be trod upon!" | The Apollo System Division of H.P.