[comp.sys.apollo] PRISM Fortran compiler causes trashing!

hanche@imf.unit.no (Harald Hanche-Olsen) (12/10/89)

This is a short tale about a long FORTRAN compilation on the DN10000.
I was to compile a FORTRAN program with a total of 120K lines split up
into 37 files.  Since our machine has only 8MB of memory and compiling
is known to load it down severely, I set it up to start compiling at
1am.  Next morning I got in to the office sort of late and found that
the compilation was still running, and students were complaining
loudly about the sluggishness of the machine.  So I killed it, and
restarted it the following night.  History repeated itself next
morning, which got me to thinking.  By now, I had compiled 87K lines,
which had taken 29.5 hours of elapsed time and a total of 5 million
page faults!  "All right", I thought, "maybe the compiler tries to
compile all subroutines in one file simultaneously, thereby filling up
a lot of memory and causing page trashing.  In other words, maybe it
helps to split the files into single subroutine pieces?"

To test my hypothesis, I took a representative FORTRAN file of 2850
lines and split it up into little files using `fsplit'.  I got 19 files
with sizes ranging from 22 lines to 280 lines.  Compiling them all and
timing with the `time' command, I got the answer

    289.3u 20.4s 14:51 34% 0+0k 1014+0io 30083pf+0w

For those who don't know `time' output format, this means 289 seconds
user time, 20 seconds system time (i.e., spent in the kernel), almost
15 minutes elapsed time, and 30000 page faults.  Compare this with the
result of trying to compile the whole file in one shot.  After tying
up the machine for over two hours, I decided the most merciful thing
to do was to kill it, yielding this result:

    267.2u 271.5s 2:17:10 6% 0+0k 7043+0io 441500pf+0w

That is, 9 times as much elapsed time, 14 times as many page faults!
Needless to say, I decided to split up the rest of the input files and
compile them all separately.  Thus, the remaining 32K lines took a
total of 2 hours elapsed time and 300000 page faults.  That is 266
lines per (elapsed) minute, compared to 67 lines/minute for the first
part of the compilation!  Not quite the same performance ratio as in
the test case, but still quite a difference.

Technical detail:  We are running under SR10.1.p.  Our Fortran compiler
is version 10.5.p.  As I said, we have 8M of memory.  (We will upgrade
to 32M soon).
Conclusion:  Split your files before compiling!
Rhetoric question:  What do those compiler writers at HP/apollo think
they're doing?  Writing for machines with a minimum of 128M of memory?

- Harald Hanche-Olsen     Division of Mathematical Sciences
  hanche@imf.unit.no      The Norwegian Institute of Technology
  hanche@norunit.bitnet   N-7034 Trondheim-NTH NORWAY

pha@CAEN.ENGIN.UMICH.EDU (Paul H. Anderson) (12/12/89)

Compiles that go on to no end due to page faults can be fixed by
splitting files up, as you mentioned, or by reducing the level of
optimization, which is a very memory intensive process.  For example,
if you have 10000 lines of code compiling at max optimization leve,
it will be doing an awful lot of global optimization, in attempts to
remove redunant code, reuse common code, etc.

Addiing memory is a good solution, and should take care of most
of the problem.

Paul Anderson
CAEN

madler@apollo.HP.COM (Michael Adler) (12/13/89)

You are quite correct that 8MB isn't much for compiling large programs.
However, I strongly discourage breaking apart source files for users with
reasonable amounts of memory.  The current Prism compilers compile depth
first.  Since leaf routines are coded first their register usage and some
additional information about memory and condition code usage is available
to routines that call them.

Breaking routines apart into separate source files causes longer calling
sequences, makes inlining (-opt 4) impossible and destroys any chances for
interprocedural optimizations.

-Michael

 Languages Group
 Apollo Computer

geiser@apollo.HP.COM (Wayne Geiser) (12/13/89)

In article <CMM.0.88.629305379.hanche@vifsla.imf.unit.no>,
hanche@imf.unit.no (Harald Hanche-Olsen) writes:
> This is a short tale about a long FORTRAN compilation on the DN10000.
> I was to compile a FORTRAN program with a total of 120K lines split up
> into 37 files.  Since our machine has only 8MB of memory and compiling
> is known to load it down severely, I set it up to start compiling at
> 1am.  Next morning I got in to the office sort of late and found that
> the compilation was still running, and students were complaining
> loudly about the sluggishness of the machine.
.
.
.
> To test my hypothesis, I took a representative FORTRAN file of 2850
> lines and split it up into little files using `fsplit'.  I got 19 files
> with sizes ranging from 22 lines to 280 lines.  Compiling them all and
> timing with the `time' command, I got the answer
> 
>     289.3u 20.4s 14:51 34% 0+0k 1014+0io 30083pf+0w
> 
> For those who don't know `time' output format, this means 289 seconds
> user time, 20 seconds system time (i.e., spent in the kernel), almost
> 15 minutes elapsed time, and 30000 page faults.  Compare this with the
> result of trying to compile the whole file in one shot.  After tying
> up the machine for over two hours, I decided the most merciful thing
> to do was to kill it, yielding this result:
> 
>     267.2u 271.5s 2:17:10 6% 0+0k 7043+0io 441500pf+0w
> 
> That is, 9 times as much elapsed time, 14 times as many page faults!
.
.
.
> Conclusion:  Split your files before compiling!
> Rhetoric question:  What do those compiler writers at HP/apollo think
> they're doing?  Writing for machines with a minimum of 128M of memory?
> 
> - Harald Hanche-Olsen     Division of Mathematical Sciences
>   hanche@imf.unit.no      The Norwegian Institute of Technology
>   hanche@norunit.bitnet   N-7034 Trondheim-NTH NORWAY

The new (i.e. SR10 and later) compilers do, indeed, compile the entire
source at once.  This was done to make our FORTRAN behave more like the
Unix f77 compiler.

Your conclusion is the best solution if splitting files is acceptable. 
Obviously, you'll get the fastest compilation speed using this method. 
Another method is to turn the optimizer down (or off) for all but the
final build of your system.  I think you will find that it is the
optimizer which is taking up the majority of the time.

As an asside, the form of your program may also have somthing to do
with how much time the compiler takes.  It is much more difficult (read
time consuming) to work with control flows that consist of large
numbers of GOTOs and such.  Calculating register lives in that instance
is definitly a non-trivial problem.

I hope this sheds a little light on what is going on in the compilers
and "What ... those compiler writers at HP/apollo think they're doing."

Wayne Geiser
Apollo Computer, Inc. - A subsidiary of Hewlett Packard
{mit-erl, yale, uw-beaver, decvax}!apollo!geiser

khb@chiba.Sun.COM (chiba) (12/13/89)

In article <CMM.0.88.629305379.hanche@vifsla.imf.unit.no> hanche@imf.unit.no (Harald Hanche-Olsen) writes:
>This is a short tale about a long FORTRAN compilation on the DN10000.
>...
>Rhetoric question:  What do those compiler writers at HP/apollo think
>they're doing?  Writing for machines with a minimum of 128M of memory?
>

Most (all I have ever seen, from PDP-11 to Cray) unix compilers act
this way. In the unix universe the common idiom is to store things
which are different in different files. This permits make to act in a
sensible fashion. 

In addition, it is seldom a good idea to compile a large application
with the optimizer set to just one setting for the whole application
on machines with interesting optimizers. The proper approach is to
compile at low (or no) levels of optimization, and to enable
profiling.

Optimizers can, and will be serious pessimizers in some cases
(consider a machine with multiple functional units and long pipes, and
an application which has loops which run from 1 to 3 ... but this is
hidden from the compiler .... all loops will be unrolled to a depth of
beyond 3 ... and etc.). In addition, most often the modules most
expensive to optimize are the ones with the least payoff...

Lastly, one should consider one of Amdahl's old "laws" .... 

		1MIP => 1Mb of RAM.

Keith H. Bierman    |*My thoughts are my own. !! kbierman@sun.com
It's Not My Fault   |	MTS --Only my work belongs to Sun* 
I Voted for Bill &  | Advanced Languages/Floating Point Group            
Opus                | "When the going gets Weird .. the Weird turn PRO"

khb@chiba.Sun.COM (chiba) (12/13/89)

In article <47625cb9.20b6d@apollo.HP.COM> madler@apollo.HP.COM (Michael Adler) writes:
>You are quite correct that 8MB isn't much for compiling large programs.
>However, I strongly discourage breaking apart source files for users with
>reasonable amounts of memory. ...

I have not used the PRISM compiler, but have used many others with
similar behavior. Really large programs should typically be broken,
profiled and if necessary reassembled. 

On one large commerical FE code (276K lines), for a largish computer (names
withheld) the moral equivalent of saxpy was called several hundred
times. Inlining (which would have been natural) all such call sites
was a big lose. In about 4 call sites, inlining, loop interchanges,
hand unrolling to a depth of 3 (all calls were multiples of 3) and
some other vile and unnatural transformations resulted in a total
application speedup of more than a factor of 2.

>Breaking routines apart into separate source files causes longer calling
>sequences, makes inlining (-opt 4) impossible and destroys any chances for
>interprocedural optimizations.

Compilers can only do so much. In addition, the cost of compilation
dominates in many shops .... 

Keith H. Bierman    |*My thoughts are my own. !! kbierman@sun.com
It's Not My Fault   |	MTS --Only my work belongs to Sun* 
I Voted for Bill &  | Advanced Languages/Floating Point Group            
Opus                | "When the going gets Weird .. the Weird turn PRO"

achille@cernvax.UUCP (achille petrilli) (12/19/89)

In article <47625cb9.20b6d@apollo.HP.COM> madler@apollo.HP.COM (Michael Adler) writes:
>Breaking routines apart into separate source files causes longer calling
>sequences, makes inlining (-opt 4) impossible and destroys any chances for
>interprocedural optimizations.

Well said Michael, but do you know that if we compile a bunch of subroutine together
and then make a library out of them, then a reference to any one of the subroutine
will bring in all others ?
This is due to the wonderful Unix library scheme.

We all would be very happy to avoid file splitting, it just does not work if we
want to have real libraries. Give us a decent librarian and we'll follow your advice,
no doubt about that !

Achille Petrilli
Cray and PWS Operations

rehrauer@apollo.HP.COM (Steve Rehrauer) (12/20/89)

In article <CMM.0.88.629305379.hanche@vifsla.imf.unit.no> hanche@imf.unit.no (Harald Hanche-Olsen) writes:
>This is a short tale about a long FORTRAN compilation on the DN10000.
>...
>Rhetoric question:  What do those compiler writers at HP/apollo think
>they're doing?  Writing for machines with a minimum of 128M of memory?

(I'm responding for Bruce Olsen regards Prism compiler performance.  These
are his words, not mine.)

---------------------------------------------
Mr. Hanche-Olsen has evidently encountered a compile_time performance problem that we
have not seen.  He got these results with the first release Fortran compiler, compiling
on a machine configured with the minimum amount of physical memory. 

The design goals for the DN10000 compilers were ( in priority order )

   - Code Quality,
   - Reliability,
   - Compile-time efficiency. 

While we were very pleased with the levels of code quality an reliability
that we attained, the initial release did not exhibit the level of compile-
time efficiency that we wanted.  The second release shows substantial
improvement.  The typical user, compiling code that is not dramatically too
large for his physical memory, will see compile speeds on the order of
several thousand lines per minute.  Your mileage may vary.  We expect to
make further substantial compile-time improvements in subsequent releases.

In sum, we believe that you can expect good compile times on the DN10000
provided that you have a reasonable amount of physical memory for the job
at hand.  When you're estimating your memory needs, keep in mind that code
for most RISC machines is 1.5 -2.0X as large as the corresponding code for
CISC machines.  Again, your mileage may vary.

Also, it generally requires a sophisticated compiler to take advantage of
the improved run-time performance of RISC architectures.  Such compilers
tend to be bigger and (yes, we admit it) slower than comparable CISC
compilers.  The payoff, we believe, is that your program runs faster.

Should you experience this problem (or any other), we would of course
welcome any input that can help us reproduce and analyze it.

Bruce Olsen
Apollo Division/HP

--
>>"Aaiiyeeee!  Death from above!"<< | Steve Rehrauer, rehrauer@apollo.hp.com
   "Flee, lest we be trod upon!"    | The Apollo System Division of H.P.