[comp.lang.ada] Reducing size of Ada's EXE files

kmccook@wrdis01.af.mil (Ken McCook) (04/13/91)

 I've heard people complain about the final size of EXEs produced after 
compiling and linking Ada code.  I saw something in a trade pub about
PKLITE from PKWARE.  PKLITE will compress the size of an EXE or COM file while
leaving it executable.  The file doesn't uncompress before execution.  When
you compress the file it remains compressed.  There are various "-" flags
that can be set as arguments when you kick it off that take care of things
like overlays.  I've used it and on a 368K EXE file it reduced the size to
129K.  There were no obvious differences, however, the file seems to run much
faster now.  I have no explanation for that effect although I have a couple of
suspicions.

Just thought I'd pass this along.

Ken McCook		USAF 1926th CCSG Robins AFB GA	      (912) 926-3224

larryc@poe.jpl.nasa.gov (Larry Carroll) (04/13/91)

What most of us complain about is the amount of code that gets linked into an
executable which will never be executed.  On DEC Ada, for instance, whenever
you used a subprogram from a package, you got the whole package linked in.
So if the subprogram was 3 KBytes & the package was 3 MBytes, you got a 
1000 times as much code in your executable.

Agreed, it takes a smart linker to know just what part of a package you need.
For instance, any variables global to a package which are used by the desired
subprogram will need to be linked in.  Any initialization code you include at 
the end of the package, & all the subprograms & global variables they use,
must be linked.  And so on.  But eventually such smart linkers must become 
available if Ada is to compete successfully with other languages.

Anyone know which vendors supply such linkers?
							Larry Carroll
							puente.jpl.nasa.gov

yow@sweetpea.jsc.nasa.gov (04/15/91)

The Meridian compiler for PC DOS systems has a link option (-g) that
will remove unused code.  The smallest program that I have been able to
produce using the Meridian Linker with the -g option is 5K. (The program
filled the screen with A's)  So it would seem that the Meridian compiler
has about 5K of overhead.  On a larger program (630K) the linker removes
about 100K of stuff.

The Major drawback of using the -g option is that it is SLOW.  Using it
adds about 2 to 3 minutes to a small link (1 to 2 packages) and about 45
minutes to a large link (over 100 packages).  (The system used was a 386
25Mhz with a 2 meg disk cache, and a 15 msec ESDI disk).

What I would like to see is the SHARING of generic code, not this macro
expansion that takes place in compilers.  I wrote my own dynamic generic
because the generic code was about 20K (object code size) and the
generic was used 50+ times on a PC (640K limit).  This would never work
with the marco expansion because the program would be over 1 meg just
for the generics!!! (:-()

					Bill Yow
					yow@sweetpea.jsc.nasa.gov

My opinions are my own.

mfeldman@seas.gwu.edu (Michael Feldman) (04/16/91)

In article <1991Apr12.235101.7245@jpl-devvax.jpl.nasa.gov> larryc@poe.jpl.nasa.gov (Larry Carroll) writes:
>What most of us complain about is the amount of code that gets linked into an
>executable which will never be executed.  On DEC Ada, for instance, whenever
>you used a subprogram from a package, you got the whole package linked in.
>So if the subprogram was 3 KBytes & the package was 3 MBytes, you got a 
>1000 times as much code in your executable.

Hmmm. After all these years, DEC Ada doesn't have an optimizer that removes
unused subprograms? Are you really sure of this?

>
>Agreed, it takes a smart linker to know just what part of a package you need.
>For instance, any variables global to a package which are used by the desired
>subprogram will need to be linked in.  Any initialization code you include at 
>the end of the package, & all the subprograms & global variables they use,
>must be linked.  And so on.  But eventually such smart linkers must become 
>available if Ada is to compete successfully with other languages.
>
>Anyone know which vendors supply such linkers?

Three I am familiar with are the TeleSoft, Meridian, and Janus/Ada families.
Indeed they throw away unused subprograms. In my experience, invoking this
global optimization, as it's usually called, requires both compiling and
linking with a command-line switch set, because preparing for and doing this
optimization takes time and space, and one doesn't want to do it unnecessarily.

It's hard to imagine that, this far into the lifetimes of the prevalent Ada 
compiler families, their vendors haven't given at least _some_ attention to
this. What's the experience in netland? (Before answering "my compiler won't
do this" you might do well to read the documentation...)

Mike Feldman

jls@rutabaga.Rational.COM (Jim Showalter) (04/16/91)

%What most of us complain about is the amount of code that gets linked into an
%executable which will never be executed.  On DEC Ada, for instance, whenever
%you used a subprogram from a package, you got the whole package linked in.
%So if the subprogram was 3 KBytes & the package was 3 MBytes, you got a 
%1000 times as much code in your executable.
%
%Agreed, it takes a smart linker to know just what part of a package you need.
%For instance, any variables global to a package which are used by the desired
%subprogram will need to be linked in.  Any initialization code you include at 
%the end of the package, & all the subprograms & global variables they use,
%must be linked.  And so on.  But eventually such smart linkers must become 
%available if Ada is to compete successfully with other languages.

Our cross-compilers do link-time dead code elimination. Only the code
that is actually reachable is ever loaded into the executables. Similarly,
our Ada runtimes are selectively loadable, so, for example, if you don't
use tasking then no tasking stuff is loaded.
--
* The opinions expressed herein are my own, except in the realm of software *
* engineering, in which case I borrowed them from incredibly smart people.  *
*                                                                           *
* Rational: cutting-edge software engineering technology and services.      *

jloup@nocturne.chorus.fr (Jean-Loup Gailly) (04/16/91)

In article <3049@sparko.gwu.edu>, mfeldman@seas.gwu.edu (Michael Feldman)
writes:

| In article <1991Apr12.235101.7245@jpl-devvax.jpl.nasa.gov>
| larryc@poe.jpl.nasa.gov (Larry Carroll) writes:

> Agreed, it takes a smart linker to know just what part of a package you need.
> For instance, any variables global to a package which are used by the desired
> subprogram will need to be linked in. Any initialization code you include at 
> the end of the package, & all the subprograms & global variables they use,
> must be linked.  And so on.  But eventually such smart linkers must become 
> available if Ada is to compete successfully with other languages.
>
> Anyone know which vendors supply such linkers?

| Three I am familiar with are the TeleSoft, Meridian, and Janus/Ada families.
| Indeed they throw away unused subprograms. In my experience, invoking this
| global optimization, as it's usually called, requires both compiling and
| linking with a command-line switch set, because preparing for and doing this
| optimization takes time and space, and one doesn't want to do it unnecessarily.

All Alsys compilers remove unused subprograms by default, but you can
set a bind time option to keep them. On the 370, we found that it was often
*cheaper* to remove unused subprograms, because the bind time was dominated
by the IO, and the binder had less IO to perform when unused subprograms
were eliminated. Alsys does not require a compile time option. The call
graph is kept systematically because it is stored in a very compact form.

Jean-loup Gailly
(formally within Alsys)

Chorus systemes, 6 av G. Eiffel, 78182 St-Quentin-en-Yvelines-Cedex, France
email: jloup@chorus.fr    Tel: +33 (1) 30 64 82 79 Fax: +33 (1) 30 57 00 66

mfeldman@seas.gwu.edu (Michael Feldman) (04/17/91)

In article <1991Apr15.094702@riddler.Berkeley.EDU> yow@sweetpea.jsc.nasa.gov writes:
>
>What I would like to see is the SHARING of generic code, not this macro
>expansion that takes place in compilers.  I wrote my own dynamic generic
>because the generic code was about 20K (object code size) and the
>generic was used 50+ times on a PC (640K limit).  This would never work
>with the marco expansion because the program would be over 1 meg just
>for the generics!!! (:-()
>
Indeed. But code-sharing in generics is a separate issue from throwing
away dead code. Are you saying that _everything_ in the generic was used
50+ times? Wouldn't throwing away the dead code at least get rid of
some of the extra stuff?

I wonder if folks out there in net-land could check their favorite compiler
documentation and ascertain the general kinds of global optimizations
done. A general summary of compiler families in this regard would be
helpful to all concerned. I'll take responsibility of summarizing, though
I think your answers would be better posted than e-mailed to me.

Does your system code-share generics?
Does your system throw away dead code?

Mike

ryer@inmet.inmet.com (04/19/91)

1. Intermetrics compilers do dead code elimination.

2. Some Background, for those interested:

At link-time, almost any compiler does some sort of library searching to
resolve external references.  To do this requires that you start with the
object module(s) that were selected by the user, find the things they
reference, load them from the library, find the things they in turn reference, 
load them, and so on.  This is a transitive closure operation on the
reference graph.  Some people say "call graph", but it is important to
look at the data references too, so it is really the graph of ALL the
external references.

This processing is pretty essential, and it has the side effect of identifying
which "units" are and are not referenced directly or indirectly starting from
the main program.  Eliminating the unused units should be trivial.

However, the definition of "unit" is the rub.  To make this really effective,
each subprogram in a compilation unit must be a separately relocatable item
in the object module(s).  In package Math_Lib, the Sine, Cosine, and other
routines each need to be independently relocatable.  This means that if
Cosine invokes Sine, the call must be adjustable by the linker -- the compiler
must not just use a PC-relative branch based on an assumption that the
units in a compilation are located at known offsets from the start of the
compilation unit.   This does make a bit more work for the compiler, though
compared to everything else an Ada compiler has to do the impact is not
great.  It also vastly increases the number of external symbols and references
that the linker has to deal with.  Linker processing time is more than
linear with the number of symbols and references.  There is no excuse
for a linker going exponential, but some might be O(n**2) or worse
anyway.

3. Further Information on Intermetrics Compilers 

Our compilers generate separately relocatable portions of the
object module for each subprogram in a package, and for other things.
One example is the tables required at runtime to evaluate 'IMAGE and
'VALUE.  By making these separately relocatable, these tables are also
automatically excluded from the load module by the same transitive
closure mentioned above if there are no references to 'IMAGE or 'VALUE
for that particular type anywhere in the program.

Mike Ryer
Intermetrics