[comp.lang.c] MSC Linking and Libraries

richw@rosevax.UUCP (11/19/87)

I have some questions about the MSC linker and compiler I
hope someone can answer.

1) Does the MSC linker link an entire library to an object file
or does it extract only the functions actually used by the
object code?

2) Why is the executable size of program so much larger
than the object file size? The De Smet compiler I've used seems to
produce much smaller executables than MSC.

3) If the MSC linker does link in an entire library are there
programs which will remove unused library functions.

Thanks in advance,
Rich W

farren@gethen.UUCP (Michael J. Farren) (11/21/87)

richw@rosevax.Rosemount.COM (Rich Wagenknecht) writes:
>
>1) Does the MSC linker link an entire library to an object file
>or does it extract only the functions actually used by the
>object code?

It extracts only those functions actually used.  This, however, can
include functions needed by those functions, so it isn't as simple
as a one-to-one match.

>2) Why is the executable size of program so much larger
>than the object file size?

Because the object file does not include any of the code for the
library functions; this is included later, by the linker, and
expands the size significantly.  In the case of small programs
which use a lot of library functions, the code for the library
functions can be massively larger than the code for the program
itself.  Also, the .EXE file produced by the linker generally
has a large amount of initialization data and relocation information
not present in the object file.

>3) If the MSC linker does link in an entire library are there
>programs which will remove unused library functions.

Not needed; see #1.

-- 
----------------
Michael J. Farren      "... if the church put in half the time on covetousness
unisoft!gethen!farren   that it does on lust, this would be a better world ..."
gethen!farren@lll-winken.arpa             Garrison Keillor, "Lake Wobegon Days"

daveb@laidbak.UUCP (Dave Burton) (11/23/87)

In article <3195@rosevax.Rosemount.COM> richw@rosevax.Rosemount.COM (Rich Wagenknecht) writes:
>1) Does the MSC linker link an entire library to an object file
>or does it extract only the functions actually used by the
>object code?
>2) Why is the executable size of program so much larger
>than the object file size? The De Smet compiler I've used seems to
>produce much smaller executables than MSC.
>3) If the MSC linker does link in an entire library are there
>programs which will remove unused library functions.

1)No. No.
2)See below. DeSmet has a better engineered library.
3)Not that I know of/I seriously doubt it.

In general:
Linkers can only remove objects called "modules" from libraries.
A module is simply a source file compiled/assembled into an object
file. If the source file contains more than one function, so will
the object. A library archiver then places the object module into
the library, noting only the existance and location of the module
and any external symbols. The linker simply searches the library for
these symbols, extracting the MODULE the symbol is defined within.

There are pros and cons to the implementation of libraries with
multi-function modules:
-- Pro --
a) Placing several related functions in the same source file allows
   them to share variables/buffer space, thus reducing the data space
   requirements.
b) This also means that the related functions can communicate through
   "private channels" (static global variables) which any potential
   caller cannot access symbolically.
c) Maintenance of these modules is easier than maintaining several
   source files, especially without the assistance of automated
   maintenance tools such as SCCS and make.
d) Link time will (usually) be improved because the linker may already
   have the external symbol required (due to retrieving an earlier
   required module).
-- Con --
1) Executable files contain potentially many dead areas of code,
   increasing load time, memory usage, and disk space usage.
2) The output of a Static Analysis of these executables will be more
   complex due to the presence of dead code.
3) Automatic Program Verification becomes more difficult when a section
   of code is never used: is it because the test suite is incomplete,
   the program is flawed, or the code is actually dead?

Single function modules must be carefully written, however, or
references to other external symbols can have a dominoe effect and
chain-link the entire library.

(A good test of the granularity and quality of a library is the code
a program such as:
	main() { float a=1.2; printf("this is a test\n"); exit(0); }
The printf() will want to bring in several different modules to satisfy
its complex/diverse conversion requirements. Many compilers define
symbols such as "__floatused" to help the linker in determining if
certain modules are needed, so the float assignment should trigger this.
This is definitely *NOT* the definitive test, but an indicator.)

Further, single function modules must now pass information via
global symbols (although usually undocumented). As an example of
the need to pass information in this manner, consider a set of
functions which manage a video screen, and several of the functions
can modifiy the state of the screen, such as current page, window,
font, attribute, etc. If the library writer is not careful, using
just one of these functions can bring in most, if not all, the
related functions because of this interaction.

Library engineering is the major reason for the reported differences
in code size between MSC and DeSmet. While the size of the executable
is important, code speed and efficiency is more so. If I had to choose
between a 50k executable that would run a given program in 10 seconds
vs. a 30k executable that took 25 seconds, I would take the larger.

In summary:
Although the linker actually does the work and is seen as the
culprit of large executables, the library writer is actually at
'fault'. What you are seeing is the engineering decision and
implementation quality of the library based upon those decisions.
-- 
--------------------"Well, it looked good when I wrote it"---------------------
 Verbal: Dave Burton                        Net: ...!ihnp4!laidbak!daveb
 V-MAIL: (312) 505-9100 x325            USSnail: 1901 N. Naper Blvd.
#include <disclaimer.h>                          Naperville, IL  60540

daveb@geac.UUCP (11/27/87)

In article <1259@laidbak.UUCP> daveb@laidbak.UUCP (Dave Burton) writes:
>In general:
>Linkers can only remove objects called "modules" from libraries.
>A module is simply a source file compiled/assembled into an object
>file. If the source file contains more than one function, so will
>the object. A library archiver then places the object module into
>the library, noting only the existance and location of the module
>and any external symbols. The linker simply searches the library for
>these symbols, extracting the MODULE the symbol is defined within.

  Well, thats the usual implementation of C.  Not all
languages/compilers do that, though.  The alternative is to put all
the functions in as separate linkable items, while arranging for the
"top-level statics" to be given a name invisible to the casual user
and arranging for the functions which require the statics to
reference that name.
  An example from CP/M (!) is:

/* foo.c */
static int harold;

foo() {
	harold = 2;
}

bar() {
	printf("%d\n",harold);
}

maude() {
	;
}
/* end */

 Linking either foo or bar will drag in "foo^statics", a block
of data two bytes long, containing "harold".  Linking maude will nor
drag in foo^statics.  (I forget what character the linker used for
the separator: it wasn't really ^).

 --dave
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

tim@doug.UUCP (Tim J Ihde) (11/30/87)

In article <3195@rosevax.Rosemount.COM>, richw@rosevax.Rosemount.COM (Rich Wagenknecht) writes:
> 1) Does the MSC linker link an entire library to an object file
> or does it extract only the functions actually used by the
> object code?

If you are using a given function, then the entire .obj file that included
that function will be included in your executable.  The whole library
is not included, but you still might get some functions that you don't
need.  For example, you might use printf and then find that the object
code for scanf has been included in your executable even though you
don't use it.

For the most part, MicroSoft and other library vendors try to
lump external functions/variables into one object module only if they
are closely related - under the assumption that you will probably want
the extraneous stuff as well.  If you give the LIB program a filename
to list to, it will produce a giant file telling which modules contain
which functions.

> 2) Why is the executable size of program so much larger
> than the object file size? The De Smet compiler I've used seems to
> produce much smaller executables than MSC.

The executable contains a fair amount of code from the library just to
start running, so it will always be bigger than your object code.  The
degree depends on how much startup work they do.  It's doing things
like allocating stack/heap space and whatnot.  Plus you've got any
library code you've used stuck in there someplace, this could easily
be bigger than your object code all by itself.

I'm not familier with DeSmet C, but it must do something similar.
Perhaps they have broken their library modules down farther, so you
end up with less unused object in the executable?

	tim