[comp.software-eng] Linkers

sommar@enea.se (Erland Sommarskog) (03/02/88)

Lee Sailer (UH2@PSUVM.BITNET) writes:
>How does this "smart linker" business tie into the "shared libraries"
>in Unix V.3.  As I understand it, (1) when I need a module, the whole
>library is loaded, but (2) when another program needs a module from
>the library, it shares the core image that is already in memory.
>
>So, for example, at any moment, there is only one copy of all the stdio
>(that's standard input-output in Unix-speak) stuff in memory at any given
>moment, and all programs that need it share.  (This also makes the
>executables smaller and saves disk space and load time.)

Well, I know nothing of shared libraries or even System V.3 as such.
But I guess it looks much like shareable images in VMS. 
  If you really want to save space for your binaries under VMS, you put 
them in a shareable image. No matter how many of these procedure you 
call, none will be included. Mere references to the shared image. 
  Slowly I am beginning to realize that this concept is not standard
under Unix. Well, that explains why even the simplest of programs
exceeds 100 kbytes when linked. (Pascal, f77 and Ada) Library routines,
or even entire libraries, in the langauge environment are included in
my private executeable. Needless to say, all such routines are provided
in shareable images in VMS, unless you explcitly tell the linker not to 
use them.

To make it even more fun, VMS permits you to install these images
just like other heavily used programs like compilers, editors and usual 
utilities are. My exact notion of this "installation" is uncertain,
but if I'm right, but I belive that it is the file header is constantly
loaded into physical memory. (To INSTALL may also involve other things,
such as priviliges, but that is out of the subject.) Does Unix have 
such a concept?

As a whole: Many Unix-fans have reacted on the critics on the Unix
linker with: "It does what you want, just if you use in the right
way." Remember that strikes back on you, the occassion you flame another
OS. Some manouvers are the way to go under Unix, but meets problems
under VMS. And vice verca. Often because you don't know the best way
under the another operating system. But if you look, you very often find
out that you can easily do what you like, "just if you use it the right 
way." But sometimes you fall flat. And depending where you stumble, you 
pick your favourite system, which doesn't have to be Unix by necessity. 
It's not mine.

-- 
Erland Sommarskog       
ENEA Data, Stockholm        
sommar@enea.UUCP           "Souvent pour s'amuser les hommes d'equipages
                            and it's like talking to a stranger" -- H&C.

cml@tove.umd.edu (Christopher Lott) (12/05/89)

in  <9185@hoptoad.uucp>  tim@hoptoad.UUCP (Tim Maroney) writes:
| WHAT?  What year is this?  I don't think I've ever used a linker that
| didn't eliminate unused routines.  Any such linker would be seriously
| brain damaged.
| -- 
| Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

Well!  I think this is the transitive-closure problem in a linker,
quite difficult to solve in a straightforward link process.  I'm thinking
of the unix linker, which (I am told) makes exactly 1 pass through the
object files it is told to read, collecting everything that it finds,
and finally resolving unknown externals from libraries.

Deciding whether or not to link in a module requires the linker to know
if it is ever going to be used.  Since the linker hasn't seen all the code
yet, it can't know, so it adds it in.  In an ideal world, the linker should
keep track of ALL references to externals, and run through its data 1 more
time to kick out all externals that were not referenced.

This can be verified easily on any machine.  sort-of-c-code:

-----snip-------
main()
{
   printf("hello, world\n");
}

void never_called_func(i, j, k)
int i, j, k;
{
   i = 1;
   printf("i = %d\n", i);
   j = 2;
   printf("j = %d\n", j);
   k = 3;
   printf("k = %d\n", k);

   < repeat above lines a hundred times or so >

}
-----snip-------

(You really need > 100 filler lines for them to take up noticeable space.)

Compile this into an object and examine the executable size. 
Then remove function "never_called_func", recompile, and reexamine.  Then
if you really want to be convinced, compile never_called_func in a library
and link the simple program with that library.

I know from personal experience using Microsoft C v5.1 on a PC that
that particular linker includes ALL code in all modules that are ".o"
files and includes ONLY the modules needed from libraries.

I agree about the previous poster's assesment of "brain damaged" but 
perhaps a better description would be "efficient in time, not space."

....time out for testing.....ok.  On tove, a vax something runing 4.3Tahoe,
the standard linker is dumb.  (uh, "efficient in time, not space" :-)
....another time-out....ok.  Ditto for the GNU linker.

Anyone know if the GNU project has plans to build a better linker??

chris...
--
cml@tove.umd.edu    Computer Science Dept, U. Maryland at College Park
		    4122 A.V.W.  301-454-8711	<standard disclaimers>

davecb@yunexus.UUCP (David Collier-Brown) (12/06/89)

cml@tove.umd.edu (Christopher Lott) writes:
[talking about a common linker...]
| that particular linker includes ALL code in all modules that are ".o"
| files and includes ONLY the modules needed from libraries.
|
| Anyone know if the GNU project has plans to build a better linker??

  I'd suggest writing a binder, which is a .o->.o translator which make
previously-global function and variable names local static, and only
"exports" a small list of names.  Then it could be used with any linker.
I'd suggest using the Gnu linker **code** as a framework, though...

--dave (I wrote one once upon a time...) c-b
-- 
David Collier-Brown,  | davecb@yunexus, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | Joyce C-B:
CANADA. 416-223-8968  |    He's so smart he's dumb.

sommar@enea.se (Erland Sommarskog) (12/07/89)

Tim Maroney (tim@hoptoad.UUCP) writes:
>WHAT?  What year is this?  I don't think I've ever used a linker that
>didn't eliminate unused routines.  Any such linker would be seriously
>brain damaged.

While this may seem credible at first glance, it is at not second.
I am very happy that the linker we use in our project (VMS LINK)
don't remove uncalled routines. In that case it would notice that
this routine is never called, and never is this one and so forth and
rapidly it would have removed the 250 top modules. In the next step
it would remove modules they call etc, and instead of giving us the
10500 block executeable we want, it would leaves a tiny thing on
100-200 blocks.

The trick is in Cobol where you can say
   Routine PIC x(32).
   ...
   CALL Routine USING....
Routine is not a literal, it is a string variable. All top entries
corresponds to menu choices. When the user enters a menu choice,
it is looked up in a database, which gives you the name of the
procedure to call.

Then of course there are other reasons why may want uncalled routines
to be included in the final image. Debugging is one.

As been pointed out by other posters, most linkers include all
object files you feed it with - and that's probably the behaviour
you want - but from libraries it only includes referenced modules,
and those you particulary ask for.
  I don't know about Unix linker, but the VMS linker takes the
entire object module, even if only one routine in it is called. 
This may seem stupid, but if the linker should be able to pick
out pieces it would have to analyze the object module to see
exactly which routines the referenced routine called, both inside 
and outside the object module.
  However, the language processor may help the linker. VAX-Cobol 
makes a separate module of each procedure. I don't know about VAX-C, 
but I would guess that one file gives one object module, although 
one could envision the opposite with a separate object module for 
variables on file level. (Or, couldn't one. Don't flame me, I 
don't speak C.)
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
Mail me your votes on comp.lang.cobol.

rang@cs.wisc.edu (Anton Rang) (12/07/89)

In article <21107@mimsy.umd.edu> cml@tove.umd.edu (Christopher Lott) writes:
>in  <9185@hoptoad.uucp>  tim@hoptoad.UUCP (Tim Maroney) writes:
>| WHAT?  What year is this?  I don't think I've ever used a linker that
>| didn't eliminate unused routines.  Any such linker would be seriously
>| brain damaged.
>| -- 
>| Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

  No linker should link in unused modules from an object library.
However, there is a bit of a tradeoff involved with object files.
Many compilers/assemblers resolve references to symbols within the
object file at compile time, so that linking is faster.  If this is
done, the linker can no longer arbitrarily "munge" an object file.

>Well!  I think this is the transitive-closure problem in a linker,
>quite difficult to solve in a straightforward link process.

  It's not that difficult, as long as references are always resolved
at link time (and never at compile time--or at least, if they are
resolved at compile time, relocation information is still kept
around).  The THINK Pascal compiler on the Macintosh works this way:
the compiler resolves references at compile time, but when the final
build is done the linker removes all procedures which are unused.
  The VMS compilers mostly create a single object module per source
file (not keeping around relocation info).  The FORTRAN compiler,
however, generates one module per routine (within one object file)
which allows the linker to remove unused code.

>I know from personal experience using Microsoft C v5.1 on a PC that
>that particular linker includes ALL code in all modules that are ".o"
>files and includes ONLY the modules needed from libraries.

  Any system using the BSD UNIX object file format (or similar ones)
has to do this; the information needed to move code within a module
is not available.

>I agree about the previous poster's assesment of "brain damaged" but 
>perhaps a better description would be "efficient in time, not space."

  Well...sort of.  Hopefully, references within a module will be
resolved at compile time instead of link time, which will speed up
links.  It would still be nice if the object file format included full
relocation information, which could be skipped for a fast link, or
used for a "small" link.

  Just my thoughts....

			Anton

+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+

peter@ficc.uu.net (Peter da Silva) (12/08/89)

Smart linkers aren't that much of a win in practice, but they are pretty
safe for high-level languages. For example, the following situation is
not a problem:

In article <530@enea.se> sommar@enea.se (Erland Sommarskog) writes:
>    Routine PIC x(32).
>    ...
>    CALL Routine USING....
> Routine is not a literal, it is a string variable. All top entries
> corresponds to menu choices. When the user enters a menu choice,
> it is looked up in a database, which gives you the name of the
> procedure to call.

In which case the routine would be referenced in the *database*, and so
would be linked in.
-- 
`-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
 'U`  Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org>.

      "If you want PL/I, you know where to find it." -- Dennis

tim@hoptoad.uucp (Tim Maroney) (12/09/89)

In article <530@enea.se> sommar@enea.se (Erland Sommarskog) writes:
>Tim Maroney (tim@hoptoad.UUCP) writes:
>>WHAT?  What year is this?  I don't think I've ever used a linker that
>>didn't eliminate unused routines.  Any such linker would be seriously
>>brain damaged.
>
>While this may seem credible at first glance, it is at not second.

Try the third....

>I am very happy that the linker we use in our project (VMS LINK)
>don't remove uncalled routines. In that case it would notice that
>this routine is never called, and never is this one and so forth and
>rapidly it would have removed the 250 top modules.
>
>The trick is in Cobol where you can say
>   Routine PIC x(32).
>   ...
>   CALL Routine USING....
>Routine is not a literal, it is a string variable. All top entries
>corresponds to menu choices. When the user enters a menu choice,
>it is looked up in a database, which gives you the name of the
>procedure to call.

And of course, all such routines are referenced; pointers to them are
stored in this database.  The analogue in C is when a routine is never
explicitly called, but a function pointer to it is used in a referenced
routine.  The routine is referenced, just as routines entered into a
late-binding database are referenced.
-- 
Tim Maroney, Mac Software Consultant, sun!hoptoad!tim, tim@toad.com

"This signature is not to be quoted." -- Erland Sommarskog

sommar@enea.se (Erland Sommarskog) (12/10/89)

Peter da Silva (peter@ficc.uu.net) writes, quoting me:
)Smart linkers aren't that much of a win in practice, but they are pretty
)safe for high-level languages. For example, the following situation is
)not a problem:
))    Routine PIC x(32).
))    ...
))    CALL Routine USING....
)) Routine is not a literal, it is a string variable. All top entries
)) corresponds to menu choices. When the user enters a menu choice,
)) it is looked up in a database, which gives you the name of the
)) procedure to call.
)
)In which case the routine would be referenced in the *database*, and so
)would be linked in.

Eh? You link relational databases with your executeable? With
arbitrary relations and columns? The linker has not only to be
smart, but to be clairvoyant to see which column in which relation 
is the function name.

To make it even worse, one code may map to different routines
at different sites, since two customers want different behaviour.
To make it simple, you link both routines with your executeable,
and the contents in the menu databases at the customer site decide 
which variant they will run.


-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
Mail me your votes on comp.lang.cobol.

peter@ficc.uu.net (Peter da Silva) (12/13/89)

> )In which case the routine would be referenced in the *database*, and so
> )would be linked in.

> Eh? You link relational databases with your executeable?

Database != relational database. In this case it just means a table of
function names and locations.

> To make it even worse, one code may map to different routines
> at different sites, since two customers want different behaviour.

So far, so good.

> To make it simple, you link both routines with your executeable,
> and the contents in the menu databases at the customer site decide 
> which variant they will run.

Say what? You have an external relational database containing absolute
addresses in your executable? What happens when you want to ship them a
new version of the program? They have to rebuild the database?
-- 
`-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
 'U`  Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org>.
"It was just dumb luck that Unix managed to break through the Stupidity Barrier
and become popular in spite of its inherent elegance." -- gavin@krypton.sgi.com

sommar@enea.se (Erland Sommarskog) (12/17/89)

Peter da Silva (peter@ficc.uu.net) writes:
>> )In which case the routine would be referenced in the *database*, and so
>> )would be linked in.
>
>> Eh? You link relational databases with your executeable?
>
>Database != relational database. In this case it just means a table of
>function names and locations.

Peter, if you think you are clairvoyant, I've bad news for you.
There is something disturbing your reception. I introduced this
thread, including the database. So maybe you should try to under-
stand what I'm talking about instead of deciding that on your own.
The database I'm talking of is an RDB database, and last time I
look R stood for relational. That it is relational is beside the
point, but point is that does not conatin any information on function 
locations.

In short: the thing is a generic menu handler. The user enters a
code or a number which is looked up the database. What you get is
the *name* of the function to call. You also get some information
what access rights the user has to this particular function.
The menu handler then calls function which is application specific.
  The menu handler has it own set of menues for maintenance. For
adding or modifying users, but also to add or modify function entries.
For instance if there is a function which only one customer have
paid for, only in the menu database for that customer that function
is availble. The others have it in the executeable, but cannot access
it. (Unless they know the name of function in which case they add
it. That is less likely with our crude name standards.)

>> To make it simple, you link both routines with your executeable,
>> and the contents in the menu databases at the customer site decide
>> which variant they will run.
>
>Say what? You have an external relational database containing absolute
>addresses in your executable? What happens when you want to ship them a
>new version of the program? They have to rebuild the database?

No. Who said that were any absolute addresses in the database, Peter?
Certainly not me. You made that up yourself. I even told the trick
in my first article, and I have mentioned it before. But let's take 
it again:

    Routine PIC X(32).
    ...
    CALL Routine USING ...

If you didn't recignize it, this is Cobol. Peter is probably stuck in
C thinking, thereof his talk of absolute addresses. Anyway, Routine is
a string variable, into which we load the contents of the database entry.
Then we call the function with the name Routine contains. If there is
no such routine (on VMS at least this has to be another Cobol procedure) 
we get a run-time error. (Which is handled by the menu handler.)

Yes, somewhere there is a coupling function name <-> address, but
that is handled by the Cobol compiler and the Cobol run-time library.

My original comment to the linker discussion was that the menu
handler wouldn't work with a linker that removed unreferenced
modules, since the routines called by the menu handler are neiher
compile-time nor link-time references, but run-time references
and beyond the linker's horizon.
  Of course this doesn't mean that a linker shouldn't be allowed
to remove unreferenced routines, but that you need a mechanism
to tell such a linker that it should include a routine no matter
whether it's referenced or not.

(Some people may question the wise of making everything one big
executeable, as we do. We are heading for a better solution.
We're making every function a shareable image of its own to 
be activated by LIB$Find_image_symbol. This means that the
main executeable will only be the menu handler.)
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
Mail me your votes on comp.lang.cobol.