[comp.lang.c] Shared library summary

rbbb@rice.EDU (04/08/87)
(Why this belongs in info-c is beyond me, but maybe it will be
educational).

I received replies from 
 BOB%UTAHMED.BITNET@wiscvm.wisc.edu (Bob Wheeler)
 blarson%castor.usc.edu@usc-oberon.ARPA (Bob Larson)
 Beebe@SCIENCE.UTAH.EDU (Nelson Beebe)
telling of shared libraries under VMS 4.* and PRIMOS 19.4-*.
I also discussed details of the VMS implementation with Scott Comer here.

First, under VMS it is NOT necessary to INSTALL a shared library; that is
only done for purposes of speed or protected (i.e, privileged) shareable
images.  Any user can create and/or use a shareable image.  Bob Wheeler
and Nelson Beebe both report that these are Good Things, both because of
the real memory saved and because of the flexibility.

   Shareable libraries  are  particularly nice  in  a  graphics
   application--the device  driver  library can  be  chosen  at run-time
   rather than link time, so we avoid a  proliferation of .EXE files
   loaded for a number of display devices (PLOT79 supports over 40
   different ones).

PRIMOS has shareable images, and in 19.4 and beyond does them in a "new"
nicer way.

   A link to a shared library, known as a dynt (dynamic entry) is a
   pointer to a charater string with the subroutine name with the fault
   bit set.  When a procedure is called inderectly through a pointer with
   the fault bit set, primos handles the exception and will search the
   libraries specified in the user's entrypoint search rules for a routine
   with that name.  When the routine is found, the library is mapped into
   the user's address space and the faulting pointer is replaced with a
   pointer to the actual routine.  (Such pointers must be in the users
   data rather than pure procedure space since the program may not be
   mapped in at the same address for other users.)  The pure procedure
   portion of the library is shared among all users using the library and
   paged against the run file (rather than copying to the paging device),
   and the static data is allocated out of process-class or
   procedure-class storage depending on the class of the library.
   (Process class storage will not be reallocated for further use of the
   library across program execution bounds.)  Libraries are referenced by
   their file name, and protected via the normal file system protection.

   Search rules are part of the user's enviornment, are initialized to the
   system default at login time, and may be manipulated via the
   set_search_rules command and some system subroutines. 

This approach seems to be doable entirely at run-time.  Any unresolved
references at link-time are just converted to these character-string
references.  This approach is (according to Scott) also used in Cedar Mesa
for unresolved references.

Another approach, less flexible but still imaginable, is to specify the
shared libraries at link time and leave holes in the address space for the
shared routines (you must specify them to the linker so it knows how big a
hole to make).  At run-time when the image page map is created these holes
are mapped to the shared libraries; if the mapped routine has become
bigger than its hole, you lose.

Both approaches require so additional fooling around in the operating
system to be sure that the same physical pages are actually mapped by all
current users of the image.  Without this, you still have the flexibility,
but every user of the image has his own private set of physical pages (but
everyone pages against the same image).

To add shareable libraries to unix (or any other operating system), the
following things must be taken care of:

1) Position Independent Code.  Without this, any shareable library
   implementation is bound to be brain-damaged.  It must be possible to
   load shared (or just run-time-bound) code into any place in text
   address space to avoid conflicts or wasted address space.

2) Debugging and shared code.  It is Not Friendly to actually set a
   breakpoint in a shared page; this must either be prohibited or handled
   by copying and remapping the page (ptrace is a system call, so it can
   handle this if it really wants to).

3) Correct paging and swapping of shared code.  There is no reason that a
   piece of shared code should not get paged out, but this should only
   happen after it has been ejected from the working set of every one of
   its users  (or else it should cause this to happen).  This creates an
   odd sort of dilemma---one the one hand, there is "no reason" to eject a
   heavily shared page from a working set because it "doesn't cost that
   much"  (only one physical page is used).  On the other hand, if it
   isn't ejected from working sets then it will never be released.  A
   possible solution to this is to page shared memory as if it belonged to
   a separate process (in fact, it might be managed by a separate
   process).  Another complication arises because it becomes possible for
   several faults to the same physical page to occur at the same time.

4) Calls from one shared, dynamically loaded library to another shared,
   dynamically loaded library.  To implement this, there must be a few
   unshared (data) pages containing the traps and names of the routines so
   that each user gets the correct calls.

I think I've said enough.  It should be clear that shared memory is Not
Enough to get shared libraries; one also needs position independent code
and some idea of what to do with debugging and shared calls to dynamically
loaded code.

David