[comp.os.research] Shared libraries

rab@mimsy.UUCP (Robert A Bruce) (04/15/87)

Here at UM we are trying to implement shared libraries as part
of an operating system for a multiprocesser computer.  The computer
consists of 16 MC68010 bases cpu boards, each with a 68450 MMU.
We are trying to implement forking and load balancing between processors.
There is no shared memory between processor boards, so moving a process
to a different cpu requires copying the entire text space.  If we
use shared libraries the size of the text segment of each process whould 
be substantially reduced, so copying could be done much more efficiently.

I would like to start some discussion about the best way to support shared
libraries and how the operating system should support them.

Some ideas that I have heard are:

1) Locate the library at a know location in memory.  At link time bind
   all the addresses to the addresses of the library subroutines.
   This would involve the least overhead, but would also be the most
   inflexible.  Every time the library was changed, or moved to a new
   location in memory, all programs would need to be relinked.

2)  Same as 1) but bind all the addresses when a program is loaded.
    This would cause more overhead at load time, but minor library
    changes won't break all existing programs.

3)  Use a jump table.  The jump table is located at a known location in
    memory.  The location of each jump instruction is bound to the
    program at link time.  Each library call involves one extra jump
    instruction, but I don't think these jumps would cause a significant
    performance degradation.

4)  Use an `openlibrary' system call that returns a pointer to a table
    of pointers to library routines.  I have heard the amiga uses
    something like this.  This would would require an extra level of
    of indirection but would be even more flexible.

darrell@sdcsvax.UUCP (04/17/87)

A combination of fixed addresses and binding at load time is possible,
by associating a version number with the library.  At link time, the
version number of the library is stored in the load module. At load
time, that version number is compared with the version number of the
library that is being shared. If they match (which we assume they will
usually do), all is fine. On a mismatch, more work is needed. Two
solutions come to mind:

- Load the version of the library needed by the load module. This
assumes that old versions of the library are kept around. It also means
that we waste space for multiple versions of the library, but no more
than that needed for non-shared libraries.

- Patch the references in the load module. The linker must save in the
load module a list of the places where library references have been
inserted. Then the loader can fix these up with the real references.
This assumes that the semantics of the library have not changed. It
also requires a more complex loader, and extra crud in the load
module.

        --Per Bothner

bothner@pescadero.stanford.edu  ...!decwrl!labrea!navajo!bothner
Computer Science Dept., Stanford University, Stanford CA 94305

barry@adelie.Adelie.COM (Barry A. Burke) (04/27/87)

In article <2990@sdcsvax.UCSD.EDU> rab@mimsy.UUCP (Robert A Bruce) writes:
>There is no shared memory between processor boards, so moving a process
>to a different cpu requires copying the entire text space.
What a bummer. Shared memory space would save you TONS! of overhead. Process
exchange overhead is what causes 16 1MIP processors != 16MIPS. 

Beleive it or not, at Prime several years ago we built the Prime 850 out of
two closely coupled 750's. After a whiz engineer (Hi, Dave!) resolved the
issues of instruction pipeline re-initialization, it became possible for a
process to be suspended by the scheduler on CPU A, and then continue on CPU B.
Of course, the Prime 50 series architecture made all this possible through
it's process register management: A processes' entire register set (including
instruction pointer, etc.) could be mapped into any one of the (2, 4, 8,
16-depending on CPU model) CPU register sets in microseconds. Process exchange
consisted of flopping (1,2,3,4) bits - something the 750 could do 40,000
times/second. The 850 used a 1Kb "window" to share these register sets between
the two 750 CPU's.

Bottom line? 2*750 = 2.2*750! Why? becuase all interrupt processies could only
run on one of the CPU's, (leaving only about 90% for user's) The second CPU
was 100% available for user processies. So 1*750=.9, 2*750=1.9, 1.9/.9=2.2
(almost)... 

>use shared libraries the size of the text segment of each process whould 
>be substantially reduced, so copying could be done much more efficiently.
>
>1) Locate the library at a know location in memory.  At link time bind
>   all the addresses to the addresses of the library subroutines.

Bad news. Prime Originally did this- turns out the pain of having to recompile
really makes customer's mad...
>
>2)  Same as 1) but bind all the addresses when a program is loaded.

Expensive, but on the right track (see next item).

>
>3)  Use a jump table.  The jump table is located at a known location in
>    memory. 

Almost...

>4)  Use an `openlibrary' system call that returns a pointer to a table
>    of pointers to library routines.  I have heard the amiga uses

Hooray! Whatthey did at Prime is create a subroutine call structure in the
loaded program that looks like this (in layman's terms):

	jump indirect address
	

address INSTRUCTION FAULT
	"name of routine"

barry@adelie.Adelie.COM (Barry A. Burke) (04/27/87)

Ooops. In my previous article, I said the list of shared routines' names
where kept at the start of each programs DATA area. Wrong- the indirect
"address" pointer were kept there. The list of names are actually kept in a
list that begins at a fixed memory locations (but it links through several
different "libraries", each within it's own segment(s)). The routines
themselves are generally free-floating, but pre-loaded (at startup time) into
fixed memory areas. This works well in a truly dynamic paging environment- the
pages are on the paging device until needed, then they are dynamically located
in memory when paged in. The Prime's have neat hardware to do this sort of
memory management (wish that DEC did!), so there is very little overhead in
this scheme. I'm not sure sure that the 68000 MMU can handle such complex
tasks (I recall discussions at Prime that indicated that such was out of the
realm of most any `generic' MMU).

Sorry for any confusion...

-- 
LIVE:	Barry A. Burke, (617) 499-6370
USPS:	Adelie Corporation, 125 CambridgePark Drive Cambridge, MA  02140
UUCP:	barry@adelie.Adelie.COM / ..!{harvard,ll-xn,necntc,mirror}!adelie!barry
ARPA:	barry@adelie.Adelie.COM (via MX) / barry%adelie@harvard.Harvard.EDU

steve@edm.UUCP (Stephen Samuel) (05/02/87)

The way run-time loading is done on MTS requires 3 loader sections:
1) External Symbol Directories consist of the name of the routine and the number
2) ReLocation Directory consisted of the rotine # (from ESD), displacement into 
the loaded routine where it is used, and type of reference (2-byte/4-byte,
relative/absolute etc.)
3) The actual text contains any displacement from the 'external' location's
public address

 EG: pseudo-C for an absolute reference to _fread+16 from 0x7da into a subroutine,
with fread being the 6th symbol referenced) would consist of:

 ESD: ... 
  struct { char EsdNam[ENSIZE] ; THING * ES_ptr; } esd_sym[]
    = { <5 entries>,  {"_fread  ",NULL} ...}

 RLD: ...
  struct { int EsdNum; int disp ; FLAGS type }
rld_sym[] ={ ..., {5,0x7da,RT_16bit+RT_abs} ... }

 routine+0x7da: (thing *) 16

once all external references were found, the value of each referencing
location (assuming all pointers were the same length) was:

for(i=0;i<RLDcount;i++){
  proc[rld_sym[i].disp] += esd_sym[rld_sym[i].EsdNum].ES_ptr -
  ( (rld_sym[i].type & RT_abs) ? 0 : &(proc[rld_sym[i].disp] ) ) ;
};

We leave it as an exercise to the reader to add thing like: 1/2/4 byte pointers;
Symbols OTHER than simple external references; everything else I've forgotten.

For those of you who don't know: MTS is an operating system for /370 type
systems that's actually reasonable and nice to use (and definitely NOT
from IBM). The loader form I paraphrased above may actually originate with
an old IBM loader but, not having seen one, I can't really say.


--
 Stephen Samuel 			Disclaimer: You betcha! (done@4am)
  {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve

DN5@PSUVM.PSU.EDU (01/18/90)

Does anybody know any references explaining how shared libraries could
be implemented.  Most C programs under unix contain much of the standard
i/o libraries, and would benefit by sharing this code amongst several
processes.  While the savings woundn't be much for JUST the standard
i/o libraries, with the coming of larger and larger libraries to support
the new user-interface models, these libraries are getting bigger and
bigger.

I'm sure that there are many papers and books on the subject, but I haven't
found anything specific (all the ones I've found describe the process so
generally it really doesn't help).

                       D. Jay Newman
                       dn5@psuvm.psu.edu