[comp.unix.internals] shared libraries can be done right

bill@franklin.com (bill) (05/20/91)

This nonsense that what's-name has been spouting made me decide
to describe how I think shared libraries can be done correctly.
This is basically a dynamically linked shared library scheme with
the linking occuring as each page is read from its file.

Here's how I'd do it. To create a shared library, you first create
executable images that contains shared references only. This is a
two step process. The first step creates an ordinary object file
with undefined symbols. The second step, linking the object with
its shared libraries, associates each undefined with the file it
is contained in, thus converting it to a shared reference. An
object file is executable once the only references it contains
are shared references. Note that this permits mutually
referencing shared libraries if we allow this linking to occur
with the unshared version of a library. In my scheme, "shared
library" is a misnomer; they are really "dynamically linked
executables" or some such buzz; the difference between an
executable and a shared library is only that executables have an
entry point and shared libraries don't.

When a file is exec'ed, its referenced shared libraries are
opened if they haven't been already. This is done recursively so
that a processes knows at the start that all shared libraries
needed are available and don't conflict in addresses. (This isn't
necessary, actually; one could defer this till later, with the
attendant failures at obscure points in random programs.)

Each time that a page is loaded in from a file (at exec time or
at a page fault), any shared references are satisfied in that
page before the page is made available to any process.

Here are the overheads:

Program startup requires making the shared libraries available.
This should be significant only the first time that a shared
library is referenced; all other references should discover
quickly that the shared library is opened. When a shared library
is opened, a special segment is created for its symbol table.
This increases, slightly, the memory needed for using the shared
library.

When an executable or shared library is opened, a segment has to
be created to hold the shared reference information. This also
costs some memory.

When a page is loaded from its file, its shared references must
be resolved. This implies references to the shared reference
information for its file and the symbol tables of referenced
shared libraries. Depending on the implementation, this could
happen each time the page is faulted in, or it could be done only
the first time the page is read from its file and that page could
be then stored in a swap area. In any case, this amounts to a
very simple and fast loop to do the fixups and it only has to be
done occasionally.

Here is the drawback:

Each shared library must exist in a specified region of virtual
memory and this must be decided when the shared library is
created. If one wanted to be clever and avoid this problem, the
shared libraries could also contain relocation information. The
way this would be used is this: a shared library would have a
"preferred" location, one where the library gets placed if there
are no conflicts. When located at this preferred location, no
relocation is done. However, if there is a conflict when an
executable is started, a new set of segments is created for a
relocated version of the shared library, at a system selected
address; these new segments could be used to deal with other
conflicts as well. This would incur an additional overhead, but
only for those processes that reference a relocated shared
library. Also, by opening shared libraries in reverse order of
linkage, system shared libraries generally would never be
relocated, resulting in the cost being borne by users (or
vendors) who create shared libraries and don't take care to avoid
the system's library addresses.

---

As near as I can tell, this gives all of the advantages of shared
libraries, at a minimal cost. It does not require special coding
of the libraries or any other nonsense; one merely links them a
bit differently.

Anyone see any material drawbacks?

Note that if what's-name comes back with noise about how it
doesn't solve any problems or the mere statement that the
overhead is unacceptable, I'll ignore him and I hope you all will
too. Those of us interested in constructive activity are aware of
the problems that shared libraries can solve and don't need to
prove their existence. As for the overhead, all the overheads I
can see are microscopic in comparison to the overhead of, e.g.,
additional paging or swapping induced by not sharing one's
libraries, so I see this also as a non-issue. If someone has
evidence that I've overlooked something, I'll be happy to examine
it, but I'm not interested in mere assertion.

beal@paladin.owego.ny.us (Alan Beal) (05/24/91)

I just came into them middle of this discussion on shared libraries, and I 
was wondering if someone could explain the typical UNIX implementation
of shared libraries.  UNISYS Large Systems(A-series) have libraries(shared
and private); since I understand how these work, let me explain the
implementation and you might be able to explain how UNIX's implementation
differs.

First of all, a UNISYS library runs as a separate task; it is initiated by the
OS when a program calls one its procedures.  The library after beginning to
run performs any initialization code and then 'freezes' which in effect
makes its exported procedures available for other programs to use.  The
library can 'unfreeze' when noone is using its code if it froze
temporarily or via the operator THAW command if it froze permanently.
Essentially a library acts like a server but only is initiated when
needed and may terminate when no longer in use.

Here is some sample code in ALGOL:

library(*SYSTEM/A)                            calling program
-------------------------------        ----------------------------------------
BEGIN                                       BEGIN
  % GLOBAL DATA                                INTEGER AMOUNT, NEW_A;
  INTEGER A;                                
                                               LIBRARY LIB(TITLE='*SYSTEM/A.');
  INTEGER PROCEDURE INCREMENT_A(AMT);          
    INTEGER AMT;                               INTEGER PROCEDURE INCREMENT_A(B);
  BEGIN                                          LIBRARY LIB;
    INCREMENT_A := A + AMT;  
  END;                                         %---- OUTER BLOCK ----
                                               AMOUNT := 2;
  %----- OUTER BLOCK, INITIALIZE A ---         NEW_A := INCREMENT_A(AMOUNT);
  EXPORT INCREMENT_A;                       END.
  A := 0;
  FREEZE(PERMANENT)
END.
     
In my example INCREMENT_A is a shared library procedure that happens to
access a global variable.  When the procedure is called, the OS initiates
the library, executes the procedure, and returns back to the calling
program.  If it is a shared library, then multiple calling programs can
call the same procedure at the same time; of course you have to code for
mutual exclusion of global data in the library.  The name of the library
(*SYSTEM/A) can be changed at run-time if desired.  This is basically
how the database software(*SYSTEM/ACCESSROUTINES) works for these systems.
Advantages: reduced number of tasks in the system, encourages reusable
code, and reduces the binary size of the calling programs.  Disadvantages:
library procedure calls are so-what slower and shared libraries must code
for mutual exclusion.

I assume UNIX shared libraries are dynamicly linked libraries with
shared text and separate data.  If so, then it looks like the major
difference is that UNIX libraries are not separate tasks.  Does any
one see any advantages of one over the other?  What do you think of
Unisys's implementation - it has been around for awhile?
-- 
Alan Beal
Internet: beal@paladin.Owego.NY.US
USENET:   {uunet,uunet!bywater!scifi}!paladin!beal

rca@nss1.com (Rich C. Ankney) (05/28/91)

I think it is important to point out that A-series library calls result in the
called function running on the caller's stack; i.e. no context switch is ever
required.  The somewhat slower execution time is measured in microseconds and
is due to the need to update the (static) stack frame linkages to reference  
the library's environment (i.e. global variables).  (The A-series architecture
is completely stack-oriented and this kind of inter-stack reference has been
supported by other means since the early 1970s.  As I recall, libraries debuted
in the mid-1980s, and much of the A-series system software was modified to
become libraries instead of using ad hoc mechanisms to do the same thing.)

Still one of my favorite architectures in the "elegance" category, even if
they don't know how to market it...

mjs@hpfcso.FC.HP.COM (Marc Sabatella) (05/29/91)

For those of you who haven't put this discussion into your "kill" files by now:

I just finished reading through all the postings on shared libraries (a
coworker pointed me here; I don't normally read this group), and would like to
contribute some of my thoughts.

Bill, your proposal is in fact quite similar to the Apollo Domain system, and
what I last heard proposed for OSF/1, except that you propose a finer (page)
granularity.  At the heart of both all of these is the concept of
"pre-loading", where the first time a dynamically linked page is loaded, its
external references are fixed up.  This assumes, as you explicitly stated, that
the resolutions will be the same for each program.  Unfortunately this cannot
be guaranteed.  The "malloc" example brought up by several people in response
to Alex's claim that shared libraries should be "simple and elgant"
demonstrates this well.  A library may make calls to malloc(), but different
programs may provide their own definitions of malloc(), and the library's
references would have to be resolved differently for each.  Some means must be
provided for this.  Were it not for the desire to allow this sort of
interposition, shared libraries would be a great deal simpler than they are.
This is also why a segmented architecture is no panacea, and why position
independent code needs to have some indirection in it to be useful.

Some numbers from a paper I presented at last year's USENIX conference were
tossed around.  Guy Harris wondered if my claims about programs spending little
time in library routines was well-founded in the case of window programs.  I
admit I don't know; I've heard it claimed from someone who implemented his own
shared library scheme that his applications spend 90% of their time in the X11
libraries!  Note that library code tends to be "purer" than application code
(as several point out, most of the C library is probably pure already), so the
penalty for PIC (indirect reference to global data mainly) will be less than
the penalty I measured using Dhrystone and other standard benchmarks.  The
indirect procedure calls (or calls through a jump table) will still hurt, but
I think Masataka-san greatly overestimates the effect this will have on most
programs.  Do you really worry that much about 6 cycles per call?  As for the
startup overhead, after the conference I took some of Donn Seeley's ideas and
tuned my dynamic loader.  I ended up improving its performance by almost a
factor of two, and with tuning to the memory mapping kernal function, we ended
up getting the startup performance hit down to about 30 milliseconds per shared
library used by the program.  This includes the amortized cost of dynamic
binding and relocation.  The degradation in performance on SPEC is not
measurable.

As for the benefit, they are great indeed when considering disk space.  HP-UX
shaved off 16 MB in core commands alone - ie, not even including X11.  However,
there is a tradeoff here as well.  Since the mapping operation generally
reserves swap for shared library data segments mapped copy on write, a program
that uses only a little of a library's static data segment may need more swap
space to execute than it would if it were linked with archive libraries.  In
the shared case, swap is reserved for the whole library's data segment, but in
the archive case, only those few modules needed by the program are copied into
the a.out, so the data space for the rest of the library needs no swap at run
time.  We measured up to 100K of "wasted" swap per process for Motif
applications.

As for memory savings, I tend to side with Masataka-san on this - you'll have
to prove it really does make a difference.  So far, I've seen little other than
anecdotal evidence.  There was a discussion earlier as to whether most of real
memory was being used for potential shareable text, or for clearly unshareable
data, and I wish someone would produce some actual numbers.  My gut feel is
that the savings from sharing even the X11 libraries' text won't amount to
much as far as really reducing memory consumption as long as huge amounts of
data are being horded.  Sharing of the "xterm" and "[ck]sh" executables
themselves probably gets me most of the text savings I am going to get on most
systems with which I am familiar, but I probably don't live in the "real
world".  In any case, the rumor about Sun implementing shared libraries
primarily to save disk space rings true of HP; any other benefits were gravy.
Note that providing a general purpose dynamic linking capability probably did
not weigh heavily for Sun, as they did not provide such a facility until SunOS
4.1

Barry Margolin made a comment long time about shared libraries should be
"linked on demand" as in Multics.  That is in fact the way they are usually
done in Unix, at least for procedures.  Data references are usually done up
front simply because most systems don't have Multics' wizzy architecture.  But
note in HP-UX, we are at least able to defer data resolution and dynamic
relocation until the first reference to any procedure defined in the module
within the library.  We also have a way to defer evaluation of static
constructors using the same mechanism.

Multics actually "loads" a segment on first reference as well.  Most Unix
implementations "map" libraries at startup time, and "load" pages on demand.
While we could certainly defer mapping as well, it is not clear to me that it
would be worthwhile.  Typical programs use a handful of relatively large
libraries, each of which would tend to be reference fairly soon, so the
deferral wouldn't buy much.  If Unix switched to the Mutlics everything-is-a-
separate-little-segment approach, deferred mapping would appear to make more
sense, but we'd have to reduce the overhead of the mapping operation to make
this realistic.

--------------
Marc Sabatella (marc@hpmonk.fc.hp.com)
Disclaimers:
	2 + 2 = 3, for suitably small values of 2
	Bill and Dave may not always agree with me

bill@franklin.com (bill) (05/30/91)

In article <18370001@hpfcso.FC.HP.COM>
	mjs@hpfcso.FC.HP.COM (Marc Sabatella) writes:
: For those of you who haven't put this discussion into your "kill" files by now:
: Bill, your proposal is in fact quite similar to the Apollo Domain system, and
: what I last heard proposed for OSF/1, except that you propose a finer (page)
: granularity.

There's something to be said for either end of the spectrum. With
a small granularity, you don't have to load in the entire
executable (or the pages with shared references, anyway); you can
just load what gets used, which I think is particularly important
when the thing being used is, for example, a huge library like
the X libraries (are supposed to be, I don't use X because I
dislike hogs) For a large granularity, you get to toss more data
once you've loaded the library, and you have less work in setup
and the like for doing the fixup.

There's obviously a minimum there; I think (totally as a matter
of intuition, this should obviously be tested) that it is at the
smaller granularity.

:               At the heart of both all of these is the concept of
: "pre-loading", where the first time a dynamically linked page is loaded, its
: external references are fixed up.  This assumes, as you explicitly stated, that
: the resolutions will be the same for each program.  Unfortunately this cannot
: be guaranteed.  The "malloc" example brought up by several people in response
: to Alex's claim that shared libraries should be "simple and elgant"
: demonstrates this well.  A library may make calls to malloc(), but different
: programs may provide their own definitions of malloc(), and the library's
: references would have to be resolved differently for each.  Some means must be
: provided for this.

I had this pointed out to me in e-mail; here's what I had to say:

| Suppose you have two shared libraries that define the same
| symbols; perhaps they are different versions of the shared
| library. Some program comes along and runs the first and then runs
| again using the second. The second invocation of the program has
| to consider itself to be not shared with the first invocation; its
| shared text isn't, in this case, shared. Actually, you could
| share those pages of the text which don't make reference to the
| differing shared libraries. Life gets complicated if you do that,
| I think. Still, it might be worthwhile if it can be done
| efficiently, because it would mean that some of the more common
| situations don't cause problems with wasted memory.
|
| This situation, one would hope, doesn't occur often. Either the
| changed version I mentioned above or a literal use of different
| libraries. Another possibility is that a shared library *could*
| refer to variables in the main program. In this case, you lose
| most of the value of shared libraries unless you do the page by
| page thing.
|
| This would get checked for during process startup but is a simple
| enough test, so I don't think it changes anything. You'd have to
| make most of the test anyway, just to open the shared libraries.

Anyway, this seems to solve the problem you mentioned, without
excessive hackery.

:                                                                        However,
: there is a tradeoff here as well.  Since the mapping operation generally
: reserves swap for shared library data segments mapped copy on write, a program
: that uses only a little of a library's static data segment may need more swap
: space to execute than it would if it were linked with archive libraries.  In
: the shared case, swap is reserved for the whole library's data segment, but in
: the archive case, only those few modules needed by the program are copied into
: the a.out, so the data space for the rest of the library needs no swap at run
: time.  We measured up to 100K of "wasted" swap per process for Motif
: applications.

The trade-off, then, is between the allocated space in swap for
each running process, vs. the disk space saved for the
executables? Is there any other way to avoid the swap deadlock I
assume is the reason for allocating for the worst case?

If so, the solution to this would be to use that method and then
allocate swap space to meet expected peak, not worst case; if not,
the question is whether the total of all running processes is
going to be comparable to the total of all commands. I suggest
not. :-)

Another solution to this problem is to use smaller shared
libraries, instead of a monolithic library. At least in my scheme,
this doesn't involve much additional overhead, so it would be the
easy solution.

: As for memory savings, I tend to side with Masataka-san on this - you'll have
: to prove it really does make a difference.  So far, I've seen little other than
: anecdotal evidence.  There was a discussion earlier as to whether most of real
: memory was being used for potential shareable text, or for clearly unshareable
: data, and I wish someone would produce some actual numbers.  My gut feel is
: that the savings from sharing even the X11 libraries' text won't amount to
: much as far as really reducing memory consumption as long as huge amounts of
: data are being horded.

I suppose this depends a lot on your mix of applications. On my
system, the total space is probably 50/50 between data and text,
if you ignore the savings obtained from shared text segments.
Counting shared text, this is probably more like 80/20 in favor
of data. So, in general, it would seem that for me there isn't
that much savings to be had.

However, when my system starts swapping at all, its performance
is significantly worse than when it doesn't swap. Also, there is
a knee in the curve, which shared libraries can help avoid
running into. So, while I wouldn't say that the savings that
shared libraries provide in memory are always significant, there
are definitely circumstances where they are.

mjs@hpfcso.FC.HP.COM (Marc Sabatella) (05/31/91)

>There's something to be said for either end of the spectrum. With
>a small granularity, you don't have to load in the entire
>executable (or the pages with shared references, anyway); you can
>just load what gets used

This happens trivially anyhow with demand loading.  Using the scheme we
developed for HP-UX, you can get at least object module granularity on your
relocations, so "demand" is only for a module at a time.

| Suppose you have two shared libraries that define the same
| symbols; perhaps they are different versions of the shared
| library. Some program comes along and runs the first and then runs
| again using the second. The second invocation of the program has
| to consider itself to be not shared with the first invocation; its
| shared text isn't, in this case, shared. Actually, you could
| share those pages of the text which don't make reference to the
| differing shared libraries. Life gets complicated if you do that,
| I think. Still, it might be worthwhile if it can be done
| efficiently, because it would mean that some of the more common
| situations don't cause problems with wasted memory.

This is why we normally use jump table.  They really aren't that big a deal.
Share the whole text, no fixup necessary except to a table in the data segment.

| This situation, one would hope, doesn't occur often.

No, but an analagous one does: some systems provide two versions of malloc()
located in different libraries.  Almost every library in the world calls
malloc() at some point, so there is the potential for a reference not being
shareable.  Again, indirect calls and data references (as opposed to merely
"PC-relative code") solve these problems with not a lot of overhead.
Particularly if you only use the indirection on references you "expect" to be
intercepted.

: In
: the shared case, swap is reserved for the whole library's data segment, but in
: the archive case, only those few modules needed by the program are copied into
: the a.out, so the data space for the rest of the library needs no swap at run
: time.  We measured up to 100K of "wasted" swap per process for Motif
: applications.

>The trade-off, then, is between the allocated space in swap for
>each running process, vs. the disk space saved for the
>executables? Is there any other way to avoid the swap deadlock I
>assume is the reason for allocating for the worst case?

Sure - arbitrarily kill off a process.  But it is so much less painful to not
let a process start up than to kill it once has started.  Do you really want to
see your X server die because some trivial "ls" is using all the swap at the
moment?

>Another solution to this problem is to use smaller shared
>libraries, instead of a monolithic library. At least in my scheme,
>this doesn't involve much additional overhead, so it would be the
>easy solution.

I agree with this, but note there is more VM overhead associated with having
lots of small libraries than their is with a few monolithic ones.

--------------
Marc Sabatella (marc@hpmonk.fc.hp.com)
Disclaimers:
	2 + 2 = 3, for suitably small values of 2
	Bill and Dave may not always agree with me

ske@pkmab.se (Kristoffer Eriksson) (06/01/91)

In article <18370001@hpfcso.FC.HP.COM> mjs@hpfcso.FC.HP.COM (Marc Sabatella) writes:
> This assumes, as you explicitly stated, that
>the resolutions will be the same for each program.  Unfortunately this cannot
>be guaranteed.  The "malloc" example brought up by several people in response
>to Alex's claim that shared libraries should be "simple and elgant"
>demonstrates this well.  A library may make calls to malloc(), but different
>programs may provide their own definitions of malloc(), and the library's
>references would have to be resolved differently for each.  Some means must be
>provided for this.

That's fairly simple, if you add a level of indirection. "Any problem can
be solved by adding another level of indirection."

I think most shared libraries have the need for a data segment that is
instantiated for each program that links to it, to hold such data that
would otherwise have been static or global in an ordinary library, anyway.
Providing such a segment is no problem; it can be automatically allocated
when the shared library is loaded, by the system or by the application
itself, or can even be statically linked into the program binary. The
simplest way thereafter to inform the shared library of where its data
segment is, is to pass the address of the segment as an additional
parameter in every call to the library. Ideally, the compiler should do
that automatically and transparently for all calls to shared libraries,
and also automatically address all static and global variables in a
library off of that additional parameter when it knows it is compiling
a shared library. Anyway, no matter whether the compiler helps out with
this or not, it is easy to implement, and there are other possibilities
too, like using statically linked wrappers for all library routines, or
adding a special memory page on a fixed offset from the library code
pages in the applications address space to hold such data, and address it
off the PC or the library's call address, or if the library is given the
same address in all programs linking to it, simply address that page with
fixed addresses, same for all.

Passing in pointers to global data in various ways has already been mention
previously in this discussion thread, but it has mostly centered around
giving malloc a pointer to malloc data, giving stdio a pointer to stdio
data, and so on, giving rise to a more or less pronounced need to pass
all these various pointers in to *every* library function that might
conceivably somewhere down the chain call these functions. For instance,
most stdio functions might very probably need the malloc pointer in order
to dynamically allocate I/O-buffers, in addition to it's own stdio pointer,
and who knows what more? The solution is to just have one pointer and one
data segment for each shared library, which gives you access to the data
needed by *all* the functions and packages that reside in the library. This
one pointer is readily available to all callers of the library's function,
since the pointer itself can simply be stored in an arbitrary global (to
the application) variable. This is actually how it is done on the Amiga
(which once again do things the right way). Additionally, the Amiga reserves
a special register for this pointer.

Having this data segment, I'd say that not only should the library's data
reside there, but also all calls out of the library back to the application
or to other libraries should indirect through this page. Then you can easily
patch in the correct address to whatever malloc() function the library should
use in this particular program during the dynamic linking phase, just as you
resolve all calls *to* the library. You also have to store library data
segment pointers here for the other libraries that this library makes calls
to, since the library, unlike the application, can not get them from the
applications global variables.

An exception might be calls from the library that are already required by
the library to go to a specific other library, rather than to be resolved
according to the application program's preferences. Those can be resolved
once and for all, if all shared libraries stay at the same addresses in all
programs that reference them. On the other hand, if you really want to
simplify everything, you might consider limiting yourself to *only* use this
last kind of calls out of shared libraries, and scrapping all dynamic call
resolution. You might ask yourself if it really is essential to be able to
influence where the library's own calls go. You could view the library as
a fixed package, including everything it itself calls.

> Were it not for the desire to allow this sort of
>interposition, shared libraries would be a great deal simpler than they are.
>This is also why a segmented architecture is no panacea, and why position
>independent code needs to have some indirection in it to be useful.

I don't think just a little bit of indirection makes it so much more
complicated, and it's not just the position independence that causes this,
you will have it even using fixed addresses, if you want calls out of a
shared library to refer to different places depending on the program
linking to it ("reference independance"?).

-- 
Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden
Phone: +46 19-13 03 60  !  e-mail: ske@pkmab.se
Fax:   +46 19-11 51 03  !  or ...!{uunet,mcsun}!sunic.sunet.se!kullmar!pkmab!ske

tchrist@convex.COM (Tom Christiansen) (06/03/91)

From the keyboard of ske@pkmab.se (Kristoffer Eriksson):
:That's fairly simple, if you add a level of indirection. "Any problem can
:be solved by adding another level of indirection."

Untrue: you cannot solve the problem of too many levels of indirection
that way, which while apparently a pathological case, may in fact apply
to this one, since indirection usually costs access time.

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time." 

bill@franklin.com (bill) (06/03/91)

In article <18370002@hpfcso.FC.HP.COM>
	mjs@hpfcso.FC.HP.COM (Marc Sabatella) writes:
: >There's something to be said for either end of the spectrum. With
: >a small granularity, you don't have to load in the entire
: >executable (or the pages with shared references, anyway); you can
: >just load what gets used
:
: This happens trivially anyhow with demand loading.  Using the scheme we
: developed for HP-UX, you can get at least object module granularity on your
: relocations, so "demand" is only for a module at a time.

I think we just said the same thing. :-) In the scheme I proposed,
the "small granularity" is the page; in the one you describe, it
is the object module.

I've actually thought that maybe doing shared libraries with each
module and shared set of globals as a separately pageable entity
would be the right way to go. But with the relatively large pages
that seem common, this would (as you point out) increase the
amount of paging.

Then again, maybe not. It really depends on, for example, whether
the modules in the shared library are linked in an order that
improves locality. In the typical case, a module starts at some
random place in a page and may extend past the page end, even if
it is less than a page long. If the following routine isn't used
(a good chance, with libc, for example), some of its code gets
loaded pointlessly. Whereas, if you start each module on a page
boundary, you minimize the paging for that module, while
potentially increasing the total paging for the library. Another
advantage of doing this is that the compiler may be able to do
things like not splitting loops across page boundaries, which
could also decrease the working set.

I'm still ambivalent about this.

: | Suppose you have two shared libraries that define the same
: | symbols; perhaps they are different versions of the shared
: | library. Some program comes along and runs the first and then runs
: | again using the second. The second invocation of the program has
: | to consider itself to be not shared with the first invocation; its
: | shared text isn't, in this case, shared. Actually, you could
: | share those pages of the text which don't make reference to the
: | differing shared libraries. Life gets complicated if you do that,
: | I think. Still, it might be worthwhile if it can be done
: | efficiently, because it would mean that some of the more common
: | situations don't cause problems with wasted memory.
:
: This is why we normally use jump table.  They really aren't that big a deal.
: Share the whole text, no fixup necessary except to a table in the data segment.

I'll admit to a prejudice that says that reasonable fixed
overheads are better than overheads that increase indefinitely
(e.g, percentage overheads). For this discussion, that means that
I feel that paying at load time is better than paying at run time.
My feeling is that, for "short" programs, the difference makes no
difference, but for "long" programs, the long run cost is less
with my approach.

: | This situation, one would hope, doesn't occur often.
:
: No, but an analagous one does: some systems provide two versions of malloc()
: located in different libraries.

True, in one sense of often. However, by "often" I meant: often
enough that following my suggestion would result in wasting
something on the order of the amount of space that it saved. On my
system, for example, with just two system shared libraries, this
situation would occur only when the user's program overrode
something in the library and that doesn't happen often (just as
often as I link with -lmalloc. :-)

Still, I can envision, for example, a testing environment, where
nearly every program override something in the shared libraries.
We'd want any system to at least not be pathological when
confronted with that circumstance.

This also argues for a small granularity with my scheme, in that
with a large one, the amount that gets unshared by this
circumstance would also be large.

: : In
: : the shared case, swap is reserved for the whole library's data segment, but in
: : the archive case, only those few modules needed by the program are copied into
: : the a.out, so the data space for the rest of the library needs no swap at run
: : time.  We measured up to 100K of "wasted" swap per process for Motif
: : applications.
:
: >The trade-off, then, is between the allocated space in swap for
: >each running process, vs. the disk space saved for the
: >executables? Is there any other way to avoid the swap deadlock I
: >assume is the reason for allocating for the worst case?
:
: Sure - arbitrarily kill off a process.

I meant: besides that. :-)

:                                         But it is so much less painful to not
: let a process start up than to kill it once has started.  Do you really want to
: see your X server die because some trivial "ls" is using all the swap at the
: moment?

No. (I had assumed that no one in his right mind would consider
killing off random processes as acceptable, so I just ignored that
obvious option.)

Anyway, I've demonstrated to my satisfaction that there is no way
to avoid the problem. Suppose you have N tasks, each needing its
full address space to complete, and each of which requires the
other N tasks to have done something after their full allocation
has been done (imagine N sorts connected by pipes). You'll need
space for all those processes, no matter what.

: >Another solution to this problem is to use smaller shared
: >libraries, instead of a monolithic library. At least in my scheme,
: >this doesn't involve much additional overhead, so it would be the
: >easy solution.
:
: I agree with this, but note there is more VM overhead associated with having
: lots of small libraries than their is with a few monolithic ones.

Well, see my comments above. This may or may not be true, for
suitable values of "lots".

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (06/03/91)

In article <5619@pkmab.se> ske@pkmab.se (Kristoffer Eriksson) writes:

>That's fairly simple,

Not at all.

>if you add a level of indirection. "Any problem can
>be solved by adding another level of indirection."

You had better realize that "another level of indirection is another level
of complexity".

							Masataka Ohta

ske@pkmab.se (Kristoffer Eriksson) (06/15/91)

In article <274@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>In article <5619@pkmab.se> ske@pkmab.se (Kristoffer Eriksson) writes:
>
>>That's fairly simple,
>
>Not at all.

How about you describe why you think it is not the least bit simple?

>You had better realize that "another level of indirection is another level
>of complexity".

I can think of a lot of worse cases. In this case it is fairly easy to
get a complete overview of the consequences.
-- 
Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden
Phone: +46 19-13 03 60  !  e-mail: ske@pkmab.se
Fax:   +46 19-11 51 03  !  or ...!{uunet,mcsun}!sunic.sunet.se!kullmar!pkmab!ske