[net.unix-wizards] Shared system routines

stephen@alberta (07/23/83)

.nh
.ad l
    It has bothered me for a while that every time I compile a C
program, of the routines which I use are physically added to the
object file.  When you consider that many of the routines are
included in almost every object file, perhaps it would be better
to share one copy of the more common routines between all tasks.

    The most obvious advantage of this approach would be the
savings in disk space. As an example: On our system, with about
1200 object files, the savings from sharing the startup routines
and PRINTF (apx 7k per file) would come to about 9meg.  Sharing
routines would also result in decreased memory usage and possibly
faster loading times (since only the user routines would have to
be loaded).

    Those subroutines used by the kernel would have to be
designated as shared non-paged, and others could be simply
shared.

The big question is how easy would it be to make the change?
Memory management would have to be modified, the loader, and
possibly things like 'adb' as well. And where would the
routines be placed? If they are put at the bottom of memory,
would this cause problems with routines that didn't expect them
there? And if they are placed at the top of memory, would that
cause problems with ADB and CDB or get in the way of the stack? I
don't yet know enough about UN*X to really answer those questions
well.

    Does it sound like a feasible Idea or am I out in left field?

		Stephen Samuel 
		(ubc-visi!alberta!stephen)

msc@qubix.UUCP (07/24/83)

There was a discussion about shared libraries less than 2 months
ago.  The conclusion was to leave things the way they are.
-- 
	Mark
	...{decvax,ucbvax,ihnp4}!decwrl!
		      ...{ittvax,amd70}!qubix!msc
	decwrl!qubix!msc@Berkeley.ARPA

guy@rlgvax.UUCP (Guy Harris) (07/24/83)

There was some discussion of shared libraries in UNIX a while ago.  Many
other operating systems do support them, and they probably do cut down on
the physical memory requirements of the programs that use them.

The main tricky part is that they cannot call routines outside the shared
library except through an indirect pointer of some sort.  If they called them
using the standard subroutine call instruction provided on most machines,
the address of the routine would be hardcoded into the code of the routine
itself.  However, this address may be different in different programs which
include this routine.  Furthermore, if they reference any globals they would
also have to reference them through such a pointer.  Some current routines
might have to be modified.  Another alternative is to have two data segments;
one for globals referenced by the library, and one for others.  The globals
referenced by the library must be DEFINED (not just referenced) by the library
routine; the size must be assigned at the time the library is built, not the
time the program using the library is built.

Also, the shared library routines must either be position-independent code
(which the PDP-11 C compiler does not generate) or must always appear at the
exact same place in the virtual address space in all processes.

As long as the shared library routines always appeared at the same address
in all programs' virtual address spaces, the exact placement wouldn't be a
problem; put it wherever your machines' memory mapping hardware wants you to.
I suspect the various debuggers wouldn't care too much where they appeared,
except that UNIX prefers that the data segments appear before the stack segment
in virtual address space - but this can probably be gotten around if necessary
(I'm sure there's at least ONE machine out there that makes this difficult).

I suspect it's feasible, but it'll take a lot of work.  Cooperative hardware
and compilers would help; you may also want to impose or use certain coding
conventions within shared library routines (use of pointers to variables passed
as arguments rather than global variables, for example).

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

Michael.Young%cmu-cs-g@sri-unix.UUCP (07/25/83)

In order to allow shared modules, you'd also probably need a
global variable dictionary for each shared module.  [Unless
you choose to put the shared modules at *fixed* places in
every user's address space, which is not inconceivable on a Vax, 
for example.]  When a shared module wants to access a global
variable (which cannot be shared), it must look up
it's address in this "dictionary" (merely because the address
of the global variable may be different in separate users'
address spaces).  Likewise for shared modules calling external
routines.  Thus, inter-module calls and externals cost more.

Assuming the Unix non-shared disk structure (that is, a given
disk block is in one file only), you'd have to meddle with the
loader to handle the incorporation of these modules.  [That is,
not only the linker, which would generate references to these
shared code files, but the kernel loader which would interpret them.]
A big problem here is that changing one of these modules' code
files probably breaks everything that requires it.

A much simpler, but also a *lot* less flexible, approach would be
to make system-wide fixed-location libary routines.  *All* of these
routines would be at a known location in *every* process's
address space; kernel tables for such stuff could be limited
to once (instead of once per process).  Adb/sdb would have to
be taught that when looking up addresses in the shared area to
look in their own address spaces rather than that of the child/core-file.
The linker would have to be changed to understand the new addresses
for these things, but that's not too tough.  Again, the kernel's
loader would be the hardest change; I'm not sure how I'd deal
with page tables, but it could be done.  You'd probably have to
build some mechanism for changing the shared modules (like,
adding some, or even changing some without rebooting (!)); requiring
that all entries into shared modules go through an indirect
dictionary (even if it's from a non-shared module) would
help in that regard.

A nice idea, and one whose time has come, but not for Unix systems
probably.  Capability systems, as well as better virtual memory
systems, stand a much better chance of pulling this off.

			Michael

james.umcp-cs%udel-relay@sri-unix.UUCP (07/25/83)

From:  James O'Toole <james.umcp-cs@udel-relay>

On PRIME machines running the PRIMOS operating system, sharing
of library routines is accomplished via a strange call-by-name.
The name of the routine is provided to the OS, it looks up the address, and
MODIFIES the calling instruction to directly this address.  I
don't like this, but it works pretty well.

mike.rice@rand-relay@sri-unix.UUCP (07/26/83)

From:  Mike.Caplinger <mike.rice@rand-relay>

Anybody ever heard of shareable images under VMS?  There's no
additional overhead once a process's address space is mapped in because
tha shared routines are just magically mapped in virtual memorywise.  I
think there were once plans to put such things in 4.2 BSD, but they
seem to have been abandoned.  They are nice in VMS, and can save lots
of disk space.  Also, if you change something you just reinstall the
sharable library - nothing need be relinked (if you're clever and use a
vector table, anyway.)  I don't like too many things about VMS, but
this seems to be a major win.

obrien@rand-unix@sri-unix.UUCP (07/26/83)

This message is empty.

ron%brl-bmd@sri-unix.UUCP (07/26/83)

From:      Ron Natalie <ron@brl-bmd>

Now don't think this is unique to VMS.  If you really want to get wierd,
have some one explain common banks in EXEC 8 to you.

-Ron

edhall%rand-unix@sri-unix.UUCP (07/26/83)

The idea precedes VMS; DEC's RSTS/E for the PDP-11 has had `Run-Time
Systems' for some time.  They essentially allow re-entrant code to
be accessed by any number of jobs at once.

Even such a beast as Perkin-Elmer's OS/32 has re-entrant libraries.

Any discussion as to why (or how) such a thing could/couldn't be added
to UNIX?

		-Ed

tbray@mprvaxa (07/26/83)

Shared system routines were a primary design objective of VMS, and several
different tools are provided for building and using them.  However, the 
goal was not easily achieved, and even now, 5 years later, there were some
pretty fundamental changes made to the linker (read loader) with VMS version
3, to eliminate some obscure contradictions that had been introduced.

People in the group are correct when they predict horrible problems arising
in making the loader smart enough to correctly all the permutations and
combinations this can introduce.  And with real memory getting so cheap, I
wonder if it's worth it. 

The way it's done at VMS run-time is that the shared stuff can appear anywhere
in the address space, with the corresponding entries in the page table 
containing a flag indicating a shared reference. The code is then found
via ANOTHER page table (called a global section table). This is referred to
as a 'global valid page fault'.  

             thinking that the new Amdahls support 64M phys memory (!!!!!!!),
					...microsoft!ubc-visi!mprvaxa!tbray

jhh@ihldt.UUCP (07/27/83)

If shared memory would have execution permission turned on,
no other kernel changes would need to be made to support
shared libraries on System V, everything else is there.
All that remains is to create special library interface
routines, and the shared memory manipulator.

barmar@mit-eddie.UUCP (07/27/83)

The discussion of shared libraries that occurred a while ago was 
mostly about whether library routines that just about everyone
uses should be moved into the kernal, since people didn't want
to have to deal with these issues.  It died, luckily.

BTW, shared libraries were implemented in Multics (a "pre-clone" of
Unix :-)) from day 1 (nearly twenty years ago).  We call it dynamic
linking, and I wouldn't want to live without it.  All it takes
is a bit in indirect pointers which causes a reference to fault;
the OS traps the linkage fault, unfaults the pointer to find the
symbolic name of the reference, finds the library routine, patches
the indirect pointer to reference it, and restart the instruction.
-- 
			Barry Margolin
			ARPA: barmar@MIT-Multics
			UUCP: ..!genrad!mit-eddie!barmar

mat@hou5e.UUCP (M Terribile) (07/27/83)

There is one reason for not putting a shared library system on an OS.
It is vary hard to do it in a way that is both general and right without
sacrificing machine cycles, IO bandwidth, or something else.

I have seen it done efficiently, but in a very specific way, to help
maintain speed of a UNI*X machine dedicated to a few special applications.
I have seen it done generally, on the HP 3000, with HARDWARE SUPPORT.  As
a result, loading the COBOL compiler can take up over 3 seconds of dedicated
disk usage (run-time linking).  It IS possible to get around it; the HP's
OS has a ``sticky-bit'' type of facility; but the problem affects EVERYTHING
that runs on the machine and the difference between a dog of a machine and
a smooth--running one lies almost entirely in the technical savvy of the
system administrators.  Not desireable!

Perhaps a middle ground could be found in a multi-kernelized system, with
an efficient ``sys call'' facility, but if you are talking about cheap
machines (managers DO buy DG machines, you know) it may take a while
to happen.

						Mark Terribile
						Duke of deNet

thomas@utah-gr.UUCP (Spencer W. Thomas) (07/27/83)

I'm surprised nobody has mentioned IBM yet.  The 360 architecture certainly
supported "shared libraries" quite well, but they called it "dynamic linking".
I wrote quite a few programs which used shared database libraries on IBM
360/370 equipment (IMS and CICS).

=Spencer

jbray@bbn-unix@sri-unix.UUCP (07/27/83)

From:  James Bray <jbray@bbn-unix>

What you are talking about here is a Run-Time Library. This is something
which the Gods would indeed smile upon, had they not in their imponderable
wisdom created Unix without shared segments or things of this sort. As Unix
grows into the more advanced hardware which it now finds itself on, these
should become available. We are told that system V, which I should have and
be upgrading our Unix to any day now, has some sort of shared-memory capability
between processes. I would be most interested if someone who has actually seen
the code could describe it, as shared memory for unix can be done either of
two ways: as a major architectural change involving work all over the place
in the kernel and breaking everything in the process -- the way it should be
done -- or as a bizarre and inelegant hack, sort of like using pipes and ports
as interprocess communication.
  In any case, what you want is something like, to hark back for the umpteenth
time to my last job, the way Perkin-Elmer's OS/32 (a big assembly-language
mess with a horrible user interface which makes it look like a bizarre form of
torture compared to unix, and all this neat real-time type stuff, which makes
unix look like a toy compared to it (I'll take unix any day, but would really
like both)) does it, which is named, shared, read-only segments in memory which
are loaded when the first process having need of their contents is loaded, and
which subseqeunt users merely link to. The way this all works is that one
builds this thing just like a library, putting all the right stuff in it, and
then one has one's loader scan this thing for needed library routines before
getting them in the usual way; the loader builds a link into the task image
that points to this segment, and pulls it in off the disk if not already
resident, at run-time. You can do all sorts of great things with shared
segments, but as one can imagine they rather complicate questions concerning
core management, especially swapping. But it is worth it; they save not only
disk space, but a lot of core as well.
  What you don't want to do is start putting the stuff in the kernel, unless
you do it via a very restrictive and well-defined system-service interface. It
is much nicer to have shared run-time libraries.

--Jim Bray

guy@rlgvax.UUCP (Guy Harris) (07/27/83)

The way Multics solved the "global references from shared libraries" problem
was to use a giant transfer vector called the "(common) linkage segment".
This was also necessary for the dynamic linking.  Any reference to an external
of any sort went through an indirect pointer in the common linkage segment.
This pointer was initially a special pointer which caused a trap, and it
pointed to a character string which was the name of the external.  When the
fault occured due to this pointer, the OS would find the segment referenced
by that pointer (using a search rule similar to PATH) and "initiate" it (i.e.,
map it into your address space).  It would copy the prototype of its common
linkage segment section into the system common linkage segment (which was also
the per-process static data segment, so this copy would also initialize static
variables), so any time any routine in that segment referenced an external the
same fault process would occur.  Then it would paste the address of the
given entry point in the given segment into the pointer in the common linkage
segment.  Unfortunately for this scheme under UNIX, existing compilers don't
produce code to reference externals through such a transfer vector (at least
not on the machines I'm familiar with; I've seen references to transfer vectors
on the 3B machines), so the Multics solution can't just be dropped into UNIX.

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

guy@rlgvax.UUCP (Guy Harris) (07/27/83)

P.S. In case anyone wants to pick nits, I should have said "COMBINED linkage
segment" when I said "COMMON linkage segment".  It's been over 8 years since
I've been around Multics.

guy@rlgvax.UUCP (Guy Harris) (07/27/83)

If the HP 3000 does shared library routines with full generality (i.e.,
dynamic linking), that probably accounts for most of the load.  RSX-11 does
not do dynamic linking; you must "link" in the references to the shared library
at program link time.  Of course, this means you can't just stick in a new
copy of the shared library whenever you change a routine and expect everybody's
programs to use the new version automatically (which is one of the side benefits
of system calls; just re-sysgen the OS and everybody making a system call gets
the new code).  So there are some tradeoffs available, depending on how
general or right you want to be.  I've not used RSX-11M style shared libraries
(i.e., you bind at program link time, and they are accessed mostly like
regular libraries), so I don't know how inconvenient the restrictions on such
are.  The fully general approach (i.e., bind at program execute time) does
impose the cost of a linker each time you run the program, but Multics
provided a binder which permitted you to bind references between the
modules given to the binder before program run time.

	Guy Harris

mark.umcp-cs@udel-relay@sri-unix.UUCP (07/29/83)

From:  Mark Weiser <mark.umcp-cs@udel-relay>

             thinking that the new Amdahls support 64M phys memory (!!!!!!!),
					...microsoft!ubc-visi!mprvaxa!tbray

And now even Vaxes support 32M phys...

hal@cornell.UUCP (Hal Perkins) (07/31/83)

Someone made a remark about the Univac 1100 EXEC 8 "common banks".
This was something of a kludge, but it did allow shared libraries
WITHOUT having to make changes to the existing compilers or loaders,
and old programs could take advantage of the shared library by just
relinking them.

It worked something like this.  The shared routines were kept in a
common area of virtual memory (that's the basic idea--the details
are much more grungy and you probably don't want to know them).
[On a VAX, this could be done by placing the shared routines in the
system 1/4th of the virtual memory and using common page tables
for all users.]  The shared routines were preceded by an address
vector, and all calls to shared routines went through this vector.
Thus, shared routines could be modified and moved around in memory
as long as the pointer was updated.  This avoided wiring absolute
entry point addresses into user programs, and meant that programs
always used the currently installed version of the routines.

The interesting thing is how this was made to work with old user
programs.  In the system libraries, the existing routines were
replaced by little stubs that had exactly the same calling
sequence as the old (non-shared) library routines.  These stubs
jumped to the appropriate shared routine to do the work.  The
stubs were linked with compiler object files to produce absolute
files just as before.

But once this change was made, the size of absolute files
shrunk by almost the entire size of the library routines, and
this was achieved without modifying any of the compilers or
loaders.  Eventually, some of the compilers were modified to
call the shared library directly (I believe), which eliminated
the small overhead of calling the shared library through the
stub routines.

I can't see why it wouldn't be possible to implement a similar
setup in Unix, at least on systems with large virtual memories.
The savings in disk space for linked files and the reduction in
individual program working sets might well be worth the small
cost in extra CPU time needed to call shared routines indirectly.

Any volunteers?  I am not a qualified wizard and don't have
any spare time to attempt something like this even if I knew
enough about the system to do it.


Hal Perkins                         UUCP:  {decvax|vax135|...}!cornell!hal
Cornell Computer Science            ARPA:  hal@cornell  BITNET: hal@crnlcs

guy@rlgvax.UUCP (08/01/83)

Just out of curiosity, how did the EXEC 8 system handle global references
from the shared library to data, rather than routines, outside of it?  Did
it use the Multics technique of using an indirect pointer to refer to any
external, whether code or data; did it just forbid such references; or did
it find a third way out?

	Guy Harris
	{seismo,mcnc,we13,brl-bmd,allegra}!rlgvax!guy

andrew@orca.UUCP (Andrew Klossner) (08/02/83)

	"The idea precedes VMS; DEC's RSTS/E for the PDP-11 has had
	`Run-Time Systems' for some time.  They essentially allow
	re-entrant code to be accessed by any number of jobs at once."

And the idea precedes RSTS.  TOPS-10, the PDP-10 operating system upon
whose architecture RSTS was based ("hey, let's write TOPS-10 in Basic
for a minicomputer!") implemented "Object Time Systems", which occupied
the upper half of the logical address space of a program written in
Fortran, Algol, or Cobol.

  -- Andrew Klossner   (decvax!teklabs!tekecs!andrew)  [UUCP]
                       (andrew.tektronix@rand-relay)   [ARPA]