[comp.unix.wizards] Big Programs Hurt Performance

daveb@geac.UUCP (Brown) (01/01/70)

In article <417@devvax.JPL.NASA.GOV> des@jplpro.JPL.NASA.GOV (David Smyth) writes:
| Those extra features of big programs (like especially window managers and
| I suppose any other system libraries) should be shared and therefore be
| LOW COST.
| 	* Why should all the tools be huge, when they are really using
| 	  the same code?
| 
| There still needs to be protection, so re-useable features don't have
| to be re-entrant (separate data spaces).
| 
| These re-useable "objects" seem to need to be "light weight processes"
| (basing this on the fact that the Xerox Viewpoint/XDE/Star systems
| are FAR more responsive than the Sun, even though the Sun 3 has about
| twice the processing power than the Xerox CPU).
| 
| Perhaps these things need HW, perhaps SW, more likely OS support.

  This problem was one dealt with in the design of Mutlicks...
although they didn't have lightweight processes, they did make common
code reside in public-library-like segments.  This required OS
support, and for reasonable performance used hardware hooks to allow
the linkages to be created/broken cheaply.  Not a bad idea, especially
coming from something as old a Eunich's pappy.
 --dave

-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

msf@amelia (Michael S. Fischbein) (01/01/70)

In article <417@devvax.JPL.NASA.GOV> des@jplpro.JPL.NASA.GOV (David Smyth) writes:
>In article <6920@eddie.MIT.EDU> jbs@eddie.MIT.EDU (Jeff Siegal) writes:
>(basing this on the fact that the Xerox Viewpoint/XDE/Star systems
>are FAR more responsive than the Sun, even though the Sun 3 has about
>twice the processing power than the Xerox CPU).

Sorry, but I just CAN'T let this go by.  I use both SUNs and Xerox Stars
at different places, and the Viewpoint is not nearly as responsive as the
SUNs, even though the particular installations I work on provided each
Xerox with a local disk and the SUNs are all diskless 3/50s.

Viewpoint document processing has lots of nice features, though it is missing
some that I think ought to be there; The WYSIWYG display is very good, but
speed in most definitely not a strong point.  In fact, text editing on our
installation is so slow as to be almost unusable.  It is definitely very
frustrating.

I can edit TeX `source,' run it through the formatter, and put the dvi file
up on the onscreen previewer in the same time it take Viewpoint to get the
file displayed and get out of read-only mode.  Paginating a Viewpoint
document takes longer than running TeX, which presumably has more to do.  

		mike

Michael Fischbein                 msf@prandtl.nas.nasa.gov
                                  ...!seismo!decuac!csmunix!icase!msf
These are my opinions and not necessarily official views of any
organization.

allbery@ncoast.UUCP (01/01/70)

As quoted from <14000@oddjob.UChicago.EDU> by matt@oddjob.UChicago.EDU (Ke Kupua):
+---------------
| In article <382@pcrat.UUCP> rick@pcrat.UUCP (Rick Richardson) writes:
| ) The half serious thoughts around here: port down the VAX's native C
| ) compiler to a <hot micro>.
| 
| This is not the first time I've seen the phrase "VAX's native C
| compiler" used here.  What would you mean by a vax C compiler that
| was not "native"?
+---------------

Easy:  a pcc-based compiler.  Pcc is wonderful for making a single compiler
skeleton work on many systems, but the price is that the code is often
sub-optimum.  A compiler specifically optimized for VAX code would naturally
produce faster code than something like pcc, which uses assembler constructs
which can easily be configured for a large class of host assemblers but may
not generate the most optimum code for a particular processor.  Perhaps a
good example would be (I don't know if the Vax's pcc optimizes this, since
I'm not a Vax hacker, but chances are pretty good that it doesn't):  in
order to assign two structures, the template used by pcc might be to call a
function to do a byte copy, but the Vax has an instruction to do the copy
directly, much faster than a subroutine even if the subroutine just invokes
the machine language instruction due to stack frame management and call/return
overhead.  Similar considerations apply to processors which have machine
language support for high-level control structures -- such as the 80386
"SETxx" instruction, which allows complex conditions to be computed linearly
(Pascal-style conditions, not C `&&' or `||' conditions).  Conditions computed
with SETxx on the 386 would be faster than pcc's default method using more
traditional means, because extra jump instructions in the generated machine
code could be avoided.

This kind of thing is the reason why e.g. Plexus is now compiling their
kernels using the Green Hills native 68020 compiler rather than pcc, as
they used previously.  And believe me, the difference is noticeable; I was
quoted a 20% speed increase in the kernel in the above instance.
-- 
	    Brandon S. Allbery, moderator of comp.sources.misc
  {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery
ARPA: necntc!ncoast!allbery@harvard.harvard.edu  Fido: 157/502  MCI: BALLBERY
   <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>>
"`You left off the thunderclap and the lightning flash.', I told him.
`Should I try again?'  `Never mind.'"     --Steven Brust, JHEREG

des@jplpro.JPL.NASA.GOV (David Smyth) (09/18/87)

In article <6920@eddie.MIT.EDU> jbs@eddie.MIT.EDU (Jeff Siegal) writes:
>In article <2473@xanth.UUCP) kent@xanth.UUCP (Kent Paul Dolan) writes:
>)In article <6886@eddie.MIT.EDU) jbs@eddie.MIT.EDU (Jeff Siegal) writes:
>))In article <8579@utzoo.UUCP) henry@utzoo.UUCP (Henry Spencer) writes:
>)))[...]graphing the size of the ls(1) command versus
>)))time is an interesting exercise[...]
>))
>))However, a more meaningful exercise would be to graph the cost of the
>))memory used by ls(1) versus time.  [...]
>)
>)[...]
>)We are rapidly headed toward being I/O bound simply due to program load
>)costs.
>)
>)For an example close to home, my Amiga is doing good if it can drag
>)programs off a hard disk at 30K bytes/second over an SCSI interface.
>)[...]
>
> ... (Most Sun's use Eagles, with rates of 1.8MB/sec >or higher).  

But on these demand paged Suns, we see such stupidity as:
-rwxr-xr-x  1 root       122880 Sep 15  1986 /bin/csh*
-rwxr-xr-x  1 david      737280 Aug 14 16:34 bin/shelltool*

where most of the huge increase in size of shelltool is in the window
management stuff.  Isn't this stuff shared across all sorts of things
in SunWindows?  If so, why is EVERY "tool" so VERY slow loading?

FOr example, try starting mailtool as an icon.  It should just load
what it needs, but no... even though it it is:
bin/shelltool:  mc68020 demand paged executable
it takes several seconds to load ... That seems an effective load rate
of about 250Kb (over that wonderful scuzzy disk, like most workstations
are sold with).

---------- CONCLUSION

Those extra features of big programs (like especially window managers and
I suppose any other system libraries) should be shared and therefore be
LOW COST.
	* Why should all the tools be huge, when they are really using
	  the same code?

There still needs to be protection, so re-useable features don't have
to be re-entrant (separate data spaces).

These re-useable "objects" seem to need to be "light weight processes"
(basing this on the fact that the Xerox Viewpoint/XDE/Star systems
are FAR more responsive than the Sun, even though the Sun 3 has about
twice the processing power than the Xerox CPU).

Perhaps these things need HW, perhaps SW, more likely OS support.

kyle@xanth.UUCP (09/20/87)

In <417@devvax.JPL.NASA.GOV>, des@jplpro.JPL.NASA.GOV (David Smyth) writes:
> But on these demand paged Suns, we see such stupidity as:
> -rwxr-xr-x  1 root       122880 Sep 15  1986 /bin/csh*
> -rwxr-xr-x  1 david      737280 Aug 14 16:34 bin/shelltool*
> 
> where most of the huge increase in size of shelltool is in the window
> management stuff.  Isn't this stuff shared across all sorts of things
> in SunWindows?  If so, why is EVERY "tool" so VERY slow loading?

From what I've read about the Macintosh (I don't own one), it appears
they did the right thing by putting the window management tools into
ROM.  I sure wish Sun would do something like that.  Having a 30 line
graphics program that only uses the line drawing facilities of pixrect
compile into a 200K executable is rather disconcerting.

kyle jones  <kyle@odu.edu>  old dominion university, norfolk, va  usa

hedrick@topaz.rutgers.edu.UUCP (09/20/87)

You don't really want the window system in ROM.  Window systems are
probably the least understood part of the system software.  What you
really want is shared libraries.  That way, only one copy of the code
is shared by all programs that use it, but you can change it.  Apollo
has had this for years.  Sun, like a number of other Unix vendors, is
implementing it now.  It will be part of SunOS 4.0.  In the meantime,
what Sun does is to link all of its tools together.  The commonly used
packages that run under Suntools are simply links to the same core
image.  It chooses which code to run based on the name by which it is
invoked.  Since all of these utilities are put together in one core
image, they share one copy of the libraries.  Since the thing is
demand paged, only the pieces that are actually used get paged in.
What a hack...  Shelltool is one of those, so that 737280 that you see
in fact includes 8 different programs.  (It looks like your system is
somehow missing the link.)

kyle@xanth.UUCP (Kyle Jones) (09/20/87)

In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
> You don't really want the window system in ROM.

You're right.  I typed "ROM" but I was thinking of protected RAM.
I certainly don't want a buggy window system burned into ROM for
posterity.

> What you really want is shared libraries.  That way, only one copy
> of the code is shared by all programs that use it, but you can
> change it.

This doesn't sound much different from the current scheme.  The
advantage of having the window system in protected RAM is that you
don't have gargantuan executables for small programs; calls to system
tools are simply linked to their known entry points in memory.

Please explain more about shared libraries.

mjr@osiris.UUCP (Marcus J. Ranum) (09/21/87)

In article <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) writes:
> You don't really want the window system in ROM.

	Actually, something that I find fascinating is the way that computer
systems seem to get loaded down by more users and huge windows programs at
just about the same rate that the hardware speeds up. IE - a system that used
to support 12 users with good response time is upgraded so it supports 12
users AND windows AND rwho AND 800 other kluges - with about the same
response time.
	Now, mind you, I'm not arguing that we should all go forward into
the past, but I'm starting to wonder if there'd be an advantage to running
something lean and mean and REALLY getting response time. Like, say,
Version 7 on a Sun 4.

--mjr();
-- 
If they think you're crude, go technical; if they think you're technical,
go crude. I'm a very technical boy. So I get as crude as possible. These
days, though, you have to be pretty technical before you can even aspire
to crudeness...			         -Johnny Mnemonic

shap@sfsup.UUCP (J.S.Shapiro) (09/21/87)

In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
> In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
> 
> > What you really want is shared libraries.  That way, only one copy
> > of the code is shared by all programs that use it, but you can
> > change it.
> 
> Please explain more about shared libraries.

Okay, here goes. I have stayed out of this, but shared libraries I can talk
about intelligently. Basically a shared library is a piece of code which
is "shared" between two programs. A portion of the address space is
reserved in advance by *everyone* for each shared library (that is, the
shared library has a permanent reserved location in the virtual address
space). Then, whoever needs the functionality in the shared library simply
compiles as usual, linking in the shared version of the library instead of
the normal version. As a result, a marker is put in the binary indicating
which (if any) shared libraries need to be hauled in. If the marker is
there, exec() arranges for the shared library to get mapped into your
address space.

This is preferable to the NVRAM scheme because if you *don't* use the
feature you don't have to pay for RAM for it.

In order to provide simple upgrades, shared libraries often work through
a jump table (even though they don't strictly need to). The jump table size
is fixed, and usually larger than the number of externally visible
functions to allow for backwards compatible expansion. A consequence of the
jump table is that if you need to fix a broken function, you can just fix
it. So long as the jump table doesn't move the library is compatible with
the old one *without* any recompilation. In some implementations (depends
on your hardware), the jump table points to a stub routine which
backpatches the "real" address of your function into your code. This has
the advantage that you only incur the shared library overhead once per
function, but the disadvantage that you can no longer page in those pages
from the executable - they now have to go to the paging area.

Only one copy of the shared library text is kept in core for all users. It
is simply mapped into all of the appropriate virtual address spaces. On
System V, most of the system applications are compiled against it.
Read-only data can be shared too, but read-write data pages need to be
marked copy-on-write, or if your memory manager doesn't support that, need
to be duplicated. This is okay, because the data section is usually small.
The static shared library uses the process heap.

It is worth noting that a better scheme, though much more difficult, is to
generate "position independent code" so that you do not have to reserve the
address of the library. You can then use exactly the same tricks, but
generate your function calls via a jump table in the executable's data
space. When you 'attach' the shared library, you copy the table out of the
library into your own, adjusting the entries so that the function addresses
are correct. This scheme has the advantage that you don't have to reserve
large chunks of your address space. Unfortunately, position independent
code is quite difficult to do, which is why current UNIX compilers (to my
knowledge) don't do it. This scheme is referred to as "dynamic loading."

I am hardly an expert on any of this, and I may have gotten the details
wrong. I do know that shared libraries are provided in System V, though I
don't know which CCS release they started in. AT&T compilers don't generate
position independent code. 

Side observation: If your binary is 500K, shared libraries don't help much.
They just don't represent a significant portion of your code. If your
binary is really that big, you probably have a lot of rethinking to do, and
ultimately this rethought will be reflected in better performance, greater
flexibility, and lower maintainance cost. Depends, of course, on your
application, but try running size(1) on /unix (or /vmunix) and take a look
at the text size.

*** Disclaimer ***

The above is a personal exegesis, and should not be taken as representing
the views of AT&T.


Jonathan Shapiro
AT&T Information Systems

rick@pcrat.UUCP (rick) (09/22/87)

In article <1387@osiris.UUCP>, mjr@osiris.UUCP (Marcus J. Ranum) writes:
> 	Now, mind you, I'm not arguing that we should all go forward into
> the past, but I'm starting to wonder if there'd be an advantage to running
> something lean and mean and REALLY getting response time. Like, say,
> Version 7 on a Sun 4.

The half serious thoughts around here: port down the VAX's native C
compiler to a <hot micro>.  Then cross compile and UPLOAD back to the
VAX.



-- 
	Rick Richardson, President, PC Research, Inc.
(201) 542-3734 (voice, nights)   OR   (201) 834-1378 (voice, days)
		seismo!uunet!pcrat!rick

labo@apollo.uucp (Dale Labossiere) (09/22/87)

In article <2067@sfsup.UUCP> shap@sfsup.UUCP (J.S.Shapiro) writes:
>In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
>> In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
>> 
>> Please explain more about shared libraries.
>
>                           ... A portion of the address space is
>reserved in advance by *everyone* for each shared library (that is, the
>shared library has a permanent reserved location in the virtual address
>space).
>
>   ...
>
>It is worth noting that a better scheme, though much more difficult, is to
>generate "position independent code" so that you do not have to reserve the
>address of the library. You can then use exactly the same tricks, but
>generate your function calls via a jump table in the executable's data
>space. When you 'attach' the shared library, you copy the table out of the
>library into your own, adjusting the entries so that the function addresses
>are correct. This scheme has the advantage that you don't have to reserve
>large chunks of your address space. Unfortunately, position independent
>code is quite difficult to do, which is why current UNIX compilers (to my
>knowledge) don't do it. This scheme is referred to as "dynamic loading."
>
> ...
>Jonathan Shapiro

This "dynamic loading" scheme is in fact what Apollo systems use.  The compilers
generate position independent code, and the libraries can be mapped anywhere
in the process's address space.

Rather than have a monolithic jump table for a shared library, shared libraries 
"register" their global addresses in a "Known Global Table" (KGT). Programs
which invoke shared library functions do so via an indirect address linkage
variable in their data space (the slight difference being that there are only
linkage variables for those shared functions that are invoked, not a complete
table for all of the libraries functions). 

Object files are tagged and when executed, the system loader (not UNIX ld(1)) 
resolves the undefined global's addresses using the KGT and patches up the 
program's linkage variables.
-- 
Dale LaBossiere              (617) 256-6600 x4292
Apollo Computer
330 Billerica Rd.            UUCP: {mit-erl,yale,uw-beaver,decvax}!apollo!labo
Chelmsford Ma. 01824         ARPA: apollo!labo@eddie.mit.edu

batson@cg-atla.UUCP (Jay Batson X5927) (09/23/87)

In article <2067@sfsup.UUCP> shap@sfsup.UUCP (J.S.Shapiro) writes:
>In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
>> In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
>> 
>> > What you really want is shared libraries....
>> 
>> Please explain more about shared libraries.
>
>Side observation: If your binary is 500K, shared libraries don't help much.
>They just don't represent a significant portion of your code. If your
>binary is really that big, you probably have a lot of rethinking to do....

Well, don't forget about shared libraries used by, say, suntools/windows/....
These kinds of "tookit" things can quickly grow things over a meg....

-------------------------------------------------------------------------------
Any opinions I have are most certainly those generated by the infinite
improbability generator.  To get better ones, you have to provide me with
a hotter cup of tea.  Just don't tell the nutrimatic drink dispenser....

Jay
decvax!cg-atla!batson

matt@oddjob.UChicago.EDU (Ke Kupua) (09/23/87)

In article <382@pcrat.UUCP> rick@pcrat.UUCP (Rick Richardson) writes:

) The half serious thoughts around here: port down the VAX's native C
) compiler to a <hot micro>.

This is not the first time I've seen the phrase "VAX's native C
compiler" used here.  What would you mean by a vax C compiler that
was not "native"?

I would assume that this person meant "VMS C compiler" and was
suffering from the "VAX == VMS" delusion, but in unix-wizards???
________________________________________________________
Matt	     University		matt@oddjob.uchicago.edu
Crawford     of Chicago     {astrovax,ihnp4}!oddjob!matt

daveb@llama.UUCP (09/24/87)

In article <2498@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
]In <417@devvax.JPL.NASA.GOV>, des@jplpro.JPL.NASA.GOV (David Smyth) writes:
]] But on these demand paged Suns, we see such stupidity as:
]] -rwxr-xr-x  1 root       122880 Sep 15  1986 /bin/csh*
]] -rwxr-xr-x  1 david      737280 Aug 14 16:34 bin/shelltool*
]] 
]] where most of the huge increase in size of shelltool is in the window
]] management stuff.  Isn't this stuff shared across all sorts of things
]] in SunWindows?  If so, why is EVERY "tool" so VERY slow loading?
]
]From what I've read about the Macintosh (I don't own one), it appears
]they did the right thing by putting the window management tools into
]ROM.  I sure wish Sun would do something like that...

For Sun, this will be fixed in the forthcoming SunOS 4.0 release with
their shared library support.  SVr3 also has this feature.  It is a real
win with almost no drawbacks when done with OS support.  Jeez Loise,
even _VMS_ has had them for years.  The only complication is version
control, and it's not _that_ hard to do.

It's best when users can define and install their own libraries too.

-dB
"If it was easy, we'd hire people cheaper than you to do it"
{amdahl, cbosgd, mtxinu, ptsfa, sun}!rtech!daveb daveb@rtech.uucp

tim@ism780c.UUCP (09/25/87)

In article <382@pcrat.UUCP> rick@pcrat.UUCP (Rick Richardson) writes:
< The half serious thoughts around here: port down the VAX's native C
< compiler to a <hot micro>.

I don't know if they still sell them, but Avalon once had boards
that you plugged into your VAX to do this sort of thing.

The board contained a NSC 32whatever.  When you issued a cc command,
the source file was sent to the board, which used the Greenhills C
compiler to cross compile for the VAX. 

I believe that they also had troff available for the boards.
-- 
Tim Smith, Knowledgian		{sdcrdcf,uunet}!ism780c!tim
				tim@ism780c.isc.com
"Oh I wish I were Matthew Wiener, That is who I truly want to be,
 'Cause if I were Matthew Wiener, Tim Maroney would send flames to me"

peter@sugar.UUCP (Peter da Silva) (09/25/87)

In article <2498@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
> In <417@devvax.JPL.NASA.GOV>, des@jplpro.JPL.NASA.GOV (David Smyth) writes:
> > management stuff.  Isn't this stuff shared across all sorts of things
> > in SunWindows?  If so, why is EVERY "tool" so VERY slow loading?
> From what I've read about the Macintosh (I don't own one), it appears
> they did the right thing by putting the window management tools into
> ROM.  I sure wish Sun would do something like that.

Shared libraries don't have to be in ROM. On the Amiga they do rather well
being loaded when needed and flushed when memory gets low. The semantics
of shared libraries under UNIX are difficult to see. Does anyone have any
ideas about hwo you can share stuff like that? Making windows first class
virtual devices and sending them escape sequences sounds like a win. I know
X-windows does that, but it's huge. Extended ANSI codes or maybe even
tektronics graphics codes would probably cut it. Amiga Intuition provides
such an interface for text applications.
-- 
-- Peter da Silva `-_-' ...!hoptoad!academ!uhnix1!sugar!peter
--                 'U`  Have you hugged your wolf today?
-- Disclaimer: These aren't mere opinions... these are *values*.

guy%gorodish@Sun.COM (Guy Harris) (09/25/87)

> The semantics of shared libraries under UNIX are difficult to see.

Why?  There have been several implementations on various UNIX systems.

> Does anyone have any ideas about hwo you can share stuff like that?

Yes.  See:

	Chapter 8, "Shared Libraries", in the S5R3 Programmer's Guide;

	(Here it comes again, for those of you who haven't been paying
	attention) "Shared Libraries in SunOS", in the proceedings of the
	Summer 1987 USENIX in Phoenix.

There have been other implementations as well.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

jg@jumbo.dec.com (Jim Gettys) (09/25/87)

In article <818@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>Shared libraries don't have to be in ROM. On the Amiga they do rather well
>being loaded when needed and flushed when memory gets low. The semantics
>of shared libraries under UNIX are difficult to see. Does anyone have any
>ideas about hwo you can share stuff like that? Making windows first class
>virtual devices and sending them escape sequences sounds like a win. I know
>X-windows does that, but it's huge. Extended ANSI codes or maybe even
>tektronics graphics codes would probably cut it. Amiga Intuition provides
>such an interface for text applications.

The X window system does NOT make a window a virtual device, or use
escape sequences for communications.

X is a network server process, and clients (applications) open a single
connection (essentially any stream protocol will do) to the X server.  
Over that connection, the X protocol is spoken, which is a binary
special purpose protocol, and not a escape sequence.  Over a single
connection, you can manipulate as many windows as you want.

X applications are quite small, since all of the basic windowing and
display code is encapsulated in the server.  Applications range
from 45k to 210k (full mail user interface).  The V10 server has
been ported to everything from an IBM PC/AT, on up...  These sizes
are much smaller than many other window systems.
				Jim Gettys

allbery@ncoast.UUCP (09/26/87)

As quoted from <2501@xanth.UUCP> by kyle@xanth.UUCP (Kyle Jones):
+---------------
| In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
| > What you really want is shared libraries.  That way, only one copy
| > of the code is shared by all programs that use it, but you can
| > change it.
| 
| This doesn't sound much different from the current scheme.  The
| advantage of having the window system in protected RAM is that you
| don't have gargantuan executables for small programs; calls to system
| tools are simply linked to their known entry points in memory.
+---------------

And what you describe is exactly what a shared library is.  As an example:
the 3B1 has libc, libtam (curses using the window devices), libm, and a few
other libraries in its shared library.

The first program run which uses the shared library allocates a shared memory
segment at process address 0x300000 and copies /lib/shlib into it.  The
program is linked with a special crt0.o which does the shm attach and load
(if necessary), and with a loader "ifile" (instruction file, a System V
feature which gives fine control over an object file) which defines the
addresses of routines in /lib/shlib with origin 0x300000, read-only data
also within the shlib, and read-write data which is allocated at a fixed
address just below the shlib.  The result?  Large programs get to use all of
the routines in the shared library without having to have it compiled in (it
is 126K on my 3B1) and without having to load it all the time (just once,
actually done during the boot sequence).  The shlib is loaded into a memory
segment at some random location in memory, but is attached as a shared
memory segment at a fixed address within each process using it.

The bad thing about this is that a change to the shlib requires that any
programs using it be recompiled, unless the changes are added to the end
of the shlib... but this is true of any other method as well.  And it could
be remedied easily by making the first part of the shlib be a jump table to
the actual code (i.e. instructions "jmp <addr>") for each entry point.  (It
isn't set up that way on mine, alas.)  And even so it's more mutable than
ROM, all it takes is a reboot.  In effect, it *is* the protected RAM scheme
you speak of.
-- 
	    Brandon S. Allbery, moderator of comp.sources.misc
  {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery
ARPA: necntc!ncoast!allbery@harvard.harvard.edu  Fido: 157/502  MCI: BALLBERY
   <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>>
			"Mummy, what's an opinion?"

stachour@umn-cs.UUCP (09/29/87)

In article <2067@sfsup.UUCP>, shap@sfsup.UUCP (J.S.Shapiro) writes:
> In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
> > In <14888@topaz.rutgers.edu>,
> >       hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
> > 
> > > What you really want is shared libraries.  That way, only one copy
> > > of the code is shared by all programs that use it, but you can
> > > change it.
> > 
> > Please explain more about shared libraries.
> 
> Okay, here goes. I have stayed out of this, but shared libraries I can talk
> about intelligently. Basically a shared library is a piece of code which
> is "shared" between two programs. A portion of the address space is
> reserved in advance by *everyone* for each shared library (that is, the
> shared library has a permanent reserved location in the virtual address
> space). Then, whoever needs the functionality in the shared library simply
> compiles as usual, linking in the shared version of the library instead of
> the normal version. As a result, a marker is put in the binary indicating
> which (if any) shared libraries need to be hauled in. If the marker is
> there, exec() arranges for the shared library to get mapped into your
> address space.
> 
No, requiring reserved address space is needed only in silly machines
or using silly operating systems. Sharing should be by name, such as
mail_system_$get_mail_message, and not by some pre-bound address set.

> ... (stuff deleted) ...        In some implementations (depends
> on your hardware), the jump table points to a stub routine which
> backpatches the "real" address of your function into your code. This has
> the advantage that you only incur the shared library overhead once per
> function, but the disadvantage that you can no longer page in those pages
> from the executable - they now have to go to the paging area.

No, it should go indirectly though a linkage-area specific to your process.
Your code, and all code, should remain read-only, and shared amoung
all processes that use it.
> 
> Only one copy of the shared library text is kept in core for all users. It
> is simply mapped into all of the appropriate virtual address spaces. 

Yes, only one copy, even if sometimes the shared library is running
with different priviledges.  You should note that most hardware
architectures force addressing schemes that mean that one cannot
write shared code, since it cannot run in multiple modes.  Even when
the hardware is OK, often the operating system (like IBM's now
supperceded OS/MVT) memory-management mechanism precludes it.

> ... (more deleted)           Unfortunately, position independent
> code is quite difficult to do, which is why current UNIX compilers (to my
> knowledge) don't do it. This scheme is referred to as "dynamic loading."

No, position-independent code is quite easy to do.  It's been done by
the GE Multics EPL and PL/I compilers for around 20 years. [For historians,
I personally consider 'C' as a cross between untyped 'B' and a subset
of the EPL subset of PL/I.]   By the way, what one really wants/needs
is dynamic-linking, not dynamic-loading.

>  ... (more deleted)
> Side observation: If your binary is 500K, shared libraries don't help much.
> They just don't represent a significant portion of your code. If your
> binary is really that big, you probably have a lot of rethinking to do, and
> ultimately this rethought will be reflected in better performance, greater
> flexibility, and lower maintainance cost.

But your own code should be automatically shared as well, and others
should be able to use the "object-managers" that you have written
without having to put those managers into their own code.

For those wishing to 'really' understand shared code, I recommend
the book by EI Organick on the Design of the Multics System.
It tells how sharing (through real dynaic linking) 
is done on a system that was designed from the beginning for shared,
reliable (utility-grade, as good as the telephones or power company)
computing, and which was designed to make it easy to build good software
(not an explicit goal of hardly any other system).
It's an 'ancient' book, but still more complete 
on the subject than any other I know.

Spoiler-Warning:  If you don't know much about hardware instruction-set
architectures, and/or programmning language run-time needs,
you may not be able to understand this book.


Paul Stachour
Honeywell SCTC:  Stachour@HI-Multics.ARPA
Univ of Minn:    stachour at umn-cs.edu

guy%gorodish@Sun.COM (Guy Harris) (09/30/87)

> in order to assign two structures, the template used by pcc might be to
> call a function to do a byte copy, but the Vax has an instruction to do the
> copy...

...which code generated by PCC uses.  Not a good example.

> This kind of thing is the reason why e.g. Plexus is now compiling their
> kernels using the Green Hills native 68020 compiler rather than pcc,

Is not the Green Hills compiler an optimizing compiler (i.e., offering more
optimization than the standard VAX UNIX peephole optimizer), and is it not also
available for other machines (e.g., the NS32K series)?  I don't know that
non-retargetable compilers are necessarily that much better than retargetable
ones, these days.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

jcreight@oracle.UUCP (10/02/87)

In article <4754@ncoast.UUCP>, allbery@ncoast.UUCP (Brandon Allbery) writes:
> As quoted from <2501@xanth.UUCP> by kyle@xanth.UUCP (Kyle Jones):
> +---------------
> | In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
> | > What you really want is shared libraries.  That way, only one copy
> | > of the code is shared by all programs that use it, but you can
> | > change it.
> | 
> | This doesn't sound much different from the current scheme.  The
> | advantage of having the window system in protected RAM is that you
> | don't have gargantuan executables for small programs; calls to system
> | tools are simply linked to their known entry points in memory.
> +---------------
> 
> The first program run which uses the shared library allocates a shared memory
> segment at process address 0x300000 and copies /lib/shlib into it.  The
> program is linked with a special crt0.o which does the shm attach and load
> (if necessary), and with a loader "ifile" (instruction file, a System V
> feature which gives fine control over an object file) which defines the
> addresses of routines in /lib/shlib with origin 0x300000, read-only data
> also within the shlib, and read-write data which is allocated at a fixed
> address just below the shlib.  

Does this mean that each process that is running using shared libraries
consumes one shared memory segment?  There is a (rather low) kernel limit
on the number of shared memory segments that may be attatched systemwide,
as I recall.  What happens when you run a shared-library program and
there aren't any shared memory segments left?
					JC
					hplabs!oracle!jcreight

peter@sugar.UUCP (10/03/87)

Apologies. I had gotten the impression that X was a sort of super-escape-
sequence protocol. Probably because I styill haven't been able to get hold
of any sort of description of the system unless I want to fork out a hundred
bucks for the full tape.

I have also been told that X was huge... as in, an Atari 1040ST with a MEG of
standard RAM wouldn't support it. How big's the server, really?
-- 
-- Peter da Silva  `-_-'  ...!hoptoad!academ!uhnix1!sugar!peter
-- Disclaimer: These U aren't mere opinions... these are *values*.

scc@cl.cam.ac.uk (Stephen Crawley) (10/04/87)

All this talk about huge programs being bad and shared libraries being
good, and Jim Gettys mention of X servers and X applications prompted 
me to take a look at our X10 server for XDE (i.e. for a D machine)

The size of XServerConfig.bcd file (the load image) is 27,648 bytes.  

Admittedly, it is not quite a complete implementation of X10.4 yet.  
On the other hand, it includes the code for a pinger tool to start 
an xterm on a remote machine.

So why is it so small?  Mainly I think it is because it uses the XDE 
runtime library for almost everything.  The only significant exception
is in the code for painting the bitmap, where the implementor found
that X's model of bit painting didn't map efficiently onto the XDE
library routines.

A possibly relevant statistic: the source code for the complete server
is 7708 lines of Mesa code.

-- Steve

"My other machine's a Dorado" ... wishful thinking!

jg@jumbo.dec.com (Jim Gettys) (10/05/87)

In article <856@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>Apologies. I had gotten the impression that X was a sort of super-escape-
>sequence protocol. Probably because I styill haven't been able to get hold
>of any sort of description of the system unless I want to fork out a hundred
>bucks for the full tape.
There is a paper in Transactions on Graphics which came out a couple
months ago about X.  You might look there.

>I have also been told that X was huge... as in, an Atari 1040ST with a MEG of
>standard RAM wouldn't support it. How big's the server, really?

Here are the server sizes under Version 11, on a Vax.
text    data    bss     dec     hex
335872  32768   21268   389908  5f314   Xqdss  (server for GPX color display)
297984  20480   17132   335596  51eec   Xqvss  (server for straight bitmap)

The server on my GPX's size, after running for some days, is 778k.
Note that there are a number of ways to make server dynamic data sizes
much smaller; the server as distributed is optimized toward speed
and simplicity.  At the cost of a bit of work, the size of data used
dynamically can be shrunk alot.

Neither of these implementations has been optimized yet,
and are mostly based on a complete set of machine independent graphics
code which comes with the distribution.  Much of this code may be
unneeded on machines which have graphics code in ROM, or in hardware
if the hardware or ROM can perform the correct operations.  I expect
many production servers to be smaller than above.
				Jim Gettys

raveling@vaxa.isi.edu (Paul Raveling) (10/05/87)

In article <856@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>
>I have also been told that X was huge... as in, an Atari 1040ST with a MEG of
>standard RAM wouldn't support it. How big's the server, really?



Here are some snapshots of X10.4 display server memory sizes on
an HP 9000/350.  These are as reported by a monitor program,
except for translating from pages to bytes and reformatting to
make the results more readable.

Note that this is only display server memory use -- doesn't include
clients.



X10.4 Topcat display server, serving typical client set:

	Text	245,760		Data	 114,688	Stack	12,288

	Current resident set size:        118,784
	Maximum resident set size:        364,554


X10.4 Renaissance display server, minimal client load:

	Text	270,336		Data	 225,280	Stack	12,288

	Current resident set size:        233,472
	Maximum resident set size:        503,808


X10.4 Renaissance display server, large pixmaps stored in server:

	Text	270,336		Data	5,177,344	Stack	12,288

	Current resident set size:      5,185,536
	Maximum resident set size:      5,455,872

---------------------
Paul Raveling
Raveling@vaxa.isi.edu

jim@ci-dandelion.UUCP (Jim Fulton) (10/07/87)

In article <856@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>
>I have also been told that X was huge... as in, an Atari 1040ST with a MEG of
>standard RAM wouldn't support it. How big's the server, really?

The first non-Unix port of The X Window System server was, in fact, to an IBM
PC/AT running MS-DOS.  Looking at the development sources I see

        C> dir pcx.exe
        
         Volume in drive C is PCSOURCES
         Directory of  C:\SRC\X\MAIN
        
        PCX      EXE   241696   9-29-87   8:11a


The linker gives me the following sizes for this image:

	code		data		Udata
	207964		20402		19418


PCX loads all of its device drivers (ethernet, high-res smart color graphics
card, mouse/tablet/etc.) when it starts up, so there's no code hidden in DOS.
If you have an expanded memory card it will use that for font bitmaps,
otherwise it uses system memory.  There is also a TFTP server in there so that
you can transfer files to and from the PC while running X. 

The version that we ship with our Mechanical Computer Aided Engineering product
also has bound into it extension packages for doing sophisticated dynamic
feedback, geometric animation, and menu display.  Our User Interface Management
System uses roughly 2 dozen fonts and exercises even our high-end servers.

X.V10 runs just fine on small machines.

                                                           Jim Fulton
                                                           Cognition Inc.
                                                           900 Tech Park Drive
uucp:    ...!{mit-eddie,talcott,necntc}!ci-dandelion!jim   Billerica, MA  01821
domain:    jim@ci-dandelion.ci.com, fulton@eddie.mit.edu   (617) 667-4800

allbery@ncoast.UUCP (Brandon Allbery) (10/08/87)

As quoted from <339@oracle.UUCP> by jcreight@oracle.UUCP (Jonathan Creighton):
+---------------
| > The first program run which uses the shared library allocates a shared memory
| > segment at process address 0x300000 and copies /lib/shlib into it.  The
| > program is linked with a special crt0.o which does the shm attach and load
| > (if necessary), and with a loader "ifile" (instruction file, a System V
| 
| Does this mean that each process that is running using shared libraries
| consumes one shared memory segment?  There is a (rather low) kernel limit
| on the number of shared memory segments that may be attatched systemwide,
| as I recall.  What happens when you run a shared-library program and
| there aren't any shared memory segments left?
+---------------

Each shared library takes up a shm segment.  An attach simply increments
the shm_nattch counter and places a pointer to the shm segment in the
attacher's page table.  (...roughly.  It's _always_ more complex...)

Is there a limit on the number of attaches permitted?  I don't recall one.
-- 
	    Brandon S. Allbery, moderator of comp.sources.misc
  {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery
ARPA: necntc!ncoast!allbery@harvard.harvard.edu  Fido: 157/502  MCI: BALLBERY
   <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>>
	 "...he calls _that_ a `little adventure'?!"  - Cmdr. Ryker