[comp.arch] Big Programs Hurt Performance

daveb@geac.UUCP (Brown) (01/01/70)

In article <417@devvax.JPL.NASA.GOV> des@jplpro.JPL.NASA.GOV (David Smyth) writes:
| Those extra features of big programs (like especially window managers and
| I suppose any other system libraries) should be shared and therefore be
| LOW COST.
| 	* Why should all the tools be huge, when they are really using
| 	  the same code?
| 
| There still needs to be protection, so re-useable features don't have
| to be re-entrant (separate data spaces).
| 
| These re-useable "objects" seem to need to be "light weight processes"
| (basing this on the fact that the Xerox Viewpoint/XDE/Star systems
| are FAR more responsive than the Sun, even though the Sun 3 has about
| twice the processing power than the Xerox CPU).
| 
| Perhaps these things need HW, perhaps SW, more likely OS support.

  This problem was one dealt with in the design of Mutlicks...
although they didn't have lightweight processes, they did make common
code reside in public-library-like segments.  This required OS
support, and for reasonable performance used hardware hooks to allow
the linkages to be created/broken cheaply.  Not a bad idea, especially
coming from something as old a Eunich's pappy.
 --dave

-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

msf@amelia (Michael S. Fischbein) (01/01/70)

In article <417@devvax.JPL.NASA.GOV> des@jplpro.JPL.NASA.GOV (David Smyth) writes:
>In article <6920@eddie.MIT.EDU> jbs@eddie.MIT.EDU (Jeff Siegal) writes:
>(basing this on the fact that the Xerox Viewpoint/XDE/Star systems
>are FAR more responsive than the Sun, even though the Sun 3 has about
>twice the processing power than the Xerox CPU).

Sorry, but I just CAN'T let this go by.  I use both SUNs and Xerox Stars
at different places, and the Viewpoint is not nearly as responsive as the
SUNs, even though the particular installations I work on provided each
Xerox with a local disk and the SUNs are all diskless 3/50s.

Viewpoint document processing has lots of nice features, though it is missing
some that I think ought to be there; The WYSIWYG display is very good, but
speed in most definitely not a strong point.  In fact, text editing on our
installation is so slow as to be almost unusable.  It is definitely very
frustrating.

I can edit TeX `source,' run it through the formatter, and put the dvi file
up on the onscreen previewer in the same time it take Viewpoint to get the
file displayed and get out of read-only mode.  Paginating a Viewpoint
document takes longer than running TeX, which presumably has more to do.  

		mike

Michael Fischbein                 msf@prandtl.nas.nasa.gov
                                  ...!seismo!decuac!csmunix!icase!msf
These are my opinions and not necessarily official views of any
organization.

des@jplpro.JPL.NASA.GOV (David Smyth) (09/18/87)

In article <6920@eddie.MIT.EDU> jbs@eddie.MIT.EDU (Jeff Siegal) writes:
>In article <2473@xanth.UUCP) kent@xanth.UUCP (Kent Paul Dolan) writes:
>)In article <6886@eddie.MIT.EDU) jbs@eddie.MIT.EDU (Jeff Siegal) writes:
>))In article <8579@utzoo.UUCP) henry@utzoo.UUCP (Henry Spencer) writes:
>)))[...]graphing the size of the ls(1) command versus
>)))time is an interesting exercise[...]
>))
>))However, a more meaningful exercise would be to graph the cost of the
>))memory used by ls(1) versus time.  [...]
>)
>)[...]
>)We are rapidly headed toward being I/O bound simply due to program load
>)costs.
>)
>)For an example close to home, my Amiga is doing good if it can drag
>)programs off a hard disk at 30K bytes/second over an SCSI interface.
>)[...]
>
> ... (Most Sun's use Eagles, with rates of 1.8MB/sec >or higher).  

But on these demand paged Suns, we see such stupidity as:
-rwxr-xr-x  1 root       122880 Sep 15  1986 /bin/csh*
-rwxr-xr-x  1 david      737280 Aug 14 16:34 bin/shelltool*

where most of the huge increase in size of shelltool is in the window
management stuff.  Isn't this stuff shared across all sorts of things
in SunWindows?  If so, why is EVERY "tool" so VERY slow loading?

FOr example, try starting mailtool as an icon.  It should just load
what it needs, but no... even though it it is:
bin/shelltool:  mc68020 demand paged executable
it takes several seconds to load ... That seems an effective load rate
of about 250Kb (over that wonderful scuzzy disk, like most workstations
are sold with).

---------- CONCLUSION

Those extra features of big programs (like especially window managers and
I suppose any other system libraries) should be shared and therefore be
LOW COST.
	* Why should all the tools be huge, when they are really using
	  the same code?

There still needs to be protection, so re-useable features don't have
to be re-entrant (separate data spaces).

These re-useable "objects" seem to need to be "light weight processes"
(basing this on the fact that the Xerox Viewpoint/XDE/Star systems
are FAR more responsive than the Sun, even though the Sun 3 has about
twice the processing power than the Xerox CPU).

Perhaps these things need HW, perhaps SW, more likely OS support.

kyle@xanth.UUCP (09/20/87)

In <417@devvax.JPL.NASA.GOV>, des@jplpro.JPL.NASA.GOV (David Smyth) writes:
> But on these demand paged Suns, we see such stupidity as:
> -rwxr-xr-x  1 root       122880 Sep 15  1986 /bin/csh*
> -rwxr-xr-x  1 david      737280 Aug 14 16:34 bin/shelltool*
> 
> where most of the huge increase in size of shelltool is in the window
> management stuff.  Isn't this stuff shared across all sorts of things
> in SunWindows?  If so, why is EVERY "tool" so VERY slow loading?

From what I've read about the Macintosh (I don't own one), it appears
they did the right thing by putting the window management tools into
ROM.  I sure wish Sun would do something like that.  Having a 30 line
graphics program that only uses the line drawing facilities of pixrect
compile into a 200K executable is rather disconcerting.

kyle jones  <kyle@odu.edu>  old dominion university, norfolk, va  usa

hedrick@topaz.rutgers.edu.UUCP (09/20/87)

You don't really want the window system in ROM.  Window systems are
probably the least understood part of the system software.  What you
really want is shared libraries.  That way, only one copy of the code
is shared by all programs that use it, but you can change it.  Apollo
has had this for years.  Sun, like a number of other Unix vendors, is
implementing it now.  It will be part of SunOS 4.0.  In the meantime,
what Sun does is to link all of its tools together.  The commonly used
packages that run under Suntools are simply links to the same core
image.  It chooses which code to run based on the name by which it is
invoked.  Since all of these utilities are put together in one core
image, they share one copy of the libraries.  Since the thing is
demand paged, only the pieces that are actually used get paged in.
What a hack...  Shelltool is one of those, so that 737280 that you see
in fact includes 8 different programs.  (It looks like your system is
somehow missing the link.)

kyle@xanth.UUCP (Kyle Jones) (09/20/87)

In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
> You don't really want the window system in ROM.

You're right.  I typed "ROM" but I was thinking of protected RAM.
I certainly don't want a buggy window system burned into ROM for
posterity.

> What you really want is shared libraries.  That way, only one copy
> of the code is shared by all programs that use it, but you can
> change it.

This doesn't sound much different from the current scheme.  The
advantage of having the window system in protected RAM is that you
don't have gargantuan executables for small programs; calls to system
tools are simply linked to their known entry points in memory.

Please explain more about shared libraries.

mjr@osiris.UUCP (Marcus J. Ranum) (09/21/87)

In article <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) writes:
> You don't really want the window system in ROM.

	Actually, something that I find fascinating is the way that computer
systems seem to get loaded down by more users and huge windows programs at
just about the same rate that the hardware speeds up. IE - a system that used
to support 12 users with good response time is upgraded so it supports 12
users AND windows AND rwho AND 800 other kluges - with about the same
response time.
	Now, mind you, I'm not arguing that we should all go forward into
the past, but I'm starting to wonder if there'd be an advantage to running
something lean and mean and REALLY getting response time. Like, say,
Version 7 on a Sun 4.

--mjr();
-- 
If they think you're crude, go technical; if they think you're technical,
go crude. I'm a very technical boy. So I get as crude as possible. These
days, though, you have to be pretty technical before you can even aspire
to crudeness...			         -Johnny Mnemonic

shap@sfsup.UUCP (J.S.Shapiro) (09/21/87)

In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
> In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
> 
> > What you really want is shared libraries.  That way, only one copy
> > of the code is shared by all programs that use it, but you can
> > change it.
> 
> Please explain more about shared libraries.

Okay, here goes. I have stayed out of this, but shared libraries I can talk
about intelligently. Basically a shared library is a piece of code which
is "shared" between two programs. A portion of the address space is
reserved in advance by *everyone* for each shared library (that is, the
shared library has a permanent reserved location in the virtual address
space). Then, whoever needs the functionality in the shared library simply
compiles as usual, linking in the shared version of the library instead of
the normal version. As a result, a marker is put in the binary indicating
which (if any) shared libraries need to be hauled in. If the marker is
there, exec() arranges for the shared library to get mapped into your
address space.

This is preferable to the NVRAM scheme because if you *don't* use the
feature you don't have to pay for RAM for it.

In order to provide simple upgrades, shared libraries often work through
a jump table (even though they don't strictly need to). The jump table size
is fixed, and usually larger than the number of externally visible
functions to allow for backwards compatible expansion. A consequence of the
jump table is that if you need to fix a broken function, you can just fix
it. So long as the jump table doesn't move the library is compatible with
the old one *without* any recompilation. In some implementations (depends
on your hardware), the jump table points to a stub routine which
backpatches the "real" address of your function into your code. This has
the advantage that you only incur the shared library overhead once per
function, but the disadvantage that you can no longer page in those pages
from the executable - they now have to go to the paging area.

Only one copy of the shared library text is kept in core for all users. It
is simply mapped into all of the appropriate virtual address spaces. On
System V, most of the system applications are compiled against it.
Read-only data can be shared too, but read-write data pages need to be
marked copy-on-write, or if your memory manager doesn't support that, need
to be duplicated. This is okay, because the data section is usually small.
The static shared library uses the process heap.

It is worth noting that a better scheme, though much more difficult, is to
generate "position independent code" so that you do not have to reserve the
address of the library. You can then use exactly the same tricks, but
generate your function calls via a jump table in the executable's data
space. When you 'attach' the shared library, you copy the table out of the
library into your own, adjusting the entries so that the function addresses
are correct. This scheme has the advantage that you don't have to reserve
large chunks of your address space. Unfortunately, position independent
code is quite difficult to do, which is why current UNIX compilers (to my
knowledge) don't do it. This scheme is referred to as "dynamic loading."

I am hardly an expert on any of this, and I may have gotten the details
wrong. I do know that shared libraries are provided in System V, though I
don't know which CCS release they started in. AT&T compilers don't generate
position independent code. 

Side observation: If your binary is 500K, shared libraries don't help much.
They just don't represent a significant portion of your code. If your
binary is really that big, you probably have a lot of rethinking to do, and
ultimately this rethought will be reflected in better performance, greater
flexibility, and lower maintainance cost. Depends, of course, on your
application, but try running size(1) on /unix (or /vmunix) and take a look
at the text size.

*** Disclaimer ***

The above is a personal exegesis, and should not be taken as representing
the views of AT&T.

Jonathan Shapiro
AT&T Information Systems

labo@apollo.uucp (Dale Labossiere) (09/22/87)

In article <2067@sfsup.UUCP> shap@sfsup.UUCP (J.S.Shapiro) writes:
>In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
>> In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
>> 
>> Please explain more about shared libraries.
>
>                           ... A portion of the address space is
>reserved in advance by *everyone* for each shared library (that is, the
>shared library has a permanent reserved location in the virtual address
>space).
>
>   ...
>
>It is worth noting that a better scheme, though much more difficult, is to
>generate "position independent code" so that you do not have to reserve the
>address of the library. You can then use exactly the same tricks, but
>generate your function calls via a jump table in the executable's data
>space. When you 'attach' the shared library, you copy the table out of the
>library into your own, adjusting the entries so that the function addresses
>are correct. This scheme has the advantage that you don't have to reserve
>large chunks of your address space. Unfortunately, position independent
>code is quite difficult to do, which is why current UNIX compilers (to my
>knowledge) don't do it. This scheme is referred to as "dynamic loading."
>
> ...
>Jonathan Shapiro

This "dynamic loading" scheme is in fact what Apollo systems use.  The compilers
generate position independent code, and the libraries can be mapped anywhere
in the process's address space.

Rather than have a monolithic jump table for a shared library, shared libraries 
"register" their global addresses in a "Known Global Table" (KGT). Programs
which invoke shared library functions do so via an indirect address linkage
variable in their data space (the slight difference being that there are only
linkage variables for those shared functions that are invoked, not a complete
table for all of the libraries functions). 

Object files are tagged and when executed, the system loader (not UNIX ld(1)) 
resolves the undefined global's addresses using the KGT and patches up the 
program's linkage variables.
-- 
Dale LaBossiere              (617) 256-6600 x4292
Apollo Computer
330 Billerica Rd.            UUCP: {mit-erl,yale,uw-beaver,decvax}!apollo!labo
Chelmsford Ma. 01824         ARPA: apollo!labo@eddie.mit.edu

tihor@acf4.UUCP (Stephen Tihor) (09/22/87)

N excelemnt summary on shared libraries but I would not agree that Position
Independant Code is difficult to produce. 95+% of the code I produce is PIC
and with a more clever linker (as one would expect in that enviornment) it
could get higher.  Only systems whose instruction set was not designed with PIC
as a goal make it hard.

batson@cg-atla.UUCP (Jay Batson X5927) (09/23/87)

In article <2067@sfsup.UUCP> shap@sfsup.UUCP (J.S.Shapiro) writes:
>In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
>> In <14888@topaz.rutgers.edu>, hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
>> 
>> > What you really want is shared libraries....
>> 
>> Please explain more about shared libraries.
>
>Side observation: If your binary is 500K, shared libraries don't help much.
>They just don't represent a significant portion of your code. If your
>binary is really that big, you probably have a lot of rethinking to do....

Well, don't forget about shared libraries used by, say, suntools/windows/....
These kinds of "tookit" things can quickly grow things over a meg....

-------------------------------------------------------------------------------
Any opinions I have are most certainly those generated by the infinite
improbability generator.  To get better ones, you have to provide me with
a hotter cup of tea.  Just don't tell the nutrimatic drink dispenser....

Jay
decvax!cg-atla!batson

firth@sei.cmu.edu (Robert Firth) (09/24/87)

In article <12780006@acf4.UUCP> tihor@acf4.UUCP (Stephen Tihor) writes:
>N excelemnt summary on shared libraries but I would not agree that Position
>Independant Code is difficult to produce. 95+% of the code I produce is PIC
>and with a more clever linker (as one would expect in that enviornment) it
>could get higher.  Only systems whose instruction set was not designed with PIC
>as a goal make it hard.

I'd agree with this.  PIC is either easy or impossible, but rarely hard.
For instance, if your machine has only conditional branches with absolute
addresses as the operand, then it's impossible.

Taking the PDP-11 as another example, I added a PIC option to a
codegenerator with about one week's work, including testing &
documentation of the differences in the generated code.  Under
this option, code was between 2% and 10% bigger, the last an extreme
case where a lot of addresses of static read-only variables were
being generated.  PIC for the CA NM4 was designed in an afternoon
and it took the implementor about 3 days to put it in.

This is such a useful technique, that I think every machine ought
to support position-independent code.  Really all you need are
PC-relative control transfers, global base registers, and for
efficiency a load-address instruction.

daveb@llama.UUCP (09/24/87)

In article <2498@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
]In <417@devvax.JPL.NASA.GOV>, des@jplpro.JPL.NASA.GOV (David Smyth) writes:
]] But on these demand paged Suns, we see such stupidity as:
]] -rwxr-xr-x  1 root       122880 Sep 15  1986 /bin/csh*
]] -rwxr-xr-x  1 david      737280 Aug 14 16:34 bin/shelltool*
]] 
]] where most of the huge increase in size of shelltool is in the window
]] management stuff.  Isn't this stuff shared across all sorts of things
]] in SunWindows?  If so, why is EVERY "tool" so VERY slow loading?
]
]From what I've read about the Macintosh (I don't own one), it appears
]they did the right thing by putting the window management tools into
]ROM.  I sure wish Sun would do something like that...

For Sun, this will be fixed in the forthcoming SunOS 4.0 release with
their shared library support.  SVr3 also has this feature.  It is a real
win with almost no drawbacks when done with OS support.  Jeez Loise,
even _VMS_ has had them for years.  The only complication is version
control, and it's not _that_ hard to do.

It's best when users can define and install their own libraries too.

-dB
"If it was easy, we'd hire people cheaper than you to do it"
{amdahl, cbosgd, mtxinu, ptsfa, sun}!rtech!daveb daveb@rtech.uucp

aglew@ccvaxa.UUCP (09/24/87)

...> Position Independent Code

Note that PC relative addressing is not a strict requirement
for Position Independent Code.

Many fast machines cannot generate PCs quickly enough to be used in
addressing without a delay - that is, they cannot generate a PC
accurate to byte, word, or doubleword quiickly enough, and the rules
for the inaccuracy are probabilistic.
	However, they may be able to generate, say, the page number
that the PC is in.

With this type of addressing you get Position Independent Code to
a certain granularity - ie. you can vary addresses by a multiple of
pages, but not within a page. Often this is good enough.

peter@sugar.UUCP (Peter da Silva) (09/25/87)

In article <2498@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
> In <417@devvax.JPL.NASA.GOV>, des@jplpro.JPL.NASA.GOV (David Smyth) writes:
> > management stuff.  Isn't this stuff shared across all sorts of things
> > in SunWindows?  If so, why is EVERY "tool" so VERY slow loading?
> From what I've read about the Macintosh (I don't own one), it appears
> they did the right thing by putting the window management tools into
> ROM.  I sure wish Sun would do something like that.

Shared libraries don't have to be in ROM. On the Amiga they do rather well
being loaded when needed and flushed when memory gets low. The semantics
of shared libraries under UNIX are difficult to see. Does anyone have any
ideas about hwo you can share stuff like that? Making windows first class
virtual devices and sending them escape sequences sounds like a win. I know
X-windows does that, but it's huge. Extended ANSI codes or maybe even
tektronics graphics codes would probably cut it. Amiga Intuition provides
such an interface for text applications.
-- 
-- Peter da Silva `-_-' ...!hoptoad!academ!uhnix1!sugar!peter
--                 'U`  Have you hugged your wolf today?
-- Disclaimer: These aren't mere opinions... these are *values*.

guy%gorodish@Sun.COM (Guy Harris) (09/25/87)

> The semantics of shared libraries under UNIX are difficult to see.

Why?  There have been several implementations on various UNIX systems.

> Does anyone have any ideas about hwo you can share stuff like that?

Yes.  See:

	Chapter 8, "Shared Libraries", in the S5R3 Programmer's Guide;

	(Here it comes again, for those of you who haven't been paying
	attention) "Shared Libraries in SunOS", in the proceedings of the
	Summer 1987 USENIX in Phoenix.

There have been other implementations as well.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

jg@jumbo.dec.com (Jim Gettys) (09/25/87)

In article <818@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>Shared libraries don't have to be in ROM. On the Amiga they do rather well
>being loaded when needed and flushed when memory gets low. The semantics
>of shared libraries under UNIX are difficult to see. Does anyone have any
>ideas about hwo you can share stuff like that? Making windows first class
>virtual devices and sending them escape sequences sounds like a win. I know
>X-windows does that, but it's huge. Extended ANSI codes or maybe even
>tektronics graphics codes would probably cut it. Amiga Intuition provides
>such an interface for text applications.

The X window system does NOT make a window a virtual device, or use
escape sequences for communications.

X is a network server process, and clients (applications) open a single
connection (essentially any stream protocol will do) to the X server.  
Over that connection, the X protocol is spoken, which is a binary
special purpose protocol, and not a escape sequence.  Over a single
connection, you can manipulate as many windows as you want.

X applications are quite small, since all of the basic windowing and
display code is encapsulated in the server.  Applications range
from 45k to 210k (full mail user interface).  The V10 server has
been ported to everything from an IBM PC/AT, on up...  These sizes
are much smaller than many other window systems.
				Jim Gettys

meissner@xyzzy.UUCP (Michael Meissner) (09/28/87)

In article <2614@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (PUT YOUR NAME HERE) writes:
| In article <12780006@acf4.UUCP> tihor@acf4.UUCP (Stephen Tihor) writes:
| >N excelemnt summary on shared libraries but I would not agree that Position
| >Independant Code is difficult to produce. 95+% of the code I produce is PIC
| >and with a more clever linker (as one would expect in that enviornment) it
| >could get higher.  Only systems whose instruction set was not designed with PIC
| >as a goal make it hard.
| 
| I'd agree with this.  PIC is either easy or impossible, but rarely hard.
| For instance, if your machine has only conditional branches with absolute
| addresses as the operand, then it's impossible.

One of the things that is hard about position independent code is dealing
with the initialization of pointers to static/extern data items, which is
typically done at link time.  You have to have some way of getting the
right value into the pointer before it is used.  In this respect, System V
shared libraries are easier because they are not position independent.
-- 
Michael Meissner, Data General.		Uucp: ...!mcnc!rti!xyzzy!meissner
					Arpa/Csnet:  meissner@dg-rtp.DG.COM

stachour@umn-cs.UUCP (09/29/87)

In article <2067@sfsup.UUCP>, shap@sfsup.UUCP (J.S.Shapiro) writes:
> In article <2501@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
> > In <14888@topaz.rutgers.edu>,
> >       hedrick@topaz.rutgers.edu (Charles Hedrick) sez:
> > 
> > > What you really want is shared libraries.  That way, only one copy
> > > of the code is shared by all programs that use it, but you can
> > > change it.
> > 
> > Please explain more about shared libraries.
> 
> Okay, here goes. I have stayed out of this, but shared libraries I can talk
> about intelligently. Basically a shared library is a piece of code which
> is "shared" between two programs. A portion of the address space is
> reserved in advance by *everyone* for each shared library (that is, the
> shared library has a permanent reserved location in the virtual address
> space). Then, whoever needs the functionality in the shared library simply
> compiles as usual, linking in the shared version of the library instead of
> the normal version. As a result, a marker is put in the binary indicating
> which (if any) shared libraries need to be hauled in. If the marker is
> there, exec() arranges for the shared library to get mapped into your
> address space.
> 
No, requiring reserved address space is needed only in silly machines
or using silly operating systems. Sharing should be by name, such as
mail_system_$get_mail_message, and not by some pre-bound address set.

> ... (stuff deleted) ...        In some implementations (depends
> on your hardware), the jump table points to a stub routine which
> backpatches the "real" address of your function into your code. This has
> the advantage that you only incur the shared library overhead once per
> function, but the disadvantage that you can no longer page in those pages
> from the executable - they now have to go to the paging area.

No, it should go indirectly though a linkage-area specific to your process.
Your code, and all code, should remain read-only, and shared amoung
all processes that use it.
> 
> Only one copy of the shared library text is kept in core for all users. It
> is simply mapped into all of the appropriate virtual address spaces. 

Yes, only one copy, even if sometimes the shared library is running
with different priviledges.  You should note that most hardware
architectures force addressing schemes that mean that one cannot
write shared code, since it cannot run in multiple modes.  Even when
the hardware is OK, often the operating system (like IBM's now
supperceded OS/MVT) memory-management mechanism precludes it.

> ... (more deleted)           Unfortunately, position independent
> code is quite difficult to do, which is why current UNIX compilers (to my
> knowledge) don't do it. This scheme is referred to as "dynamic loading."

No, position-independent code is quite easy to do.  It's been done by
the GE Multics EPL and PL/I compilers for around 20 years. [For historians,
I personally consider 'C' as a cross between untyped 'B' and a subset
of the EPL subset of PL/I.]   By the way, what one really wants/needs
is dynamic-linking, not dynamic-loading.

>  ... (more deleted)
> Side observation: If your binary is 500K, shared libraries don't help much.
> They just don't represent a significant portion of your code. If your
> binary is really that big, you probably have a lot of rethinking to do, and
> ultimately this rethought will be reflected in better performance, greater
> flexibility, and lower maintainance cost.

But your own code should be automatically shared as well, and others
should be able to use the "object-managers" that you have written
without having to put those managers into their own code.

For those wishing to 'really' understand shared code, I recommend
the book by EI Organick on the Design of the Multics System.
It tells how sharing (through real dynaic linking) 
is done on a system that was designed from the beginning for shared,
reliable (utility-grade, as good as the telephones or power company)
computing, and which was designed to make it easy to build good software
(not an explicit goal of hardly any other system).
It's an 'ancient' book, but still more complete 
on the subject than any other I know.

Spoiler-Warning:  If you don't know much about hardware instruction-set
architectures, and/or programmning language run-time needs,
you may not be able to understand this book.

Paul Stachour
Honeywell SCTC:  Stachour@HI-Multics.ARPA
Univ of Minn:    stachour at umn-cs.edu

daveb@geac.UUCP (Brown) (09/29/87)

In article <275@xyzzy.UUCP> meissner@nightmare.UUCP (Michael Meissner) writes:
>One of the things that is hard about position independent code is dealing
>with the initialization of pointers to static/extern data items, which is
>typically done at link time.  You have to have some way of getting the
>right value into the pointer before it is used.  

  Well, you delay some portion of linking 'til run-time.  If we assume that
externs are in the data segment, you link with a unique segment number
which at run-time turns out to mean "pages 23-29" to your paging
hardware. 
  The hard part is figuring out the data structures to pass to the
program loader (and to generate in the linker) so that gets done
cleanly and in finite time.

  Multics used a very simple scheme: an address had a segment number
(no, not the same as IBM-PC segments) and an offset.  When an address
was evaluated, the segment number was used as an index into an
on-board translate table which yielded a page address, and if it
wasn't there a page fault occurred.
    +-----+--------+
    | seg | offset | -------> address in page/page-sequence
    +-----+--------+
       \
        --------------------> index into page-tables

  If you can define how your address get interpreted, (ie, you're a
machine designer) you can set this up and migrate your customer base
to it via having a new linker, loader and "magic number" for the new
executables.  They won't run on *old* machines, of course...

 --daveb

ps: this *is* idealized, you understand.
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

peter@sugar.UUCP (10/03/87)

Apologies. I had gotten the impression that X was a sort of super-escape-
sequence protocol. Probably because I styill haven't been able to get hold
of any sort of description of the system unless I want to fork out a hundred
bucks for the full tape.

I have also been told that X was huge... as in, an Atari 1040ST with a MEG of
standard RAM wouldn't support it. How big's the server, really?
-- 
-- Peter da Silva  `-_-'  ...!hoptoad!academ!uhnix1!sugar!peter
-- Disclaimer: These U aren't mere opinions... these are *values*.

scc@cl.cam.ac.uk (Stephen Crawley) (10/04/87)

All this talk about huge programs being bad and shared libraries being
good, and Jim Gettys mention of X servers and X applications prompted 
me to take a look at our X10 server for XDE (i.e. for a D machine)

The size of XServerConfig.bcd file (the load image) is 27,648 bytes.  

Admittedly, it is not quite a complete implementation of X10.4 yet.  
On the other hand, it includes the code for a pinger tool to start 
an xterm on a remote machine.

So why is it so small?  Mainly I think it is because it uses the XDE 
runtime library for almost everything.  The only significant exception
is in the code for painting the bitmap, where the implementor found
that X's model of bit painting didn't map efficiently onto the XDE
library routines.

A possibly relevant statistic: the source code for the complete server
is 7708 lines of Mesa code.

-- Steve

"My other machine's a Dorado" ... wishful thinking!

jg@jumbo.dec.com (Jim Gettys) (10/05/87)

In article <856@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>Apologies. I had gotten the impression that X was a sort of super-escape-
>sequence protocol. Probably because I styill haven't been able to get hold
>of any sort of description of the system unless I want to fork out a hundred
>bucks for the full tape.
There is a paper in Transactions on Graphics which came out a couple
months ago about X.  You might look there.

>I have also been told that X was huge... as in, an Atari 1040ST with a MEG of
>standard RAM wouldn't support it. How big's the server, really?

Here are the server sizes under Version 11, on a Vax.
text    data    bss     dec     hex
335872  32768   21268   389908  5f314   Xqdss  (server for GPX color display)
297984  20480   17132   335596  51eec   Xqvss  (server for straight bitmap)

The server on my GPX's size, after running for some days, is 778k.
Note that there are a number of ways to make server dynamic data sizes
much smaller; the server as distributed is optimized toward speed
and simplicity.  At the cost of a bit of work, the size of data used
dynamically can be shrunk alot.

Neither of these implementations has been optimized yet,
and are mostly based on a complete set of machine independent graphics
code which comes with the distribution.  Much of this code may be
unneeded on machines which have graphics code in ROM, or in hardware
if the hardware or ROM can perform the correct operations.  I expect
many production servers to be smaller than above.
				Jim Gettys

raveling@vaxa.isi.edu (Paul Raveling) (10/05/87)

In article <856@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>
>I have also been told that X was huge... as in, an Atari 1040ST with a MEG of
>standard RAM wouldn't support it. How big's the server, really?

Here are some snapshots of X10.4 display server memory sizes on
an HP 9000/350.  These are as reported by a monitor program,
except for translating from pages to bytes and reformatting to
make the results more readable.

Note that this is only display server memory use -- doesn't include
clients.

X10.4 Topcat display server, serving typical client set:

	Text	245,760		Data	 114,688	Stack	12,288

	Current resident set size:        118,784
	Maximum resident set size:        364,554

X10.4 Renaissance display server, minimal client load:

	Text	270,336		Data	 225,280	Stack	12,288

	Current resident set size:        233,472
	Maximum resident set size:        503,808

X10.4 Renaissance display server, large pixmaps stored in server:

	Text	270,336		Data	5,177,344	Stack	12,288

	Current resident set size:      5,185,536
	Maximum resident set size:      5,455,872

---------------------
Paul Raveling
Raveling@vaxa.isi.edu

jim@ci-dandelion.UUCP (Jim Fulton) (10/07/87)

In article <856@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>
>I have also been told that X was huge... as in, an Atari 1040ST with a MEG of
>standard RAM wouldn't support it. How big's the server, really?

The first non-Unix port of The X Window System server was, in fact, to an IBM
PC/AT running MS-DOS.  Looking at the development sources I see

        C> dir pcx.exe

         Volume in drive C is PCSOURCES
         Directory of  C:\SRC\X\MAIN

        PCX      EXE   241696   9-29-87   8:11a

The linker gives me the following sizes for this image:

	code		data		Udata
	207964		20402		19418

PCX loads all of its device drivers (ethernet, high-res smart color graphics
card, mouse/tablet/etc.) when it starts up, so there's no code hidden in DOS.
If you have an expanded memory card it will use that for font bitmaps,
otherwise it uses system memory.  There is also a TFTP server in there so that
you can transfer files to and from the PC while running X. 

The version that we ship with our Mechanical Computer Aided Engineering product
also has bound into it extension packages for doing sophisticated dynamic
feedback, geometric animation, and menu display.  Our User Interface Management
System uses roughly 2 dozen fonts and exercises even our high-end servers.

X.V10 runs just fine on small machines.

                                                           Jim Fulton
                                                           Cognition Inc.
                                                           900 Tech Park Drive
uucp:    ...!{mit-eddie,talcott,necntc}!ci-dandelion!jim   Billerica, MA  01821
domain:    jim@ci-dandelion.ci.com, fulton@eddie.mit.edu   (617) 667-4800