[comp.unix.internals] Fundamental defect of the concept of shared libraries

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/16/91)

In article <1991May16.002617.15386@ladc.bull.com>
	fmayhar@hermes.ladc.bull.com writes:

>There may actually not be any "right" implementations extant at the
>moment (this is debatable), but that's not the point.

Without any fact, your claim is nothing.

Let's see what's wrong with shared libraries.

>-> Indirect jumps and accompanied process private data for the jump table.
>
>So what would be a better way to do it?
>
>Really, there's a tradeoff between the utility of shared libraries and
>efficiency.

Effeciency is only one aspect of the problem.

To share libraries, they should be:

	1) coded position independently (PIC)

or

	2) assigne static virtual address

If we take 1), the hardware architecture must support PC relative jump,
of course. Moreover, to access library private data, it must also
address data PC relative. Aside from effeciency, not all architechture
support this.

Note that, library private data is inevitable to support calls between
libraries, position independently.

Even worse, with some architechture, it is impossible to map several virtual
addresses to a physical address. Virtually tagged cache and inverted
page tables are notable examples.

If we take 2), even if you have enough address space to map all libraries
(32 bits is obviously not enough, I even think 48 bits is not), it will
be nightmare to maintain consictency. Different libraries must
have different addresses, of course, which is already non-trivial.

Moreover, compatible libraries must have the same address, whose scheme
will be very complex, even though it exists.

Even worse, if a program is linked with libraries A.0 and B.0 and the
other program is linked with libraries A.0 and B.1 (an upgraded version
of B.0) and a function in A.0 calls a funtion in B.*, it can't. As
a workaround, we can have two versoins of A.0: A.0.B.0 and A.0.B.1.
Thus, with the increase of number of kind of libraries, the number of
libraries and required storage grows exponentially.

I hope you can now understand how complex the shared library is.

The fundamental solution is, of course, not to have shared libraries.

						Masataka Ohta

krey@ira.uka.de (Andreas Krey) (05/16/91)

In article <197@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
[about shared libraries, not always completely correct]
|> 
|> I hope you can now understand how complex the shared library is.
|> 
|> The fundamental solution is, of course, not to have shared libraries.
|> 
|> 						Masataka Ohta

We now all see how complex computers are.

The fundamental solution is, of course, not to have computers.


And, why do you share text pages of statically linked programs? Seems to
be a similar problem and unnecessarily complicating operating systems.
Do you ever rum more than one instance of any program at one?

-- 
Andy

barmar@think.com (Barry Margolin) (05/17/91)

In article <197@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>To share libraries, they should be:
>	1) coded position independently (PIC)
>or
>	2) assigne static virtual address
>
>If we take 1), the hardware architecture must support PC relative jump,
>of course. Moreover, to access library private data, it must also
>address data PC relative. Aside from effeciency, not all architechture
>support this.

You don't *have* to have PC-relative jumps and data access, although it is
convenient.  The Multics compiler uses it when it can, but I think
PC-relative instructions have a relatively small limit on the offset.

When PC-relative addressing isn't available or usable, you just need
register+offset addressing, which most computers have.  On Multics, one of
the pointer registers by convention holds the address of the base of the
currently executing segment, and PIC simply offsets from this.  On a Unix
system, it would simply be a pointer to the location where the library is
mmap'ed.  The only tricky part is arranging for the register to be set
whenever an inter-module call or return takes place.

>Even worse, with some architechture, it is impossible to map several virtual
>addresses to a physical address. Virtually tagged cache and inverted
>page tables are notable examples.

Well, this kills any kind of shared text architecture, not just shared
libraries.
-- 
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

mcnally@wsl.dec.com (Mike McNally) (05/17/91)

In article <197@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
|> Effeciency is only one aspect of the problem.
|> 
|> To share libraries, they should be:
|> 
|> 	1) coded position independently (PIC)
|> 
|> or
|> 
|> 	2) assigne static virtual address
|> 
|> If we take 1), the hardware architecture must support PC relative jump,
|> of course.

No No No No No No No.  All that's needed is indirect jump.  Pull your head 
out before being so dogmatic.

|> Moreover, to access library private data, it must also
|> address data PC relative. Aside from effeciency, not all architechture
|> support this.

Once again, indirection is all that's needed.

|> Note that, library private data is inevitable to support calls between
|> libraries, position independently.

So?

|> Even worse, with some architechture, it is impossible to map several virtual
|> addresses to a physical address. Virtually tagged cache and inverted
|> page tables are notable examples.

OK fine.  Maybe we shouldn't have inter-process memory protection since not
all architectures support it.  Hell, better dump floating-point too, since my
8085 machine at home doesn't have it.

|> 
|> If we take 2), even if you have enough address space to map all libraries
|> (32 bits is obviously not enough, I even think 48 bits is not), it will
|> be nightmare to maintain consictency. Different libraries must
|> have different addresses, of course, which is already non-trivial.

Gee, Masataka, maybe you should re-state your argument:

	"*I* don't know how to solve the problems of shared libraries
	to *my own* satisfaction based on *my own* dogmatic criteria,
	and so *I* won't implement shared libraries, nor will I touch
	any system which uses them."

Sheesh.

|> 
|> Moreover, compatible libraries must have the same address, whose scheme
|> will be very complex, even though it exists.

What does this mean?

|> 
|> Even worse, if a program is linked with libraries A.0 and B.0 and the
|> other program is linked with libraries A.0 and B.1 (an upgraded version
|> of B.0) and a function in A.0 calls a funtion in B.*, it can't.

Why not?  Granted, both versions of B will have to be loaded, but "can't"?
I ask you to "prove" that; please try to be a bit more rigorous when giving
proofs, too.  It's not enough to say "I propose P, I proved P, QED."

|> I hope you can now understand how complex the shared library is.

I now understand that you don't know how to implement shared libraries.

-- 
* "In the Spirit as my automatics,  *                              Mike McNally
* Lystra and Zelda were one third   *                                    Coolie
* as large as the infinite Cosmos." *                  DEC Western Software Lab
*              --- D. S. Ashwander  *                       mcnally@wsl.dec.com 

jeremy@sw.oz.au (Jeremy Fitzhardinge) (05/17/91)

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:

>In article <1991May16.002617.15386@ladc.bull.com>
>	fmayhar@hermes.ladc.bull.com writes:
>
>>There may actually not be any "right" implementations extant at the
>>moment (this is debatable), but that's not the point.
>
>Without any fact, your claim is nothing.
>
>Let's see what's wrong with shared libraries.
>
>>-> Indirect jumps and accompanied process private data for the jump table.
>>
>>So what would be a better way to do it?
>>
>>Really, there's a tradeoff between the utility of shared libraries and
>>efficiency.
>
>Effeciency is only one aspect of the problem.
>
>To share libraries, they should be:
>
>	1) coded position independently (PIC)
>
>or
>
>	2) assigne static virtual address
>
>If we take 2), even if you have enough address space to map all libraries
>(32 bits is obviously not enough, I even think 48 bits is not), it will
>be nightmare to maintain consictency. Different libraries must
>have different addresses, of course, which is already non-trivial.
>
>Moreover, compatible libraries must have the same address, whose scheme
>will be very complex, even though it exists.
>
>Even worse, if a program is linked with libraries A.0 and B.0 and the
>other program is linked with libraries A.0 and B.1 (an upgraded version
>of B.0) and a function in A.0 calls a funtion in B.*, it can't. As
>a workaround, we can have two versoins of A.0: A.0.B.0 and A.0.B.1.
>Thus, with the increase of number of kind of libraries, the number of
>libraries and required storage grows exponentially.

Why not relocate a library into a virtual address when it is loaded, so
that you dont need to assign an address when the library is made.  It
is quite compact and efficent to store relocation information and apply
it as you load.  An executable could use some key to tell the OS what
library it needs and the OS returns the virtual address of a library,
either just loaded or loaded for something else.  The program can
then relocate its library calls to the address.  The library could
either use a jump table or have real symbolic information (which is
nicer, I think).

>I hope you can now understand how complex the shared library is.
>
>The fundamental solution is, of course, not to have shared libraries.

Multitasking gets pretty complex too - is it worth the effort?

--
jeremy@softway.sw.oz.au ph:+61 2 698 2322-x122 fax:+61 2 699 9174
"Hi Barbie, I'm your plastique surgeon, Roger.  Are you ready for your
 Semtex augmentation?"... "John Thompson died for you" society meets now.
I opine for the fjords, nothing else.

mikes@ingres.com (Mike Schilling) (05/17/91)

From article <197@titccy.cc.titech.ac.jp>, by mohta@necom830.cc.titech.ac.jp (Masataka Ohta):
> ...
> I hope you can now understand how complex the shared library is.
> 
> The fundamental solution is, of course, not to have shared libraries.
> 
> 						Masataka Ohta
Since VMS has had very functional shared libraries for over 10 years now, 
I have to consider this an overstatement.

Mike
----------------------------------------------------------------------------
mikes@rtech.com = Mike Schilling, ASK Corporation, Alameda, CA
Just machines that make big decisions,
Programmed by fellows with compassion and vision.	-- Donald Fagen, "IGY"

mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) (05/18/91)

In article <197@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
   >There may actually not be any "right" implementations extant at the
   >moment (this is debatable), but that's not the point.

   Without any fact, your claim is nothing.

Neither is yours.

   If we take 1), the hardware architecture must support PC relative jump,
   of course. Moreover, to access library private data, it must also
   address data PC relative. Aside from effeciency, not all architechture
   support this.

   Even worse, with some architechture, it is impossible to map several virtual
   addresses to a physical address. Virtually tagged cache and inverted
   page tables are notable examples.

So some architechtures can't support shared libraries? Well, don't put
shared libraries on them. Some architechtures can't support demand
paged memory, or virtual address spaces, or preemptive scheduling.
Does that mean we have to live without them on machines that can
support them?  No; it doesn't.

   I hope you can now understand how complex the shared library is.

No, I understand that you aren't qualified to do systems design work.
Using your logic, I can show that you can't do any of the things I
mentioned above "correctly".  They are still usefull in lots of
places.  The solution is not to "just not do them;" the solution is to
understand them and the various implementations, and to know the
tradeoffs involved in using those implementatins, and use them where
it's appropriate.

	<mike
--
Kiss me with your mouth.				Mike Meyer
Your love is better than wine.				mwm@pa.dec.com
But wine is all I have.					decwrl!mwm
Will your love ever be mine?

guy@auspex.auspex.com (Guy Harris) (05/20/91)

>If we take 1), the hardware architecture must support PC relative jump,
>of course. Moreover, to access library private data, it must also
>address data PC relative. Aside from effeciency, not all architechture
>support this.

Are there any architectures of interest in this discussion that can't
support PC-relative references?

>Even worse, with some architechture, it is impossible to map several virtual
>addresses to a physical address. Virtually tagged cache and inverted
>page tables are notable examples.

If you believe that a system with a virtual-address cache, or a system
with inverted page tables, cannot map several virtual addresses to a
physical address, you're wrong.  Proof by counterexample:

	1) various flavors of Suns with virtual address caches, which
	   all support mapping several virtual addresses to a physical
	   address;

	2) the IBM ROMP and RIOS architectures, which have inverted page
	   tables and support mapping several virtual addresses to a
	   physical address.

They may have to go through some amount of pain to do so, but they *do*
manage to do it.

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/20/91)

In article <1991May16.200702.7476@Think.COM> barmar@think.com writes:

>>	1) coded position independently (PIC)

>You don't *have* to have PC-relative jumps and data access, although it is
>convenient.

No, I don't have to, but it is very inconvenient not to do so.

>When PC-relative addressing isn't available or usable, you just need
>register+offset addressing, which most computers have.

I was wrong here, yes, it is possible if we use indirect addressing to
access global data, but it is slow.

>The only tricky part is arranging for the register to be set
>whenever an inter-module call or return takes place.

The call overhead is six extra cycles with typical RISCs, whenever an
inter-object-file (not inter-library) call-return takes place. It is
not negligible when we are heavyly doing something like strcmp().

As I re-read the discussion, someone mentioned the possibility of speed
up by globally optimizing compiler.  But, that is unfair. With the same
amount of optimizaiton, statically linked libraries can have better
optimization such as in-lining. You may remember that the speed of Bnews
was actually improved by in-lining the first part of strcmp(). In-lining
of functions in shared libraries is, of course, impossible.

>>Even worse, with some architechture, it is impossible to map several virtual
>>addresses to a physical address. Virtually tagged cache and inverted
>>page tables are notable examples.

>Well, this kills any kind of shared text architecture, not just shared
>libraries.

You can always share text as usual UNIX box do, because it only requires
to map a single virtual address of several different processes to a
         ^^^^^^                            ^^^^^^^^^^^^^^^^^^^
physical address.

							Masataka Ohta

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/20/91)

In article <MWM.91May17132439@raven.pa.dec.com>
	mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:

>   Even worse, with some architechture, it is impossible to map several virtual
>   addresses to a physical address. Virtually tagged cache and inverted
>   page tables are notable examples.

>So some architechtures can't support shared libraries? Well, don't put
>shared libraries on them.

That's what I am saying.

>Some architechtures can't support demand
>paged memory, or virtual address spaces, or preemptive scheduling.
>Does that mean we have to live without them on machines that can
>support them?  No; it doesn't.

You don't know about hardware enough. Because address translation is time
consuming, fast cache is always indexed by virtual address. Thesedays,
virtually indexed cache is quite common.

So, if you want shared libraries, you can put it only on slower machines.

>No, I understand that you aren't qualified to do systems design work.

You understand nothing.

As you don't know enough about hardware, you aren't qualified to do
systems design work.

						Masataka Ohta

jfh@rpp386.cactus.org (John F Haugh II) (05/20/91)

In article <209@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>The call overhead is six extra cycles with typical RISCs, whenever an
>inter-object-file (not inter-library) call-return takes place. It is
>not negligible when we are heavyly doing something like strcmp().

The CPU overhead to field an unneeded page fault caused by too many
statically bound executables will dominate your little 6 cycle hit
the first time it happens.  Trust me.  I'd rather have a slightly
slowed down, CPU bound process, than a system thrashing about all
day and night because it doesn't support shared libraries.
-- 
John F. Haugh II        | Distribution to  | UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 255-8251 | GEnie PROHIBITED :-) |  Domain: jfh@rpp386.cactus.org
"If liberals interpreted the 2nd Amendment the same way they interpret the
 rest of the Constitution, gun ownership would be mandatory."

mcnally@wsl.dec.com (Mike McNally) (05/20/91)

In article <213@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
 
|> You don't know about hardware enough. Because address translation is time
|> consuming, fast cache is always indexed by virtual address. Thesedays,
|> virtually indexed cache is quite common.
|> 
|> So, if you want shared libraries, you can put it only on slower machines.

How about MIPS R3000/R4000?  Maybe that's not fast enough.



- 
* "In the Spirit as my automatics,  *                              Mike McNally
* Lystra and Zelda were one third   *                                    Coolie
* as large as the infinite Cosmos." *                  DEC Western Software Lab
*              --- D. S. Ashwander  *                       mcnally@wsl.dec.com 

mjr@hussar.dco.dec.com (Marcus J. Ranum) (05/20/91)

	This argument about shared libraries has gotten ridiculous. Let's
be sensible about this. Does anyone have pointers to any hard numbers that
might shed some light on the performance impact/benefits under a reasonable
workload?

	We could argue forever, which is silly.

mjr.

shore@theory.TC.Cornell.EDU (Melinda Shore) (05/21/91)

[]
In the proceedings of the Summer 1990 Usenix Conference (Anaheim) there
are two papers describing different implementations of shared libraries.
Both papers present results.  Both papers conclude that for programs not
dominated by startup costs, the costs of dynamic loading are usually
insignificant (obvious tautology ... ).  Donn Seeley's paper is
particularly relevant, in that he's arguing that it is possible to
have a shared library implementation that is both simple and fast.
You just have to know what you're doing.
-- 
                    Software longa, hardware brevis
Melinda Shore - Cornell Information Technologies - shore@theory.tn.cornell.edu

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/21/91)

In article <7916@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:

>Are there any architectures of interest in this discussion that can't
>support PC-relative references?

R3000.

>If you believe that a system with a virtual-address cache, or a system
>with inverted page tables, cannot map several virtual addresses to a
>physical address, you're wrong.

I am correct. It can't map them.

>Proof by counterexample:

>They may have to go through some amount of pain to do so, but they *do*
>manage to do it.

As the architechture can not map them, a possible workaround is to flush
cache/page-table by software at context switch.

It will slow down context switch. That is, if you have heavily interactive
processes, as is often the case with window systems, the performance
degradation will be much.

						Masataka Ohta

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/21/91)

In article <1991May20.090857@wsl.dec.com> mcnally@wsl.dec.com writes:

>|> You don't know about hardware enough. Because address translation is time
>|> consuming, fast cache is always indexed by virtual address. Thesedays,
>|> virtually indexed cache is quite common.

>|> So, if you want shared libraries, you can put it only on slower machines.

>How about MIPS R3000/R4000?  Maybe that's not fast enough.

The primary cache of R4000 is virtually indexed and physically tagged.
That is, it can't map different virtual addresses to a physical address.

							Masataka Ohta

mcnally@wsl.dec.com (Mike McNally) (05/21/91)

In article <216@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
|> In article <1991May20.090857@wsl.dec.com> mcnally@wsl.dec.com writes:
|> >|> So, if you want shared libraries, you can put it only on slower machines.
|> 
|> >How about MIPS R3000/R4000?  Maybe that's not fast enough.
|> 
|> The primary cache of R4000 is virtually indexed and physically tagged.
|> That is, it can't map different virtual addresses to a physical address.

Then I'm working on a project that I can't do, according to your "rule".  Or,
you consider the R3000/R4000 to be slow.  I won't claim it's the fastest
CPU in the world, but I don't know of too many reasonable people who'd say
it's slow.

-- 
* "In the Spirit as my automatics,  *                              Mike McNally
* Lystra and Zelda were one third   *                                    Coolie
* as large as the infinite Cosmos." *                  DEC Western Software Lab
*              --- D. S. Ashwander  *                       mcnally@wsl.dec.com 

fmayhar@hermes.ladc.bull.com (Frank Mayhar) (05/21/91)

In article <197@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
-> In article <1991May16.002617.15386@ladc.bull.com> I write:
-> >There may actually not be any "right" implementations extant at the
-> >moment (this is debatable), but that's not the point.
-> Without any fact, your claim is nothing.

Let's see _you_ start including "facts" in your postings, then.  So far all
I've seen is supposition and unsupported assertions.

-> Effeciency is only one aspect of the problem.

Not true.  Regardless of the complexity of the implementation, the only
real tradeoff is between efficiency and utility.  See my previous posts
regarding this.

-> To share libraries, they should be:
-> 	1) coded position independently (PIC)
-> or
-> 	2) assigne static virtual address

Granted, more or less.

-> If we take 1), the hardware architecture must support PC relative jump,
-> of course. Moreover, to access library private data, it must also
-> address data PC relative.

Not necessarily.  While it may well need PC-relative transfers, data addressing
may use an different mechanism (probably also register-relative, but almost
certainly not PC-relative).

-> Aside from effeciency, not all architechture support this.

Examples?  Certainly the 680x0 and the 80x86 support this, as well as most
mainframe architectures.  I expect that almost any architecture would support
this.  I mean, what's the difference between PC-relative addressing and any
other kind of register-relative addressing?  And, as has already been stated,
PC-relative jumps aren't essential; other forms of indirect transfers work
as well.

-> Note that, library private data is inevitable to support calls between
-> libraries, position independently.

Not inevitable.  It depends on the implementation; I can certainly imagine
an implementation that supports inter-library calls via use of automatic
storage ("stack" space).  While technically this is "library private data,"
it doesn't have the implementation complexities that static storage does.

-> Even worse, with some architechture, it is impossible to map several virtual
-> addresses to a physical address. Virtually tagged cache and inverted
-> page tables are notable examples.

Perhaps these architectures aren't suitable for shared libraries.  And, as
Barry Margolin said, if this is the case, _any_ kind of text sharing is
dead.  IMHO, though, the concept of virtual memory implies the ability to
map a physical address to several virtual addresses.  Show me one that doesn't
allow this, and I'll show you one that is almost useless for modern computing
purposes. 

-> If we take 2), even if you have enough address space to map all libraries
-> (32 bits is obviously not enough, I even think 48 bits is not), it will
-> be nightmare to maintain consictency. Different libraries must
-> have different addresses, of course, which is already non-trivial.

How is 32 bits "obviously" not enough?  Four gigabytes of address space isn't
enough?  How big _are_ your programs, anyway?  I agree that solving such
problems are nontrivial, but that doesn't mean that they aren't worthwhile.

-> Moreover, compatible libraries must have the same address, whose scheme
-> will be very complex, even though it exists.

I don't understand this.  Your English is somewhat mangled.  Care to explain?

-> Even worse, if a program is linked with libraries A.0 and B.0 and the
-> other program is linked with libraries A.0 and B.1 (an upgraded version
-> of B.0) and a function in A.0 calls a funtion in B.*, it can't.

Why not?  Seems to me that it would depend on the context of the call:  if
the call is happening in the first program, the call would be to B.0; if
in the second, the call would be to B.1.  Regardless, this is an implementation
issue.  Likely nontrivial, but solvable.

-> As
-> a workaround, we can have two versoins of A.0: A.0.B.0 and A.0.B.1.
-> Thus, with the increase of number of kind of libraries, the number of
-> libraries and required storage grows exponentially.

The above scheme would avoid this problem.

-> I hope you can now understand how complex the shared library is.

Oh, I quite understand that shared library _implementations_ are complex.
I hope that you can now understand that shared libraries are often _useful_,
regardless of the complexity of the implementation.

-> The fundamental solution is, of course, not to have shared libraries.

The fundamental solution is, of course, not to engage in pointless religious
arguments with dogmatic individuals who are closed to any ideas not their own.
-- 
Frank Mayhar  fmayhar@hermes.ladc.bull.com (..!{uunet,hacgate}!ladcgw!fmayhar)
              Bull HN Information Systems Inc.  Los Angeles Development Center
              5250 W. Century Blvd., LA, CA  90045    Phone:  (213) 216-6241

tonys@pyra.co.uk (Tony Shaughnessy) (05/21/91)

In article <216@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>In article <1991May20.090857@wsl.dec.com> mcnally@wsl.dec.com writes:
>
>>|> You don't know about hardware enough. Because address translation is time
>>|> consuming, fast cache is always indexed by virtual address. Thesedays,
>>|> virtually indexed cache is quite common.
>
>>|> So, if you want shared libraries, you can put it only on slower machines.
>
>>How about MIPS R3000/R4000?  Maybe that's not fast enough.
>
>The primary cache of R4000 is virtually indexed and physically tagged.
>That is, it can't map different virtual addresses to a physical address.
>
>							Masataka Ohta

I quote from the book "MIPS Risc Architecture" by Gerry Kane, Prentice Hall,
1989, page 4-1.

	"The mapping of these extended, process-unique virtual addresses to
	physical addresses need not be one-to-one; virtual addresses of two
	or more different processes may map to the same physical address."

Tony Shaughnessy
tonys@pyra.co.uk

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/21/91)

In article <674816585.AA7847@flaccid>
	tonys@pyra.co.uk (Tony Shaughnessy) writes:

>>The primary cache of R4000 is virtually indexed and physically tagged.
>>That is, it can't map different virtual addresses to a physical address.

>I quote from the book "MIPS Risc Architecture" by Gerry Kane, Prentice Hall,
>1989, page 4-1.

Read the book. It's for R2000/R3000. Even on the page 4-1, the word "R2000"
appears six times.

But, you are better than others who post based only on their imagination and
still require me, who post based on facts such as measurement figures and
source code of 4.3BSD, to post based on facts.

>	"The mapping of these extended, process-unique virtual addresses to
>	physical addresses need not be one-to-one; virtual addresses of two
>	or more different processes may map to the same physical address."

Compared to R4000, R2000/R3000 are slower CPUs.

						Masataka Ohta

mcnally@wsl.dec.com (Mike McNally) (05/21/91)

In article <219@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
|> 
|> But, you are better than others who post based only on their imagination and
|> still require me, who post based on facts such as measurement figures and
|> source code of 4.3BSD, to post based on facts.
|> 

What does 4.3BSD source code have to do with R4000 architecture?

And anyway, all that you need to deal with the problem of a virtually indexed
cache is to force shared objects to map in at address boundaries bigger than
the cache size.  They don't need to be static.

-- 
* "In the Spirit as my automatics,  *                              Mike McNally
* Lystra and Zelda were one third   *                                    Coolie
* as large as the infinite Cosmos." *                  DEC Western Software Lab
*              --- D. S. Ashwander  *                       mcnally@wsl.dec.com 

amolitor@eagle.wesleyan.edu (05/22/91)

In article <215@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
> As the architechture can not map them, a possible workaround is to flush
> cache/page-table by software at context switch.
> 
	Umm. You generally have to flush this at context switch time anyway,
when you switch the memory map around. Is the phrase 'by software' meaningful
here? I haven't looked at address translation hardware in some years.

	Before saying that sharable libraries are only possible on slow
hardware, I suggest taking a look at the Vax architecture. I would hardly refer
to a vax 9000 as slow, and I point out that it uses sharable libraries.
Further, it is a trivial exercise to sketch a perfectly reasonable
machine/software configuration in which the use of sharable libraries
saves many hundreds of megabytes, or more, of disk.

	Incidentally, my thanks to Mr. Ohta for providing a little levity and
humor in this newsgroup.

	Andrew

> 
> 						Masataka Ohta

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/22/91)

In article <1991May20.175555.13943@batcomputer.tn.cornell.edu>
	shore@theory.TC.Cornell.EDU (Melinda Shore) writes:

>In the proceedings of the Summer 1990 Usenix Conference (Anaheim) there
>are two papers describing different implementations of shared libraries.
>Both papers present results.  Both papers conclude that for programs not
>dominated by startup costs,

Marc Sabatella's paper gives data, 10% for ineffecient coding of library
and maximum of 10% of start up overhead with reasonably large programs.

Moreover, the measurement was done with 68030, which support various
address modes without much performance degradation (because it is already
slow).

>Donn Seeley's paper is
>particularly relevant,

His paper also make measurement with 68030, utilizing its address modes.

I don't say there result is useless. But they are not applicable to
the todays fastest machines.

>in that he's arguing that it is possible to
>have a shared library implementation that is both simple and fast.

See page 30, line 37-38,

	"The PIC implementation is the heart of this prototype"

Similar thing is written in "Conclusion" section, also.

As I already said, PIC (Position Independent Code) imposes several
restrictions to hardware, which many architectures can't obey.

>You just have to know what you're doing.

You had better read papers you referred.

						Masataka Ohta

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/22/91)

In article <1991May22.063425.26144@kithrup.COM>
	sef@kithrup.COM (Sean Eric Fagan) writes:

>>>Are there any architectures of interest in this discussion that can't
>>>support PC-relative references?
>>R3000.
>
>	_foo:
>		<entrycode>
>		jal	foo1$
>	foo1$:
>		addu	$at, $31, 0
>
>Guess what:  we can now do PC-relative references.  And this isn't even 
>the most efficient way to do it.

You poor boy, such an old trick is already known to me. I sometimes use
the trick if it is possible.

The problem here is that "jal" is not PC-relative.

You had better write PC-relative (not actually) reference as

<		la	$at,foo1$
<	foo1$:

It is as PC-relative as your trick, but is called immediate addressing.

Of course, both of above are unusable in PIC.

>Score one for our side.

Sigh.

						Masataka Ohta

sef@kithrup.COM (Sean Eric Fagan) (05/22/91)

In article <215@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>In article <7916@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>Are there any architectures of interest in this discussion that can't
>>support PC-relative references?
>R3000.

	_foo:
		<entrycode>
		jal	foo1$
	foo1$:
		addu	$at, $31, 0

Guess what:  we can now do PC-relative references.  And this isn't even 
the most efficient way to do it.

Score one for our side.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

richard@aiai.ed.ac.uk (Richard Tobin) (05/23/91)

In article <197@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>	1) coded position independently (PIC)

>If we take 1), the hardware architecture must support PC relative jump,
>of course. Moreover, to access library private data, it must also
>address data PC relative. Aside from effeciency, not all architechture
>support this.

Surely any form of indirect jump and access will be adequate, though
possibly less efficient?

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

guy@auspex.auspex.com (Guy Harris) (05/23/91)

>>You don't *have* to have PC-relative jumps and data access, although it is
>>convenient.
>
>No, I don't have to, but it is very inconvenient not to do so.

How inconvenient is it?  The main aggravation on, say, a System/3[679]0
for PC-relative jumps within a routine seem to me to be that

	1) you might have to do a BALR N,0 at the beginning of a routine
	   if the calling convention doesn't require that the address of
	   the routine be loaded by the caller (we're not just talking
	   IBM operating systems here; the convention used by some
	   particular UNIX flavor might or might not work that way);

	2) if the routine is larger than 4096 bytes, you might need more
	   than one base register;

but aren't those problems also present even with
*non*-position-independent code?

For PC-relative procedure calls, you're less likely to find the routine
within 4096 bytes - and, if the routine is external, you can't
necessarily know at compile time whether it's within 4096 bytes or not,
so you'd have to generate worst-case code in any case, so again the
problems would also seem to be present with non-position-independent
code.

>>When PC-relative addressing isn't available or usable, you just need
>>register+offset addressing, which most computers have.
>
>I was wrong here, yes, it is possible if we use indirect addressing to
>access global data, but it is slow.

But are references to global data common enough that the performance hit
is unacceptable?  Remember, even if your idea of "unacceptable" is
"greater than 0", not all of us share your idea of "unacceptable"....

>>The only tricky part is arranging for the register to be set
>>whenever an inter-module call or return takes place.
>
>The call overhead is six extra cycles with typical RISCs, whenever an
>inter-object-file (not inter-library) call-return takes place.

Well, a SPARC executes two "sethi" instructions and one "jmp", once the
link has been snapped; according to the cycle counts in the SPARC
Architecture Manual, Version 8, Appendix L, most implementations would
take 4, rather than 6, cycles for that, and the Matsushita MN10501 would
take 3 cycles.

Which *particular* "typical RISC" were you thinking of?

>It is not negligible when we are heavyly doing something like strcmp().

It depends on how long the strings are, and how heavily you're doing
"strcmp()".  Yes, there are cases where there's a large penalty, but
then there are also cases where a typical cache loses big, too.

>You may remember that the speed of Bnews was actually improved by
>in-lining the first part of strcmp(). In-lining of functions in
>shared libraries is, of course, impossible.

Well, in the version of Bnews we have here, that in-lining is done with
a "STRCMP()" macro, that checks the first two characters and, only if
they're not equal, calls "strcmp()".

Our Bnews programs are dynamically linked, and they have that in-lining;
"In-lining of functions in shared libraries" is, of course, *NOT*
"impossible", as demonstrated by that.

Perhaps you want to completely delete the Bnews example, as it doesn't
bolster your case, and change the statement following it to "in-lining
of functions in shared libraries cannot, of course, be done by the
compiler or compile-time linker"?

>>>Even worse, with some architechture, it is impossible to map several virtual
>>>addresses to a physical address. Virtually tagged cache and inverted
>>>page tables are notable examples.
>
>>Well, this kills any kind of shared text architecture, not just shared
>>libraries.
>
>You can always share text as usual UNIX box do, because it only requires
>to map a single virtual address of several different processes to a
>         ^^^^^^                            ^^^^^^^^^^^^^^^^^^^
>physical address.

Not necessarily.

In the Sun virtually-addressed cache, the "virtual address" includes a
context number; while the "virtual address" bits of the different
virtual addresses in the different processes are the same, the context
number bits aren't.

And, in the Sun virtually-addressed cache, the cache can handle aliases
that differ not only in the context number, but in the "virtual address"
bits, so the statement that "it is impossible to map several virtual
addresses to a physical address" with a virtually-tagged cache is, of
course, not true of the Sun cache.

It's not true of all inverted page table machines, either, cf. the RT PC
and RS/6000.

guy@auspex.auspex.com (Guy Harris) (05/23/91)

>You don't know about hardware enough. Because address translation is time
>consuming, fast cache is always indexed by virtual address. Thesedays,
>virtually indexed cache is quite common.
>
>So, if you want shared libraries, you can put it only on slower
machines.

And you don't know enough about virtual-addressed cache hardware, if you
think that they can't support shared libraries. 

goykhman_a@apollo.HP.COM (Alex Goykhman) (05/23/91)

In article <213@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>In article <MWM.91May17132439@raven.pa.dec.com>
>	mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
>
>>   Even worse, with some architechture, it is impossible to map several virtual
>>   addresses to a physical address. Virtually tagged cache and inverted
>>   page tables are notable examples.
>
>>So some architechtures can't support shared libraries? Well, don't put
>>shared libraries on them.
>
>That's what I am saying.

    I confess, I am not familiar enough with such marvels of computer architecture
    as the fifth generation and tron.  Perhaps, that is why I can not think
    of one that would make it "impossible to map several virtual addresses to a 
    physical address".   Could you name such an architecture?
>
>>Some architechtures can't support demand
>>paged memory, or virtual address spaces, or preemptive scheduling.
>>Does that mean we have to live without them on machines that can
>>support them?  No; it doesn't.
>
>You don't know about hardware enough. Because address translation is time
>consuming, fast cache is always indexed by virtual address. Thesedays,
>virtually indexed cache is quite common.

    So what?  Are you sure you understand the difference between a cache
    and a TLB?

[deleted]
>
>						Masataka Ohta

sef@kithrup.COM (Sean Eric Fagan) (05/23/91)

In article <225@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>You poor boy, such an old trick is already known to me. I sometimes use
>the trick if it is possible.
>The problem here is that "jal" is not PC-relative.

*sigh*
Fine.  How about:

	mov	1, $at
	bgezal	$at, foo1$
	nop
foo1$:
	mov	$r31, $at

*Now* $at has PC, and you can write your PIC code.

Happy?

The assembler and linker can conspire with everything else to produce PIC
code.

Score two for our side.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

guy@auspex.auspex.com (Guy Harris) (05/24/91)

>>Are there any architectures of interest in this discussion that can't
>>support PC-relative references?
>
>R3000.

Well, the R-series branch instructions are PC-relative.  The "jump"
instructions aren't, but unless you have to branch more than 32767 bytes
in either direction, you can use the branch instructions.

I'd also assumed that by "PC-relative" you included references relative
to, say, the PC of the beginning of the routine; you obviously don't
need to have *all* references be relative to the PC of the referencing
instruction.  The PC can be loaded into a register by doing a BGEZAL
with the register being tested being r0.  This involves some shuffling
of registers, but I think MIPS's compiler can deal with that....

>>If you believe that a system with a virtual-address cache, or a system
>>with inverted page tables, cannot map several virtual addresses to a
>>physical address, you're wrong.
>
>I am correct. It can't map them.

You're completely incorrect, because those systems can map them.

"Virtual Address Cache in UNIX", in the summer 1987 USENIX proceedings,
discusses how Sun does it with their virtual address cache.  The cache
will do alias checking if the different virtual addresses map to the
same cache line; the OS tries to align the virtual addresses (and
generally succeeds) so that the different virtual addresses will so map.

IBM does it with their inverted page table by, as I remember, giving the
page one virtual address within a large (>32 bit) virtual address space,
and then loading segment registers up in different processes to point to
the same virtual address; they can load different segment registers, so
that different second-level 32-bit virtual addresses refer to the same
first-level virtual address.  The "IBM RISC System/6000 Technology"
collection of papers should get you started on reading how they do it.

Now, go read those papers, and then either explain why those papers
don't tell the truth, or admit that you are NOT correct.

guy@auspex.auspex.com (Guy Harris) (05/24/91)

>Guess what:  we can now do PC-relative references.

Yes, but the "jal" in question isn't necessarily position-independent. 
Of course, you can do a BGEZAL with register 0, instead of a JAL, which
*is* position-independent.

<entrycode> presumably preserves the incoming value of $31, right?

meissner@osf.org (Michael Meissner) (05/25/91)

In article <1991May23.082658.4881@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:

| In article <225@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
| >You poor boy, such an old trick is already known to me. I sometimes use
| >the trick if it is possible.
| >The problem here is that "jal" is not PC-relative.
| 
| *sigh*
| Fine.  How about:
| 
| 	mov	1, $at
| 	bgezal	$at, foo1$
| 	nop
| foo1$:
| 	mov	$r31, $at

Well actually, the move of 1 to $at is unnessary, since you already
have 0 in $0, and the test is >= 0.

	
	.set	noreorder
	.set	noat
	bgezal	$0, foo1$
	nop
foo1$:
	mov	$r31, $at
	.set	at
	.set	reorder

--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

You are in a twisty little passage of standards, all conflicting.

guy@auspex.auspex.com (Guy Harris) (05/26/91)

>>In the proceedings of the Summer 1990 Usenix Conference (Anaheim) there
>>are two papers describing different implementations of shared libraries.
>>Both papers present results.  Both papers conclude that for programs not
>>dominated by startup costs,
>
>Marc Sabatella's paper gives data, 10% for ineffecient coding of library
>and maximum of 10% of start up overhead with reasonably large programs.

Marc Sabatella's paper says that the overhead of PIC is about 10%, and
also notes that since only the libraries are PIC, "this has a negligible
effect on the performance of most programs".  I'm curious whether that
statement is intended to apply to window-system programs or not; I
wouldn't be surprised if they didn't spend more time in library code.

>As I already said, PIC (Position Independent Code) imposes several
>restrictions to hardware, which many architectures can't obey.

Which architectures?  SPARC obviously isn't one of them, and HP-PA
isn't, either, as the HP folks also did their shared libraries on Series
800 machines.  So far, MIPS R-series doesn't seem to be one, either; its
branch instructions are position-independent, and it can do an
unconditional "branch to subroutine", so it can get the PC of the
beginning of the routine into a register with position-independent code.
The Motorola 88K isn't one, either; check out the System V Release 4 ABI
for the 88K.

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/27/91)

In article <7974@auspex.auspex.com>
	guy@auspex.auspex.com (Guy Harris) writes:

>>You may remember that the speed of Bnews was actually improved by
>>in-lining the first part of strcmp(). In-lining of functions in
>>shared libraries is, of course, impossible.

>Well, in the version of Bnews we have here, that in-lining is done with
>a "STRCMP()" macro, that checks the first two characters and, only if
>they're not equal, calls "strcmp()".

Yes, of course. Bnews is the real example showing significance of call
overhead.

>Our Bnews programs are dynamically linked, and they have that in-lining;
>"In-lining of functions in shared libraries" is, of course, *NOT*
>"impossible", as demonstrated by that.

STRCMP() is source code level inlining of strcmp(), *NOT* strcmp()
in a shared library.

>Perhaps you want to completely delete the Bnews example, as it doesn't
>bolster your case,

Not at all.

						Masataka Ohta

guy@auspex.auspex.com (Guy Harris) (05/28/91)

>STRCMP() is source code level inlining of strcmp(), *NOT* strcmp()
>in a shared library.

Yes, that's exactly what I said!  It's source-code-level inlining, which
works JUST FINE with "strcmp()" in a shared library.

guy@auspex.auspex.com (Guy Harris) (05/29/91)

>>	"The mapping of these extended, process-unique virtual addresses to
>>	physical addresses need not be one-to-one; virtual addresses of two
>>	or more different processes may map to the same physical address."
>
>Compared to R4000, R2000/R3000 are slower CPUs.

So what?  Are you saying that the virtually indexed, physically tagged
cache on the R4000, *unlike* the virtually indexed, physically tagged
cache on the R3000, is unable to support having different virtual
addresses mapped to the same physical address?  (It's already been
demonstrated, by the quote above, that your previous categorical assertions
that a virtually indexed, physically tagged cache *can't* support
different virtual addresses mapped to the same physical address is
completely and utterly untrue.)

If you're asserting that, you'd better offer some evidence; I doubt
anybody in the audience is going to take your word for it.  Given that
MIPS presumably has the intention of preserving compatibility between
earlier R-series chips and the R4000, including the same ability to
support OS features such as "mmap()" and shareable libraries (present
both in S5R4 and OSF/1), the burden of proof is *entirely* upon *you* to
demonstrate that it *can't* support it - and to demonstrate so by citing
statements from MIPS that it can't, not by waving your hands.

meissner@osf.org (Michael Meissner) (05/29/91)

In article <8029@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris)
writes:

| >As I already said, PIC (Position Independent Code) imposes several
| >restrictions to hardware, which many architectures can't obey.
| 
| Which architectures?  SPARC obviously isn't one of them, and HP-PA
| isn't, either, as the HP folks also did their shared libraries on Series
| 800 machines.  So far, MIPS R-series doesn't seem to be one, either; its
| branch instructions are position-independent, and it can do an
| unconditional "branch to subroutine", so it can get the PC of the
| beginning of the routine into a register with position-independent code.
| The Motorola 88K isn't one, either; check out the System V Release 4 ABI
| for the 88K.

The MIPS branch instructions are PC-relative, but are limited to +/-
128K range.  This obviously might cause problems with some fortran
applications.....
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

You are in a twisty little passage of standards, all conflicting.

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/30/91)

In article <8054@auspex.auspex.com>
	guy@auspex.auspex.com (Guy Harris) writes:

>>STRCMP() is source code level inlining of strcmp(), *NOT* strcmp()
>>in a shared library.

>Yes, that's exactly what I said!  It's source-code-level inlining, which
>works JUST FINE with "strcmp()" in a shared library.

No. You said:

>"In-lining of functions in shared libraries" is, of course, *NOT*
>"impossible", as demonstrated by that.

while I said:

:In-lining of functions in shared libraries is, of course, impossible.

As you agree now, strcmp() in a shared library is not in-lined.

In article <8057@auspex.auspex.com>
	guy@auspex.auspex.com (Guy Harris) writes:

>I've indicated *several times* how that not only *can* be done, but how
>t *is* done on Suns, in the case of virtual address caches:
>
>	ensure that all the virtual addresses get mapped to the same
>	cache line by aligning the mappings properly

See <244@titccy.cc.titech.ac.jp>:

:The problem is that, to support shared libraries, strict PIC is not required.
:
:Instead, it is required that the same code runs if the relocation is
:multiple of some constant.

In this case, cache size can be the constant.

>	have the virtual addresses in the inverted page table be
>	addresses in the *global* virtual address space, because a given
>	page has only one virtual address in *that* space;

In this case, segment size of the global virtual address space can be the
constant.

							Masataka Ohta

guy@auspex.auspex.com (Guy Harris) (06/01/91)

>As you agree now, strcmp() in a shared library is not in-lined.

I agree that if the compiler doesn't treat "strcmp()" specially - e.g.,
by having a header define "strcmp(a, b)" as "_builtin_strcmp(a, b)", and
generating, say, code for a call to "_builtin_strcmp(a, b)" that
compares the first two characters of "a" and "b" and, only if they're
not equal, calling the "strcmp()" routine starting at one character into
the strings - the compiler can't automatically in-line code in
"strcmp()".

However, the "STRCMP()" *macro* that appears in B news will work just fine
with a "strcmp()" in a shared library.  You cited B news as an
example of a place where inlining is a win; that particular example
doesn't require unshared libraries to get that win.

>>:The problem is that, to support shared libraries, strict PIC is not required.
>>:
>>:Instead, it is required that the same code runs if the relocation is
>>:multiple of some constant.
>>
>>In this case, cache size can be the constant.

I take it you're finally agreeing that a given physical page can be
mapped into different virtual addresses in different processes, even if
you have a virtually-addressed cache or inverted page tables?

There are two separate issues here, which you're mixing together:

1) the issue of code that will run regardless of what its virtual
   address is, and that doesn't have to be modified to run at a
   different address;

2) the issue of mapping the same physical page into different virtual
   addresses within different processes.

The first issue is what *I* consider the issue of position-independent
code; it's already been demonstrated that all the major 32-bit
microprocessor architectures can handle that, as can various other
32-bit architectures such as System/3[679]0, VAX, etc..

The second issue is the issue of cache aliasing; in order to effectively
*use* position-independent code on a system with virtual addressing and
a cache that's not purely physically addressed, you have to be able to
deal with cache aliasing.  Making sure that all the virtual addresses
are the same modulo the cache size solves the problem on a lot of caches
(Sun, Cypress 7C60[45], MIPS R4000, among others); the caches with
virtual, rather than physical, tags also have to do some alias checking.

I.e., virtual indexing of caches, and even virtual *tagging* of caches,
isn't a barrier to using position-independent code.

The same is true of systems such as the RS/6000 with inverted page
tables; the RS/6000's scheme handles that.

No, you can't put the shareable object at arbitrary locations in the
address spaces of the processes and leave them cacheable in a
virtually-indexed cache.  Nobody was saying that you could, and I
sincerely *hope* nobody was claiming that the fact that you couldn't was
at *all* a major obstacle to implementing position-independent shareable
code objects!

ske@pkmab.se (Kristoffer Eriksson) (06/01/91)

In article <209@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
>In article <1991May16.200702.7476@Think.COM> barmar@think.com writes:
>>When PC-relative addressing isn't available or usable, you just need
>>register+offset addressing, which most computers have.
>
>I was wrong here, yes, it is possible if we use indirect addressing to
>access global data, but it is slow.

Why would register-relative addressing be any slower than PC-relative
addressing?

> In-lining of functions in shared libraries is, of course, impossible.

What nonsense. If you inline such a function, you simply don't reference
the version of the function in the library any more, since inlining it has
already put a copy of it at the place of the reference.
-- 
Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden
Phone: +46 19-13 03 60  !  e-mail: ske@pkmab.se
Fax:   +46 19-11 51 03  !  or ...!{uunet,mcsun}!sunic.sunet.se!kullmar!pkmab!ske

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (06/03/91)

In article <8144@auspex.auspex.com>
	guy@auspex.auspex.com (Guy Harris) writes:

>You cited B news as an
>example of a place where inlining is a win; that particular example
>doesn't require unshared libraries to get that win.

Don't distort what I said.

See <246@titccy.cc.titech.ac.jp>:

:Yes, of course. Bnews is the real example showing significance of call
:overhead.

I cited B news as the real example showing significance of call overhead.

>There are two separate issues here, which you're mixing together:
>
>1) the issue of code that will run regardless of what its virtual
>   address is, and that doesn't have to be modified to run at a
>   different address;
>
>2) the issue of mapping the same physical page into different virtual
>   addresses within different processes.

I am not mixing them. Both issues have nothing to do with the current
discussion now.

>I
>sincerely *hope* nobody was claiming that the fact that you couldn't was
>at *all* a major obstacle to implementing position-independent shareable
>code objects!

What you don't and I didn't understand is position-independent code is
not necessary for shared libraries. Roughly-position-independent code
is enough.

							Masataka Ohta

guy@auspex.auspex.com (Guy Harris) (06/04/91)

>:Yes, of course. Bnews is the real example showing significance of call
>:overhead.
>
>I cited B news as the real example showing significance of call overhead.

Umm, if call overhead is significant, inlining is a win, right?

The trick here is that in B news, the string comparison operation is
partially inlined by using a macro; that form of inlining works just
fine with shared libraries.

>>There are two separate issues here, which you're mixing together:
>>
>>1) the issue of code that will run regardless of what its virtual
>>   address is, and that doesn't have to be modified to run at a
>>   different address;
>>
>>2) the issue of mapping the same physical page into different virtual
>>   addresses within different processes.
>
>I am not mixing them.

Yes, you're continuing to mix them.  See below.

>>I
>>sincerely *hope* nobody was claiming that the fact that you couldn't was
>>at *all* a major obstacle to implementing position-independent shareable
>>code objects!
>
>What you don't and I didn't understand is position-independent code is
>not necessary for shared libraries. Roughly-position-independent code
>is enough.

See, you're still mixing them!

The first issue is, as stated, the one of making code that runs
regardless of what address it's located at.  On most if not all of the
major architectures on which UNIX runs, that can be done, and that code
is *fully* position-independent - you could move it by some minimal
amount (the actual amount depends on the alignment requirements for
various instructions).

In practice, on a system with address mapping, in order to share them
they have to be put on page or segment boundaries; if they're put on
page boundaries, they can only be relocated by an integral number of
pages - but that has nothing to do with the way the code was made
position-independent.

The second issue is the one of making the code be cacheable if you map
it in at different addresses on a machine with a virtually-indexed cache
(whether virtually or physically tagged; both can deal with aliases,
although virtually-tagged caches have to work a little harder at it), or
making it shareable without having to shuffle the page map on a context
switch on a machine with inverted page tables.  That issue means that
the alignment requirements on the code are stricter, e.g. aligning all
the virtual addresses so that the cache tags for a given location are
the same in all address spaces.

If you *don't* do that, the code will still *work* just fine, because
the code is fully position-independent, not "roughly
position-independent"; it'll just run slower because you'll have to mark
it non-cacheable.

There may well be architectures on which the code can't be made
fully-position-independent, i.e. such that it can't be made to run *at
all* unless the position of the code is only adjusted by e.g. a segment
size; however, that's not true of the 68K, the 88K, SPARC, MIPS, the
386andup, the VAX, or the IBM 3[679]0 - I didn't bother buying the WE32K
or i860 S5R4 ABI books, so I didn't see whether they do
fully-position-independent code or not.  Shared libraries could probably
be done on such an architecture, assuming the alignment requirements
aren't *too* strict.  However, given that the high-volume architectures
don't have that problem, and given that I don't work on any low-volume
architectures that have that problem, I didn't spend any energy worrying
about it.

mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (06/04/91)

In article <8167@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:

>>:Yes, of course. Bnews is the real example showing significance of call
>>:overhead.

>>I cited B news as the real example showing significance of call overhead.
>
>Umm, if call overhead is significant, inlining is a win, right?

I don't care. It has nothing to do with the current discussion. You may
post whatever you believe, but, don't distort what I said.

>>>There are two separate issues here, which you're mixing together:

>>I am not mixing them.
>
>Yes, you're continuing to mix them.  See below.

No. And, again, these points have nothing to do with the current discussion
on shared libraries.

							Masataka Ohta