mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/16/91)
In article <1991May16.002617.15386@ladc.bull.com> fmayhar@hermes.ladc.bull.com writes: >There may actually not be any "right" implementations extant at the >moment (this is debatable), but that's not the point. Without any fact, your claim is nothing. Let's see what's wrong with shared libraries. >-> Indirect jumps and accompanied process private data for the jump table. > >So what would be a better way to do it? > >Really, there's a tradeoff between the utility of shared libraries and >efficiency. Effeciency is only one aspect of the problem. To share libraries, they should be: 1) coded position independently (PIC) or 2) assigne static virtual address If we take 1), the hardware architecture must support PC relative jump, of course. Moreover, to access library private data, it must also address data PC relative. Aside from effeciency, not all architechture support this. Note that, library private data is inevitable to support calls between libraries, position independently. Even worse, with some architechture, it is impossible to map several virtual addresses to a physical address. Virtually tagged cache and inverted page tables are notable examples. If we take 2), even if you have enough address space to map all libraries (32 bits is obviously not enough, I even think 48 bits is not), it will be nightmare to maintain consictency. Different libraries must have different addresses, of course, which is already non-trivial. Moreover, compatible libraries must have the same address, whose scheme will be very complex, even though it exists. Even worse, if a program is linked with libraries A.0 and B.0 and the other program is linked with libraries A.0 and B.1 (an upgraded version of B.0) and a function in A.0 calls a funtion in B.*, it can't. As a workaround, we can have two versoins of A.0: A.0.B.0 and A.0.B.1. Thus, with the increase of number of kind of libraries, the number of libraries and required storage grows exponentially. I hope you can now understand how complex the shared library is. The fundamental solution is, of course, not to have shared libraries. Masataka Ohta
krey@ira.uka.de (Andreas Krey) (05/16/91)
In article <197@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: [about shared libraries, not always completely correct] |> |> I hope you can now understand how complex the shared library is. |> |> The fundamental solution is, of course, not to have shared libraries. |> |> Masataka Ohta We now all see how complex computers are. The fundamental solution is, of course, not to have computers. And, why do you share text pages of statically linked programs? Seems to be a similar problem and unnecessarily complicating operating systems. Do you ever rum more than one instance of any program at one? -- Andy
barmar@think.com (Barry Margolin) (05/17/91)
In article <197@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >To share libraries, they should be: > 1) coded position independently (PIC) >or > 2) assigne static virtual address > >If we take 1), the hardware architecture must support PC relative jump, >of course. Moreover, to access library private data, it must also >address data PC relative. Aside from effeciency, not all architechture >support this. You don't *have* to have PC-relative jumps and data access, although it is convenient. The Multics compiler uses it when it can, but I think PC-relative instructions have a relatively small limit on the offset. When PC-relative addressing isn't available or usable, you just need register+offset addressing, which most computers have. On Multics, one of the pointer registers by convention holds the address of the base of the currently executing segment, and PIC simply offsets from this. On a Unix system, it would simply be a pointer to the location where the library is mmap'ed. The only tricky part is arranging for the register to be set whenever an inter-module call or return takes place. >Even worse, with some architechture, it is impossible to map several virtual >addresses to a physical address. Virtually tagged cache and inverted >page tables are notable examples. Well, this kills any kind of shared text architecture, not just shared libraries. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
mcnally@wsl.dec.com (Mike McNally) (05/17/91)
In article <197@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: |> Effeciency is only one aspect of the problem. |> |> To share libraries, they should be: |> |> 1) coded position independently (PIC) |> |> or |> |> 2) assigne static virtual address |> |> If we take 1), the hardware architecture must support PC relative jump, |> of course. No No No No No No No. All that's needed is indirect jump. Pull your head out before being so dogmatic. |> Moreover, to access library private data, it must also |> address data PC relative. Aside from effeciency, not all architechture |> support this. Once again, indirection is all that's needed. |> Note that, library private data is inevitable to support calls between |> libraries, position independently. So? |> Even worse, with some architechture, it is impossible to map several virtual |> addresses to a physical address. Virtually tagged cache and inverted |> page tables are notable examples. OK fine. Maybe we shouldn't have inter-process memory protection since not all architectures support it. Hell, better dump floating-point too, since my 8085 machine at home doesn't have it. |> |> If we take 2), even if you have enough address space to map all libraries |> (32 bits is obviously not enough, I even think 48 bits is not), it will |> be nightmare to maintain consictency. Different libraries must |> have different addresses, of course, which is already non-trivial. Gee, Masataka, maybe you should re-state your argument: "*I* don't know how to solve the problems of shared libraries to *my own* satisfaction based on *my own* dogmatic criteria, and so *I* won't implement shared libraries, nor will I touch any system which uses them." Sheesh. |> |> Moreover, compatible libraries must have the same address, whose scheme |> will be very complex, even though it exists. What does this mean? |> |> Even worse, if a program is linked with libraries A.0 and B.0 and the |> other program is linked with libraries A.0 and B.1 (an upgraded version |> of B.0) and a function in A.0 calls a funtion in B.*, it can't. Why not? Granted, both versions of B will have to be loaded, but "can't"? I ask you to "prove" that; please try to be a bit more rigorous when giving proofs, too. It's not enough to say "I propose P, I proved P, QED." |> I hope you can now understand how complex the shared library is. I now understand that you don't know how to implement shared libraries. -- * "In the Spirit as my automatics, * Mike McNally * Lystra and Zelda were one third * Coolie * as large as the infinite Cosmos." * DEC Western Software Lab * --- D. S. Ashwander * mcnally@wsl.dec.com
jeremy@sw.oz.au (Jeremy Fitzhardinge) (05/17/91)
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >In article <1991May16.002617.15386@ladc.bull.com> > fmayhar@hermes.ladc.bull.com writes: > >>There may actually not be any "right" implementations extant at the >>moment (this is debatable), but that's not the point. > >Without any fact, your claim is nothing. > >Let's see what's wrong with shared libraries. > >>-> Indirect jumps and accompanied process private data for the jump table. >> >>So what would be a better way to do it? >> >>Really, there's a tradeoff between the utility of shared libraries and >>efficiency. > >Effeciency is only one aspect of the problem. > >To share libraries, they should be: > > 1) coded position independently (PIC) > >or > > 2) assigne static virtual address > >If we take 2), even if you have enough address space to map all libraries >(32 bits is obviously not enough, I even think 48 bits is not), it will >be nightmare to maintain consictency. Different libraries must >have different addresses, of course, which is already non-trivial. > >Moreover, compatible libraries must have the same address, whose scheme >will be very complex, even though it exists. > >Even worse, if a program is linked with libraries A.0 and B.0 and the >other program is linked with libraries A.0 and B.1 (an upgraded version >of B.0) and a function in A.0 calls a funtion in B.*, it can't. As >a workaround, we can have two versoins of A.0: A.0.B.0 and A.0.B.1. >Thus, with the increase of number of kind of libraries, the number of >libraries and required storage grows exponentially. Why not relocate a library into a virtual address when it is loaded, so that you dont need to assign an address when the library is made. It is quite compact and efficent to store relocation information and apply it as you load. An executable could use some key to tell the OS what library it needs and the OS returns the virtual address of a library, either just loaded or loaded for something else. The program can then relocate its library calls to the address. The library could either use a jump table or have real symbolic information (which is nicer, I think). >I hope you can now understand how complex the shared library is. > >The fundamental solution is, of course, not to have shared libraries. Multitasking gets pretty complex too - is it worth the effort? -- jeremy@softway.sw.oz.au ph:+61 2 698 2322-x122 fax:+61 2 699 9174 "Hi Barbie, I'm your plastique surgeon, Roger. Are you ready for your Semtex augmentation?"... "John Thompson died for you" society meets now. I opine for the fjords, nothing else.
mikes@ingres.com (Mike Schilling) (05/17/91)
From article <197@titccy.cc.titech.ac.jp>, by mohta@necom830.cc.titech.ac.jp (Masataka Ohta): > ... > I hope you can now understand how complex the shared library is. > > The fundamental solution is, of course, not to have shared libraries. > > Masataka Ohta Since VMS has had very functional shared libraries for over 10 years now, I have to consider this an overstatement. Mike ---------------------------------------------------------------------------- mikes@rtech.com = Mike Schilling, ASK Corporation, Alameda, CA Just machines that make big decisions, Programmed by fellows with compassion and vision. -- Donald Fagen, "IGY"
mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) (05/18/91)
In article <197@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >There may actually not be any "right" implementations extant at the >moment (this is debatable), but that's not the point. Without any fact, your claim is nothing. Neither is yours. If we take 1), the hardware architecture must support PC relative jump, of course. Moreover, to access library private data, it must also address data PC relative. Aside from effeciency, not all architechture support this. Even worse, with some architechture, it is impossible to map several virtual addresses to a physical address. Virtually tagged cache and inverted page tables are notable examples. So some architechtures can't support shared libraries? Well, don't put shared libraries on them. Some architechtures can't support demand paged memory, or virtual address spaces, or preemptive scheduling. Does that mean we have to live without them on machines that can support them? No; it doesn't. I hope you can now understand how complex the shared library is. No, I understand that you aren't qualified to do systems design work. Using your logic, I can show that you can't do any of the things I mentioned above "correctly". They are still usefull in lots of places. The solution is not to "just not do them;" the solution is to understand them and the various implementations, and to know the tradeoffs involved in using those implementatins, and use them where it's appropriate. <mike -- Kiss me with your mouth. Mike Meyer Your love is better than wine. mwm@pa.dec.com But wine is all I have. decwrl!mwm Will your love ever be mine?
guy@auspex.auspex.com (Guy Harris) (05/20/91)
>If we take 1), the hardware architecture must support PC relative jump, >of course. Moreover, to access library private data, it must also >address data PC relative. Aside from effeciency, not all architechture >support this. Are there any architectures of interest in this discussion that can't support PC-relative references? >Even worse, with some architechture, it is impossible to map several virtual >addresses to a physical address. Virtually tagged cache and inverted >page tables are notable examples. If you believe that a system with a virtual-address cache, or a system with inverted page tables, cannot map several virtual addresses to a physical address, you're wrong. Proof by counterexample: 1) various flavors of Suns with virtual address caches, which all support mapping several virtual addresses to a physical address; 2) the IBM ROMP and RIOS architectures, which have inverted page tables and support mapping several virtual addresses to a physical address. They may have to go through some amount of pain to do so, but they *do* manage to do it.
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/20/91)
In article <1991May16.200702.7476@Think.COM> barmar@think.com writes: >> 1) coded position independently (PIC) >You don't *have* to have PC-relative jumps and data access, although it is >convenient. No, I don't have to, but it is very inconvenient not to do so. >When PC-relative addressing isn't available or usable, you just need >register+offset addressing, which most computers have. I was wrong here, yes, it is possible if we use indirect addressing to access global data, but it is slow. >The only tricky part is arranging for the register to be set >whenever an inter-module call or return takes place. The call overhead is six extra cycles with typical RISCs, whenever an inter-object-file (not inter-library) call-return takes place. It is not negligible when we are heavyly doing something like strcmp(). As I re-read the discussion, someone mentioned the possibility of speed up by globally optimizing compiler. But, that is unfair. With the same amount of optimizaiton, statically linked libraries can have better optimization such as in-lining. You may remember that the speed of Bnews was actually improved by in-lining the first part of strcmp(). In-lining of functions in shared libraries is, of course, impossible. >>Even worse, with some architechture, it is impossible to map several virtual >>addresses to a physical address. Virtually tagged cache and inverted >>page tables are notable examples. >Well, this kills any kind of shared text architecture, not just shared >libraries. You can always share text as usual UNIX box do, because it only requires to map a single virtual address of several different processes to a ^^^^^^ ^^^^^^^^^^^^^^^^^^^ physical address. Masataka Ohta
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/20/91)
In article <MWM.91May17132439@raven.pa.dec.com> mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) writes: > Even worse, with some architechture, it is impossible to map several virtual > addresses to a physical address. Virtually tagged cache and inverted > page tables are notable examples. >So some architechtures can't support shared libraries? Well, don't put >shared libraries on them. That's what I am saying. >Some architechtures can't support demand >paged memory, or virtual address spaces, or preemptive scheduling. >Does that mean we have to live without them on machines that can >support them? No; it doesn't. You don't know about hardware enough. Because address translation is time consuming, fast cache is always indexed by virtual address. Thesedays, virtually indexed cache is quite common. So, if you want shared libraries, you can put it only on slower machines. >No, I understand that you aren't qualified to do systems design work. You understand nothing. As you don't know enough about hardware, you aren't qualified to do systems design work. Masataka Ohta
jfh@rpp386.cactus.org (John F Haugh II) (05/20/91)
In article <209@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >The call overhead is six extra cycles with typical RISCs, whenever an >inter-object-file (not inter-library) call-return takes place. It is >not negligible when we are heavyly doing something like strcmp(). The CPU overhead to field an unneeded page fault caused by too many statically bound executables will dominate your little 6 cycle hit the first time it happens. Trust me. I'd rather have a slightly slowed down, CPU bound process, than a system thrashing about all day and night because it doesn't support shared libraries. -- John F. Haugh II | Distribution to | UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 255-8251 | GEnie PROHIBITED :-) | Domain: jfh@rpp386.cactus.org "If liberals interpreted the 2nd Amendment the same way they interpret the rest of the Constitution, gun ownership would be mandatory."
mcnally@wsl.dec.com (Mike McNally) (05/20/91)
In article <213@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: |> You don't know about hardware enough. Because address translation is time |> consuming, fast cache is always indexed by virtual address. Thesedays, |> virtually indexed cache is quite common. |> |> So, if you want shared libraries, you can put it only on slower machines. How about MIPS R3000/R4000? Maybe that's not fast enough. - * "In the Spirit as my automatics, * Mike McNally * Lystra and Zelda were one third * Coolie * as large as the infinite Cosmos." * DEC Western Software Lab * --- D. S. Ashwander * mcnally@wsl.dec.com
mjr@hussar.dco.dec.com (Marcus J. Ranum) (05/20/91)
This argument about shared libraries has gotten ridiculous. Let's be sensible about this. Does anyone have pointers to any hard numbers that might shed some light on the performance impact/benefits under a reasonable workload? We could argue forever, which is silly. mjr.
shore@theory.TC.Cornell.EDU (Melinda Shore) (05/21/91)
[] In the proceedings of the Summer 1990 Usenix Conference (Anaheim) there are two papers describing different implementations of shared libraries. Both papers present results. Both papers conclude that for programs not dominated by startup costs, the costs of dynamic loading are usually insignificant (obvious tautology ... ). Donn Seeley's paper is particularly relevant, in that he's arguing that it is possible to have a shared library implementation that is both simple and fast. You just have to know what you're doing. -- Software longa, hardware brevis Melinda Shore - Cornell Information Technologies - shore@theory.tn.cornell.edu
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/21/91)
In article <7916@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >Are there any architectures of interest in this discussion that can't >support PC-relative references? R3000. >If you believe that a system with a virtual-address cache, or a system >with inverted page tables, cannot map several virtual addresses to a >physical address, you're wrong. I am correct. It can't map them. >Proof by counterexample: >They may have to go through some amount of pain to do so, but they *do* >manage to do it. As the architechture can not map them, a possible workaround is to flush cache/page-table by software at context switch. It will slow down context switch. That is, if you have heavily interactive processes, as is often the case with window systems, the performance degradation will be much. Masataka Ohta
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/21/91)
In article <1991May20.090857@wsl.dec.com> mcnally@wsl.dec.com writes: >|> You don't know about hardware enough. Because address translation is time >|> consuming, fast cache is always indexed by virtual address. Thesedays, >|> virtually indexed cache is quite common. >|> So, if you want shared libraries, you can put it only on slower machines. >How about MIPS R3000/R4000? Maybe that's not fast enough. The primary cache of R4000 is virtually indexed and physically tagged. That is, it can't map different virtual addresses to a physical address. Masataka Ohta
mcnally@wsl.dec.com (Mike McNally) (05/21/91)
In article <216@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: |> In article <1991May20.090857@wsl.dec.com> mcnally@wsl.dec.com writes: |> >|> So, if you want shared libraries, you can put it only on slower machines. |> |> >How about MIPS R3000/R4000? Maybe that's not fast enough. |> |> The primary cache of R4000 is virtually indexed and physically tagged. |> That is, it can't map different virtual addresses to a physical address. Then I'm working on a project that I can't do, according to your "rule". Or, you consider the R3000/R4000 to be slow. I won't claim it's the fastest CPU in the world, but I don't know of too many reasonable people who'd say it's slow. -- * "In the Spirit as my automatics, * Mike McNally * Lystra and Zelda were one third * Coolie * as large as the infinite Cosmos." * DEC Western Software Lab * --- D. S. Ashwander * mcnally@wsl.dec.com
fmayhar@hermes.ladc.bull.com (Frank Mayhar) (05/21/91)
In article <197@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
-> In article <1991May16.002617.15386@ladc.bull.com> I write:
-> >There may actually not be any "right" implementations extant at the
-> >moment (this is debatable), but that's not the point.
-> Without any fact, your claim is nothing.
Let's see _you_ start including "facts" in your postings, then. So far all
I've seen is supposition and unsupported assertions.
-> Effeciency is only one aspect of the problem.
Not true. Regardless of the complexity of the implementation, the only
real tradeoff is between efficiency and utility. See my previous posts
regarding this.
-> To share libraries, they should be:
-> 1) coded position independently (PIC)
-> or
-> 2) assigne static virtual address
Granted, more or less.
-> If we take 1), the hardware architecture must support PC relative jump,
-> of course. Moreover, to access library private data, it must also
-> address data PC relative.
Not necessarily. While it may well need PC-relative transfers, data addressing
may use an different mechanism (probably also register-relative, but almost
certainly not PC-relative).
-> Aside from effeciency, not all architechture support this.
Examples? Certainly the 680x0 and the 80x86 support this, as well as most
mainframe architectures. I expect that almost any architecture would support
this. I mean, what's the difference between PC-relative addressing and any
other kind of register-relative addressing? And, as has already been stated,
PC-relative jumps aren't essential; other forms of indirect transfers work
as well.
-> Note that, library private data is inevitable to support calls between
-> libraries, position independently.
Not inevitable. It depends on the implementation; I can certainly imagine
an implementation that supports inter-library calls via use of automatic
storage ("stack" space). While technically this is "library private data,"
it doesn't have the implementation complexities that static storage does.
-> Even worse, with some architechture, it is impossible to map several virtual
-> addresses to a physical address. Virtually tagged cache and inverted
-> page tables are notable examples.
Perhaps these architectures aren't suitable for shared libraries. And, as
Barry Margolin said, if this is the case, _any_ kind of text sharing is
dead. IMHO, though, the concept of virtual memory implies the ability to
map a physical address to several virtual addresses. Show me one that doesn't
allow this, and I'll show you one that is almost useless for modern computing
purposes.
-> If we take 2), even if you have enough address space to map all libraries
-> (32 bits is obviously not enough, I even think 48 bits is not), it will
-> be nightmare to maintain consictency. Different libraries must
-> have different addresses, of course, which is already non-trivial.
How is 32 bits "obviously" not enough? Four gigabytes of address space isn't
enough? How big _are_ your programs, anyway? I agree that solving such
problems are nontrivial, but that doesn't mean that they aren't worthwhile.
-> Moreover, compatible libraries must have the same address, whose scheme
-> will be very complex, even though it exists.
I don't understand this. Your English is somewhat mangled. Care to explain?
-> Even worse, if a program is linked with libraries A.0 and B.0 and the
-> other program is linked with libraries A.0 and B.1 (an upgraded version
-> of B.0) and a function in A.0 calls a funtion in B.*, it can't.
Why not? Seems to me that it would depend on the context of the call: if
the call is happening in the first program, the call would be to B.0; if
in the second, the call would be to B.1. Regardless, this is an implementation
issue. Likely nontrivial, but solvable.
-> As
-> a workaround, we can have two versoins of A.0: A.0.B.0 and A.0.B.1.
-> Thus, with the increase of number of kind of libraries, the number of
-> libraries and required storage grows exponentially.
The above scheme would avoid this problem.
-> I hope you can now understand how complex the shared library is.
Oh, I quite understand that shared library _implementations_ are complex.
I hope that you can now understand that shared libraries are often _useful_,
regardless of the complexity of the implementation.
-> The fundamental solution is, of course, not to have shared libraries.
The fundamental solution is, of course, not to engage in pointless religious
arguments with dogmatic individuals who are closed to any ideas not their own.
--
Frank Mayhar fmayhar@hermes.ladc.bull.com (..!{uunet,hacgate}!ladcgw!fmayhar)
Bull HN Information Systems Inc. Los Angeles Development Center
5250 W. Century Blvd., LA, CA 90045 Phone: (213) 216-6241
tonys@pyra.co.uk (Tony Shaughnessy) (05/21/91)
In article <216@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >In article <1991May20.090857@wsl.dec.com> mcnally@wsl.dec.com writes: > >>|> You don't know about hardware enough. Because address translation is time >>|> consuming, fast cache is always indexed by virtual address. Thesedays, >>|> virtually indexed cache is quite common. > >>|> So, if you want shared libraries, you can put it only on slower machines. > >>How about MIPS R3000/R4000? Maybe that's not fast enough. > >The primary cache of R4000 is virtually indexed and physically tagged. >That is, it can't map different virtual addresses to a physical address. > > Masataka Ohta I quote from the book "MIPS Risc Architecture" by Gerry Kane, Prentice Hall, 1989, page 4-1. "The mapping of these extended, process-unique virtual addresses to physical addresses need not be one-to-one; virtual addresses of two or more different processes may map to the same physical address." Tony Shaughnessy tonys@pyra.co.uk
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/21/91)
In article <674816585.AA7847@flaccid> tonys@pyra.co.uk (Tony Shaughnessy) writes: >>The primary cache of R4000 is virtually indexed and physically tagged. >>That is, it can't map different virtual addresses to a physical address. >I quote from the book "MIPS Risc Architecture" by Gerry Kane, Prentice Hall, >1989, page 4-1. Read the book. It's for R2000/R3000. Even on the page 4-1, the word "R2000" appears six times. But, you are better than others who post based only on their imagination and still require me, who post based on facts such as measurement figures and source code of 4.3BSD, to post based on facts. > "The mapping of these extended, process-unique virtual addresses to > physical addresses need not be one-to-one; virtual addresses of two > or more different processes may map to the same physical address." Compared to R4000, R2000/R3000 are slower CPUs. Masataka Ohta
mcnally@wsl.dec.com (Mike McNally) (05/21/91)
In article <219@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: |> |> But, you are better than others who post based only on their imagination and |> still require me, who post based on facts such as measurement figures and |> source code of 4.3BSD, to post based on facts. |> What does 4.3BSD source code have to do with R4000 architecture? And anyway, all that you need to deal with the problem of a virtually indexed cache is to force shared objects to map in at address boundaries bigger than the cache size. They don't need to be static. -- * "In the Spirit as my automatics, * Mike McNally * Lystra and Zelda were one third * Coolie * as large as the infinite Cosmos." * DEC Western Software Lab * --- D. S. Ashwander * mcnally@wsl.dec.com
amolitor@eagle.wesleyan.edu (05/22/91)
In article <215@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: > As the architechture can not map them, a possible workaround is to flush > cache/page-table by software at context switch. > Umm. You generally have to flush this at context switch time anyway, when you switch the memory map around. Is the phrase 'by software' meaningful here? I haven't looked at address translation hardware in some years. Before saying that sharable libraries are only possible on slow hardware, I suggest taking a look at the Vax architecture. I would hardly refer to a vax 9000 as slow, and I point out that it uses sharable libraries. Further, it is a trivial exercise to sketch a perfectly reasonable machine/software configuration in which the use of sharable libraries saves many hundreds of megabytes, or more, of disk. Incidentally, my thanks to Mr. Ohta for providing a little levity and humor in this newsgroup. Andrew > > Masataka Ohta
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/22/91)
In article <1991May20.175555.13943@batcomputer.tn.cornell.edu> shore@theory.TC.Cornell.EDU (Melinda Shore) writes: >In the proceedings of the Summer 1990 Usenix Conference (Anaheim) there >are two papers describing different implementations of shared libraries. >Both papers present results. Both papers conclude that for programs not >dominated by startup costs, Marc Sabatella's paper gives data, 10% for ineffecient coding of library and maximum of 10% of start up overhead with reasonably large programs. Moreover, the measurement was done with 68030, which support various address modes without much performance degradation (because it is already slow). >Donn Seeley's paper is >particularly relevant, His paper also make measurement with 68030, utilizing its address modes. I don't say there result is useless. But they are not applicable to the todays fastest machines. >in that he's arguing that it is possible to >have a shared library implementation that is both simple and fast. See page 30, line 37-38, "The PIC implementation is the heart of this prototype" Similar thing is written in "Conclusion" section, also. As I already said, PIC (Position Independent Code) imposes several restrictions to hardware, which many architectures can't obey. >You just have to know what you're doing. You had better read papers you referred. Masataka Ohta
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/22/91)
In article <1991May22.063425.26144@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >>>Are there any architectures of interest in this discussion that can't >>>support PC-relative references? >>R3000. > > _foo: > <entrycode> > jal foo1$ > foo1$: > addu $at, $31, 0 > >Guess what: we can now do PC-relative references. And this isn't even >the most efficient way to do it. You poor boy, such an old trick is already known to me. I sometimes use the trick if it is possible. The problem here is that "jal" is not PC-relative. You had better write PC-relative (not actually) reference as < la $at,foo1$ < foo1$: It is as PC-relative as your trick, but is called immediate addressing. Of course, both of above are unusable in PIC. >Score one for our side. Sigh. Masataka Ohta
sef@kithrup.COM (Sean Eric Fagan) (05/22/91)
In article <215@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >In article <7916@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>Are there any architectures of interest in this discussion that can't >>support PC-relative references? >R3000. _foo: <entrycode> jal foo1$ foo1$: addu $at, $31, 0 Guess what: we can now do PC-relative references. And this isn't even the most efficient way to do it. Score one for our side. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
richard@aiai.ed.ac.uk (Richard Tobin) (05/23/91)
In article <197@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: > 1) coded position independently (PIC) >If we take 1), the hardware architecture must support PC relative jump, >of course. Moreover, to access library private data, it must also >address data PC relative. Aside from effeciency, not all architechture >support this. Surely any form of indirect jump and access will be adequate, though possibly less efficient? -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
guy@auspex.auspex.com (Guy Harris) (05/23/91)
>>You don't *have* to have PC-relative jumps and data access, although it is >>convenient. > >No, I don't have to, but it is very inconvenient not to do so. How inconvenient is it? The main aggravation on, say, a System/3[679]0 for PC-relative jumps within a routine seem to me to be that 1) you might have to do a BALR N,0 at the beginning of a routine if the calling convention doesn't require that the address of the routine be loaded by the caller (we're not just talking IBM operating systems here; the convention used by some particular UNIX flavor might or might not work that way); 2) if the routine is larger than 4096 bytes, you might need more than one base register; but aren't those problems also present even with *non*-position-independent code? For PC-relative procedure calls, you're less likely to find the routine within 4096 bytes - and, if the routine is external, you can't necessarily know at compile time whether it's within 4096 bytes or not, so you'd have to generate worst-case code in any case, so again the problems would also seem to be present with non-position-independent code. >>When PC-relative addressing isn't available or usable, you just need >>register+offset addressing, which most computers have. > >I was wrong here, yes, it is possible if we use indirect addressing to >access global data, but it is slow. But are references to global data common enough that the performance hit is unacceptable? Remember, even if your idea of "unacceptable" is "greater than 0", not all of us share your idea of "unacceptable".... >>The only tricky part is arranging for the register to be set >>whenever an inter-module call or return takes place. > >The call overhead is six extra cycles with typical RISCs, whenever an >inter-object-file (not inter-library) call-return takes place. Well, a SPARC executes two "sethi" instructions and one "jmp", once the link has been snapped; according to the cycle counts in the SPARC Architecture Manual, Version 8, Appendix L, most implementations would take 4, rather than 6, cycles for that, and the Matsushita MN10501 would take 3 cycles. Which *particular* "typical RISC" were you thinking of? >It is not negligible when we are heavyly doing something like strcmp(). It depends on how long the strings are, and how heavily you're doing "strcmp()". Yes, there are cases where there's a large penalty, but then there are also cases where a typical cache loses big, too. >You may remember that the speed of Bnews was actually improved by >in-lining the first part of strcmp(). In-lining of functions in >shared libraries is, of course, impossible. Well, in the version of Bnews we have here, that in-lining is done with a "STRCMP()" macro, that checks the first two characters and, only if they're not equal, calls "strcmp()". Our Bnews programs are dynamically linked, and they have that in-lining; "In-lining of functions in shared libraries" is, of course, *NOT* "impossible", as demonstrated by that. Perhaps you want to completely delete the Bnews example, as it doesn't bolster your case, and change the statement following it to "in-lining of functions in shared libraries cannot, of course, be done by the compiler or compile-time linker"? >>>Even worse, with some architechture, it is impossible to map several virtual >>>addresses to a physical address. Virtually tagged cache and inverted >>>page tables are notable examples. > >>Well, this kills any kind of shared text architecture, not just shared >>libraries. > >You can always share text as usual UNIX box do, because it only requires >to map a single virtual address of several different processes to a > ^^^^^^ ^^^^^^^^^^^^^^^^^^^ >physical address. Not necessarily. In the Sun virtually-addressed cache, the "virtual address" includes a context number; while the "virtual address" bits of the different virtual addresses in the different processes are the same, the context number bits aren't. And, in the Sun virtually-addressed cache, the cache can handle aliases that differ not only in the context number, but in the "virtual address" bits, so the statement that "it is impossible to map several virtual addresses to a physical address" with a virtually-tagged cache is, of course, not true of the Sun cache. It's not true of all inverted page table machines, either, cf. the RT PC and RS/6000.
guy@auspex.auspex.com (Guy Harris) (05/23/91)
>You don't know about hardware enough. Because address translation is time >consuming, fast cache is always indexed by virtual address. Thesedays, >virtually indexed cache is quite common. > >So, if you want shared libraries, you can put it only on slower machines. And you don't know enough about virtual-addressed cache hardware, if you think that they can't support shared libraries.
goykhman_a@apollo.HP.COM (Alex Goykhman) (05/23/91)
In article <213@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >In article <MWM.91May17132439@raven.pa.dec.com> > mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) writes: > >> Even worse, with some architechture, it is impossible to map several virtual >> addresses to a physical address. Virtually tagged cache and inverted >> page tables are notable examples. > >>So some architechtures can't support shared libraries? Well, don't put >>shared libraries on them. > >That's what I am saying. I confess, I am not familiar enough with such marvels of computer architecture as the fifth generation and tron. Perhaps, that is why I can not think of one that would make it "impossible to map several virtual addresses to a physical address". Could you name such an architecture? > >>Some architechtures can't support demand >>paged memory, or virtual address spaces, or preemptive scheduling. >>Does that mean we have to live without them on machines that can >>support them? No; it doesn't. > >You don't know about hardware enough. Because address translation is time >consuming, fast cache is always indexed by virtual address. Thesedays, >virtually indexed cache is quite common. So what? Are you sure you understand the difference between a cache and a TLB? [deleted] > > Masataka Ohta
sef@kithrup.COM (Sean Eric Fagan) (05/23/91)
In article <225@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >You poor boy, such an old trick is already known to me. I sometimes use >the trick if it is possible. >The problem here is that "jal" is not PC-relative. *sigh* Fine. How about: mov 1, $at bgezal $at, foo1$ nop foo1$: mov $r31, $at *Now* $at has PC, and you can write your PIC code. Happy? The assembler and linker can conspire with everything else to produce PIC code. Score two for our side. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
guy@auspex.auspex.com (Guy Harris) (05/24/91)
>>Are there any architectures of interest in this discussion that can't >>support PC-relative references? > >R3000. Well, the R-series branch instructions are PC-relative. The "jump" instructions aren't, but unless you have to branch more than 32767 bytes in either direction, you can use the branch instructions. I'd also assumed that by "PC-relative" you included references relative to, say, the PC of the beginning of the routine; you obviously don't need to have *all* references be relative to the PC of the referencing instruction. The PC can be loaded into a register by doing a BGEZAL with the register being tested being r0. This involves some shuffling of registers, but I think MIPS's compiler can deal with that.... >>If you believe that a system with a virtual-address cache, or a system >>with inverted page tables, cannot map several virtual addresses to a >>physical address, you're wrong. > >I am correct. It can't map them. You're completely incorrect, because those systems can map them. "Virtual Address Cache in UNIX", in the summer 1987 USENIX proceedings, discusses how Sun does it with their virtual address cache. The cache will do alias checking if the different virtual addresses map to the same cache line; the OS tries to align the virtual addresses (and generally succeeds) so that the different virtual addresses will so map. IBM does it with their inverted page table by, as I remember, giving the page one virtual address within a large (>32 bit) virtual address space, and then loading segment registers up in different processes to point to the same virtual address; they can load different segment registers, so that different second-level 32-bit virtual addresses refer to the same first-level virtual address. The "IBM RISC System/6000 Technology" collection of papers should get you started on reading how they do it. Now, go read those papers, and then either explain why those papers don't tell the truth, or admit that you are NOT correct.
guy@auspex.auspex.com (Guy Harris) (05/24/91)
>Guess what: we can now do PC-relative references.
Yes, but the "jal" in question isn't necessarily position-independent.
Of course, you can do a BGEZAL with register 0, instead of a JAL, which
*is* position-independent.
<entrycode> presumably preserves the incoming value of $31, right?
meissner@osf.org (Michael Meissner) (05/25/91)
In article <1991May23.082658.4881@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: | In article <225@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: | >You poor boy, such an old trick is already known to me. I sometimes use | >the trick if it is possible. | >The problem here is that "jal" is not PC-relative. | | *sigh* | Fine. How about: | | mov 1, $at | bgezal $at, foo1$ | nop | foo1$: | mov $r31, $at Well actually, the move of 1 to $at is unnessary, since you already have 0 in $0, and the test is >= 0. .set noreorder .set noat bgezal $0, foo1$ nop foo1$: mov $r31, $at .set at .set reorder -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 You are in a twisty little passage of standards, all conflicting.
guy@auspex.auspex.com (Guy Harris) (05/26/91)
>>In the proceedings of the Summer 1990 Usenix Conference (Anaheim) there >>are two papers describing different implementations of shared libraries. >>Both papers present results. Both papers conclude that for programs not >>dominated by startup costs, > >Marc Sabatella's paper gives data, 10% for ineffecient coding of library >and maximum of 10% of start up overhead with reasonably large programs. Marc Sabatella's paper says that the overhead of PIC is about 10%, and also notes that since only the libraries are PIC, "this has a negligible effect on the performance of most programs". I'm curious whether that statement is intended to apply to window-system programs or not; I wouldn't be surprised if they didn't spend more time in library code. >As I already said, PIC (Position Independent Code) imposes several >restrictions to hardware, which many architectures can't obey. Which architectures? SPARC obviously isn't one of them, and HP-PA isn't, either, as the HP folks also did their shared libraries on Series 800 machines. So far, MIPS R-series doesn't seem to be one, either; its branch instructions are position-independent, and it can do an unconditional "branch to subroutine", so it can get the PC of the beginning of the routine into a register with position-independent code. The Motorola 88K isn't one, either; check out the System V Release 4 ABI for the 88K.
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/27/91)
In article <7974@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>You may remember that the speed of Bnews was actually improved by >>in-lining the first part of strcmp(). In-lining of functions in >>shared libraries is, of course, impossible. >Well, in the version of Bnews we have here, that in-lining is done with >a "STRCMP()" macro, that checks the first two characters and, only if >they're not equal, calls "strcmp()". Yes, of course. Bnews is the real example showing significance of call overhead. >Our Bnews programs are dynamically linked, and they have that in-lining; >"In-lining of functions in shared libraries" is, of course, *NOT* >"impossible", as demonstrated by that. STRCMP() is source code level inlining of strcmp(), *NOT* strcmp() in a shared library. >Perhaps you want to completely delete the Bnews example, as it doesn't >bolster your case, Not at all. Masataka Ohta
guy@auspex.auspex.com (Guy Harris) (05/28/91)
>STRCMP() is source code level inlining of strcmp(), *NOT* strcmp() >in a shared library. Yes, that's exactly what I said! It's source-code-level inlining, which works JUST FINE with "strcmp()" in a shared library.
guy@auspex.auspex.com (Guy Harris) (05/29/91)
>> "The mapping of these extended, process-unique virtual addresses to >> physical addresses need not be one-to-one; virtual addresses of two >> or more different processes may map to the same physical address." > >Compared to R4000, R2000/R3000 are slower CPUs. So what? Are you saying that the virtually indexed, physically tagged cache on the R4000, *unlike* the virtually indexed, physically tagged cache on the R3000, is unable to support having different virtual addresses mapped to the same physical address? (It's already been demonstrated, by the quote above, that your previous categorical assertions that a virtually indexed, physically tagged cache *can't* support different virtual addresses mapped to the same physical address is completely and utterly untrue.) If you're asserting that, you'd better offer some evidence; I doubt anybody in the audience is going to take your word for it. Given that MIPS presumably has the intention of preserving compatibility between earlier R-series chips and the R4000, including the same ability to support OS features such as "mmap()" and shareable libraries (present both in S5R4 and OSF/1), the burden of proof is *entirely* upon *you* to demonstrate that it *can't* support it - and to demonstrate so by citing statements from MIPS that it can't, not by waving your hands.
meissner@osf.org (Michael Meissner) (05/29/91)
In article <8029@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: | >As I already said, PIC (Position Independent Code) imposes several | >restrictions to hardware, which many architectures can't obey. | | Which architectures? SPARC obviously isn't one of them, and HP-PA | isn't, either, as the HP folks also did their shared libraries on Series | 800 machines. So far, MIPS R-series doesn't seem to be one, either; its | branch instructions are position-independent, and it can do an | unconditional "branch to subroutine", so it can get the PC of the | beginning of the routine into a register with position-independent code. | The Motorola 88K isn't one, either; check out the System V Release 4 ABI | for the 88K. The MIPS branch instructions are PC-relative, but are limited to +/- 128K range. This obviously might cause problems with some fortran applications..... -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 You are in a twisty little passage of standards, all conflicting.
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (05/30/91)
In article <8054@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>STRCMP() is source code level inlining of strcmp(), *NOT* strcmp() >>in a shared library. >Yes, that's exactly what I said! It's source-code-level inlining, which >works JUST FINE with "strcmp()" in a shared library. No. You said: >"In-lining of functions in shared libraries" is, of course, *NOT* >"impossible", as demonstrated by that. while I said: :In-lining of functions in shared libraries is, of course, impossible. As you agree now, strcmp() in a shared library is not in-lined. In article <8057@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >I've indicated *several times* how that not only *can* be done, but how >t *is* done on Suns, in the case of virtual address caches: > > ensure that all the virtual addresses get mapped to the same > cache line by aligning the mappings properly See <244@titccy.cc.titech.ac.jp>: :The problem is that, to support shared libraries, strict PIC is not required. : :Instead, it is required that the same code runs if the relocation is :multiple of some constant. In this case, cache size can be the constant. > have the virtual addresses in the inverted page table be > addresses in the *global* virtual address space, because a given > page has only one virtual address in *that* space; In this case, segment size of the global virtual address space can be the constant. Masataka Ohta
guy@auspex.auspex.com (Guy Harris) (06/01/91)
>As you agree now, strcmp() in a shared library is not in-lined. I agree that if the compiler doesn't treat "strcmp()" specially - e.g., by having a header define "strcmp(a, b)" as "_builtin_strcmp(a, b)", and generating, say, code for a call to "_builtin_strcmp(a, b)" that compares the first two characters of "a" and "b" and, only if they're not equal, calling the "strcmp()" routine starting at one character into the strings - the compiler can't automatically in-line code in "strcmp()". However, the "STRCMP()" *macro* that appears in B news will work just fine with a "strcmp()" in a shared library. You cited B news as an example of a place where inlining is a win; that particular example doesn't require unshared libraries to get that win. >>:The problem is that, to support shared libraries, strict PIC is not required. >>: >>:Instead, it is required that the same code runs if the relocation is >>:multiple of some constant. >> >>In this case, cache size can be the constant. I take it you're finally agreeing that a given physical page can be mapped into different virtual addresses in different processes, even if you have a virtually-addressed cache or inverted page tables? There are two separate issues here, which you're mixing together: 1) the issue of code that will run regardless of what its virtual address is, and that doesn't have to be modified to run at a different address; 2) the issue of mapping the same physical page into different virtual addresses within different processes. The first issue is what *I* consider the issue of position-independent code; it's already been demonstrated that all the major 32-bit microprocessor architectures can handle that, as can various other 32-bit architectures such as System/3[679]0, VAX, etc.. The second issue is the issue of cache aliasing; in order to effectively *use* position-independent code on a system with virtual addressing and a cache that's not purely physically addressed, you have to be able to deal with cache aliasing. Making sure that all the virtual addresses are the same modulo the cache size solves the problem on a lot of caches (Sun, Cypress 7C60[45], MIPS R4000, among others); the caches with virtual, rather than physical, tags also have to do some alias checking. I.e., virtual indexing of caches, and even virtual *tagging* of caches, isn't a barrier to using position-independent code. The same is true of systems such as the RS/6000 with inverted page tables; the RS/6000's scheme handles that. No, you can't put the shareable object at arbitrary locations in the address spaces of the processes and leave them cacheable in a virtually-indexed cache. Nobody was saying that you could, and I sincerely *hope* nobody was claiming that the fact that you couldn't was at *all* a major obstacle to implementing position-independent shareable code objects!
ske@pkmab.se (Kristoffer Eriksson) (06/01/91)
In article <209@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: >In article <1991May16.200702.7476@Think.COM> barmar@think.com writes: >>When PC-relative addressing isn't available or usable, you just need >>register+offset addressing, which most computers have. > >I was wrong here, yes, it is possible if we use indirect addressing to >access global data, but it is slow. Why would register-relative addressing be any slower than PC-relative addressing? > In-lining of functions in shared libraries is, of course, impossible. What nonsense. If you inline such a function, you simply don't reference the version of the function in the library any more, since inlining it has already put a copy of it at the place of the reference. -- Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden Phone: +46 19-13 03 60 ! e-mail: ske@pkmab.se Fax: +46 19-11 51 03 ! or ...!{uunet,mcsun}!sunic.sunet.se!kullmar!pkmab!ske
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (06/03/91)
In article <8144@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >You cited B news as an >example of a place where inlining is a win; that particular example >doesn't require unshared libraries to get that win. Don't distort what I said. See <246@titccy.cc.titech.ac.jp>: :Yes, of course. Bnews is the real example showing significance of call :overhead. I cited B news as the real example showing significance of call overhead. >There are two separate issues here, which you're mixing together: > >1) the issue of code that will run regardless of what its virtual > address is, and that doesn't have to be modified to run at a > different address; > >2) the issue of mapping the same physical page into different virtual > addresses within different processes. I am not mixing them. Both issues have nothing to do with the current discussion now. >I >sincerely *hope* nobody was claiming that the fact that you couldn't was >at *all* a major obstacle to implementing position-independent shareable >code objects! What you don't and I didn't understand is position-independent code is not necessary for shared libraries. Roughly-position-independent code is enough. Masataka Ohta
guy@auspex.auspex.com (Guy Harris) (06/04/91)
>:Yes, of course. Bnews is the real example showing significance of call >:overhead. > >I cited B news as the real example showing significance of call overhead. Umm, if call overhead is significant, inlining is a win, right? The trick here is that in B news, the string comparison operation is partially inlined by using a macro; that form of inlining works just fine with shared libraries. >>There are two separate issues here, which you're mixing together: >> >>1) the issue of code that will run regardless of what its virtual >> address is, and that doesn't have to be modified to run at a >> different address; >> >>2) the issue of mapping the same physical page into different virtual >> addresses within different processes. > >I am not mixing them. Yes, you're continuing to mix them. See below. >>I >>sincerely *hope* nobody was claiming that the fact that you couldn't was >>at *all* a major obstacle to implementing position-independent shareable >>code objects! > >What you don't and I didn't understand is position-independent code is >not necessary for shared libraries. Roughly-position-independent code >is enough. See, you're still mixing them! The first issue is, as stated, the one of making code that runs regardless of what address it's located at. On most if not all of the major architectures on which UNIX runs, that can be done, and that code is *fully* position-independent - you could move it by some minimal amount (the actual amount depends on the alignment requirements for various instructions). In practice, on a system with address mapping, in order to share them they have to be put on page or segment boundaries; if they're put on page boundaries, they can only be relocated by an integral number of pages - but that has nothing to do with the way the code was made position-independent. The second issue is the one of making the code be cacheable if you map it in at different addresses on a machine with a virtually-indexed cache (whether virtually or physically tagged; both can deal with aliases, although virtually-tagged caches have to work a little harder at it), or making it shareable without having to shuffle the page map on a context switch on a machine with inverted page tables. That issue means that the alignment requirements on the code are stricter, e.g. aligning all the virtual addresses so that the cache tags for a given location are the same in all address spaces. If you *don't* do that, the code will still *work* just fine, because the code is fully position-independent, not "roughly position-independent"; it'll just run slower because you'll have to mark it non-cacheable. There may well be architectures on which the code can't be made fully-position-independent, i.e. such that it can't be made to run *at all* unless the position of the code is only adjusted by e.g. a segment size; however, that's not true of the 68K, the 88K, SPARC, MIPS, the 386andup, the VAX, or the IBM 3[679]0 - I didn't bother buying the WE32K or i860 S5R4 ABI books, so I didn't see whether they do fully-position-independent code or not. Shared libraries could probably be done on such an architecture, assuming the alignment requirements aren't *too* strict. However, given that the high-volume architectures don't have that problem, and given that I don't work on any low-volume architectures that have that problem, I didn't spend any energy worrying about it.
mohta@necom830.cc.titech.ac.jp (Masataka Ohta) (06/04/91)
In article <8167@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes: >>:Yes, of course. Bnews is the real example showing significance of call >>:overhead. >>I cited B news as the real example showing significance of call overhead. > >Umm, if call overhead is significant, inlining is a win, right? I don't care. It has nothing to do with the current discussion. You may post whatever you believe, but, don't distort what I said. >>>There are two separate issues here, which you're mixing together: >>I am not mixing them. > >Yes, you're continuing to mix them. See below. No. And, again, these points have nothing to do with the current discussion on shared libraries. Masataka Ohta