hutch@fps.com (Jim Hutchison) (01/16/90)
<> When a "thing" gets off the drawing board and into practice, it seems that there end up being one or two new features that people need almost immediately (in days/weeks/months). This is certainly understandable, for the sake of simplicity we could call it growth. Now, here we have the ABI(s) and people already want them to grow to include FPAs, Application Processors, and such. Sounds good. Unfortunately it seems that to add these neat things, and it gets called a "new" ABI. With many of the vendors making "new" ABIs for there new feature, it would seem that the purpose for the ABI will be lost. So clearly that result would be bad. So here is a thought I was toying with. How about allowing for library calls to trap and re-write there calls as the appropriate instruction? I can see some problems right away with handling shared pages neatly, and having space and parameters to re-write the call with. Also some fpa's need a setup call to get themselves in order when the game starts, this could probably be done in the "trap" handler presuming that it had a way to get the information it needed. It's a thought, what does anyone else think about this? -- /* Jim Hutchison {dcdwest,ucbvax}!ucsd!celerity!hutch */ /* Disclaimer: I am not an official spokesman for FPS computing */
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (01/16/90)
In article <6186@celit.fps.com> hutch@fps.com (Jim Hutchison) writes: | Now, here we have the ABI(s) and people already want them to grow to include | FPAs, Application Processors, and such. Sounds good. Unfortunately it seems | that to add these neat things, and it gets called a "new" ABI. That is the problem in a nutshell. | So here is a thought I was toying with. How about allowing for library calls | to trap and re-write there calls as the appropriate instruction? I can see | some problems right away with handling shared pages neatly, and having space | and parameters to re-write the call with. It could work if you have copy on write. Or maybe not, since the page will still be valid for all processes using it. I think this requires some careful though about doing stuff with the MMU and paging. The usual practice is to overwrite a code page rather than swap it, since it (usually) hasn't been modified. I suppose this wouldn't be a problem, since if a new, unmodified, copy came back in it would become a modified copy quickly. You would need to lock the page which writing it in a multi-CPU system, which might require a lot of MMU extensions. This is an old problem. The VAX gets around some of it by allowing "writable control store" (users defined microcode for certain instructions). A friend did a master's thesis based on implementing an FFT instruction in microcode. It was almost twice as fast as doing it with discrete instructions. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "Stupidity, like virtue, is its own reward" -me
desnoyer@apple.com (Peter Desnoyers) (01/17/90)
In article <2020@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > | So here is a thought I was toying with. How about allowing for library calls > | to trap and re-write there calls as the appropriate instruction? I can see > | some problems right away with handling shared pages neatly, and having space > | and parameters to re-write the call with. > This sounds like a variation on run-time loading. I'm not sure of the specifics, but I believe some variation on {link in call to loader, load function on first call, patch stub to jump to loaded function} is used in OS/2, the Macintosh, and TOPS-20. The patching is easier in this case, however, as you are always patching the same run-time load stub instead of arbitrary code. > It could work if you have copy on write. Or maybe not, since the page > will still be valid for all processes using it. I think this requires > some careful though about doing stuff with the MMU and paging. The usual > practice is to overwrite a code page rather than swap it, since it > (usually) hasn't been modified. I suppose this wouldn't be a problem, > since if a new, unmodified, copy came back in it would become a modified > copy quickly. If the code page is used infrequently enough that it gets swapped out, then whether or not the optimized version of the instruction gets used is a moot point. Peter Desnoyers Apple ATG (408) 974-4469
henry@utzoo.uucp (Henry Spencer) (01/17/90)
In article <2020@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: > This is an old problem. The VAX gets around some of it by allowing >"writable control store" (users defined microcode for certain >instructions). A friend did a master's thesis based on implementing an >FFT instruction in microcode. It was almost twice as fast as doing it >with discrete instructions. Which is actually fairly impressive, since the VAX WCS seems to have been an afterthought and there was substantial overhead involved in getting to it. I guess his FFT instruction did enough work to amortize the overhead pretty well. -- 1972: Saturn V #15 flight-ready| Henry Spencer at U of Toronto Zoology 1990: birds nesting in engines | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
mrc@Tomobiki-Cho.CAC.Washington.EDU (Mark Crispin) (01/17/90)
In article <6197@internal.Apple.COM> desnoyer@apple.com (Peter Desnoyers) writes: >This sounds like a variation on run-time loading. I'm not sure of the >specifics, but I believe some variation on {link in call to loader, load >function on first call, patch stub to jump to loaded function} is used in >[...] TOPS-20. Not that I know of, and I was a TOPS-20 OS programmer for 10 years. It was possible for user programs to do something like this, and I think the FORTRAN overlay system (used mostly on the older TOPS-10 operating system) did, but it was not at all general practice. TOPS-20 is a demand-paged virtual memory operating system. Usually your program would have everything lunk in with it, or call a well-defined library segment (usually a TOPS-10 style "high segment") that would be separately loaded. "Loading" an executable program (as opposed to linking relocatable binaries) merely involved setting the swap pointers for the appropriate process page(s) to that particular file on the disk; this made sharing of pure pages easily (and in fact impossible to avoid). _____ ____ ---+--- /-\ Mark Crispin Atheist & Proud _|_|_ _|_ || ___|__ / / 6158 Lariat Loop NE R90/6 pilot |_|_|_| /|\-++- |=====| / / Bainbridge Island, WA "Gaijin! Gaijin!" --|-- | |||| |_____| / \ USA 98110-2098 "Gaijin ha doko ka?" /|\ | |/\| _______ / \ +1 (206) 842-2385 "Niichan ha gaijin." / | \ | |__| / \ / \ mrc@CAC.Washington.EDU "Chigau. Gaijin ja nai. kisha no kisha ga kisha de kisha-shita Omae ha gaijin darou." sumomo mo momo, momo mo momo, momo ni mo iroiro aru "Iie, boku ha nihonjin." uraniwa ni wa niwa, niwa ni wa niwa niwatori ga iru "Souka. Yappari gaijin!"
ddb@ns.network.com (David Dyer-Bennet) (01/17/90)
In article <5344@blake.acs.washington.edu> mrc@Tomobiki-Cho.CAC.Washington.EDU (Mark Crispin) writes: :In article <6197@internal.Apple.COM> desnoyer@apple.com (Peter Desnoyers) writes: :>This sounds like a variation on run-time loading. I'm not sure of the :>specifics, but I believe some variation on {link in call to loader, load :>function on first call, patch stub to jump to loaded function} is used in :>[...] TOPS-20. : :Not that I know of, and I was a TOPS-20 OS programmer for 10 years. I did implement something like this for TOPS-20 dynamic library support, which was actually used with the TOPS-20 version of Datatrieve (did that ever ship?) but not, so far as I know, with anything else. It didn't require any changes in the OS (it used the PDV facility that appeared in version whatever). Each dynamic library was loaded into its own segment (it started out as a one-weekend hack, ok?). -- David Dyer-Bennet, ddb@terrabit.fidonet.org or ddb@network.com or Fidonet 1:282/341.0, (612) 721-8967 9600hst/2400/1200/300 or terrabit!ddb@Lynx.MN.Org, ...{amdahl,hpda}!bungia!viper!terrabit!ddb
johnl@esegue.segue.boston.ma.us (John R. Levine) (01/17/90)
In article <6186@celit.fps.com> hutch@fps.com (Jim Hutchison) writes: |So here is a thought I was toying with. How about allowing for library calls |to trap and re-write there calls as the appropriate instruction? This sort of thing has been done since time immemorial (which in this business is about 25 years.) A few years back I was programming an HP 1000 which is sort of an overgrown 16-bit PDP-8. When they added floating point instructions, they made them take their arguments in the place where they'd be for a call to the floating whatever routine, so the linker could patch the call into the appropriate instruction. More recently, PC/IX had a wonderful hack again for floating point. On the 8088, floating point instructions in the absence of a floating point unit do nothing in particular, so if you generate in-line instructions and there's no 8087, you lose. On the other hand, if you do have an 8087, the overhead of library calls is considerable. One of the PC/IX kernel guys noticed that every FP instruction was preceded by a one-byte WAIT instruction, and the first byte of every floating instruction is in the range D8 through DF. Instead of generating the WAIT, they generated the first byte of an INT instruction, so when the program got to that point it would generate an interrupt on one of the vectors from D8 through DF. If the machine didn't have an 8087, it could then pick up the rest of the instruction and emulate it. If there was an 8087, it would patch the INT to a WAIT, back up the instruction pointer, and return. The first time through any piece of code, you get an interrupt on each floating instruction but if there's an 8087, after the first time through any loop the machine ran at full speed. I don't propose exactly this hack for future ABIs, but there is a lot of milage to be gained by designing your extensions so that you can patch call instructions on the fly, getting practically full machine speed while maintaining binary compatibility with older systems. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl "Now, we are all jelly doughnuts."