srg@quick.COM (Spencer Garrett) (11/15/87)
I haven't seen one of these beasts, but I understand they have a single large register pool which their compiler allocates as needed to procedures. The idea sounded intriguing at first, but on closer examination I can't see how to avoid acting like a vax (i.e. - save the regs you're going to use at procedure entry and restore them at exit). Here are several specific questions I hope someone can answer, or maybe someone at MIPS could post a short "here's how we do it" article. 1) If you're going to have object libraries, then it would have to be the linker that really allocates the registers. How does it handle running out of registers? (since it's too late to generate code that uses fewer or saves the ones it needs) 2) How does the compiler deal with recursion? 3) How does the register allocator (compiler or linker) handle computed calls, since it can't generate a static call tree at all in this case? (i.e. - calls of the form (*pointer)(args); )
larry@mips.UUCP (Larry Weber) (11/16/87)
In article <166@quick.COM> srg@quick.COM (Spencer Garrett) writes: >1) If you're going to have object libraries, then it would have to be >the linker that really allocates the registers. How does it handle >running out of registers? (since it's too late to generate code that >uses fewer or saves the ones it needs) The register allocator assigns registers in each procedure as it encounters the procedure. All registers are divided into two basic classes: saved across calls and not saved (or temporary). The allocator places any value whose lifetime does not extend across a call into the unsaved registers, this includes global variables which must (in general) be homed to its memory resident location making the register free. Notice that functions which make no calls (leaf routines) may assign ALL variables and expressions to the unsaved registers giving the effect of register windows at the deepest level of the call graph. Also, treating register save and restores like expressions that move outside of regions that require them but not to the edges of the function allow the cost of saving and restoring to be felt only in those regions that require more register saving. (This last improvement is only partially done at the present.) So far, I have described the general strategy which produces excellent results. We then optionally go a step further. We compile each compilation unit into UCODE (our intermediate code). The UCODE file looks very much like an object file, including headers and symbol table; although it has a special magic number (of course, this is UNIX remember). In these files, the ucode is placed in a special segment. Looking as much like object files as they do, the archiver can build libraries as always. You can think of the link edit as first linking the ucode into a single compilation unit (even code from different languages can be linked), then optimizing the linked ucode. When this is done, we first order the functions depth-first. The optimizer has a method to know if all calls to a given function are following its definition. In this last case, the general register strategy is ignored in favor of making all registers act like unsaved registers. The caller will know which registers are used by each called routine and will avoid using them (because it is cheaper to use the other registers). The effect is registers are assigned from the bottom of the call graph toward the root, each function using only the registers it needs. When all the registers are used, some function up the call graph will have to save the registers, but in a place which is less frequently used because it is higher in the graph; also, the saves can at least be moved out of loops. > >2) How does the compiler deal with recursion? A regressive function is one which CAN'T be defined before its used so it must punt and use the general strategy. > >3) How does the register allocator (compiler or linker) handle computed >calls, since it can't generate a static call tree at all in this case? >(i.e. - calls of the form (*pointer)(args); ) Again, its a punt to the general method. There is always a discussion about whether to use callee save conventions or caller saves. The argument goes that the callee KNOWS which registers it needs and should do the save, but some registers that contain useless information may be saved. On the other side, the caller KNOWS which values must be preserved at any call site and should do the save, but registers that may not be used also get saved. Our approach has advantages from both sides, values with limited lifetimes can use FREE registers while there is a known cost with allocating values to the saved registers which is accounted for in the register allocators cost function. Our studies so far show that our general strategy has about 4-5% cost; that is, if we really had unlimited registers and could completely ignore saving and restoring we gain back this much. We have very limited results to date on our fancy call graph based allocator. A register window machine, when executing a program that fits entirely in the set of hardware windows will buy this advantage. HOWEVER, should a window overflow or underflow occur, many registers are saved and restored. This cost is a pure blind save, registers that have nothing in them get saved, without even the benefits of either caller or callee save which tries to place some intelligence behind the which registers should be saved. Also, windows have a larger cost at context switch times. -- -Larry Weber DISCLAIMER: I speak only for myself, and I even deny that. UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!larry, DDD:408-720-1700, x214 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
bcase@apple.UUCP (Brian Case) (11/16/87)
In article <910@gumby.UUCP> larry@gumby.UUCP (Larry Weber) writes: [Lots of good stuff about MIPS strategies.] Larry, thanks so much for posting this info. How would you compare your link-time optimization with that done by Wall at DEC? >HOWEVER, should a window overflow or underflow occur, >many registers are saved and restored. This cost is a pure blind save, >registers that have nothing in them get saved, without even the benefits >of either caller or callee save which tries to place some intelligence >behind the which registers should be saved. Also, windows have a larger >cost at context switch times. Yes, straight, fixed-size register windows (ala RISC I & II and SPARC) have this problem: unless the window is exactly the right size, there will be either too few registers or too many (one or the other, usually too many, is the case most of the time). So, when a save happens, some unneeded memory traffic is generated. This is exactly why the 29K has variable-sized "windows." They aren't perfect, but at least the compiler can allocate exactly what it wants. When a save/restore happens, the memory traffic is always transfering something "useful" and the addresses are known to be sequential (thus enabling the use of high-bandwidth access techniques). I believe the link-time allocation techniques hold great promise. Keep us updated if possible!