[comp.arch] MIPS register handling question

srg@quick.COM (Spencer Garrett) (11/15/87)

I haven't seen one of these beasts, but I understand they have a single
large register pool which their compiler allocates as needed to procedures.
The idea sounded intriguing at first, but on closer examination I can't
see how to avoid acting like a vax (i.e. - save the regs you're going
to use at procedure entry and restore them at exit).  Here are several
specific questions I hope someone can answer, or maybe someone at MIPS
could post a short "here's how we do it" article.

1) If you're going to have object libraries, then it would have to be
the linker that really allocates the registers.  How does it handle
running out of registers? (since it's too late to generate code that
uses fewer or saves the ones it needs)

2) How does the compiler deal with recursion?

3) How does the register allocator (compiler or linker) handle computed
calls, since it can't generate a static call tree at all in this case?
(i.e. - calls of the form    (*pointer)(args);  )

larry@mips.UUCP (Larry Weber) (11/16/87)

In article <166@quick.COM> srg@quick.COM (Spencer Garrett) writes:
>1) If you're going to have object libraries, then it would have to be
>the linker that really allocates the registers.  How does it handle
>running out of registers? (since it's too late to generate code that
>uses fewer or saves the ones it needs)

The register allocator assigns registers in each procedure as it encounters
the procedure.  All registers are divided into two basic classes: saved across
calls and not saved (or temporary).  The allocator places any value whose
lifetime does not extend across a call into the unsaved registers, this 
includes global variables which must (in general) be homed to its memory
resident location making the register free.  Notice that functions
which make no calls (leaf routines) may assign ALL variables and expressions
to the unsaved registers giving the effect of register windows at the deepest
level of the call graph.  Also, treating register save and restores like
expressions that move outside of regions that require them but not to the
edges of the function allow the cost of saving and restoring to be felt
only in those regions that require more register saving. (This last improvement
is only partially done at the present.)

So far, I have described the general strategy which produces excellent results.
We then optionally go a step further.  We compile each compilation unit into 
UCODE (our intermediate code).  The UCODE file looks very much like an 
object file, including headers and symbol table; although it has a special 
magic number (of course, this is UNIX remember).  In these files, the ucode is
placed in a special segment.  Looking as much like object files as they do,
the archiver can build libraries as always. You can think of the link edit as
first linking the ucode into a single compilation unit (even code from different
languages can be linked), then optimizing the linked ucode.  When this is done,
we first order the functions depth-first.  The optimizer has a method to know
if all calls to a given function are following its definition.  In this last
case, the general register strategy is ignored in favor of making all registers
act like unsaved registers.  The caller will know which registers are used
by each called routine and will avoid using them (because it is cheaper to
use the other registers).  The effect is registers are assigned from the bottom
of the call graph toward the root, each function using only the registers it
needs.  When all the registers are used, some function up the call graph will
have to save the registers, but in a place which is less frequently used 
because it is higher in the graph; also, the saves can at least be moved out
of loops.  
>
>2) How does the compiler deal with recursion?
A regressive function is one which CAN'T be defined before its
used so it must punt and use the general strategy.
>
>3) How does the register allocator (compiler or linker) handle computed
>calls, since it can't generate a static call tree at all in this case?
>(i.e. - calls of the form    (*pointer)(args);  )
Again, its a punt to the general method.

There is always a discussion about whether to use callee save conventions
or caller saves.  The argument goes that the callee KNOWS which registers
it needs and should do the save, but some registers that contain useless
information may be saved.  On the other side, the caller KNOWS which values
must be preserved at any call site and should do the save, but registers that
may not be used also get saved.  Our approach has advantages from both sides,
values with limited lifetimes can use FREE registers while there is a known
cost with allocating values to the saved registers which is accounted for
in the register allocators cost function.

Our studies so far show that our general strategy has about 4-5% cost; that
is, if we really had unlimited registers and could completely ignore saving
and restoring we gain back this much.  We have very limited results to date
on our fancy call graph based allocator.  A register window machine, when 
executing a program that fits entirely in the set of hardware windows will
buy this advantage. HOWEVER, should a window overflow or underflow occur,
many registers are saved and restored.  This cost is a pure blind save, 
registers that have nothing in them get saved, without even the benefits
of either caller or callee save which tries to place some intelligence
behind the which registers should be saved.  Also, windows have a larger
cost at context switch times.

-- 
-Larry Weber  DISCLAIMER: I speak only for myself, and I even deny that.
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!larry,   DDD:408-720-1700, x214
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

bcase@apple.UUCP (Brian Case) (11/16/87)

In article <910@gumby.UUCP> larry@gumby.UUCP (Larry Weber) writes:

[Lots of good stuff about MIPS strategies.]  Larry, thanks so much for
posting this info.  How would you compare your link-time optimization
with that done by Wall at DEC?

>HOWEVER, should a window overflow or underflow occur,
>many registers are saved and restored.  This cost is a pure blind save, 
>registers that have nothing in them get saved, without even the benefits
>of either caller or callee save which tries to place some intelligence
>behind the which registers should be saved.  Also, windows have a larger
>cost at context switch times.

Yes, straight, fixed-size register windows (ala RISC I & II and SPARC)
have this problem:  unless the window is exactly the right size, there
will be either too few registers or too many (one or the other, usually
too many, is the case most of the time).  So, when a save happens, some
unneeded memory traffic is generated.  This is exactly why the 29K has
variable-sized "windows."  They aren't perfect, but at least the compiler
can allocate exactly what it wants.  When a save/restore happens, the
memory traffic is always transfering something "useful" and the addresses
are known to be sequential (thus enabling the use of high-bandwidth
access techniques).

I believe the link-time allocation techniques hold great promise.  Keep
us updated if possible!