[comp.arch] Optimizing compiler's view on: no. of regs, aliasing, caller/callee save etc.

urip@orcisi.UUCP (02/16/87)

As one of the designers of the new optimizing compiler for the NS32000,
I enjoyed reading the articles posted on the above subject. I am going to 
describe the way that our implementation deals with some of the issues that 
were mentioned (see title).

It is needless to say (but I'll say it anyway...) that I completely agree 
with all those who condemn using pointers to local variables, and assumming
that incrementing the pointer will point to the next declared variable.
Not only that any of these variables may end up in a register, but
even those that stayed in memory (because there was no free register for them)
are reallocated to fill the "holes" created by the lucky ones (i.e. variables
that won a register). This reduces the size of frames (sometimes down to zero).

Someone from Sun (don't remember the name) has suggested to declare all 
variables that are not affected by pointers or by other functions as 
'register' in order to help the optimizer. 
As far as I understand this is completely unnecessary.
According to the "white book" 'register' storage class is applicable 
only for automatic variables or function parameters, and it is not
possible to take the address of a register variable. This leaves us with
local variables whose address is never taken, which are considered safe 
by the optimizer since they cannot be affected by pointers or by other 
functions (unless we play dangerous games with pointers, which we already 
promised not to!!). 


The conventions for saving registers by the caller vs. the callee were
mentioned too, but no one mentioned the posibility of a compromise between
the two - combined caller and callee save. This convention is used by
good old Portable C Compiler (pcc). By this convention, the registers
are divided into two groups: volatile and non-volatile registers
(on the VAX and NS32000 the volatile registers are R0..R2 and the rest are 
non-volatile). Volatile register means that a procedure call may change
its contents, so it has to be saved by the caller (or not used across
procedure calls). Non-volatile register means that a procedure call is
guaranteed to retain its old value, so it has to be saved by the callee
(if used by the callee).
The pcc uses non-volatile registers for 'register' declarations and 
volatile registers for expression evaluation. This means that every
'register' declaration causes save and restore of that register in
the beginning and end of the procedure that contains the declarations.
Declaring a variable that is used only once or twice as 'register'
may therefore cause perfomance degradation (depending on the exact
instruction timings). 
Since most expressions do not contain function calls, using non-volatile
registers for expression evaluation is very efficient.

Our optimizer uses the same convention (so modules compiled by either
compiler are compatible) but squeezes a little more performance out of it 
by using the following strategy: 
a variable that spans across procedure calls
(in optimizers' jargon: a variable whose live range contains procedure calls)
is allocated a non-volatile register (if still profitable after taking
into account the cost of save/restore). A variable that doesn't live across
procedure calls is allocated a volatile register (if available), which is
cheaper to use because it does not require save/restore (because, by
convention, the caller does not expect it to retain its value...).

I hope this answers the original question: what is the available number
of registers for 'register' declaration.


Uri Postavsky (...utgpu!syntron!orcisi!urip)

Currently with O.R.C. Toronto,
 formerly with National Semiconductor Tel Aviv.

faustus@ucbcad.UUCP (02/22/87)

WRT the discussion about whether the caller or the callee should save
registers -- wouldn't the most efficient strategy be for the caller to
pass the callee a mask of which registers are currently in use (by
itself and previous functions)?  That way, the callee could AND its
mask with the caller's mask and save those registers, and OR its mask
and the caller's and pass that to any functions that it calls.  I think
this should prevent any unnecessary saving of registers.

	Wayne

lodman@ncr-sd.UUCP (02/23/87)

In article <1283@ucbcad.berkeley.edu> faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
>WRT the discussion about whether the caller or the callee should save
>registers -- wouldn't the most efficient strategy be for the caller to
>pass the callee a mask of which registers are currently in use (by
>itself and previous functions)?  That way, the callee could AND its
>mask with the caller's mask and save those registers, and OR its mask
>and the caller's and pass that to any functions that it calls.  I think
>this should prevent any unnecessary saving of registers.

Wouldn't this force the linker to insert code to save and restore 
registers? It doesn't sound like a workable solution.



-- 
Michael Lodman
mike.lodman@SanDiego.NCR.COM 
{sdcsvax,cbatt,dcdwest,nosc.ARPA,ihnp4}!ncr-sd!lodman

greg@utcsri.UUCP (Gregory Smith) (03/03/87)

In article <1283@ucbcad.berkeley.edu> faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
>WRT the discussion about whether the caller or the callee should save
>registers -- wouldn't the most efficient strategy be for the caller to
>pass the callee a mask of which registers are currently in use (by
>itself and previous functions)?  That way, the callee could AND its
>mask with the caller's mask and save those registers, and OR its mask
>and the caller's and pass that to any functions that it calls.  I think
>this should prevent any unnecessary saving of registers.

In article <1379@ncr-sd.SanDiego.NCR.COM> lodman@ncr-sd.UUCP (0000-Mike Lodman) writes:
>Wouldn't this force the linker to insert code to save and restore 
>registers?

No. It would require support in the architecture itself - a 'register-in-use'
word, with a bit for each register. *
At the top of each function would be an instruction with a register mask
field indicating which registers would be needed within that function.
This would be anded with the current mask to produce the set of registers
which must be pushed. The 'register-in-use' mask would be then be 'or-ed'
with this value to produce the new mask. The old mask would also have to
be pushed. ( can you say 'CISC'? ) The same instruction would of course
perform the usual stacrobatics (and make coffee if anybody wanted it).

On function exit there would be an instruction which would undo this. This
instruction would also contain the same mask that appeared in the entry
instruction.

The problem is, you wouldn't gain much. As you go deeper into the stack,
bits in the 'reg-in-use' word would become set, but would never be
cleared by anything but a function return. Once all of them become set,
you have gained nothing over a conventional (vax/68k/NS32K)
'push-specified-set' instruction, and you are pushing and popping an
extra word to boot.

Presumably, you could have a 'release-all-registers' instruction which
would push all the regs-in-use, and  the regs-in-use word, and clear
the regs-in-use word. A complementary instruction would pop them.
If a function A were to make many calls to a function B, and B needed
a lot of regs, it would then be advantageous for A to 'release-all-regs'
before the loop, so that the save/restore need only be done once.
But how to determine this?

* you could do this on an arbitrary machine with run-time code, but I
don't know of a machine with a push-specified-set op that allows a non-constant
set.

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

kary@hplsdla.UUCP (03/09/87)

> The problem is, you wouldn't gain much. As you go deeper into the stack,
> bits in the 'reg-in-use' word would become set, but would never be
> cleared by anything but a function return. 
The 'reg-in-use' word could be cleared at function entry.  Since the contents
of the registers are on the stack, they can be considered no longer in use.
Another thing that could be done is the 'reg-in-use' word could set the 
appropriate bit whenever a register is loaded, (the 'reg-in-use' word should
be a processor register, though not necessarily addressable) then there would 
be no compiler support required for the register mask feature.  The register 
mask would then reflect only the registers actually in use at the point of the 
function call, registers that are first used *after* the call wouldn't have 
to be saved.  One thing missing from this scheme is a way to clear a bit when
a register is no longer in use.  Perhaps clearing that bit could be tacked onto
every register instruction.  Every register instruction now has two modes, one
standard mode and one to clear the 'reg-in-use' bit.  The compiler could then
generate the second version the last time a register is used in a function.
This machine is now a MISC (Machiavellian Instruction Set Computer :-)

Dan Kary
hp-lsd!kary

hrubin@osupyr.UUCP (Herman Rubin) (03/09/87)

It is not that difficult for hardware to avoid register loss in subroutine
calls while minimizing register saving.  The callee saves those registers
which the caller tells it are in use and which the callee needs, and keeps
a bit pattern of those the caller sends which it does not use.  It would
then send this list or-ed with the list of its active registers when it
makes another call.  Thus it is not merely building up a long list by or's.
Having written programs with unavoidable rare conditional subroutine calls,
I can unequivocally state that having the caller save registers can be so
costly as to require kludges.
-- 
Herman Rubin
Until mid May:
Department of Statistics, Cockins Hall, 1958 Neil Avenue
Ohio State University, Columbus OH43210
hrubin@osupyr or cbosgd!osu-eddie!osupyr!hrubin or ts0474@ohstvma.bitnet
"Permanent":
Department of Statistics, Purdue University, West Lafayette IN47907
hrubin@l.cc.purdue.edu or ihnp4!pur-ee!stat-l!hrubin or hrubin@purccvm.bitnet