mcg@mipon2.intel.com (Steven McGeady) (05/23/89)
Here is a note I sent some time ago - I never saw it reflected onto the
net. I am posting three messages - A previous message, which contains
changes I needed to make to the base, this one , which contains suggestions
for improvements in the base, and a third which contains mods for gcc.c to
provide environment variable support for locating passes and files. I have
not posted the machine-specific files for the 960. If anyone
wants them they can write to me.
S. McGeady
Intel Corp.
(503) 696-4393
------------
To: rms@wheaties.ai.mit.edu
Cc: info-gcc@wheaties.ai.mit.edu
Subject: [MSG 3 of 3] Suggestions for improvements/enhancements in GCC
Date: Fri, 21 Apr 89 13:46:18 PDT
From: mcg
1. Ease of Retarget Issues
a. More generic calling sequence support
My port would have been made much easier if FUNCTION_ARG could have
just returned an RTX for each argument in the list, rather than
just for the register arguments. Then I could have returned a
register rtx when appropriate, and
(plus (arg_pointer_rtx) (const_int 64))
or whatever when I needed to put something into the argument block.
I think that both expr.c and stmt.c would be simplified if this
were the case. Perhaps you could let expr.c and stmt.c figure it
out if a NIL was returned. (One would also need a macro to be
called after all parameters were assigned that returned the total
amount of stack space consumed, but this is easy to keep track of
in CUM.)
b. Machine-specific function header setup
The machine descriptions should have control of outputting the
header of a procedure. The reason for this is the optimization
of leaf-procedures: A different kind of definition must
preceed the '.globl _foo\n\n_foo:' lines, e.g.
.leafproc _foo,__foo
rret: return
_foo: lda rret,g14
__foo: /* leaf procedure, entered with branch-link */
...
bx (g14)
FUNCTION_PROLOGUE is too late for this. I had to hijack
FUNCTION_NAME_DECLARE, which is probably not the appropriate
place for such code. Why not just let FUNCTION_PROLOGUE
output the .globl and the initial label? Pass it the same
parameters that you pass FUNCTION_NAME_DECLARE. That seems
the easiest solution.
c. A three-word integer type would be useful, so three-word
BLKmode structures and aggregates could be kept in registers.
2. Optimization Issues
a. Support for leaf-procedure and tail-call optimizations
These optimizations are quite common in RISC architectures. A
leaf-procedure is one that meets certain machine-dependent criteria
regarding how many and which registers are used, perhaps whether a
frame is needed or how big that frame is, etc. Currently, I need
to scan all of the insns in a procedure looking for calls and for
argument block or frame references to know whether the procedure is
leaf or not. This is because regs_ever_live[FRAME_POINTER_REGNUM]
and regs_ever_live[ARG_POINTER_REGNUM] are not correct at
FUNCTION_PROLOGUE time, and because there is no indication of
whether any calls are made from the procedure.
Tail-calls are calls made immediately prior to a return. The call
can often "piggy-back" on the current procedure's stack, and avoid
the overhead of allocating another one. The crucial test is
whether any of the called procedure's actual parameters are
addresses on the calling procedure's frame. There is no easy way
to determine this currently (that I am aware of), so I scan all the
insns looking for a load effective address insn that references the
frame pointer. This is not optimal though, because such references
may have nothing to do with the actual parameters of any particular
call. An indication of whether a call is a potential tail-call
would be useful.
b. Constant costs & Constants in common registers
The CONST_COST macro is severely limited by not knowing the context
in which the constant will be used. Take for example a standard
RISC architecture, which has a limited number of constants that can
appear in an instruction. The constant '256' may not fit in the
instruction, so one would assign it a non-zero cost, and it will be
put in a register. However, if an insn (or a peephole), can
recognize (for example), 'and 256,r0,r0; test r0' and replace it
with 'bittst 7' or whatever, then substantial optimizations could
be realized. It is easy to define an insn which has operands which
match various types of constant operands, but the loading of these
operands into registers often eliminates optimization possibilities.
The solution might be to give CONST_COST an idea of what instruction
context a constant is being used in, or it may be to delay loading
of constants into registers until after instruction selection. I
can give numerous examples where better code could be generated if
I could recognize certain constants: especially bit strings
(0x00ff00) and power-of-2 operands (0x80000000).
There is an another case as well. I would like to define an
instruction template that has a condition:
if (operands[0] == const0_rtx &&
regs_ever_used(REG_G14,...))
Because I have a register used in the calling sequence which
is almost always zero, except in routines which require an
argument block. I can probably handle this by writing special
code which examines the function definition tree, but that
duplicates a lot of code in a place it really shouldn't be.
c. GCC generates extremely bad code for RISC architectures when
performing bit-field extractions on automatic memory-based
operands (typically those passed in a register but addressed
later). When the bitfield is addressed within a module, GCC
assumes that it must write the bitfield out to memory after
each statement, causing a larger number of redundant loads and
stores to be generated. Instead, GCC should load the variable
into memory only immediately before a call or a distinct reference
to memory that might be a reference to the structure is made.
d. On the same subject as (c), the compiler does not do the dataflow
analysis needed to determine whether an addressed automatic
variable (either scaler or aggregate) can actually be used at any
point in a procedure. It would be worth doing this data flow, at
least within basic blocks, if not within the whole procedure. Only
spilling the value at basic block boundaries would be a major win.
e. Constant-value tracking across basic-blocks is poor, e.g:
for (i = 0; i < 100; i++) {
...
}
generates something like:
mov 0,rX
mov 99,rY
br L1:
L2: ...
sub 1,rX,rX
L1: cmp rX,rY
ble L2
The initial branch in this can be eliminated under all
circumstances, since the initial comparison is constant.
f. Some indication should be given for each insn (especially
load/store insns) whether the insn is in a loop or not, and
approximately (if known) how many times that loop might be
executed. A machine-description might want to make a trade-off
between code density and speed in certain circumstances, and
generate a different sequence of instructions in a loop.
One example is use of complex addressing modes. A complex
addressing mode (such as reg+const+(reg*scale)) might be desirable
in straight-line code, especially if it is a dense instruction and
does not take much longer than the corresponding expanded code to
perform the address calculation. However, one would want to expand
and strength-reduce or hoist the address address calculation of the
load/store occured in a loop.
g. An indication of the top and bottom of a loop, and perhaps
identification of the loop control insns. There are certain
optimizations that may or may not be beyond the scope of the
compiler such as loop unrolling and software pipelining. At a
less aggressive level, on some RISC machines it is and optimization
to align branch targets in memory. This is often not a win except
for loop tops. A loop header that noted the minimum and maximum
number of times the loop is executed (if known), whether the loop
is left with a break, and some other information of that nature
would be helpful both to the code generator and potentially to
post-pass optimizers.
h. Support for __builtin_strcpy, etc, generating movstrqi instructions
or equivalent, would be useful. A hook for strcmp would also be
nice. I am told that the guy at Data General who has done the 88k
port has done this.
3. Machine Type Support
a. It would be nice if there were a straightforward way to support
3-word (and 4-word) structures in registers. As noted in an
earlier note, I have added support for 4-word structures, and
they appear to work, but I have not added 3-word structures out
of fear that the base compiler might not work on non-power-of-2
sized data.
4. Environment/Miscellaneous Issues
a. It would be desirable if GCC would get its paths (optionally)
from environment variables, rather than from compiled-in
constants: e.g.:
$GmdBASE - base of GNU tools
$GmdLIBDIR - where to look for libraries
default $GNUBASE/lib
$GmdBINDIR - where to look for executable binaries
default $GNUBASE/bin
$GmdINCDIR - where to look for include files
default $GNUBASE/include
$GPPmdSPEC - default CPP_SPEC
$GASmdSPEC - default ASM_SPEC
$GLDmdSPEC - default loader spec
$GASmdPATH - path to assembler
default $GBINDIR/gas
$GLDmdPATH - path to linker
default $GBINDIR/gld
$GPPmdPATH - path to cpp
default $GBINDIR/cpp
(the 'md' would be a machine-dependent string)
This would allow different users on the same machine to use
different combinations of the system and gnu tools without each
having a private gcc. Also, it would allow cross-compilers to
isolate themselves from the system includes and whatnot.
Currently, someone targeting GCC as a cross-compiler must invade
gcc, cpp, and all the other tools to scrub them of references to
/usr/include, /lib, /usr/lib, etc. If the above variable where
used, a "normal", non-cross user would not set any variables, and
gcc could substitute in the standards. However, a cross user
could just set $GmdBASE, and then get a cross environment of
include files and libraries.
S. McGeady
haahr@phoenix.Princeton.EDU (Paul Gluckauf Haahr) (05/24/89)
In article <4462@omepd.UUCP> mcg@mipon2.intel.com (Steven McGeady) writes: > 2. Optimization Issues > a. Support for leaf-procedure and tail-call optimizations > Tail-calls are calls made immediately prior to a return. The call > can often "piggy-back" on the current procedure's stack, and avoid > the overhead of allocating another one. The crucial test is > whether any of the called procedure's actual parameters are > addresses on the calling procedure's frame. There is no easy way > to determine this currently (that I am aware of), so I scan all the > insns looking for a load effective address insn that references the > frame pointer. This is not optimal though, because such references > may have nothing to do with the actual parameters of any particular > call. An indication of whether a call is a potential tail-call > would be useful. the conditions listed above are not sufficient. any reference to an address in stack allocated memory that might store the address (ie, passing it to a function or storing it in a global) is enough to indicate that tail-call elimination is unsafe. for example, char *global; int f(const char *s, int n) { char buf[1000]; strcpy(buf, s); if (n == 0) global = buf; if (*s == '\0') { printf("length of %s is %d\n", global, n); return n; } return f(s + 1, n + 1); } if the tail-call is eliminated, the global pointer will be pointing to the wrong instance of buf.
mcg@mipon2.intel.com (05/28/89)
In article <8641@phoenix.Princeton.EDU> haahr@princeton.edu (Paul Gluckauf Haahr) writes: >In article <4462@omepd.UUCP> mcg@mipon2.intel.com (Steven McGeady) writes: >> 2. Optimization Issues >> a. Support for leaf-procedure and tail-call optimizations >> Tail-calls are calls made immediately prior to a return. The call >> can often "piggy-back" on the current procedure's stack, and avoid >> the overhead of allocating another one. The crucial test is >> whether any of the called procedure's actual parameters are >> addresses on the calling procedure's frame. > >the conditions listed above are not sufficient. any reference to >an address in stack allocated memory that might store the address >(ie, passing it to a function or storing it in a global) is enough >to indicate that tail-call elimination is unsafe Mr. Haahr is correct in pointing out that it is very difficult to determine whether an actual parameter to a subroutine is an address on the local frame. Perhaps an indication of whether an operand's address is ever taken is the best that can be done. Better would be the combination of knowing an indication of whether any aliases for local frame addresses may have been created (e.g. frame_ever_addressed_p), and an indication that a particular parameter is known to hold a frame address. The latter may require dataflow analysis that may be beyond the capability of gcc. S. McGeady Intel Corp.