[gnu.gcc] Suggestions for improvements/enhancements to GCC

mcg@mipon2.intel.com (Steven McGeady) (05/23/89)

Here is a note I sent some time ago - I never saw it reflected onto the
net.  I am posting three messages - A previous message, which contains
changes I needed to make to the base, this one , which contains suggestions
for improvements in the base, and a third which contains mods for gcc.c to
provide environment variable support for locating passes and files.  I have
not posted the machine-specific files for the 960.  If anyone
wants them they can write to me.

S. McGeady
Intel Corp.
(503) 696-4393

------------

To: rms@wheaties.ai.mit.edu
Cc: info-gcc@wheaties.ai.mit.edu
Subject: [MSG 3 of 3] Suggestions for improvements/enhancements in GCC
Date: Fri, 21 Apr 89 13:46:18 PDT
From: mcg


1. Ease of Retarget Issues

	a. More generic calling sequence support

	   My port would have been made much easier if FUNCTION_ARG could have
	   just returned an RTX for each argument in the list, rather than
	   just for the register arguments.  Then I could have returned a
	   register rtx when appropriate, and
		(plus (arg_pointer_rtx) (const_int 64))
	   or whatever when I needed to put something into the argument block.
	   I think that both expr.c and stmt.c would be simplified if this
	   were the case.  Perhaps you could let expr.c and stmt.c figure it
	   out if a NIL was returned.  (One would also need a macro to be
	   called after all parameters were assigned that returned the total
	   amount of stack space consumed, but this is easy to keep track of
	   in CUM.)

	b. Machine-specific function header setup

	   The machine descriptions should have control of outputting the
	   header of a procedure.  The reason for this is the optimization
	   of leaf-procedures:  A different kind of definition must
	   preceed the '.globl _foo\n\n_foo:' lines, e.g.

		.leafproc _foo,__foo
	    rret:	return
	    _foo:	lda	rret,g14
	    __foo:	/* leaf procedure, entered with branch-link */
			...
			bx	(g14)

	   FUNCTION_PROLOGUE is too late for this.  I had to hijack
	   FUNCTION_NAME_DECLARE, which is probably not the appropriate
	   place for such code.  Why not just let FUNCTION_PROLOGUE
	   output the .globl and the initial label?  Pass it the same
	   parameters that you pass FUNCTION_NAME_DECLARE.  That seems
	   the easiest solution.

	c. A three-word integer type would be useful, so three-word
	   BLKmode structures and aggregates could be kept in registers.

2. Optimization Issues

	a. Support for leaf-procedure and tail-call optimizations

	   These optimizations are quite common in RISC architectures.  A
	   leaf-procedure is one that meets certain machine-dependent criteria
	   regarding how many and which registers are used, perhaps whether a
	   frame is needed or how big that frame is, etc.  Currently, I need
	   to scan all of the insns in a procedure looking for calls and for
	   argument block or frame references to know whether the procedure is
	   leaf or not.  This is because regs_ever_live[FRAME_POINTER_REGNUM]
	   and regs_ever_live[ARG_POINTER_REGNUM] are not correct at
	   FUNCTION_PROLOGUE time, and because there is no indication of
	   whether any calls are made from the procedure.

	   Tail-calls are calls made immediately prior to a return.  The call
	   can often "piggy-back" on the current procedure's stack, and avoid
	   the overhead of allocating another one.  The crucial test is
	   whether any of the called procedure's actual parameters are
	   addresses on the calling procedure's frame.  There is no easy way
	   to determine this currently (that I am aware of), so I scan all the
	   insns looking for a load effective address insn that references the
	   frame pointer.  This is not optimal though, because such references
	   may have nothing to do with the actual parameters of any particular
	   call.  An indication of whether a call is a potential tail-call
	   would be useful.

	b. Constant costs & Constants in common registers

	   The CONST_COST macro is severely limited by not knowing the context
	   in which the constant will be used.  Take for example a standard
	   RISC architecture, which has a limited number of constants that can
	   appear in an instruction.  The constant '256' may not fit in the
	   instruction, so one would assign it a non-zero cost, and it will be
	   put in a register.  However, if an insn (or a peephole), can
	   recognize (for example), 'and 256,r0,r0; test r0' and replace it
	   with 'bittst 7' or whatever, then substantial optimizations could
	   be realized.  It is easy to define an insn which has operands which
	   match various types of constant operands, but the loading of these
	   operands into registers often eliminates optimization possibilities.

	   The solution might be to give CONST_COST an idea of what instruction
	   context a constant is being used in, or it may be to delay loading
	   of constants into registers until after instruction selection.  I
	   can give numerous examples where better code could be generated if
	   I could recognize certain constants: especially bit strings
	   (0x00ff00) and power-of-2 operands (0x80000000).

	   There is an another case as well.  I would like to define an
	   instruction template that has a condition:

		if (operands[0] == const0_rtx &&
		    regs_ever_used(REG_G14,...))

	   Because I have a register used in the calling sequence which
	   is almost always zero, except in routines which require an
	   argument block.  I can probably handle this by writing special
	   code which examines the function definition tree, but that
	   duplicates a lot of code in a place it really shouldn't be.

	c. GCC generates extremely bad code for RISC architectures when
	   performing bit-field extractions on automatic memory-based
	   operands (typically those passed in a register but addressed
	   later).  When the bitfield is addressed within a module, GCC
	   assumes that it must write the bitfield out to memory after
	   each statement, causing a larger number of redundant loads and
	   stores to be generated.  Instead, GCC should load the variable
	   into memory only immediately before a call or a distinct reference
	   to memory that might be a reference to the structure is made.

	d. On the same subject as (c), the compiler does not do the dataflow
	   analysis needed to determine whether an addressed automatic
	   variable (either scaler or aggregate) can actually be used at any
	   point in a procedure.  It would be worth doing this data flow, at
	   least within basic blocks, if not within the whole procedure.  Only
	   spilling the value at basic block boundaries would be a major win.

	e. Constant-value tracking across basic-blocks is poor, e.g:

		for (i = 0; i < 100; i++) {
			...
		}

	   generates something like:

		mov	0,rX
		mov	99,rY
		br	L1:
	L2:	...
		sub	1,rX,rX

	L1:	cmp	rX,rY
		ble	L2

	    The initial branch in this can be eliminated under all
	    circumstances, since the initial comparison is constant.

	f. Some indication should be given for each insn (especially
	   load/store insns) whether the insn is in a loop or not, and
	   approximately (if known) how many times that loop might be
	   executed.  A machine-description might want to make a trade-off
	   between code density and speed in certain circumstances, and
	   generate a different sequence of instructions in a loop.

	   One example is use of complex addressing modes.  A complex
	   addressing mode (such as reg+const+(reg*scale)) might be desirable
	   in straight-line code, especially if it is a dense instruction and
	   does not take much longer than the corresponding expanded code to
	   perform the address calculation.  However, one would want to expand
	   and strength-reduce or hoist the address address calculation of the
	   load/store occured in a loop.

	g. An indication of the top and bottom of a loop, and perhaps
	   identification of the loop control insns.  There are certain
	   optimizations that may or may not be beyond the scope of the
	   compiler such as loop unrolling and software pipelining.  At a
	   less aggressive level, on some RISC machines it is and optimization
	   to align branch targets in memory.  This is often not a win except
	   for loop tops.  A loop header that noted the minimum and maximum
	   number of times the loop is executed (if known), whether the loop
	   is left with a break, and some other information of that nature
	   would be helpful both to the code generator and potentially to
	   post-pass optimizers.

	h. Support for __builtin_strcpy, etc, generating movstrqi instructions
	   or equivalent, would be useful.  A hook for strcmp would also be
	   nice.  I am told that the guy at Data General who has done the 88k
	   port has done this.

3. Machine Type Support

	a. It would be nice if there were a straightforward way to support
	   3-word (and 4-word) structures in registers.  As noted in an
	   earlier note, I have added support for 4-word structures, and
	   they appear to work, but I have not added 3-word structures out
	   of fear that the base compiler might not work on non-power-of-2
	   sized data.

4. Environment/Miscellaneous Issues
	
	a. It would be desirable if GCC would get its paths (optionally)
	   from environment variables, rather than from compiled-in
	   constants: e.g.:

		$GmdBASE	- base of GNU tools
		$GmdLIBDIR	- where to look for libraries
				  default $GNUBASE/lib
		$GmdBINDIR	- where to look for executable binaries
				  default $GNUBASE/bin
		$GmdINCDIR	- where to look for include files
				  default $GNUBASE/include
		$GPPmdSPEC	- default CPP_SPEC
		$GASmdSPEC	- default ASM_SPEC
		$GLDmdSPEC	- default loader spec

		$GASmdPATH	- path to assembler
				  default $GBINDIR/gas
		$GLDmdPATH	- path to linker
				  default $GBINDIR/gld
		$GPPmdPATH	- path to cpp
				  default $GBINDIR/cpp

	    (the 'md' would be a machine-dependent string)

	    This would allow different users on the same machine to use
	    different combinations of the system and gnu tools without each
	    having a private gcc.  Also, it would allow cross-compilers to
	    isolate themselves from the system includes and whatnot.
	    Currently, someone targeting GCC as a cross-compiler must invade
	    gcc, cpp, and all the other tools to scrub them of references to
	    /usr/include, /lib, /usr/lib, etc.  If the above variable where
	    used, a "normal", non-cross user would not set any variables, and
	    gcc could substitute in the standards.  However, a cross user
	    could just set $GmdBASE, and then get a cross environment of
	    include files and libraries.

S. McGeady

haahr@phoenix.Princeton.EDU (Paul Gluckauf Haahr) (05/24/89)

In article <4462@omepd.UUCP> mcg@mipon2.intel.com (Steven McGeady) writes:
> 2. Optimization Issues
> 	a. Support for leaf-procedure and tail-call optimizations
> 	   Tail-calls are calls made immediately prior to a return.  The call
> 	   can often "piggy-back" on the current procedure's stack, and avoid
> 	   the overhead of allocating another one.  The crucial test is
> 	   whether any of the called procedure's actual parameters are
> 	   addresses on the calling procedure's frame.  There is no easy way
> 	   to determine this currently (that I am aware of), so I scan all the
> 	   insns looking for a load effective address insn that references the
> 	   frame pointer.  This is not optimal though, because such references
> 	   may have nothing to do with the actual parameters of any particular
> 	   call.  An indication of whether a call is a potential tail-call
> 	   would be useful.

the conditions listed above are not sufficient.  any reference to
an address in stack allocated memory that might store the address
(ie, passing it to a function or storing it in a global) is enough
to indicate that tail-call elimination is unsafe.  for example,

	char *global;
	int f(const char *s, int n) {
		char buf[1000];
		strcpy(buf, s);
		if (n == 0)
			global = buf;
		if (*s == '\0') {
			printf("length of %s is %d\n", global, n);
			return n;
		}
		return f(s + 1, n + 1);
	}

if the tail-call is eliminated, the global pointer will be
pointing to the wrong instance of buf.

mcg@mipon2.intel.com (05/28/89)

In article <8641@phoenix.Princeton.EDU> haahr@princeton.edu (Paul Gluckauf Haahr) writes:
>In article <4462@omepd.UUCP> mcg@mipon2.intel.com (Steven McGeady) writes:
>> 2. Optimization Issues
>> 	a. Support for leaf-procedure and tail-call optimizations
>> 	   Tail-calls are calls made immediately prior to a return.  The call
>> 	   can often "piggy-back" on the current procedure's stack, and avoid
>> 	   the overhead of allocating another one.  The crucial test is
>> 	   whether any of the called procedure's actual parameters are
>> 	   addresses on the calling procedure's frame.
>
>the conditions listed above are not sufficient.  any reference to
>an address in stack allocated memory that might store the address
>(ie, passing it to a function or storing it in a global) is enough
>to indicate that tail-call elimination is unsafe

Mr. Haahr is correct in pointing out that it is very difficult to determine
whether an actual parameter to a subroutine is an address on the local frame.
Perhaps an indication of whether an operand's address is ever taken is
the best that can be done.   Better would be the combination of knowing
an indication of whether any aliases for local frame addresses may have
been created (e.g. frame_ever_addressed_p), and an indication that a particular
parameter is known to hold a frame address.  The latter may require dataflow
analysis that may be beyond the capability of gcc.

S. McGeady
Intel Corp.