stuart@bms-at.UUCP (Stuart Gathman) (01/12/89)
1) Jump optimization - easy for machines, hard for humans and machine dependent to boot, almost any decent compiler does it. This includes choosing optimal 'switch' strategies. 2) Common subexpression elimination. This includes pointer expressions and should apply to an entire function, not just one statement. All variables used should be divided into two classes, 'alias' and 'noalias', by one of three methods (in increasing order of expense): a) 'register' variables are noalias. b) 'register' variables and 'auto' variables whose address is never taken within the function are noalias. c) 'register' variables, 'auto' variables whose address is never taken within the function, and 'static' variables whose address is never taken within the module are noalias. For structured variables, taking the address of any member at any level cancels noalias status for the entire structure. (One can see why compiler makers wanted the 'noalias' keyword. One can also see why it would be dangerous.) Subexpression trees are not considered identical when a member of the tree is 'alias' and there is a function call or modify indirect operation seperating the references. When expression rearrangement allows, the offending operation should be moved. Of course, modification of any variable member of the tree also renders them non-identical. Some compilers may be able to detect identities (e.g. *p++, *--p), but this is not required. The key thing is that a->b->c->d used twice will not follow the pointer chain each time. CSE elimination should not be confined to values in registers. The compiler should allocate and compute temporaries where required. 3) Constant reduction. This is specified as a requirement in K&R, but many compilers don't do it! Gcc does it in the parse phase. In particular this includes: if (0) { ... } 4) Register optimization: let CSE determine register usage. There is no reason to assign a fixed register to a variable. Registers that are never flushed do not need memory allocated. Register flushes never followed by a reload should be eliminated. Flush variables declared 'register' as a last resort. Flushing strategy can range from a simple LRU over sequential program flow to attempts to guess frequency of usage. The latter should still give 'register' priority because the compiler doesn't know usage context (apart from profiler feedback used by some systems - but it is still unknown the first time it is compiled, and the profile results from every potential application would have to be combined to do library routines properly). LRU over program flow with 'register' priority would suit me just fine. 5) Loop unrolling: this is OK as long as code size is not increased significantly. If I want speed over size, I'd rather unroll it myself. GCC does the important things in several steps: 1) constant reduction is done while parsing. output: tree representation of program with all variables as memory references. Gcc calls this form 'C-tree' 2) translate to program using infinite registers. Do CSE and assign 1 register to each unique sub expression. GCC calls this form 'RTL'. 3) assign virtual to real registers with appropriate flushes and reloads from memory. This is still in RTL form. 4) translate to actual machine instructions with peephole optimizations. This is programmable using a machine description in Gcc, the other steps are portable. 5) 4 can be done before 3 Apparently, CSE is the hardest part since very few compilers do a decent job of it. -- Stuart D. Gathman <stuart@bms-at.uucp> <..!{vrdxhq|daitc}!bms-at!stuart>