carter@cs.wisc.edu (Gregory Carter) (05/01/91)
Hello... I don't know if this is incredibly obvious or what, but in the OLDEN DAYS when I was working on a 6502 machine. Speed was of the upmost...ad nauseum.. What I want to know, if any of you have considered, for high speed applications to get that extra push, what the problems would be in compiler design to: 1) Transcribe all local variables to global ones. 2) Replacing subroutine calls with the actual code. 3) minimizing stack frame usage to almost ZIPPO. I have done this so often by hand, I wish I could tell my compiler to do it! For example: #...ad nauseum sub1() { ... } main() { sub1() } -> My mythical switch -L (Local data and code segments -> global state) results in the above being transcribed to: main() { ... } No stack frames, no parameters, no slow stack model, DUMP THOSE JSR's/RTS's link/unlnk instructions! GIVE ME JMP PERIOD. Since the compiler handles the transcription, Hey! We can still have readable modular ad nauseum type code on the C source level. Of course, this is REDUNDANT, in that code is replicated all over the place. Millions of machine cycles can be saved, though...depending on your computer. I think this method will be more and more attractive as memory prices go through the floor. I don't see why just because us humans have to think about inefficient stack models, why must the computer be made to do it this way when we don't need it too? Problem is, will the transcription of the code generation part of the compiler be able to do this when you want to shift from a stack based paradigm to a non stack based paradigm. --Gregory (Undergrad for life) [GCC and some other C compilers permit you do declare a procedure "inline" so that it is expanded wherever it is called. That gives pretty much the same effect, faster but larger code. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
byron@archone.tamu.edu (Byron Rakitzis) (05/02/91)
In article <1991May1.035622.25021@daffy.cs.wisc.edu> carter@cs.wisc.edu (Gregory Carter) writes: >I don't know if this is incredibly obvious or what, but in the OLDEN DAYS >when I was working on a 6502 machine. Speed was of the upmost...ad nauseum.. Hmm... are we seeing an "all the world's a 6502" attitude in this article? (although, I admit that my subsequent comments assume "all the world's a machine with slow memory and fast registers (e.g., most RISC architectures)") >What I want to know, if any of you have considered, for high speed >applications to get that extra push, what the problems would be in compiler >design to: >1) Transcribe all local variables to global ones. Ouch. This will kill register allocation on most compilers. You WANT your code in blocks with local declarations of variables; this gives an optimizing compiler a better chance at allocating registers over variable lifetimes. However, even the simplest optimizations can be killed by making your variables global; compilers like gcc usually make pessimal assumptions about how global variables are used: x = 1; /* assume x is global. gcc writes x to memory */ foo(); bar(x); /*gcc will reload x from memory, even if foo() does not touch */ /* x or the register that x was stored in */ /* (i.e., there's no interprocedural optimization on globals) */ >2) Replacing subroutine calls with the actual code. This is already done. >3) minimizing stack frame usage to almost ZIPPO. If you have a good compiler on a RISC archictecture, then you probably touch the stack frame as little as possible ANYWAY. Even return addresses are not stored on the stack in leaf subroutines on the MIPS, for example. (or rather, they shouldn't be!) Sorry to pick nits. Now, to address the ideas in your article: yes, it's clear that it should be possible for you to configure your compiler to trade space for time. Most of the optimizations you suggest are more readily accomplished by good interprocedural optimization, though. -- Byron Rakitzis, Texas A&M., byron@archone.tamu.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
pardo@june.cs.washington.edu (David Keppel) (05/03/91)
carter@cs.wisc.edu (Gregory Carter) writes: >[Optimize: > (1) local vars -> global vars > (2) replace subroutine calls with the subroutine code > (3) minimize stack frame usage] On many machines (RISC machines in particular), accesing global memory is no faster than accessing local variables. The cost of stack pointer update is, of course, larger than no update. For larger procedures the cost is minimal. Smaller procedures can be inlined and/or values just kept in registers. It would be possible to do flow analysis and do one stack pointer update for several nested calls; I know of no such work but it's conceptually easy. In general, the biggest win is from making effective use of registers, and if you can't fit it all in 32 registers, you're already doing enough work tht the cost of a stack pointer update is small. As John Levine points out, some compilers support function inlining. The code often gets larger and on machines with caches, may get slower. A number of compilers attempt to minimize stack frame usage. On CISCs such as the VAX and i386, stack usage is a part of the calling convention defined by most compilers. On RISCs, the compilers can often optimize leaf procedures enough to eliminate all stack references. The SPARC compilers can also optimize away leaf procedure register window saves and restores (register windows implicitly consume stack space). So the answers are (1) It's not usually a big win (2) See GCC and other production-quality compilers (3) See GCC and other production-quality compilers As an aside, several C compilers I've looked at take code of the form foo(){ {int a[SZ]; ...} {int a[SZ]; ...} } and allocate two `SZ'-sized slots on the stack. This doesn't cause any extra stack pointer updates but does reduce the localit of reference. ;-D on ( Code twisting ) Pardo -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
preston@ariel.rice.edu (Preston Briggs) (05/03/91)
pardo@june.cs.washington.edu (David Keppel) writes: >The cost of stack pointer update is, of course, larger than no update. For >larger procedures the cost is minimal. Smaller procedures can be inlined >and/or values just kept in registers. It would be possible to do flow >analysis and do one stack pointer update for several nested calls; I know of >no such work but it's conceptually easy. Tom Murtagh has been working on this for a while. See, for example A Less Dynamic Memory Allocation Scheme for Algol-Like Languages Thomas P. Murtagh POPL 11, 1984 It seems like he's done further papers on the subject (with C. Ruggerie?), but I can't find them right off. Preston Briggs -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
deutsch@lix.polytechnique.fr (Alain Deutsch) (05/14/91)
[re inlining procedures and turning locals into globals]
There is a related paper by Peter Sestoft:
@InProceedings{Sestoft89,
author = "Sestoft,P.",
title = "Replacing function parameters by global variables",
booktitle = "Conference on Functional Programming Languages and Computer
Architecture",
year = "1989",
publisher = "ACM Press",
address = "London",
month = "Sep.",
pages = "39--53"
}
----
Alain Deutsch, LIX, Ecole Polytechnique, 91128 Palaiseau Cedex, France.
deutsch@lix.polytechnique.fr, (33) 1 69/33/45/90.
--
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.