[comp.compilers] Code folding from JSR/RTS -> {Local}

carter@cs.wisc.edu (Gregory Carter) (05/01/91)

Hello...

I don't know if this is incredibly obvious or what, but in the OLDEN DAYS
when I was working on a 6502 machine.  Speed was of the upmost...ad nauseum..

What I want to know, if any of you have considered, for high speed applications
to get that extra push, what the problems would be in compiler design to:

1) Transcribe all local variables to global ones.
2) Replacing subroutine calls with the actual code.
3) minimizing stack frame usage to almost ZIPPO.

I have done this so often by hand, I wish I could tell my compiler to do it!

For example:

#...ad nauseum

sub1()
{
...
}

main()
{
  sub1()
}

-> My mythical switch -L (Local data and code segments -> global state)
   results in the above being transcribed to:

main()
{
...
}

No stack frames, no parameters, no slow stack model, DUMP THOSE
JSR's/RTS's link/unlnk instructions!  GIVE ME JMP PERIOD.  Since the
compiler handles the transcription, Hey!  We can still have readable
modular ad nauseum type code on the C source level.

Of course, this is REDUNDANT, in that code is replicated all over the
place.  Millions of machine cycles can be saved, though...depending on
your computer.  I think this method will be more and more attractive as
memory prices go through the floor.

I don't see why just because us humans have to think about inefficient
stack models, why must the computer be made to do it this way when we
don't need it too?  

Problem is, will the transcription of the code generation part of the
compiler be able to do this when you want to shift from a stack based
paradigm to a non stack based paradigm.

--Gregory (Undergrad for life)
[GCC and some other C compilers permit you do declare a procedure "inline"
so that it is expanded wherever it is called.  That gives pretty much the
same effect, faster but larger code. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

byron@archone.tamu.edu (Byron Rakitzis) (05/02/91)

In article <1991May1.035622.25021@daffy.cs.wisc.edu> carter@cs.wisc.edu (Gregory Carter) writes:
>I don't know if this is incredibly obvious or what, but in the OLDEN DAYS
>when I was working on a 6502 machine.  Speed was of the upmost...ad nauseum..

Hmm... are we seeing an "all the world's a 6502" attitude in this article?
(although, I admit that my subsequent comments assume "all the world's a
machine with slow memory and fast registers (e.g., most RISC architectures)")

>What I want to know, if any of you have considered, for high speed
>applications to get that extra push, what the problems would be in compiler
>design to:

>1) Transcribe all local variables to global ones.

Ouch. This will kill register allocation on most compilers. You WANT your
code in blocks with local declarations of variables; this gives an
optimizing compiler a better chance at allocating registers over variable
lifetimes. However, even the simplest optimizations can be killed by
making your variables global; compilers like gcc usually make pessimal
assumptions about how global variables are used:

    x = 1;    /* assume x is global. gcc writes x to memory */
    foo(); 
    bar(x);   /*gcc will reload x from memory, even if foo() does not touch */
	      /* x or the register that x was stored in			    */
	      /* (i.e., there's no interprocedural optimization on globals) */

>2) Replacing subroutine calls with the actual code.

This is already done.

>3) minimizing stack frame usage to almost ZIPPO.

If you have a good compiler on a RISC archictecture, then you probably
touch the stack frame as little as possible ANYWAY. Even return addresses
are not stored on the stack in leaf subroutines on the MIPS, for example.
(or rather, they shouldn't be!)

Sorry to pick nits. Now, to address the ideas in your article: yes, it's
clear that it should be possible for you to configure your compiler to
trade space for time. Most of the optimizations you suggest are more
readily accomplished by good interprocedural optimization, though.

--
Byron Rakitzis, Texas A&M., byron@archone.tamu.edu
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

pardo@june.cs.washington.edu (David Keppel) (05/03/91)

carter@cs.wisc.edu (Gregory Carter) writes:
>[Optimize:
> (1) local vars -> global vars
> (2) replace subroutine calls with the subroutine code
> (3) minimize stack frame usage]

On many machines (RISC machines in particular), accesing global memory is
no faster than accessing local variables.  The cost of stack pointer
update is, of course, larger than no update.  For larger procedures the
cost is minimal.  Smaller procedures can be inlined and/or values just
kept in registers.  It would be possible to do flow analysis and do one
stack pointer update for several nested calls; I know of no such work but
it's conceptually easy.  In general, the biggest win is from making
effective use of registers, and if you can't fit it all in 32 registers,
you're already doing enough work tht the cost of a stack pointer update is
small.

As John Levine points out, some compilers support function inlining.  The
code often gets larger and on machines with caches, may get slower.

A number of compilers attempt to minimize stack frame usage.  On CISCs
such as the VAX and i386, stack usage is a part of the calling convention
defined by most compilers.  On RISCs, the compilers can often optimize
leaf procedures enough to eliminate all stack references.  The SPARC
compilers can also optimize away leaf procedure register window saves and
restores (register windows implicitly consume stack space).

So the answers are

(1) It's not usually a big win
(2) See GCC and other production-quality compilers
(3) See GCC and other production-quality compilers


As an aside, several C compilers I've looked at take code of the form

	foo(){ {int a[SZ]; ...} {int a[SZ]; ...} }

and allocate two `SZ'-sized slots on the stack.  This doesn't cause any
extra stack pointer updates but does reduce the localit of reference.

		;-D on  ( Code twisting )  Pardo
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

preston@ariel.rice.edu (Preston Briggs) (05/03/91)

pardo@june.cs.washington.edu (David Keppel) writes:

>The cost of stack pointer update is, of course, larger than no update.  For
>larger procedures the cost is minimal.  Smaller procedures can be inlined
>and/or values just kept in registers.  It would be possible to do flow
>analysis and do one stack pointer update for several nested calls; I know of
>no such work but it's conceptually easy.

Tom Murtagh has been working on this for a while.
See, for example

	A Less Dynamic Memory Allocation Scheme for Algol-Like Languages
	Thomas P. Murtagh
	POPL 11, 1984

It seems like he's done further papers on the subject (with C. Ruggerie?), but
I can't find them right off.

Preston Briggs
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

deutsch@lix.polytechnique.fr (Alain Deutsch) (05/14/91)

[re inlining procedures and turning locals into globals]

There is a related paper by Peter Sestoft:

@InProceedings{Sestoft89,
  author = 	"Sestoft,P.",
  title = 	"Replacing function parameters by global variables",
  booktitle = 	"Conference on Functional Programming Languages and Computer
	         Architecture",
  year = 	"1989",
  publisher = 	"ACM Press",
  address = 	"London",
  month = 	"Sep.",
  pages = 	"39--53"
}

----
Alain Deutsch, LIX, Ecole Polytechnique, 91128 Palaiseau Cedex, France.
deutsch@lix.polytechnique.fr, (33) 1 69/33/45/90.

-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.