[comp.arch] Dynamic frequency analysis

rcd@ico.ISC.COM (Dick Dunn) (05/17/89)

In article <952@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
[much deleted]
> ...This is a very good
> support for my position on "register" (where a competent programmer is
> expected to do this, with great simplication of the compiler, because
> compilers cannot possibly know which variables are the most frequently used
> dynamically).

Where did this idea come from, and what can we do to send it back???
All it takes for the compiler to allocate based on dynamic frequency is
the right information from an execution profile.  (Q to MIPS folk--don't
you have tools for this right now?)  In practice it may be tedious to
develop a compiler system to do this, but the principle is simple:  Compile
the program once with lots of instrumentation to gather frequency-of-use
info for variables.  Run the instrumented program on what you consider
"typical" data, and catch the output of the instrumentation.  Feed this
data and the program back to the compiler for the super-optimizing pass.

> Or the "register" keyword, and you are a competent programmer. Possibly
> extended to global variables...

There are several flaws here.  First, people who've played with execution
profiles know that very good programmers are often surprised to find out
where a program spends its time.  It's a very common blind spot.  That's
why people build execution-profile tools.  But then note that if you've got
some execution-profile tools, you're on the road to building what you need
to get the compiler doing the tuning for you.

Second, "register" is so weak it's almost useless.  It's only a single
binary attribute; it only gives you one bit of quantifying the "importance"
of a variable--either it is or it isn't.  You get no chance to make
statements about the relative importance of register variables.  This is
tied closely to the question, "How many variables should I declare as
register?  (What frequency of use merits a register attribute?)"

You can't say that a variable is used heavily in one part of the code but
seldom in another part--i.e., keep it in a register for a while, then toss
it back into memory.

But even if "register" gave you enough control, it would still remain a
property of the code rather than of the execution of the code.  If you
change the flow of one part of a program, it may very well change frequency
of use of variables in a far-removed part enough to merit changing register
assignments.  Why should you have to re-comprehend the whole program to
make some localized changes and maintain efficiency.  (Sounds like
something a computer could do.:-)
-- 
Dick Dunn      UUCP: {ncar,nbires}!ico!rcd           (303)449-2870
   ...Relax...don't worry...have a homebrew.

beyer@cbnewsh.ATT.COM (jean-david.beyer) (05/17/89)

In article <15767@vail.ICO.ISC.COM>, rcd@ico.ISC.COM (Dick Dunn) writes:
> In article <952@aber-cs.UUCP>, pcg@aber-cs.UUCP (Piercarlo Grandi) writes:
> [much deleted]

> > support for my position on "register" (where a competent programmer is
> > expected to do this, with great simplication of the compiler, because
> > compilers cannot possibly know which variables are the most frequently used
> > dynamically).
> 
> Where did this idea come from, and what can we do to send it back???
> All it takes for the compiler to allocate based on dynamic frequency is
> the right information from an execution profile.  (Q to MIPS folk--don't
> you have tools for this right now?)  In practice it may be tedious to
> develop a compiler system to do this, but the principle is simple:  Compile
> the program once with lots of instrumentation to gather frequency-of-use

One of our optimizers allocates registers by estimating useage. To handle
loops, it assums each loop is executed a small number of times. This does
much better register allocation than I do manually. It does not use profile
data. 

I once modified the optimizer to use the true count, when it could be
determined (many loops are done a constant number of times that can
be determined at optimization time, especially in benchmarks :-)   )
This did not improve the execution time of any of the benchmarks tested,
beyond what assuming they were executed a small number of times.

I thought about using execution-time profiles to improve optimization,
but that just opened up the can of  worms of determining what "typical data"
to use for the profiling run. At least, MIPS do this.

-- 
Jean-David Beyer
AT&T Bell Laboratories
Holmdel, New Jersey, 07733
attunix!beyer

pcg@aber-cs.UUCP (Piercarlo Grandi) (05/21/89)

In article <664@cbnewsh.ATT.COM> beyer@cbnewsh.ATT.COM (jean-david.beyer) writes:

    One of our optimizers allocates registers by estimating useage. To handle
    loops, it assums each loop is executed a small number of times. This does
    much better register allocation than I do manually.

If you say so.... :-) :-) (Sorry, but I could not resist -- I am actually
prepared to believe that it could even beat me...).

    It does not use profile data. 
    
    I once modified the optimizer to use the true count, when it could be
    determined (many loops are done a constant number of times that can
    be determined at optimization time, especially in benchmarks :-)   )
    This did not improve the execution time of any of the benchmarks tested,
    beyond what assuming they were executed a small number of times.

I am prepared to believe you, and you have made a good contribution to
optimizer technology (even if, let me dimly remember, Gries in the ancient
"compiler construction for digital computers", noted that the legendary
Fortran H did weigh more heavily variables appearing in loops).

My interpretation: the Wall paper essentially says that while coloring may
be good (as per a very interesting letter I received, especially if a
different DAG traversal order is used) at minimizing spills, it is no good
at intuiting which variables matter; indeed, let me add, it tends to keep a
variable in a register based on how many it appears statically in the text
of a program (spill minimization), which may or may not have any correlation
with how often it is used.

Typically the variables for which static vs. dynamic use is least correlated
are those in loops (which "ironically" are those that matter most). In other
words, what really matters is caching preferentially the variables in "most
frequently executed loops". A weaker strategy, that does not require dynamic
info, is to cache variables in "any loop"; an even weaker one is to cache
variables, period.

    I thought about using execution-time profiles to improve optimization,
    but that just opened up the can of worms of determining what "typical
    data" to use for the profiling run. At least, MIPS do this.

What you are telling us is that simply caching variables in "any loop" is a
big win, and (almost?) as good as selecting those in the critical ones; this
means that caching variables in the less critical loops is not as big a loss
as caching based on static analysis. This is not surprising, in hindsight;
there may be at least two reasons:

	[1] If loops are non interfering in a dataflow sense, we can reuse
	    the same registers for them, and not waste any. I suspect this
	    is quite common.

	[2] If a loop is executed even a few times, caching its variables
	    is a win nonetheless.

In other words, "used in a loop" is a good estimator of dynamic frequency
of use, and spares the optimizer from having to cache everything in sight
to get a good chance of caching all the variables that matter.

Well, I may be sentimental, but I still prefer "register" and my own
judgment. On the other hand, of course, "performance-by-design" is a win
only if it is done. If not, an optimizer is just a fixup, but may be useful
("read carefully the instructions...").
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

morrison@cg-atla.UUCP (Steve Morrison) (05/23/89)

In article <966@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:


Well, I may be sentimental, but I still prefer "register" and my own
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
judgment.
^^^^^^^^^

One motto was stressed over and over again in a course I took on
Software Enginerring back in the seventies: "The goal of Software
Engineering isn't to make good programmers better, it's to make
average programmers as productive as the best."

hankd@pur-ee.UUCP (Hank Dietz) (05/24/89)

In article <966@aber-cs.UUCP> you write:
>In other words, "used in a loop" is a good estimator of dynamic frequency
>of use, and spares the optimizer from having to cache everything in sight
>to get a good chance of caching all the variables that matter.

"Used in a loop" is not the flip side to dynamic collection of stats; static
analysis of expected execution count is.  Where possible, the compiler
should compute the exact number of iterations made by loops, etc.  My PhD
thesis, and other papers by other folk, show how to do this.

It is only when the exact analysis is THEORETICALLY IMPOSSIBLE that an
approximate scheme is used.  For example:

f()
{
	{A}
	if (B) {
		{C}
		while (D) {
			{E}
		}
		if (F) {
			{G}
		}
	} else {
		{H}
	}
	{I}
	while (J) {
		{K}
	}
	{L}
}

In the above code, let's assume that no range information, etc., is available
about the expressions B, D, F, and J.  Then one might say that the expected
execution counts are:

code	E(count)	comments
----	--------	--------
{A}	1		actually 1 times E(function f)
B	1
{C}	0.51		as per Multiflow, 0.51/0.49 then/else
D	0.51*101	as per Multiflow, 100 iterations per while
{E}	0.51*100
F	0.51
{G}	0.51*0.51	simple composition
{H}	0.49
{I}	1
J	1*101
{K}	1*100
{L}	1

This generates the following partial ordering of regions based on expected
execution count:

J		101
{K}		100
D		 51.51
{E}		 51
{A},B,{I},{L}	  1
{C},F		  0.51
{H}		  0.49
{G}		  0.2601

Hence, despite having NO range analysis, it will be recognized that the while
loop J,{K} is more important than the while loop D,{E}.  Now, you might argue
that you can easily create a program where this would be incorrect...  but
the point is that it is very rare that this analysis would mislead you AND
getting a few dynamic statistics would correct the error.  In other words,
dynamic statistics are just about as unreliable as static guesses...  it
might be depressing, but that's how it is most of the time.

Of course, the above expected counts can be further weighted by the expected
spill costs, etc.

In fact, a compiler doesn't allocate VARIABLES to registers -- it allocates
VALUES to registers -- hence all these issues really are rather artificial.
Interesting register allocation problems only arrise when the number of
simultaneously live values which are good candidates for registers exceeds
the number of registers available.  This does happen, but not as often as
you might think....

>Well, I may be sentimental, but I still prefer "register" and my own
>judgment. On the other hand, of course, "performance-by-design" is a win
>only if it is done. If not, an optimizer is just a fixup, but may be useful
>("read carefully the instructions...").

Use register wherever possible -- it means "no address, no alias, and
non-volatile."  That can help the compiler understand your program better.
As for you coding better than an optimizing compiler, well, congratulations,
John Henry. ;-)

						-hankd@ee.ecn.purdue.edu