[comp.compilers] Followup to naive compilation time question

eugene@eos.arc.nasa.gov (Eugene Miya) (10/01/89)
I have an interest in performance measurement, and a major problem
we have is in dealing with naive [over-simple] perspectives on the topic.
Everyone cites that compilers have an influence on performance, but
few do anything about it.  Now there are several separate issues:
one deals with the running of actual test codes, but a person also
needs to understand how compilers influence the process of actually
trying to get to an executable code, and other issues.

The BIG problem is trying to preserve that naive view.  Asking the question
influences the way people think about the problem.  One way we can accomplish
the understanding of the above is to test several codes on a compiler.
After all a compiler is just another program. Right 8)?  So timing
might be a marginally useful metric.

So it was with this in mind that I deliberately asked the naive question:
if you have a program of 10,000 lines which compiled in time X, how
long would a similarly structured program of 20,000 lines takes to compile.
	<2X	X	>2X
Now the problem, which I decided to ignore as a black box, is that
people familiar with the insides of compilers know different algorithms
are used which bound performance.  It was interesting to see what in their
heads they thought dominated execution: table look up, linking, etc. etc.
And for every one of these, some one thought something else.
There was NO consensus on those who made any distinction with link/load time:
About an equal number said faster, others said slower.

I should also mention I have an interest in seeing how network communication
propagates and I was pleasantly surprised at the result.  I had the largest
survey I had ever done (previous largest was a survey on a gatewayed,
unmoderated group [sci.space, "just ack this messge", and only got 90-some]).
You can see how messages were largely answered within 1 day!

The initial trend of messges which came in was interesting,
mostly linear at first then bimodal to n^2 followed by n ln n.
This summary can't show the message sequence.  It does show distribution.

Totals    DAYS:        0  1  2  3  3+
______    Bounds
11        <O(n)        6  3  1  0  1
39        O(n)         22 3  7  4  3
20        O(n ln n)    9  2  4  3  2
24        O(n^2)       15 2  3  1  3
6         >O(n^2)      3  3  0  0  0
__		       __ __ __ _  __
100		       55 13 15 8  9

94 people trust the reply command on their news or name systems.
4 didn't, or read my signature instructions.  2 used a more generalized
mail path and got mail to me on another machine.  I thought about a cut off
in 2 days at that return rate, but I was really busy and finally decided
to carry it to 100 people. (more being busy, 100 people took 9 days).

Many people thought interprocedural optimization would be the big
cycle burner in compilation.  (In my studies, I am generating big monolithic
code blocks, we also have a couple of users, physicists, who do not use
subroutines, functions or procedures: so I hope you don't rely on these
assumptions in your compilers, hey, so some of our users are a bit
"eccentric" by computer standards.)

Several people brought operating system effects (paging) into play
[remember, a compiler is just another program].  This is a good point.
Fortunately for my reearch I've selected to do some research on a machine
without VM.

Many expected big constant (fixed) initial start up times.  This was a
common justification for the <2X timing.
Dennis Cohen (a former co-worker) did later time on his system and
got 1.8x, so that proof right? 8)  So every perspective has some basis
for their beliefs.  I.E., one has to learn how to ask questions in a better
way.  This we refine.

Compiled code size?  Sure.  This will vary in many case, but I've plans.

The bottom line, no pun intended, is that we have a significant portion
of an educated readership with vastly differing views of how a particular
class of programs work.  Our thinking is dominated by linear models.
This is interesting because if you consider operations on double or
multiple precision arithmetic, people will probably also think the costs
are merely linear, too.  See this is a serious point.  I could have asked
a very similar question: if I took a program and it was written in a
language with a double precision type, would I expect to see the run
time to go up linearly?  [Should I survey this?  Showing of hands:
<2X?  2X? >2X?]  I know the influences of such a response.  Think about it.
The question isn't strictly a S/w question (if you really know hardware
trends).

So what have I learned?  I've verified that I have a big hurdle with
the linear model of performance thinkers.  Conveying non-linear
effects will be important.  I've been given a few by doing this survey
by watching how people responded.  Was this survey scientific?  No,
not intended to me, I wouldn't try to write a journal paper on it.

My thanks to all who responded.  Responses will be anonymous (except
for Dennis 8).

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  {ncar,decwrl,hplabs,uunet}!ames!eugene
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.