eugene@eos.arc.nasa.gov (Eugene Miya) (10/01/89)
I have an interest in performance measurement, and a major problem we have is in dealing with naive [over-simple] perspectives on the topic. Everyone cites that compilers have an influence on performance, but few do anything about it. Now there are several separate issues: one deals with the running of actual test codes, but a person also needs to understand how compilers influence the process of actually trying to get to an executable code, and other issues. The BIG problem is trying to preserve that naive view. Asking the question influences the way people think about the problem. One way we can accomplish the understanding of the above is to test several codes on a compiler. After all a compiler is just another program. Right 8)? So timing might be a marginally useful metric. So it was with this in mind that I deliberately asked the naive question: if you have a program of 10,000 lines which compiled in time X, how long would a similarly structured program of 20,000 lines takes to compile. <2X X >2X Now the problem, which I decided to ignore as a black box, is that people familiar with the insides of compilers know different algorithms are used which bound performance. It was interesting to see what in their heads they thought dominated execution: table look up, linking, etc. etc. And for every one of these, some one thought something else. There was NO consensus on those who made any distinction with link/load time: About an equal number said faster, others said slower. I should also mention I have an interest in seeing how network communication propagates and I was pleasantly surprised at the result. I had the largest survey I had ever done (previous largest was a survey on a gatewayed, unmoderated group [sci.space, "just ack this messge", and only got 90-some]). You can see how messages were largely answered within 1 day! The initial trend of messges which came in was interesting, mostly linear at first then bimodal to n^2 followed by n ln n. This summary can't show the message sequence. It does show distribution. Totals DAYS: 0 1 2 3 3+ ______ Bounds 11 <O(n) 6 3 1 0 1 39 O(n) 22 3 7 4 3 20 O(n ln n) 9 2 4 3 2 24 O(n^2) 15 2 3 1 3 6 >O(n^2) 3 3 0 0 0 __ __ __ __ _ __ 100 55 13 15 8 9 94 people trust the reply command on their news or name systems. 4 didn't, or read my signature instructions. 2 used a more generalized mail path and got mail to me on another machine. I thought about a cut off in 2 days at that return rate, but I was really busy and finally decided to carry it to 100 people. (more being busy, 100 people took 9 days). Many people thought interprocedural optimization would be the big cycle burner in compilation. (In my studies, I am generating big monolithic code blocks, we also have a couple of users, physicists, who do not use subroutines, functions or procedures: so I hope you don't rely on these assumptions in your compilers, hey, so some of our users are a bit "eccentric" by computer standards.) Several people brought operating system effects (paging) into play [remember, a compiler is just another program]. This is a good point. Fortunately for my reearch I've selected to do some research on a machine without VM. Many expected big constant (fixed) initial start up times. This was a common justification for the <2X timing. Dennis Cohen (a former co-worker) did later time on his system and got 1.8x, so that proof right? 8) So every perspective has some basis for their beliefs. I.E., one has to learn how to ask questions in a better way. This we refine. Compiled code size? Sure. This will vary in many case, but I've plans. The bottom line, no pun intended, is that we have a significant portion of an educated readership with vastly differing views of how a particular class of programs work. Our thinking is dominated by linear models. This is interesting because if you consider operations on double or multiple precision arithmetic, people will probably also think the costs are merely linear, too. See this is a serious point. I could have asked a very similar question: if I took a program and it was written in a language with a double precision type, would I expect to see the run time to go up linearly? [Should I survey this? Showing of hands: <2X? 2X? >2X?] I know the influences of such a response. Think about it. The question isn't strictly a S/w question (if you really know hardware trends). So what have I learned? I've verified that I have a big hurdle with the linear model of performance thinkers. Conveying non-linear effects will be important. I've been given a few by doing this survey by watching how people responded. Was this survey scientific? No, not intended to me, I wouldn't try to write a journal paper on it. My thanks to all who responded. Responses will be anonymous (except for Dennis 8). Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov {ncar,decwrl,hplabs,uunet}!ames!eugene -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.