darrell@sdcsvax.UUCP (01/27/87)
-- I would like to pose a question to those of you interested in the performance of computer systems. What method do you favour for measuring/analysing the performance of a computer system? Stochastic (Markov) analysis? Simulation? Direct measurement? Some other technique? I know that it depends on what is being analysed/measured, but I would like to hear what method you favour and why. DL -- Darrell Long Department of Computer Science and Engineering University of California, San Diego ARPA: Darrell@Beowulf.UCSD.EDU UUCP: darrell%beowulf@sdcsvax.uucp --
darrell@sdcsvax.UUCP (01/27/87)
-- The method I use for performance analysis depends heavily on the problem being investigated. Most of my work is in measurement and tuning of operating systems, so I usually start by instrumenting the system of interest and then performing statistical analysis on the results. [Could you explain how you "instrument" the system? -DL] This usually provides only a rough sketch of the system's behavior, because measurement interfers with behavior, and because instrumentation doesn't uncover all of the behavior. Once I have this rough sketch, I propose a series of experiments aimed at highlighting the behavior I'm interested in. I do this using the scientific method: propose a hypothesis explaining some behavior of the system, formulate an experiment which can disprove the hypothesis, perform the experiment, and evaluate the results. The experiments can consist of modifications to the measurements being made, applications of analytic models, stochastic or event driven model, and performance of workload benchmarks; as appropriate. In a recently completed analysis of the UniCos scheduler on the Cray 2, I have used all but stochastic models. My personal preferences for analysis tools are SPSS and SLAMII, not because they are inherently better than other tools, but because I am familiar with them and they are (usually) readily available. --
darrell@sdcsvax.UUCP (01/28/87)
-- In article <2614@sdcsvax.UCSD.EDU> fouts@orville%ames.arpa (Marty Fouts) writes: >-- > >The method I use for performance analysis depends heavily on the problem >being investigated. Most of my work is in measurement and tuning of >operating systems, so I usually start by instrumenting the system of >interest and then performing statistical analysis on the results. > >[Could you explain how you "instrument" the system? -DL] > Sure, instrumentation can be done in two ways. When you are very luck, you can use an external hardware monitor to sample the state of the system (usually PS and some status registers) and then later run the samples through software which correlates it to software states. This is the 'easy' way. When you are not lucky, you modify the operating system to increment counters based on periodic state checks (user versus system state, for example) or on the occurance of events. (I/O completion.) Sometimes you check periodic data at event occurance, like recording the amount of idle time accumulated by the process which is about to be made runable as a result of an i/o completion. There are three major problems here, along with a number of gotchas I won't go into. First, is the autocorrelation problem. If the samples are always taken on a major clock tick, they may reflect state which is dependent on the tick having just happened. This can cause performance data to be skewed in sometimes subtle ways. Second, is the interaction problem. Adding code to an operating system always changes the timing of the system. Sometimes it doesn't impact the feature being measured, but you can never tell for certain. Sometimes, especially when measuring real time systems, instrumentation can have an adverse impact on the system. Adding .1 millisecond of CPU time to a routine called once a millisecond can have a substantial impact on a system. Third is the capture problem. Determining how to retrieve information being gathered in real time in a way which creates a consistent view of the system can be a major problem. You want to have all of the data consistent at some point of time and then to be able to capture all of the data in an atomic action, and that usually isn't possible. Also, you have to figure out where to put all of the data you are capturing it. Sometimes you are generating enough data to require some data reduction be performed in real time. --
darrell@sdcsvax.UUCP (02/02/87)
I am a direct measurement type for several reasons: 1) I believe in the Missouri principle (show me) 2) I and many other have been burned by people's claims for hardware 3) Simulation has many draw backs: call them limitations or tradeoffs, you don't trade off in reality. 4) Direct measurement can show interactions which simulation, etc. can't show (especially when asynchronous things take place on sequential simulations). 5) Lastly, consider the following situation posed by a comment once given to me: "Gee, you simulate flying by those outer planets so well, that we don't need to fly spacecraft past them......" From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene
darrell@sdcsvax.UUCP (02/04/87)
In article <2683@sdcsvax.UCSD.EDU> ucbvax!ames-pioneer.arpa!ames!eugene@sdcsvax (Eugene Miya N.) writes: >I am a direct measurement type for several reasons: >1) I believe in the Missouri principle (show me) >2) I and many other have been burned by people's claims for hardware >3) Simulation has many draw backs: call them limitations or tradeoffs, >you don't trade off in reality. >4) Direct measurement can show interactions which simulation, etc. >can't show (especially when asynchronous things take place on sequential >simulations)..... Additional comments: a) If I'm buying a computer, I'm a direct measurement type. b) If I haven't built it yet, and need to know which way to do it, I sure like simulation a lot. c) If I've built it, and I have simulations, it's awfully nice to correlate them so I can trust the simulator more, although as Gene says, some situations are very hard to simulate. To illustrate the importance and value of really good simulations, one can observe the unfortunate problem of announcing performance, and then having to backtrack horribly when the product comes out. I'm sure most people have run into some of these. Some are marketing ploys, but others appear to be cases where people simply did not even simulate typical applications. On the other hand, good simulation can be incredibly helpful, and it's hard to believe people are serious about designing new computers without doing good simulation. As an example, this last summer, we started to be able to run much larger programs than we'd been able to simulate, and also more multi-tasking stuff, on M/500s. Although the performance of the simulated benchmarks was as expected, the bigger ones were not quite the 5X 11/780 we were looking for. There were a bunch of possible ways to fix this. Our performance gurus simulated the alternatives, which led us to the simplest improvement: double the I-cache from 8K to 16K [costs about $50 and was fairly straightforward]. It would have been difficult to quickly pick the correct choice without such simulation, since a whole bunch of other proposals were also considered. Coincidentally, a prospective customer was in benchmarking the week new boards were to appear. They ran a large benchmark on the 8K machine, and address traces were also captured, which when run thru the simulator, predicted the results they got. Our gurus then fed the 16K size to the simulator, and gave the customer the simulated numbers. The next day, the new boards arrived, and they re-ran their benchmarks, which matched the predictions within about 1%!! Needless to say, they were surprised. Of course, we weren't surprised at all :-) [UNIX timing granularity isn't that good: if you can get within a few tenths of a second, you're lucky!] Bottom line: simulation is necessary, but not sufficient. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
darrell@sdcsvax.UUCP (02/10/87)
John Mashey writes: > To illustrate the importance and value of really good simulations, one > can observe the unfortunate problem of announcing performance, and then > having to backtrack horribly when the product comes out... I am reminded of the story in Fred Brooks's book, in which he mentions what happened when the performance simulator for OS/360 started running. It showed a Model 75 with drums compiling Fortran at ten cards per minute, or something like that. This was investigated immediately, of course. Turned out that many of the OS group had never used disks before, and even the innermost parts of OS were doing disk overlays with wild abandon. It *did* help the coders meet their core limits... Needless to say, this could have been an utter disaster if the simulation hadn't caught it early. I also dimly recall an incident on one of the 370s, the model 135 I think, in which consistent speed discrepancies between hardware and simulation uncovered a hardware bug well after the system had been thoroughly tested and released for production. I forget the IBM buzzwords, but it was a case where there were N copies of a resource for the sake of parallelism, but the hardware in fact was only ever using one. Nothing functionally wrong, so the orthodox diagnostics never caught it. Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry -- Darrell Long Department of Computer Science & Engineers, UC San Diego, La Jolla CA 92109 ARPA: Darrell@Beowulf.UCSD.EDU UUCP: darrell@sdcsvax.uucp Operating Systems submissions to: mod-os@sdcsvax.uucp