[mod.os] Performance analysis of computer systems

darrell@sdcsvax.UUCP (01/27/87)

--

I would like to pose a question to those of you interested in the performance
of computer systems.  What method do you favour for measuring/analysing the
performance of a computer system?  Stochastic (Markov) analysis?  Simulation?
Direct measurement?  Some other technique?

I know that it depends on what is being analysed/measured, but I would like
to hear what method you favour and why.

DL
--
Darrell Long
Department of Computer Science and Engineering
University of California, San Diego

ARPA: Darrell@Beowulf.UCSD.EDU
UUCP: darrell%beowulf@sdcsvax.uucp

--

darrell@sdcsvax.UUCP (01/27/87)

--

The method I use for performance analysis depends heavily on the problem
being investigated.  Most of my work is in measurement and tuning of
operating systems, so I usually start by instrumenting the system of
interest and then performing statistical analysis on the results.

[Could you explain how you "instrument" the system?  -DL]

This usually provides only a rough sketch of the system's behavior, because
measurement interfers with behavior, and because instrumentation doesn't
uncover all of the behavior.  Once I have this rough sketch, I propose a
series of experiments aimed at highlighting the behavior I'm interested in.
I do this using the scientific method: propose a hypothesis explaining
some behavior of the system, formulate an experiment which can disprove
the hypothesis, perform the experiment, and evaluate the results.

The experiments can consist of modifications to the measurements being made,
applications of analytic models, stochastic or event driven model, and
performance of workload benchmarks; as appropriate.  In a recently completed
analysis of the UniCos scheduler on the Cray 2, I have used all but stochastic
models.

My personal preferences for analysis tools are SPSS and SLAMII, not because
they are inherently better than other tools, but because I am familiar with
them and they are (usually) readily available.

--

darrell@sdcsvax.UUCP (01/28/87)

--

In article <2614@sdcsvax.UCSD.EDU> fouts@orville%ames.arpa (Marty Fouts) writes:
>--
>
>The method I use for performance analysis depends heavily on the problem
>being investigated.  Most of my work is in measurement and tuning of
>operating systems, so I usually start by instrumenting the system of
>interest and then performing statistical analysis on the results.
>
>[Could you explain how you "instrument" the system?  -DL]
>

Sure, instrumentation can be done in two ways.  When you are very luck,
you can use an external hardware monitor to sample the state of the system
(usually PS and some status registers) and then later run the samples through
software which correlates it to software states.  This is the 'easy' way.

When you are not lucky, you modify the operating system to increment counters
based on periodic state checks (user versus system state, for example) or on
the occurance of events.  (I/O completion.)  Sometimes you check periodic data
at event occurance, like recording the amount of idle time accumulated by the
process which is about to be made runable as a result of an i/o completion.

There are three major problems here, along with a number of gotchas I won't
go into.  First, is the autocorrelation problem.  If the samples are always
taken on a major clock tick, they may reflect state which is dependent on
the tick having just happened.  This can cause performance data to be
skewed in sometimes subtle ways.

Second, is the interaction problem.  Adding code to an operating system always
changes the timing of the system.  Sometimes it doesn't impact the feature
being measured, but you can never tell for certain.  Sometimes, especially
when measuring real time systems, instrumentation can have an adverse impact
on the system.  Adding .1 millisecond of CPU time to a routine called once
a millisecond can have a substantial impact on a system.

Third is the capture problem.  Determining how to retrieve information being
gathered in real time in a way which creates a consistent view of the system
can be a major problem.  You want to have all of the data consistent at some
point of time and then to be able to capture all of the data in an atomic
action, and that usually isn't possible.  Also, you have to figure out where
to put all of the data you are capturing it.  Sometimes you are generating
enough data to require some data reduction be performed in real time.

--

darrell@sdcsvax.UUCP (02/02/87)

I am a direct measurement type for several reasons:
1) I believe in the Missouri principle (show me)
2) I and many other have been burned by people's claims for hardware
3) Simulation has many draw backs: call them limitations or tradeoffs,
you don't trade off in reality.
4) Direct measurement can show interactions which simulation, etc.
can't show (especially when asynchronous things take place on sequential
simulations).
5) Lastly, consider the following situation posed by a comment once
given to me: "Gee, you simulate flying by those outer planets so
well, that we don't need to fly spacecraft past them......"

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,nike,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

darrell@sdcsvax.UUCP (02/04/87)

In article <2683@sdcsvax.UCSD.EDU> ucbvax!ames-pioneer.arpa!ames!eugene@sdcsvax (Eugene Miya N.) writes:
>I am a direct measurement type for several reasons:
>1) I believe in the Missouri principle (show me)
>2) I and many other have been burned by people's claims for hardware
>3) Simulation has many draw backs: call them limitations or tradeoffs,
>you don't trade off in reality.
>4) Direct measurement can show interactions which simulation, etc.
>can't show (especially when asynchronous things take place on sequential
>simulations).....

Additional comments:
a) If I'm buying a computer, I'm a direct measurement type.
b) If I haven't built it yet, and need to know which way to do it, I sure like
simulation a lot.
c) If I've built it, and I have simulations, it's awfully nice to correlate 
them so I can trust the simulator more, although as Gene says, some situations
are very hard to simulate.

To illustrate the importance and value of really good simulations, one
can observe the unfortunate problem of announcing performance, and then
having to backtrack horribly when the product comes out.  I'm sure most people
have run into some of these.  Some are marketing ploys, but others appear to
be cases where people simply did not even simulate typical applications.

On the other hand, good simulation can be incredibly helpful, and it's hard to
believe people are serious about designing new computers without doing good
simulation.  As an example, this last summer, we started to be able to run
much larger programs than we'd been able to simulate, and also more
multi-tasking stuff, on M/500s.  Although the performance of the simulated
benchmarks was as expected, the bigger ones were not quite the 5X 11/780 
we were looking for.  There were a bunch of possible ways to fix this.
Our performance gurus simulated the alternatives, which led us to the
simplest improvement: double the I-cache from 8K to 16K [costs about $50
and was fairly straightforward]. It would have been difficult to quickly
pick the correct choice without such simulation, since a whole bunch of
other proposals were also considered. 

Coincidentally, a prospective customer was in benchmarking the week new boards
were to appear.  They ran a large benchmark on the 8K machine, and address
traces were also captured, which when run thru the simulator, predicted the
results they got.  Our gurus then fed the 16K size to the simulator,
and gave the customer the simulated numbers.  The next day, the new boards
arrived, and they re-ran their benchmarks, which matched the predictions
within about 1%!!  Needless to say, they were surprised.
Of course, we weren't surprised at all :-)  [UNIX timing granularity isn't
that good: if you can get within a few tenths of a second, you're lucky!]

Bottom line: simulation is necessary, but not sufficient.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

darrell@sdcsvax.UUCP (02/10/87)

John Mashey writes:

> To illustrate the importance and value of really good simulations, one
> can observe the unfortunate problem of announcing performance, and then
> having to backtrack horribly when the product comes out...

I am reminded of the story in Fred Brooks's book, in which he mentions what
happened when the performance simulator for OS/360 started running.  It
showed a Model 75 with drums compiling Fortran at ten cards per minute, or
something like that.  This was investigated immediately, of course.  Turned
out that many of the OS group had never used disks before, and even the
innermost parts of OS were doing disk overlays with wild abandon.  It *did*
help the coders meet their core limits...  Needless to say, this could have
been an utter disaster if the simulation hadn't caught it early.

I also dimly recall an incident on one of the 370s, the model 135 I think,
in which consistent speed discrepancies between hardware and simulation
uncovered a hardware bug well after the system had been thoroughly tested
and released for production.  I forget the IBM buzzwords, but it was a case
where there were N copies of a resource for the sake of parallelism, but
the hardware in fact was only ever using one.  Nothing functionally wrong,
so the orthodox diagnostics never caught it.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

-- 
Darrell Long
Department of Computer Science & Engineers, UC San Diego, La Jolla CA 92109
ARPA: Darrell@Beowulf.UCSD.EDU  UUCP: darrell@sdcsvax.uucp
Operating Systems submissions to: mod-os@sdcsvax.uucp