[comp.unix.ultrix] Meaning of %CPU in top/ps

thomson@zazen.macc.wisc.edu (Don Thomson) (04/19/91)

My manager is trying to analyze performance on our DECstation 3100 running
Ultrix 4.0 and has asked me to clarify the meaning of the percentage CPU time
that is displayed.  We're looking at percentage idle and guessing that 
percentage CPU time used + idle time should add up to 100%, but are rarely 
seeing CPU percentages wander over 1 or 2%.  I have a feeling we're comparing
apples and oranges here, so am looking for clarification of what exactly we're
looking at:

last pid:  8233;  load averages: 0.64, 0.56, 0.00                      12:03:09
 59 processes: 55 sleeping, 1 running, 3 stopped
Cpu states:  5.2% user,  0.0% nice, 21.0% system, 73.8% idle              zazen
Memory:  9536K ( 5360K) real, 27388K (17316K) virtual,  1296K free

  PID USERNAME PRI NICE   SIZE   RES STATE   TIME   WCPU    CPU COMMAND
 8213 thomson   25    0   612K  396K run     0:00  1.95%  1.95% top
 8232 root       1    0   172K   64K sleep   0:00  0.39%  0.39% nntpxmit
21060 thomson   26    0  7828K 2648K stop    4:39  0.00%  0.00% emacs

The top man page says:

	WCPU is the weighted cpu percentage (this is the same value that
	ps(1) displays as CPU), CPU is the raw percentage and is the
	field that is sorted to determine the order of the processes

and the ps man page says:

	%CPU      CPU utilization of the process.  This is a decaying
                  average over a minute or less of previous (real) time.
                  Because the time base over which this is computed
                  varies since processes may be very young, it is possi-
                  ble for the sum of all %CPU fields to exceed 200%.

Can anyone help shed any light on exactly what kind of measurements we're
seeing here, and/or clarify the distinction between weighted and raw CPU
percentages?  Thanks!




--
 
----- Don Thomson ----- MACC, 1210 W. Dayton, Madison, WI  53706 -------------
    (608) 262-0138      thomson@macc.wisc.edu / thomson@wiscmacc.bitnet

barmar@think.com (Barry Margolin) (04/19/91)

I've redirected followups to comp.unix.internals, since there is nothing in
the question that is specific to Ultrix.  In fact, much of my response here
is based on a very non-Unix-like OS (Genera on Symbolics Lisp Machines),
but I strongly believe that my answer is pretty correct for Unix as well.

In article <THOMSON.91Apr18122955@zazen.macc.wisc.edu> thomson@zazen.macc.wisc.edu (Don Thomson) writes:
>My manager is trying to analyze performance on our DECstation 3100 running
>Ultrix 4.0 and has asked me to clarify the meaning of the percentage CPU time
>that is displayed.  We're looking at percentage idle and guessing that 
>percentage CPU time used + idle time should add up to 100%, but are rarely 
>seeing CPU percentages wander over 1 or 2%.  I have a feeling we're comparing
>apples and oranges here

>	%CPU      CPU utilization of the process.  This is a decaying
>                  average over a minute or less of previous (real) time.
>                  Because the time base over which this is computed
>                  varies since processes may be very young, it is possi-
>                  ble for the sum of all %CPU fields to exceed 200%.

The problem is that on a single-CPU computer, at any instant at most one
process can be running, so an instantaneous CPU time percentage would show
one process having 100% CPU time and all the rest having 0%.  On the other
hand, if the CPU percentage were simply the amount of CPU time the process
has used during the last real minute divided by 1 minute, processes that
were started less than a minute ago would have artificially low CPU
utilization, and processes that became idle in the middle of the minute
would have misleadingly high CPU utilitization; for instance, a process
that started monopolizing the system 30 seconds ago would only show 50%
usage, even though it has actually taken over the system.

The common solution to this is to use a weighted average, with decaying
weights.  Averages are computed more often than just once a minute, and the
more recent averages are multiplied by weighting factors to make them more
significant in the final number.  Thus, if only two processes ran during
the last minute, and process A was the only process during the first 30
seconds and process B was the only process during the second 30 seconds,
process B would have a higher %CPU than A.

However, the algorithms for computing this generally don't actually keep
around all the little per-period averages.  Instead, running weighted
averages are computed at process switch time.  Furthermore, the weighting
factors are generally arbitrary, and decay exponentially rather than
linearly, with very recent activity having a very high weight.  It's
basically a heuristic, and it's a white lie to call these things
percentages.  Like car mileage, the numbers should be used for comparison
purposes only.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar