[comp.unix.wizards] accurate runtime accounting

torek@elf.ee.lbl.gov (Chris Torek) (06/18/91)

In article <14081@dog.ee.lbl.gov> I noted that Unix CPU accounting is
generally fairly poor, and wrote:
>>The solution is simple but requires relatively precise clocks. ...

In article <1991Jun12.130441.20640@fccc.edu> stodola@orion.fccc.edu
(Robert K. Stodola) writes:
>One of my associates and I did a study of this a number of years ago
>(actually it was with a PDP-11/70 running IAS).  We found that there
>was substantial clock synchronized usage on the system.  The solution
>we found didn't require very precise clocks at all.  Simply one whose
>rate was relatively prime to the system clock.

This works well in a number of situations, but I believe it will miss
short-lived processes on modern (fast) machines.  Unix boxes generally
run their scheduling clock in the range 50..500 Hz.  Some of these have
CPUs that run 40 million instructions per second; some things take only
a few thousand instructions, and it seems intuitively obvious% that
they might `slip through the cracks'.  [%This is research-ese for `We
did not try it out but we wrote a paper on it anyway.' :-) ]

In other words, I think `PDP-11/70' may be an important constraint
above.  A relatively prime profiling clock is likely to work well on
many VAXen as well (the 11/780 is typically slower than an 11/70, and
most microVAXen are only a few times faster).  I would like to see
some measurements done on MIPS RC6280s, HP 700s, and so forth; I
expect things may have changed.  (Of course, we could always speed
up the profiling clock, still keeping it relatively prime.)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

stodola@orion.fccc.edu (Robert K. Stodola) (06/19/91)

In article <14398@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>In article <14081@dog.ee.lbl.gov> I noted that Unix CPU accounting is
>generally fairly poor, and wrote:
>>>The solution is simple but requires relatively precise clocks. ...
>
>In article <1991Jun12.130441.20640@fccc.edu> Stodola says:
>>One of my associates and I did a study of this a number of years ago
>>(actually it was with a PDP-11/70 running IAS).  We found that there
>>was substantial clock synchronized usage on the system.  The solution
>>we found didn't require very precise clocks at all.  Simply one whose
>>rate was relatively prime to the system clock.
>
>This works well in a number of situations, but I believe it will miss
>short-lived processes on modern (fast) machines.  Unix boxes generally
>run their scheduling clock in the range 50..500 Hz.  Some of these have
>CPUs that run 40 million instructions per second; some things take only
>a few thousand instructions, and it seems intuitively obvious% that
>they might `slip through the cracks'.  [%This is research-ese for `We
>did not try it out but we wrote a paper on it anyway.' :-) ]
>
>In other words, I think `PDP-11/70' may be an important constraint
>above.  A relatively prime profiling clock is likely to work well on
>   [More deleted]

I guess I should have explained the context of the project.  The purpose was
to obtain accurate usage information on a per user basis, and provide good
average load statistics.  If the context switcher itself doesn't keep this
info using a very accurate clock (ie. a non-interrupting read-only clock with
megaHz resolution), you can't ever accurately measure this [actually, we
kicked it around at lunch and tossed out some silly ideas for having another
machine on the bus counting instructions, but the conversation quickly
deteriorated from there].  In this context, the speed of the clock is
less important than its lack of synchronization with the system clock.  Those
thousand instructions (taking 1/40000th second on the machine you have
postulated) has a one in 500 chance of being interrupted by a, say, 80Hz
clock.  So when you see it, you score it with an 80th of a second.  Since you
miss it 499 other times, you get it right on the average.  That is, for every
10000 times the code runs, you see it 20 times, and score it with 1/80th
of a second each time (20 * 1/80 = 10000 * 1000/40000000).  Speeding up the
clock merely improves the variance for a given number of samples, but doesn't
effect your ability to see a short sequence in a statistical sense.  Obviously,
if you need to know EXACTLY how many cycles were used in a PARTICULAR clock
tick, or EXACTLY how many cycles a PARTICULAR process used in a PARTICULAR
tick, this method doesn't do it.  The importance of the statistical method
of measurement is that you avoid the rhythms imposed by the system clock
entirely.

[BTW - we both tried it and wrote the paper :-) ]
-- 

stodola@fccc.edu -- Robert K. Stodola (occasionally) speaks for himself.

jackv@turnkey.tcc.com (Jack F. Vogel) (06/19/91)

In article <14398@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
|In article <14081@dog.ee.lbl.gov> I noted that Unix CPU accounting is
|generally fairly poor, and wrote:
|>>The solution is simple but requires relatively precise clocks. ...
 
|In article <1991Jun12.130441.20640@fccc.edu> stodola@orion.fccc.edu
|(Robert K. Stodola) writes:
|>One of my associates and I did a study of this a number of years ago
|>(actually it was with a PDP-11/70 running IAS).  We found that there
|>was substantial clock synchronized usage on the system.  The solution
|>we found didn't require very precise clocks at all.  Simply one whose
|>rate was relatively prime to the system clock.
 
|This works well in a number of situations, but I believe it will miss
|short-lived processes on modern (fast) machines.
 
Right! What we did to solve this problem in AIX/370 was to go to an
interval accounting scheme. Process time usage is not charged in
hardclock(), rather user time is calculated and charged on entering
the kernel, system time on returning to user mode or in swtch() when
being preempted. Actual memory/time integral calculation is done
either in hardclock() or before a process exits in the exit() system
call. Use is also made of the 370 cputimer for microsecond resolution.
Certainly made things more accurate than the older method.

Disclaimer: I don't speak for my employer.


-- 
Jack F. Vogel			jackv@locus.com
AIX370 Technical Support	       - or -
Locus Computing Corp.		jackv@turnkey.TCC.COM

torek@elf.ee.lbl.gov (Chris Torek) (06/20/91)

>In article <14398@dog.ee.lbl.gov> I wrote:
>>... I believe [a relatively prime accounting clock] will miss
>>short-lived processes on modern (fast) machines.

In article <1991Jun18.175921.14843@fccc.edu> stodola@orion.fccc.edu
(Robert K. Stodola) writes:
>I guess I should have explained the context of the project.  The purpose was
>to obtain accurate usage information on a per user basis, and provide good
>average load statistics.

This is somewhat different from situations I have in mind, in which
people run one-time (or few-times) short throwaway programs.  In this
case there simply are not enough instances for the statistics to
average out.  What I do not know is whether increasing the frequency of
the sampling clock (always keeping it asynchronous with respect to the
scheduling clock) would suffice to flatten out the sampling jitter.

>... The importance of the statistical method of measurement is that you
>avoid the rhythms imposed by the system clock entirely.

Right.  The clock synchronization problem has been demonstrated
repeatedly, under various Unix systems and under other systems.  We
know it exists; we know an unsynchronized sampling clock fixes it in
some situations.  Whether it fixes it in what is becoming a more common
situation (fast Unix boxes) is, I think, still an open question.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov