LEICHTER-JERRY@YALE.ARPA (10/12/86)
Producing repeatable CPU time measurements on a time-shared system is an extremely difficult problem. Making them "accurate" is not just difficult, it's impossible without first deciding what "accurate" is supposed to mean. A process running on such a system has many complex interactions with its environment, and it's not always clear what should be charged to which process. Interrupt processing time is an obvious example - at the time of an interrupt, it may not be possible to determine which process the "work" involved is being done for. (Consider an interrupt from a terminal, with a typed-ahead SPAWN command waiting to be executed. The process that will eventually read the character just received doesn't even exist yet.) Some- times the work may be for the benefit of more than one process, as when a page of a shared image is faulted in. Then there's stuff that is there only for the system's benefit, not for any one process. Who should be charged for the error logger's time if a process runs into a tape error? Who should pay for the CPU time RMS uses to arbitrate file locks? For many classes of algorithms, the single best performance measure will, if you play with it, turn out to be page fault rate. (There's no deep reason for this, it just often turns out that way.) Theoretically, your process pages "against itself" - the number of page faults is the same independent of other processes in the system. In practice, this is much more complex. When VMS finds it has many free pages, it will let processes "borrow" pages beyond their normal working set limits; if the pool of free pages gets low enough, the "borrowed" pages have to be returned. This can produce major differences in performance depending on system loading. Programs that run for a long time relative to changes in workload may suffer repeated growth and shrinkage of their actual working set at unpredictable times. This alone will make a complete hash of timings. Even if the system is configured to eliminate such "borrowing" - there are SYSGEN parameters that control this; be aware that playing with them will almost certainly have serious performance consequences - there are STILL paging effects: Pages tossed out of a working set stick around for a while before being written to disk. If a process needs one of those pages back before it's written, it will receive it back from this cache of pages. Clearly there is a big difference between such a "soft fault" and a "hard fault" - two disk accesses, for one thing. But the effectiveness of this page cache is heavily dependent on the number of processes running and the rate at which they are losing dirty pages from their working sets! (BTW, beware of one large effect: Subprocesses get only a fraction of the working set limit of their creator. Create a bunch of these, and they'll be running with rather small working sets - and doing a LOT of paging.) Basically, the only way to ensure repeatable measurements - of SOMETHING - is by building special hardware. I believe the KL10 processor actually had such hardware. It wasn't cheap, and most people never noticed anyway. (As I recall the single article I read about this a long time ago, the processor kept track of the actual "time" it spent running in user state; the OS would save and restore the appropriate value when switching processes. I don't remember how system time was dealt with - probably a second CPU register. The hardware was useful - and complex - because it measured some useful quantity; elaspsed time would NOT be a repeatable measure in systems with memory segments of different speeds, for example, which many 10's and 20's had.) If you are interested in timing as a measure of program/algorithm quality, and you can't separate the effects of programs/algorithms from the "noise", then in any reasonable sense there is NO performance difference between your two programs/algorithms, at least under the conditions you are testing them in. If you are dealing with bean-counters, remind them that actual miles per gal- lon will vary over a fairly large range, depending on weather, traffic condi- tions, gas quality, and so on; but reimbursements for mileage simply use an average value, because it's just not practical to do anything else. -- Jerry -------
JOHNSON%nuhub.acs.northeastern.edu@RELAY.CS.NET ("I am only an egg.") (10/14/86)
Chris Johnson comments that he gets embarrassed when computer science faculty ask why they get variable CPU time measures for a given job. I can't figure out if he's embarrassed for DEC or for the faculty members! I look at the situation in the following way. The higher charges incurred due to overhead while running on a loaded system are simply an incentive for users to run their programs when the system is not so loaded. This seems to be accepted as a "fair" practice since every site I've ever seen charges higher rates for peak time computing. Peter ------- I'm more embarrassed for the faculty then DEC right now. The faculty *should* know better. DEC *does* know better but apparently doesn't care much. As far as any site I've ever worked at was concerned, charging more for peak hours was the way to go. The inability of an operating system to accurately distribute cpu time was a happy happenstance. However as I said before, accurate determination of what job owns what cpu time is difficult. Not impossible mind you but very difficult. One of the first things you need to know is how fast your machine is. The last timing chart for an instruction set I saw in DEC (for a big computer) was for a KI 10. I don't know whether or not DEC ever made any more instruction set timing charts. Another thing that could be done is to have a timer in the hardware that could be used to tell how long interrupt time *really* took. etc etc etc ... You got me going again. I could go on all day but I'll stop here and save you reading time. Chris Johnson Northeatstern University.