sokol@starbase (Lisa Sokol) (06/29/90)
We have also been having timing problems using Lisp on a Symmetry. When multiple processes compete for the same processors, user time appears to also include both user time and system time. The only instance in which user time appears to be accurate is if one has sole use of the machine. Has anyone found a way to run multiple processes and get the user timing function to just include user time? Thanks Lisa Sokol sokol@mitre.org
jim@cs.strath.ac.uk (Jim Reid) (07/02/90)
In article <112093@linus.mitre.org> sokol@starbase (Lisa Sokol) writes:
We have also been having timing problems using Lisp on a Symmetry.
When multiple processes compete for the same processors, user time appears
to also include both user time and system time. The only instance
in which user time appears to be accurate is if one has sole use of
the machine.
Has anyone found a way to run multiple processes and get the user
timing function to just include user time?
Your query raises some interesting questions, though it is somewhat
confusing.
The UNIX time command reports three times for a process.
1) User time
This is the amount of time that the process executes in
user mode: i.e. the time the process takes executing code
other than system calls
2) System time
This is the time that the process takes executing system
calls. It may also include time servicing interrupts and
possibly context switching on behalf of other processes.
3) Elapsed time
The total time taken by the process to execute. This is the
sum of the user and system time plus any time that the process
was idle - swapped out, waiting for I/O or possibly waiting
for access to the CPU.
The only time that is reproducible is the user time: the other two
will depend on external factors such as the system load and interrupt
rate that may vary as the process is repeatedly executed. These can
be minimised or eliminated by dedicating a whole machine to timing a
particular process. Even then, there may still be minor variations.
The kernel maintains the user and system times for each process. The
system clock interrupts many times a second (100 times a second on a
Sequent) and the current running process 'charged' with a clock tick.
If the process was in kernel mode, the system time is incremented. If
it was in user mode, the user time gets incremented. When the process
terminates, the time command picks up the user and system time using
the wait3() system call. The elapsed time is worked out by taking the
difference between the time of day immediately before the command was
executed and after it terminates. The time of day is usually taken
from another clock, the time-of-day clock.
Although the system and user times have the precision of the system's
real time clock, they may not be that accurate. In theory, a process
could give up the CPU just before the real time clock interrupts,
causing the process' tick to get charged to another process. [In
practice this behaviour should even itself out over time.] Another
possibility is that the system loses clock ticks because clock
interrupts get disabled, so processes can get 'free' CPU time. Since
the elapsed time is usually taken from the time of day clock, it will
be as (in)accurate as that clock happens to be.
Hope this explains things.
Jim
job00542@uxa.cso.uiuc.edu (James Bordner) (07/04/90)
The following are two partial outputs from the profiler 'gprof' for a parallel Fortran program on a Sequent Balance. The first was obtained when the load was relatively small; the second when other CPU intensive parallel programs were running. Both are from the same program running the same data; only the loads were different. granularity: each sample hit covers 4 byte(s) for 0.02% of 52.95 seconds %time cumsecs seconds calls name 10.3 5.44 5.44 9 _gradp_ 6.8 9.02 3.58 mcount 6.1 12.24 3.22 133 _do50%hesp_ 5.6 15.20 2.96 133 _do56%hesp_ 5.5 18.09 2.89 133 _do53%hesp_ 5.4 20.93 2.84 133 _do33%hesp_ 5.3 23.71 2.78 133 _do36%hesp_ : : : : : : : : : : granularity: each sample hit covers 4 byte(s) for 0.02% of 63.36 seconds %time cumsecs seconds calls name 20.5 13.02 13.02 2326 __m_join 8.6 18.46 5.44 9 _gradp_ 5.5 21.94 3.48 mcount 4.9 25.04 3.10 133 _do50%hesp_ 4.8 28.11 3.07 133 _do53%hesp_ 4.7 31.10 2.99 133 _do56%hesp_ 4.6 34.00 2.90 133 _do36%hesp_ : : : : : : : : : : The 'seconds' column differs significantly only in the m_join call. M_join is (I assume) an internal function in Sequent's Parallel Programming Library. (Functions in the P.P.L. generally start with 'm_'; I assume it's internal since it's not documented :-) A reasonable guess is that it's the function where processes wait for other processes to catch up to them at the end of parallel loops. If the load is heavy, then running processes may spend a lot of time waiting in m_join for processes which are not currently running (ready, blocked, suspended: whatever the OS people call it when the processor is currently working on someone else's process). Possible solutions are to: 1) Install the gang scheduler /pub/gang.tar.Z from maddog.llnl.gov, which is accessible via anonymous ftp. We have it installed on a Symmetry but I haven't gotten it to work yet. It is designed to syncronize the running and ready states of processes in a parallel program, eliminating problems such as the one described in the last sentence of the previous paragraph. 2) Do timings at night when the load is small. This is what I'm doing until I get the gang scheduler to work. 3) Use the output from gprof to figure out the user time. One problem with this is that to get a profile, you have to link in profiling functions which themselves slow down the code (without the profiler functions linked in, the above program ran in about 45 seconds during small loads). I hope this helps some-- let me know how things go. -- James Bordner email:job00542@uxa.cso.uiuc.edu bordner@mcs.anl.gov