[comp.sys.sequent] Timing parallel programs on Symmetry

sokol@starbase (Lisa Sokol) (06/29/90)

We have also been having timing problems using Lisp on a Symmetry.
When multiple processes compete for the same processors, user time appears
to also include both user time and system time.  The only instance
in which user time appears to be accurate is if one has sole use of
the machine.

Has anyone found a way to run multiple processes and get the user
timing function to just include user time?

Thanks

Lisa Sokol
sokol@mitre.org

jim@cs.strath.ac.uk (Jim Reid) (07/02/90)

In article <112093@linus.mitre.org> sokol@starbase (Lisa Sokol) writes:

   We have also been having timing problems using Lisp on a Symmetry.
   When multiple processes compete for the same processors, user time appears
   to also include both user time and system time.  The only instance
   in which user time appears to be accurate is if one has sole use of
   the machine.

   Has anyone found a way to run multiple processes and get the user
   timing function to just include user time?

Your query raises some interesting questions, though it is somewhat
confusing. 

The UNIX time command reports three times for a process.

1) User time
	This is the amount of time that the process executes in
	user mode: i.e. the time the process takes executing code
	other than system calls

2) System time
	This is the time that the process takes executing system
	calls. It may also include time servicing interrupts and
	possibly context switching on behalf of other processes.

3) Elapsed time
	The total time taken by the process to execute. This is the
	sum of the user and system time plus any time that the process
	was idle - swapped out, waiting for I/O or possibly waiting
	for access to the CPU.

The only time that is reproducible is the user time: the other two
will depend on external factors such as the system load and interrupt
rate that may vary as the process is repeatedly executed. These can
be minimised or eliminated by dedicating a whole machine to timing a
particular process. Even then, there may still be minor variations.

The kernel maintains the user and system times for each process. The
system clock interrupts many times a second (100 times a second on a
Sequent) and the current running process 'charged' with a clock tick.
If the process was in kernel mode, the system time is incremented. If
it was in user mode, the user time gets incremented. When the process
terminates, the time command picks up the user and system time using
the wait3() system call. The elapsed time is worked out by taking the
difference between the time of day immediately before the command was
executed and after it terminates. The time of day is usually taken
from another clock, the time-of-day clock.

Although the system and user times have the precision of the system's
real time clock, they may not be that accurate. In theory, a process
could give up the CPU just before the real time clock interrupts,
causing the process' tick to get charged to another process. [In
practice this behaviour should even itself out over time.] Another
possibility is that the system loses clock ticks because clock
interrupts get disabled, so processes can get 'free' CPU time. Since
the elapsed time is usually taken from the time of day clock, it will
be as (in)accurate as that clock happens to be.

Hope this explains things.

		Jim

job00542@uxa.cso.uiuc.edu (James Bordner) (07/04/90)

The following are two partial outputs from the profiler 'gprof' for a parallel
Fortran program on a Sequent Balance.  The first was obtained when the load 
was relatively small; the second when other CPU intensive parallel programs
were running.  Both are from the same program running the same data; only
the loads were different.


granularity: each sample hit covers 4 byte(s) for 0.02% of 52.95 seconds

%time cumsecs seconds   calls name    
 10.3    5.44    5.44       9 _gradp_
  6.8    9.02    3.58         mcount
  6.1   12.24    3.22     133 _do50%hesp_
  5.6   15.20    2.96     133 _do56%hesp_
  5.5   18.09    2.89     133 _do53%hesp_
  5.4   20.93    2.84     133 _do33%hesp_
  5.3   23.71    2.78     133 _do36%hesp_
   :      :       :        :      :
   :      :       :        :      :

granularity: each sample hit covers 4 byte(s) for 0.02% of 63.36 seconds

%time cumsecs seconds   calls name    
 20.5   13.02   13.02    2326 __m_join
  8.6   18.46    5.44       9 _gradp_
  5.5   21.94    3.48         mcount
  4.9   25.04    3.10     133 _do50%hesp_
  4.8   28.11    3.07     133 _do53%hesp_
  4.7   31.10    2.99     133 _do56%hesp_
  4.6   34.00    2.90     133 _do36%hesp_
   :      :       :        :      :
   :      :       :        :      :


The 'seconds' column differs significantly only in the m_join call.  M_join
is (I assume) an internal function in Sequent's Parallel Programming Library.
(Functions in the P.P.L. generally start with 'm_'; I assume it's internal 
since it's not documented :-)  A reasonable guess is that it's the function 
where processes wait for other processes to catch up to them at the end of 
parallel loops.  If the load is heavy, then running processes may spend a lot
of time waiting in m_join for processes which are not currently running 
(ready, blocked, suspended: whatever the OS people call it when the processor
is currently working on someone else's process).


Possible solutions are to:

     1) Install the gang scheduler /pub/gang.tar.Z from maddog.llnl.gov,
	which is accessible via anonymous ftp.  We have it installed on
	a Symmetry but I haven't gotten it to work yet.  It is designed
	to syncronize the running and ready states of processes in a 
	parallel program, eliminating problems such as the one described 
	in the last sentence of the previous paragraph.

     2) Do timings at night when the load is small.  This is what I'm 
	doing until I get the gang scheduler to work.

     3) Use the output from gprof to figure out the user time.  
        One problem with this is that to get a profile, you have to
	link in profiling functions which themselves slow down the
	code (without the profiler functions linked in, the above program
	ran in about 45 seconds during small loads).  
	
I hope this helps some-- let me know how things go.

-- James Bordner

email:job00542@uxa.cso.uiuc.edu
       bordner@mcs.anl.gov