[comp.os.research] Unexplained Multimax System Time

raymond@uunet.UU.NET (Raymond Leung) (11/08/88)
I have done the following experiment on a Encore Multimax machine
with 8 processors running UMAX 4.2 (BSD4.2 like with some System V
features) operating system and I could not explain the system time
observed.

The code I wrote is a parallel sorting program base on Batcher's
bitonic sorters. The parent forks off a number of processes and
distribute the input numbers to these processes. After a initial
sort, each process partitions the set of numbers it got and sends
them to some other processs (according to the algorithm). Each
process receives two sets of numbers from two other processes,
which are merged, partitioned and sent to other processes.
This merge-split step is repeated a number of times (depending
on the number of processes) and the sorting is completed.
(Never mind the exact algorithm)

I used the message system (msgsnd, msgrcv - system V features)
to pass the numbers between processes. In order to see the various
behaviour of the program and system, I put a loop around the
comparison routine so that the cost of a comparison could be varied
relative to other costs (e.g. communication cost). The comparison
routine looks like:

	cmp(x,y)
	int	x,y;
	{
	    int		i, result;
	    for (i=0; i < repeat; i++) {
		if (x > y)
		    result = 1;
		else if (y > x)
		    result = -1;
		else
		    result = 0;
	    }
	    return result;
	}
    
I ran experiments with 3200 numbers and 16 processes. The only
varying parameter was the cost of the comparison routines (the
variable "repeat"). The following are the real, user and system
time for different "repeat factor"s. The user and system are the
accumulated times for all the processes (parent and children).

        Repeat        Real        User        System
	============================================
             1        4.50        2.43         11.77   <--
            10        4.68        5.33         10.49   <--
           100        7.30       34.42          8.14
	  1000       46.20      325.17          8.69
         10000      435.60     3234.14         23.77
         20000      845.93     6466.99         41.42

They all ran on exactly the same set of data. This was repeated several
times. The separate runs agree within 2-3%. The system was almost
idle when the experiments were carried out.

If the system measures the time spent in carrying out systems calls
(messages transfers etc.) on behalf of the processes, I expect a
constant system time. I also learnt that when a process got interrupted,
the time spent in handling the interrupt was also charged to the process
being interrupted.
The system time was expected to rise with the real time and this was so.
The point I cannot explain is why the system time
droped when the repeat factor went from 1 to 100?

I ran the same experiment with 2, 4, 8, and 32 processes, the initial
drop in system time became more and more prominent as there were more
processes.

I have also used the getrusage system call to get some statistics
about the resources usage for the cases repeat=10 and repeat=100.
The various statistics were similar except for the number of voluntary
context switches which were slightly higher for the case repeat=100 (yes,
the one that took less system time). However, when I ran this a few times,
a fairly large variance was produced.

For your information, the results with 8 and 32 processes are
as follows:

With 8 processes:
        Repeat        Real        User        System
	============================================
             1        3.27        2.09          4.75
            10        3.54        4.61          4.29
           100        7.13       30.54          4.19
	  1000       44.29      290.16          5.65
         10000      424.70     2896.34         22.44
         20000      816.20     5765.93         41.18

With 32 processes:
        Repeat        Real        User        System
	============================================
             1        9.86        3.23         45.13   <-- (also unexplained)
            10       10.48        6.47         47.59
           100       12.68       40.01         37.33
	  1000       55.08      373.11         30.39
         10000      498.31     3703.73         33.76
         20000      965.62     7400.84         51.44


Can someone shed some light? Some information about the architecture
of the Multimax and the kernal of UMAX? Does it handle all system
calls on one particular processor?
Someone suggested maybe spin locks are present in the kernel for
handling the message passing. Anyone know how and why they are used?

Thanks in advance.


Raymond Leung

VLSI and System Technology Laboratory,
School of Electrical Engineering and Computer Science,
The University of New South Wales,
PO Box 1, Kensington,
Sydney, NSW 2033,
AUSTRALIA.

ACSnet: raymond@cad.jmrc.eecs.unsw.oz
ARPA: raymond%cad.jmrc.eecs.unsw.oz@uunet.css.gov
UUCP: uunet!munnari!cad.jmrc.eecs.unsw.oz!raymond