[comp.unix.wizards] summary of replies to distributed system performance monitors

paul@cantuar.UUCP (Paul Ashton) (01/05/89)
Some time ago I posted an article requesting information on the following

- monitors for distributed systems.

- performance experiments that have been performed, or you would like to
be able to perform, on distributed systems.

- the structure of the SunOS 4.0 kernel.

Many thanks to all those that replied, especially to those who sent me
copies of papers.

The following is a summary of references on performance monitors
for distributed systems.  My main interest is in reasonably general
purpose tools, so I havn't included references on specialised tools
(e.g. ones to help optomising RPC times).  I would be very interested to
hear of other general-purpose monitors for distributed systems.

"Monitoring Distributed Systems:  A Relational Approach", Snodgrass, R.,
PhD Thesis, CMU, 1982.

"A Relational Approach to Monitoring Complex Systems", Snodgrass, R.,
ACM Transactions on Computer Systems, Vol. 6, no. 2, May 1988, pp.
157-196.

"An Integrated Instrumentation Environment for Multiprocessors",
Segall, Z., Singh, A., Snodgrass, R., Jones, A., and Siework, D., IEEE
Transactions on Computers, Vol. C-32, no. 1, January 1983, pp. 4-14.

"An Integrated Approach to General Software Monitoring", Duncan, S. E.,
Softlab Document No. 27, Department of Computer Science, University of
North Carolina, Chapel Hill.

Snodgrass has proposed "relational monitoring" as a way of monitoring
distributed systems.  In relational monitoring sensors (event detection
pieces of code) are inserted into the system to be monitored.  The
data a sensor emits when an event occurs is taken as a tuple in the
relation defined by the sensor (i.e.  each sensor corresponds to a
relation).  When a user wants to monitor something they specify a query
on the relations defined by the sensors in the system (the query
language used is called tquel).  The query is converted to an update
 network, which is optomised.  Finally the sensors corresponding to the
relations in the query are enabled, and the tuples they produce are fed
into the update network, and the answer to the query is produced. 
Snodgrass' model of distributed systems is an object oriented one.

The Tquel query langauge is a version of the quel query langauge used in
INGRES, which has been modified to become a "temporal query language",
i.e.  a query langauge which handles time as an implicit part of the
database.  Time information is automatically recorded in all relations,
and tquel provides facilities to incorporate time into queries (for
example you could ask for the periods during which processor A and
processor B were both idle, or both not idle, or one of them idle, etc). 
Database systems which provide support for times stored implicitly in
relations, and queries which make use of these times, are called
Temporal databases, and this is an area of current research.  Rick
Snodgrass has written several papers on temporal databases.

The Softlab Document by Duncan (actually a Masters thesis) describes how
a relational monitoring approach was used to instrument the Unix file
system for 4.2 BSD.

Relational monitoring is an innovative approach to monitoring, and
incorporates a lot of very interesting ideas.


"IPS: An Interactive And Automatic Performance Measurement Tool for
Parallel and Distributed Programs", Miller, B. P., and Yang, C.-Q., 7th
International Conference on Distributed Computing Systems, Berlin,
September 1987.

"Critical Path Analysis for the Execution of Parallel and Distributed
Programs", Yang, C.-Q., and Miller, B. P., 8th International Conference
on Distributed Computing Systems, pp. 366-375, San Jose, Calif., June
1988.

"IPS-2: The Second Generation of a Parallel Program Measurement System",
Miller, B. P., Clark, M., Kierstead, S., and Lim, S.-S., Computer
Sciences Technical Report #783, University of Wisconsin-Madison, August
1988.

IPS has a hierachical view of a distributed systems - PROGRAMS that run
on MACHINES that run PROCESSES which execute PROCEDURES which contain
PRIMITIVE ACTIVITIES.  It is an event driven system with 10 probes in
the kernel and language run-time libraries to detect events related to
process creation/deletion, process scheduling, and interprocess
communication.  Various tables and graphs can be produced from the data
from the event data, at various levels of abstraction.

An interesting idea introduced in IPS-2 is that of the critical path for
a parallel or distributed program.  A program is represented as an
acyclic graph.  Each process in the program is a series of nodes
connected by arcs showing the sequential execution of the program.  Each
node in a process represents the process sending or receiving data, or
some other form of process synchronisation.  Arcs showing interprocess
communication are included on the graph.  With each arc labelled with
the time taken to get between the two nodes it connects, the critical
path of the program can be computed.  Knowledge of the critical path is
useful when optomising the program as unless things on the critical path
are addressed turnaround time will not decrease. 


"A Distributed Programs Monitor for Berkeley UNIX", Miller, B. P.,
Macrander, C., and Sechrest, S., Software - Practice and Experience,
Vol. 16, no. 2, pp. 183-200, February 1986.

"DPM: A Measurement System for Distributed Programs", Miller, B. P.,
IEEE Transactions on Computers, Vol. C-37, no. 2, pp. 243-248, February
1988.

DPM's model is of a number of communicating processes (communication
inter or intra-machine).  Processes compute and communicate.  Probes in
the kernel detect process creation/destruction, process scheduling,
creation/destruction of communication paths, and message passing. 
Empahsis is on tracking interprocess communication.  A variety of uses
are suggested for the data.  The SP&E contains a detailed description of
the implementation of DPM on 4.2BSD. 


"Monitoring Distributed Systems", Joyce, J., Lomow, G., Slind, K., and
Unger, B., ACM Transactions on Computer Systems, Vol. 5, no. 2, pp.
121-150, May 1987.

Describes a program monitoring tool for the Jade distributed system. 
Event probes are inserted in the language run-time library.  Events
include process creation/destruction, message passing, and failure of
operations.  Various data analysis tools are available including a
textual trace of events, an IPC "movie", communication analysis,
run-time protocol checking, and deadlock dcetection.


"Monitoring and performance measuring distributed systems during
operation", Wybranietz, D., and Haban, D., ACM SIGMETRICS Conference on
Measurement and Modeling of Computer Systems, Sante Fe, New Mexico, May
1988, pp. 197-206.

Describes a hybrid monitor for a distributed system.  Each node in the
system consists of a MC68000 processor plus a TMP (Test and Measurement
Processor) to monitor the node's activity (the TMP itself includes an
MC68000).  Probes have been included in the kernel to detect process
scheduling events, message transfer events, and various kernel events. 
The probes communicate with the TMP by writing into a particular part of
memory.  Events contain very little information - just the number of the
event and one (and in one case two) 32 bit words of additional
information.  The TMPs send monitoring data to a central station over a
dedicated network.  The central station controls all of the TMPs, and
displays summaries of basic measures.


The references above where the ones I found the most interesting.  The
following are other references with some relevance to monitoring
distributed systems (some are a bit dated).

"Monit: A Performance Monitoring Tool for Parallel and Pseudo-Parallel
Programs", Kerola, T., and Schwetman, H., ACM SIGMETRICS Conference on
Measurement and Modeling of Computer Systems, May 1987, pp. 163-174.

"METRIC: a kernel instrumentation system for distributed environments",
Mcdaniel, G., Proceedings of the Sixth ACM Symposium on Operating
Systems Principles, pp. 93-99, November 1977.

"XRAY: Instrumentation for Multiple Computers", Blake, R., Proceedings
of Performance '80, pp. 11-25, 1980.

The last two refer to monitors that centralise data collection for multiple
processors, rather than being true distributed system monitors.

The following three papers have sections that contain information on
monitoring distributed systems. 

"Rochester's Intelligent Gateway", Lantz, K. A., Gradischnig, K. D.,
Feldman, J. A., and Rashid, R. F., Computer, Vol. 15, no. 10, pp. 54-68,
October 1982.

"Program debugging and performance evaluation aids for a
multi-microprocessor development system", Lambert, J. E., and Halsall,
F., Software & Microsystems, Vol. 3, no. 1, pp. 2-10, February 1984.

"Issues and Approaches to Distributed Testbed Instrumentation", Franta,
W. R., Berg, H. K., and Wood, W. T., Computer, Vol. 15, no. 10, pp.
71-81, October 1982.


The following were recommended to me, but I havn't had a look at them
yet:

Lecture Notes in Computer Science (LNCS) 309, "Experience with
Ditributed Systems: international workshop Kaiserslautern, FRG, Sep. 
1987 proceedings". 

Two articles by Robbert van Renesse on the performance of the Amoeba
distributed OS.  One to appear in the November 1988 Operating Systems
Review, and one to appear in Software-Practice&Experience, probably in,
or shortly after the November 1988 issue.  The SPE paper is more
interesting than the OSR paper since it contains more measurement,
notably the performance of Amoeba under load. 


There weren't many replies on SunOS 4.0.  The following tools were
mentioned: 

etherfind(1) is a SunOS program that allows you to specify
packets in a moderately cumbersome way on invocation.  It then
writes logging data on all such packets to the standard output.
nit(4) is a pseudo-device from which your own program can read
packet headers directly.  More efficient and general versions of
both tools are under development by van Jacobson at Berkeley.

"Characterising the Workload of a Distributed File Server", Tourigny, S.
R., Research Report 88-15, Department of Computational Science,
University of Saskatchewan.

This report described the construction of a passive monitoring tool
using the nit(4) to monitor NFS traffic on an ethernet connecting Sun
workstations, so that the workload of the file server could be
characterised.  It contains references to papers describing work on file
system worklaod characteristics for both distributed and centralised
systems.


Finally there was interest in monitoring tools to allow the following:

"I have been considering an analytic model of bistable behavior in virtual
memory workstations.  My main contribution (so far) has been to address the
issue of estimating the mean-time to "failure" (thrashing) by recognizing
the failure mode to be analogous to a quantum-tunneling process.

I would like to find ways to incorporate the insights of my formal model
into a meaningful framework for software engineers/systems programmers.  One
problem I have struck in this attempt is that many people although familiar
with thrashing, do not relate easily to estimates of MTBF.

Thrashing in the context of a workstation can be more disasterous than it
is on a mainframe.  I have been talking to some people at Sun
about this issue.  There is a group already building tools to try and
understand virtual-memory performance to get a handle on process dynamics
responsible for performance degradation.  Clearly this is a non-trivial
issue and I would like to see tools like these built in the context of a
formal framework like my analytic model."

-- 
Internet(ish):  paul@cantuar.{uucp,nz}  JANET/SPEARNET: p.ashton@nz.ac.canty
UUCP:              ...!{watmath,munnari,mcvax,...!uunet!vuwcomp}!cantuar!paul
NZ Telecom:     Office: +64 3 667 001 x6350
NZ Post:        University of Canterbury, Christchurch, New Zealand