paul@cantuar.UUCP (Paul Ashton) (01/05/89)
Some time ago I posted an article requesting information on the following - monitors for distributed systems. - performance experiments that have been performed, or you would like to be able to perform, on distributed systems. - the structure of the SunOS 4.0 kernel. Many thanks to all those that replied, especially to those who sent me copies of papers. The following is a summary of references on performance monitors for distributed systems. My main interest is in reasonably general purpose tools, so I havn't included references on specialised tools (e.g. ones to help optomising RPC times). I would be very interested to hear of other general-purpose monitors for distributed systems. "Monitoring Distributed Systems: A Relational Approach", Snodgrass, R., PhD Thesis, CMU, 1982. "A Relational Approach to Monitoring Complex Systems", Snodgrass, R., ACM Transactions on Computer Systems, Vol. 6, no. 2, May 1988, pp. 157-196. "An Integrated Instrumentation Environment for Multiprocessors", Segall, Z., Singh, A., Snodgrass, R., Jones, A., and Siework, D., IEEE Transactions on Computers, Vol. C-32, no. 1, January 1983, pp. 4-14. "An Integrated Approach to General Software Monitoring", Duncan, S. E., Softlab Document No. 27, Department of Computer Science, University of North Carolina, Chapel Hill. Snodgrass has proposed "relational monitoring" as a way of monitoring distributed systems. In relational monitoring sensors (event detection pieces of code) are inserted into the system to be monitored. The data a sensor emits when an event occurs is taken as a tuple in the relation defined by the sensor (i.e. each sensor corresponds to a relation). When a user wants to monitor something they specify a query on the relations defined by the sensors in the system (the query language used is called tquel). The query is converted to an update network, which is optomised. Finally the sensors corresponding to the relations in the query are enabled, and the tuples they produce are fed into the update network, and the answer to the query is produced. Snodgrass' model of distributed systems is an object oriented one. The Tquel query langauge is a version of the quel query langauge used in INGRES, which has been modified to become a "temporal query language", i.e. a query langauge which handles time as an implicit part of the database. Time information is automatically recorded in all relations, and tquel provides facilities to incorporate time into queries (for example you could ask for the periods during which processor A and processor B were both idle, or both not idle, or one of them idle, etc). Database systems which provide support for times stored implicitly in relations, and queries which make use of these times, are called Temporal databases, and this is an area of current research. Rick Snodgrass has written several papers on temporal databases. The Softlab Document by Duncan (actually a Masters thesis) describes how a relational monitoring approach was used to instrument the Unix file system for 4.2 BSD. Relational monitoring is an innovative approach to monitoring, and incorporates a lot of very interesting ideas. "IPS: An Interactive And Automatic Performance Measurement Tool for Parallel and Distributed Programs", Miller, B. P., and Yang, C.-Q., 7th International Conference on Distributed Computing Systems, Berlin, September 1987. "Critical Path Analysis for the Execution of Parallel and Distributed Programs", Yang, C.-Q., and Miller, B. P., 8th International Conference on Distributed Computing Systems, pp. 366-375, San Jose, Calif., June 1988. "IPS-2: The Second Generation of a Parallel Program Measurement System", Miller, B. P., Clark, M., Kierstead, S., and Lim, S.-S., Computer Sciences Technical Report #783, University of Wisconsin-Madison, August 1988. IPS has a hierachical view of a distributed systems - PROGRAMS that run on MACHINES that run PROCESSES which execute PROCEDURES which contain PRIMITIVE ACTIVITIES. It is an event driven system with 10 probes in the kernel and language run-time libraries to detect events related to process creation/deletion, process scheduling, and interprocess communication. Various tables and graphs can be produced from the data from the event data, at various levels of abstraction. An interesting idea introduced in IPS-2 is that of the critical path for a parallel or distributed program. A program is represented as an acyclic graph. Each process in the program is a series of nodes connected by arcs showing the sequential execution of the program. Each node in a process represents the process sending or receiving data, or some other form of process synchronisation. Arcs showing interprocess communication are included on the graph. With each arc labelled with the time taken to get between the two nodes it connects, the critical path of the program can be computed. Knowledge of the critical path is useful when optomising the program as unless things on the critical path are addressed turnaround time will not decrease. "A Distributed Programs Monitor for Berkeley UNIX", Miller, B. P., Macrander, C., and Sechrest, S., Software - Practice and Experience, Vol. 16, no. 2, pp. 183-200, February 1986. "DPM: A Measurement System for Distributed Programs", Miller, B. P., IEEE Transactions on Computers, Vol. C-37, no. 2, pp. 243-248, February 1988. DPM's model is of a number of communicating processes (communication inter or intra-machine). Processes compute and communicate. Probes in the kernel detect process creation/destruction, process scheduling, creation/destruction of communication paths, and message passing. Empahsis is on tracking interprocess communication. A variety of uses are suggested for the data. The SP&E contains a detailed description of the implementation of DPM on 4.2BSD. "Monitoring Distributed Systems", Joyce, J., Lomow, G., Slind, K., and Unger, B., ACM Transactions on Computer Systems, Vol. 5, no. 2, pp. 121-150, May 1987. Describes a program monitoring tool for the Jade distributed system. Event probes are inserted in the language run-time library. Events include process creation/destruction, message passing, and failure of operations. Various data analysis tools are available including a textual trace of events, an IPC "movie", communication analysis, run-time protocol checking, and deadlock dcetection. "Monitoring and performance measuring distributed systems during operation", Wybranietz, D., and Haban, D., ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Sante Fe, New Mexico, May 1988, pp. 197-206. Describes a hybrid monitor for a distributed system. Each node in the system consists of a MC68000 processor plus a TMP (Test and Measurement Processor) to monitor the node's activity (the TMP itself includes an MC68000). Probes have been included in the kernel to detect process scheduling events, message transfer events, and various kernel events. The probes communicate with the TMP by writing into a particular part of memory. Events contain very little information - just the number of the event and one (and in one case two) 32 bit words of additional information. The TMPs send monitoring data to a central station over a dedicated network. The central station controls all of the TMPs, and displays summaries of basic measures. The references above where the ones I found the most interesting. The following are other references with some relevance to monitoring distributed systems (some are a bit dated). "Monit: A Performance Monitoring Tool for Parallel and Pseudo-Parallel Programs", Kerola, T., and Schwetman, H., ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1987, pp. 163-174. "METRIC: a kernel instrumentation system for distributed environments", Mcdaniel, G., Proceedings of the Sixth ACM Symposium on Operating Systems Principles, pp. 93-99, November 1977. "XRAY: Instrumentation for Multiple Computers", Blake, R., Proceedings of Performance '80, pp. 11-25, 1980. The last two refer to monitors that centralise data collection for multiple processors, rather than being true distributed system monitors. The following three papers have sections that contain information on monitoring distributed systems. "Rochester's Intelligent Gateway", Lantz, K. A., Gradischnig, K. D., Feldman, J. A., and Rashid, R. F., Computer, Vol. 15, no. 10, pp. 54-68, October 1982. "Program debugging and performance evaluation aids for a multi-microprocessor development system", Lambert, J. E., and Halsall, F., Software & Microsystems, Vol. 3, no. 1, pp. 2-10, February 1984. "Issues and Approaches to Distributed Testbed Instrumentation", Franta, W. R., Berg, H. K., and Wood, W. T., Computer, Vol. 15, no. 10, pp. 71-81, October 1982. The following were recommended to me, but I havn't had a look at them yet: Lecture Notes in Computer Science (LNCS) 309, "Experience with Ditributed Systems: international workshop Kaiserslautern, FRG, Sep. 1987 proceedings". Two articles by Robbert van Renesse on the performance of the Amoeba distributed OS. One to appear in the November 1988 Operating Systems Review, and one to appear in Software-Practice&Experience, probably in, or shortly after the November 1988 issue. The SPE paper is more interesting than the OSR paper since it contains more measurement, notably the performance of Amoeba under load. There weren't many replies on SunOS 4.0. The following tools were mentioned: etherfind(1) is a SunOS program that allows you to specify packets in a moderately cumbersome way on invocation. It then writes logging data on all such packets to the standard output. nit(4) is a pseudo-device from which your own program can read packet headers directly. More efficient and general versions of both tools are under development by van Jacobson at Berkeley. "Characterising the Workload of a Distributed File Server", Tourigny, S. R., Research Report 88-15, Department of Computational Science, University of Saskatchewan. This report described the construction of a passive monitoring tool using the nit(4) to monitor NFS traffic on an ethernet connecting Sun workstations, so that the workload of the file server could be characterised. It contains references to papers describing work on file system worklaod characteristics for both distributed and centralised systems. Finally there was interest in monitoring tools to allow the following: "I have been considering an analytic model of bistable behavior in virtual memory workstations. My main contribution (so far) has been to address the issue of estimating the mean-time to "failure" (thrashing) by recognizing the failure mode to be analogous to a quantum-tunneling process. I would like to find ways to incorporate the insights of my formal model into a meaningful framework for software engineers/systems programmers. One problem I have struck in this attempt is that many people although familiar with thrashing, do not relate easily to estimates of MTBF. Thrashing in the context of a workstation can be more disasterous than it is on a mainframe. I have been talking to some people at Sun about this issue. There is a group already building tools to try and understand virtual-memory performance to get a handle on process dynamics responsible for performance degradation. Clearly this is a non-trivial issue and I would like to see tools like these built in the context of a formal framework like my analytic model." -- Internet(ish): paul@cantuar.{uucp,nz} JANET/SPEARNET: p.ashton@nz.ac.canty UUCP: ...!{watmath,munnari,mcvax,...!uunet!vuwcomp}!cantuar!paul NZ Telecom: Office: +64 3 667 001 x6350 NZ Post: University of Canterbury, Christchurch, New Zealand