leff@smu.UUCP (Laurence Leff) (03/17/88)
Naomi Schulman Publications COMPUTER SYSTEMS LABORATORY STANFORD UNIVERSITY Stanford, CA 94305-4055 RECENT REPORTS & NOTES LIST #13 MARCH 1988 To order reports, see end of this announcement. ABSTRACTS CSL-TR-87-343 EDITING GRAPHICAL OBJECTS USING PROCEDURAL REPRESENTATIONS Paul J. Asente October 1987 117 pages.....$8.35 Traditionally, people have created computer-generated images by writing programs in a programming language that supports graphics. More recently, interactive graphics editors have become commonplace. Graphics editors are easy to use but frequently lack many of the capabilities found in graphics programming languages. This deficiency is intrinsic to graphics editors; it is not a result of neglect or incompetence by the implementer. Tweedle is a graphics editor that attempts to bridge this gap by using a program as its internal representation for a picture. During an editing session the user can modify either the picture itself or the program representation; the editor modifies the other to keep the two consistent. The language used by the editor contains features that allow the editor to incrementally execute parts of a program in response to a change so that the picture can be regenerated without completely reexecuting the program. The use of a procedural representation allows the user to create easily pictures with repetition, recursion, and calculated point values. It further allows him to define parts of a drawing as variants of other parts; these variants can differ from their original objects in quite arbitrary ways but still respond to changes made to the original. A working prototype of Tweedle has been implemented under the X Window System. ------------------------------------------------------------------------------- CSL-TR-87-344 PERFORMANCE PREDICTION OF CONCURRENT SYSTEMS Victor W. K. Mak December 1987 162 pages.....$10.60 Concurrent systems are computers that use multiple processors to solve a single problem. A means to predict the application performance on these systems is a useful tool in many areas of concurrent system research. Earlier work in this area either dealt with very restricted program structures or used methods with exponential complexity. This dissertation describes a computationally efficient and accurate method to predict performance for a class of parallel computations on concurrent systems. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in concurrent systems are modeled as service centers in queueing network models. Using these two models as inputs, the method outputs predictions of both the time to complete the computation and the concurrent system utilization. The algorithm used is based on the approximate Mean Value Analysis in queueing network modeling with extensions to model concurrency in the computation. The new algorithm has been validated against both detailed simulation and actual execution on a commercial multiprocessor. Average error of the prediction when compared to simulation statistics is 1.7% with a standard deviation of 1.5%. The algorithm is iterative. The time complexity of each iteration of the 2 3 2 algorithm is O(N K + N ), the space complexity is O(N K) where N is the number of tasks in the computation and K is the number of resources in the concurrent system. The prediction method has a very high rate of convergence. 90% of the one hundred test cases evaluated achieved convergence within seven iterations. Worst case convergence observed was twelve iterations. ------------------------------------------------------------------------------- CSL-TR-87-345 TRADEOFFS IN PROCESSOR-ARCHITECTURE AND DATA-BUFFER DESIGN Johannes M. Mulder December 1987 219 pages.....$13.45 Processor-Memory traffic can be a serious performance constraint for computer systems in general and for closely coupled processors specifically. In a memory-bound system the data traffic becomes a critical performance factor when sufficient instruction buffering is used; when highly encoded instructions are used; or when multiple processors share data memory. Microprocessors are particularly vulnerable to performance degradation because of pin-bandwidth limitations. To relax the memory bandwidth requirements and sustain a high processor throughput, buffering of both instruction and data requests is essential for high performance. This dissertation describes the performance and utility of buffer hierarchies under reference patterns typical for procedural languages. Hardware data-buffering schemes (multiple register sets and stack buffers) can profit significantly from compile-time assistance. Even single register sets perform competitively with these hardware schemes when allocated interprocedurally. In multi-level buffering, first-level (register oriented) and second-level (cache) buffers have distinct purposes. The first-level buffer determines the peak performance of the processor, while the second-level determines the actual performance. Hence, a cache does not hide architectural (first-level buffer) flaws, but emphasizes them, because it reduces the influence of the memory system on performance. The usefulness of caches is a function of the size of the cache and of memory speed. A cache, independent of size, improves the system performance, when memory traffic is the system bottleneck. A cache has to be of significant size to justify implementation when processor performance is a primary design objective and memory is relative fast. A cache connected to an efficient 32-word buffer and 2-cycle memory only reduces the buffer cycles by 25% for 1024 cache words, and by 10% for 128 cache words. The same configuration for a 4-cycle memory yields a cycle reduction of 50% and 30% respectively. ------------------------------------------------------------------------------- CSL-TR-87-346 BLAZENET: A PHOTONIC IMPLEMENTABLE WIDE-AREA NETWORK Zygmunt Haas and David R. Cheriton October 1987 26 pages.....$3.80 High-performance wide-area networks are required to interconnect clusters of computers connected by local area and metropolitan area networks. Optical fiber technology provides long distance channels in the multi-gigabit per second range. The challenge is to provide switching nodes that handle these data rates with minimum delay, and at a reasonable cost. In this paper, we describe a packet switching network, christened Blazenet, that provides low delay and has minimal memory requirements. It can be extended to support multicase and priority delivery. Such a network can revolutionize the opportunities for distributed command and control, information and resources sharing, real-time conferencing, and wide-area parallel computation, to mention but a few applications. ------------------------------------------------------------------------------- CSL-TR-88-347 TRACE COMPACTION USING CACHE FILTERING WITH BLOCKING Anant Agarwal December 1987 19 pages.....$3.45 Trace-driven simulation is a popular method of estimating the performance of cache memories, translation lookaside buffers, and paging schemes. Because the cost of trace-driven simulation is directly proportional to trace length, reducing the number of references in the trace significantly impacts simulation time. This paper concentrates on trace-driven simulation for cache analysis. A technique called cache filtering with blocking is presented that compresses traces by exploiting both the temporal and spatial locality in the trace. Experimental results show that this scheme can reduce trace length by nearly two orders of magnitude while introducing less than 15% error in cache miss rate estimates. ------------------------------------------------------------------------------- CSL-TR-88-348 THOR USER'S MANUAL: TUTORIAL AND COMMANDS Robert Alverson, Tom Blank, Kiyoung Choi, Sun Young Hwang, Arturo Salz, Larry Soule, Thomas Rokicki December 1987 66 pages.....$5.80 THOR is a behavioral simulation environment intended for use with digital circuits at either the gate, register transfer, or functional levels. Models are written in the CHDL modeling language (a hardware description language based on the "C" programming language). Network descriptions are written in the CSL language supporting hierarchical network descriptions. Using interactive mode, batch mode or both combined, a variety of commands are available to control execution. Simulation output can be viewed in tabular format or in waveforms. A library of components and a toolbox for building simulation models are also provided. Other tools include CSLIM, used to generate boolean equations directly from THOR models and an interface to other simulators (e.g. RSIM and a physical chip tester) so that two simulations can be run concurrently verifying equivalent operation. This technical report is part one of two parts and is formated similar to UNIX manuals. Part one contains the THOR tutorial and all the commands associated with THOR. Part two contains descriptions of the general purpose functions used in models, the parts library including many TTL components, and the logic analyzer model. ------------------------------------------------------------------------------- CSL-TR-88-349 THOR USER'S MANUAL: LIBRARY FUNCTIONS Robert Alverson, Tom Blank, Kiyoung Choi, Sun Young Hwang, Arturo Salz, Larry Soule, Thomas Rokicki December 1987 98 pages.....$7.40 THOR is a behavioral simulation environment intended for use with digital circuits at either the gate, register transfer, or functional levels. Models are written in the CHDL modeling language (a hardware description language based on the "C" programming language). Network descriptions are written in the CSL language supporting hierarchical network descriptions. Using interactive mode, batch mode or both combined, a variety of commands are available to control execution. Simulation output can be viewed in tabular format or in waveforms. A library of components and a toolbox for building simulation models are also provided. Other tools include CSLIM, used to generate boolean equations directly from THOR models and an interface to other simulators (e.g. RSIM and a physical chip tester) so that two simulations can be run concurrently verifying equivalent operation. This technical report is part two of two parts and is formated similar to UNIX manuals. Part one contains the THOR tutorial and all the commands associated with THOR. Part two contains descriptions of the general purpose functions used in models, the parts library including many TTL components, and the logic analyzer model. ------------------------------------------------------------------------------- CSL-TR-88-350 THE ILSP BEHAVIORAL DESCRIPTION LANGUAGE AND ITS GRAPH REPRESENTATION FOR BEHAVIORAL SYNTHESIS Masayasu Odani, Sun Young Hwang, Tom Blank, Thomas Rokicki March 1988 38 pages.....$4.40 This report describes the ILSP behavioral description language and its internal representation employed in the Hermod behavioral synthesis system. Using combined control and data flow graph (C/DFG) as an intermediate representation, the Hermod system generates hardware modules and their interconnection from behavioral descriptions. The Hermod system is included in an integrated environment for hardware simulation and synthesis under development at Stanford University. The functional models written in the ILSP can be simulated on the THOR logic/functional/behavioral simulator without translation. After proper verification of its behavior, an ILSP model can be input to the synthesizer for compilation into an RT-level description. This report consists of two parts: the specification of the ILSP language and its graph representation. ------------------------------------------------------------------------------- CSL-TR-88-351 EMPIRICAL ANALYSIS OF A LISP SYSTEM Robert A. Shaw February 1988 189 pages.....$11.95 We have analyzed four large Lisp programs running in a Common Lisp system on a conventional processor to provide information useful for implementers of new Lisp systems. The execution of over fifty million Lisp function calls (over one billion host instructions) are represented in the results. We address some possible implementation tradeoffs and provide pertinent empirical information. Among the issues discussed are data tagging techniques, quantity of tags, time and space efficient representations for Lisp data types, methods of function linkage, the nature of variable access, the use of tail-recursion and non-local goto's, and garbage collection. Our findings indicate that for many conventional processors choices should include a small number of tag values encoded in the low-order bits of a data reference, a simple ``car followed by cdr'' organization for CONS cells, and an encoding of symbols which segregates the different fields. Garbage collection frequently represents a sizable portion of overall execution time. The new ``generation'' garbage collectors can provide dramatic performance improvements, but require the maintenance of entry vector and data age information. To maintain the entry vector, previous generation collectors have included runtime tests (generation checks) at each tagged store operation. For our Lisp system the cost of these checks in software is projected to be at least 7% of runtime. We describe a new version of generation garbage collection adapted for Lisp on conventional processors. Our version eliminates generation checks during normal execution: information maintained by the virtual memory system is used to construct the entry vector. A simple memory layout is employed which does not require storage for age information (other than one value per generation) and allows adjustment of the boundaries between generations to avoid fragmentation and maximize memory utilization in implementations with limited memory. An analysis of the new algorithm on the test system showed that, when collections were separated by more than 1.4 seconds (or at least 160 kilobytes of new allocation), using virtual memory information required less time than software generation checks. ------------------------------------------------------------------------------- CSL-TR-88-352 PARTITIONING OF REGULAR COMPUTATION ON MULTIPROCESSOR SYSTEMS Fung Fung Lee February 1987 17 pages.....$3.35 Problem partitioning of regular computation over two-dimensional meshes on multiprocessor systems is examined. The regular computation model considered involves repetitive evaluation of values at each mesh point with local communication. The computational workload and the communication pattern are the same at each mesh point. The regular computation model arises in numerical solutions of partial differential equations and simulations of cellular automata. Given a communication pattern, a systematic way to generate a family of partitions is presented. The influence of various partitioning schemes on performance is compared on the basis of computation to communication ratio. ------------------------------------------------------------------------------- Naomi Schulman COMPUTER SYSTEMS LABORATORY Stanford University Stanford, CA 94305-4055 EM:CSL.Schulman@Sierra.Stanford.Edu 415-723-1430 Hours: M-Th, 8-1 ORDER FORM To Order Reports: Print or type your name and address in the space provided. Check or circle the report(s) you wish to purchase, whether hardcopy or microfiche. All orders must be PREPAID. CALIFORNIA RESIDENTS must add sales tax of their local county. Return this order form with your check or money order made payable to Stanford University. FOREIGN ORDERS must be paid with an international money order or a check drawn on a U.S. bank, payable in dollars. Please type or print your name and complete address: ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ Report # Hardcopy Microfiche Report # Hardcopy Microfiche -------- -------- ---------- -------- -------- ---------- TR 343 $8.35 $3.00 TR 348 $5.80 $2.00 TR 344 $10.60 $3.00 TR 349 $7.40 $2.00 TR 345 $13.45 $4.00 TR 350 $4.40 $2.00 TR 346 $3.80 $2.00 TR 351 $11.95 $3.00 TR 347 $3.45 $2.00 TR 352 $3.35 $2.00 Subtotal __________ Your Local CA County tax __________ Total __________ -------