leff@smu.CSNET.UUCP (02/21/87)
Naomi Schulman Publications COMPUTER SYSTEMS LABORATORY STANFORD UNIVERSITY Stanford, CA 94305 RECENT REPORTS & NOTES LIST #8 SEPTEMBER, 1986 To order reports see end of this announcement. ABSTRACTS CSL T.R. 86-292 On the Performance Effects of Station Locations and Access Protocol Parameters in Ethernet Networks Timothy A. Gonsalves and Fouad A. Tobagi January 1986 31 pages.....$4.05 The Ethernet has come into widespread use for interconnection of computers within a local area such as an office or campus. Stations on the network may be distributed evenly or may occur in clusters. We present a simulation study of several distributions of stations on a linear bus Ethernet. Aggregate performance is shown to depend on the distribution of stations. Individual station performance varies with the location of the station. Unbalanced distributions can lead to large performance differences between individual stations with isolated stations achieving relatively poor performance compared to the average. We next examine the effects of access protocol parameters such as the number of buffers per station and the retransmission algorithm. A modification of the standard retransmission algorithm is presented which enables a higher throughput to be achieved at high load. Finally, our results are compared to the predictions of theoretical models and the applicability of the models to finite population Ethernets is examined. ------------------------------------------------------------------------------- CSL T.R. 86-293 A Decentralized Naming Facility David R. Cheritan & Timothy P. Mann February 1986 28 pages.....$3.90 A key component in distributed computer systems is the naming facility: the means by which high-level names are bound to objects and by which objects are located given only their names. We describe the design, implementation, and performance of a decentralized naming facility, in which the global name space and name mapping mechanism are implemented by a set of cooperating peers, with no central authority. Decentralization is shown to lend increased extensibility and reliability to the design. Efficiency in name mapping is achieved through specialized caching techniques. ------------------------------------------------------------------------------- CSL T.R. 86-294 Software-Controlled Caches in the VMP Multiprocessor David R. Cheriton, Gert A. Slavenburg & Patrick D. Boyle February 1986 12 pages.....$3.10 VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory. Each processor has a synchronous, virtually addressed, single master connection to its cache, providing very high memory bandwidth. An unusually large cache page size and fast sequential memory copy hardware make it feasible for cache misses to be handled in software, analogously to the handling of virtual memory page faults. Hardware support for cache consistency is limited to a simple state machine that monitors the bus and interrupts the processor when a cache consistency action is required. In this paper, we show how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost. We also describe simple solutions to the consistency problems associated with virtually addressed caches. Simulation results indicate that the design achieves good performance providing data contention is not excessive. ------------------------------------------------------------------------------- CSL T.R. 86-295 Fast Symbolic Layout Translation for Custom VLSI Integrated Circuits Peter A. Eichenberger April 1986 122 pages.....$8.60 Symbolic layout tools have enormous potential for easing the task of custom integrated circuit layout by allowing the designer to work at a higher level of abstraction, hiding some of the complexity of full custom design. Unfortunately, the practicality of symbolic layout tools has been limited for several reasons. Most important, the CPU resources required to compute a full size integrated circuit from a symbolic description are prohibitively large; this problem has been avoided either by restricting the range of applicability to a narrow class of integrated circuits, or by using a simpler translation algorithm, which reduces the quality of the output. Other problems include: producing poor quality layouts, insufficient user control of the generated output, and inability to cooperate with other layout tools. There problems make symbolic design of complete chips difficult. This thesis presents an approach to the symbolic layout problems that produces high-quality layout for an arbitrary circuit without requiring excessive CPU time. The key to this approach includes the use of hierarchy to improve CPU time, the use of wire-length minimization to improve quality, a good balance between optimization of the layout and optimization of CPU time, and a smooth transition over varying degrees of automation. The result has been a symbolic layout tool that has been successfully used to lay out several chips from a design-rule-independent input. ------------------------------------------------------------------------------- CSL T.R. 86-296 Processor Architecture and Cache Performance by Chad L. Mitchell June 1986 147 pages.....$9.85 Previous researchers have compared and contrasted the effects of various features of processor architecture. Others have studied Instruction cache performance. In this study, a methodology has been developed which allows simu- lation of different processors without the necessity of creating interpreters and compilers for each architecture simulated. The methodology was applied to study the effects of processor architecture on instruction cache performance. New results are provided about the relationship between processor architecture and instruction traffic and cache performance. Among the results is the general observation that relative instruction traffic differences between architectures are about the same with very large caches as with no cache and that intermediate sized caches tend to accentuate such relative differences. ------------------------------------------------------------------------------- CSL T.R. 86-297 Scan Line Access Memories for High Speed Rasterization by Stefan G. Demetrescu June 1986 149 pages.....$9.95 Conventional architectures which produce solid-color computer graphics are slow and/or expensive. To solve this problem, a novel kind of VLSI integrated circuit called a Scan Line Access Memory (SLAM) has been developed which multiplies the time-performance of an inexpensive graphics system by factors of 100 to 1000 without significantly increasing its complexity and hence its cost. Because of the major performance improvement, SLAM systems promise to significantly expand the use of real-time computer graphics. A SLAM consists of a conventional dense semiconductor dynamic memory augmented with highly parallel, but simple, on-chip processors designed specifically for fast computer graphics rasterization. A graphics system consisting of SLAM smart memory chips has been built and tested. An arbitrary horizontal pixel line segment can be filled in one memory access. As a result, the speed with which polygons are rasterized is improved by several orders of magnitude. SLAM systems can also rasterize vectors and bit-mapped characters effectively, unlike many other proposed rasterization architectures. The SLAM system is compared with previously suggested rasterization techniques. Due to its highly parallel architecture, this versatile graphics system is shown to be capable of achieving performance comparable to the "processor per pixel" architectures while avoiding the tremendous circuit density (and hence cost) penalty incurred by such approaches. It thus becomes practical to build a SLAM-based, high performance, real-time graphics system with a complexity and cost comparable to that of a conventional inexpensive image frame buffer. ------------------------------------------------------------------------------- CSL T.R. 86-298 Parallel Program Behavior - Specification & Abstraction Using BDL Jerry C. Yan August 1986 28 pages.....$3.90 This paper describes the process of abstracting program behavior from programs formulated in the Actor paradigm. The transformed program (or 'program model') is described by a behavior description language (or BDL). A simulator for BDL has been constructed to investigate the performance of various programs on different multi-processor architectures. Simulating BDL is much more economical than an instruction level emulation while program behavior is realistically preserved. A programming language enables (but also constrains) the programmer to formulate his/her problem in a particular computation paradigm. The Actor programming paradigm (and the Act languages) stands out among the many n proposed for concurrent computing because of three reasons: 1. Concurrency may be exploited explicitly as well as implicitly; 2. No assumptions were made about the architecture of the hardware; 3. Most parallel computation paradigm previously proposed can be expressed as computations involving Actors. In fact, BDL was invented to facilitate a research project in progress - studying placement strategies for Actors on loosely-coupled multi-processors. There were two experimental obstacles: 1. Simulation at the instruction level is precise but too costly; 2. Program behavior (such as message passing pattern and CPU usage) cannot be preserved fully using existing stochastic models. The invention of BDL has made the "hypothesis-verification cycle" in the research process much faster and more pleasant. ------------------------------------------------------------------------------- CSL T.R. 299 Designing for Ada Reuse: A Case Study by Geoffrey Mendal June l986 19 pages.....$3.45 This paper describes the design of a generic sorting package, showing that Ada reuse can be accomplished during, and even prior to, coding. This paper identifies some key technical issues in reuse. These issues are of general interest to a software engineer, as they focus on program design and implementation. The conclusions are based on a generic program unit which includes array sorting algorithms adapted and generalized to work as a generic program unit. Any array component type excepting limited types may be directly sorted; limited types may be indirectly sorted. Examples of direct and indirect sorting will be presented. This package may also be used in conjunction with a merging package to sort data residing on external memory devices, in effect allowing files of arbitrary length to be sorted by means of an extended operation. In less tractable situations, reusable Ada promotes a general solution that uses an abstraction to isolate classic limitations. For example, a user may specify one's own linear order, so as not to be limited to ascending and descending orders as in conventional sorting routines. ------------------------------------------------------------------------------- CSL T.R. 86-300 An Overview of the MIPS-X-MP Project John L. Hennessy & Mark A. Horowitz April 1986 28 pages.....$3.90 MIPS-X-MP is a research project whose end goal is to build a small (workstation- sized) multiprocessor with a total throughput of 100-200 mips. The archi- tectural approach uses a small number (tens) of high performance RISC-based microprocessors (10-20 mips each). The multiprocessor architecture uses software-controlled cache coherency to allow cooperation among processors without sacrificing performance of the processors. Software technology for automatically decomposing problems to allow the entire machine to be concentrated on a single problem is a key component of the research. This report surveys the four key components of the project: high performance VLSI processor architecture and design, multiprocessor architectural studies, multiprocessor programming systems, and optimizing compiler technology. ------------------------------------------------------------------------------- CSL T.R. 86-301 The Complete Transformation Methodology for Sequential Runtime Checking of an Anna Subset Sriram Sankar & David Rosenblum August 1986 65 pages.....$5.75 We present in this report a complete description of a methodology for transformation of Anna (Annotated Ada) programs to executable self-checking Ada programs. The methodology covers a subset of Anna which allows annotation of scalar types and objects. The allowed annotations include subtype annotations, subprogram annotations, result annotations, object annotations, out annotations and statement annotations. Except for package state expressions and quantified expressions, the full expression language of Anna is allowed in the subset. The transformation of annotations to executable checking functions is thoroughly illustrated through informal textual description, universal checking function templates and several transformation examples. We also describe the transformer and related software tools used to transform Anna programs. In conclusion, we describe validation of the transformer and some methods of making the transformation and runtime checking processes more efficient. ------------------------------------------------------------------------------- Publications COMPUTER SYSTEMS LABORATORY Stanford University Stanford, CA 94305 415-723-1430 ORDER FORM To Order Reports: Print or type your name and address in the space provided. Check or circle the report(s) you wish to purchase, whether hardcopy or microfiche. All orders must be PREPAID. CALIFORNIA RESIDENTS must add sales tax of their local county. Return this order form with your check or money order made payable to Stanford University. FOREIGN ORDERS must be paid with an international money order or a check drawn on a U.S. bank, payable in dollars. Please type or print your name and complete address: ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ Report # Hardcopy Microfiche Report # Hardcopy Microfiche -------- -------- ---------- -------- -------- ---------- TR 292 $4.05 $2.00 TR 297 $9.95 $3.00 TR 293 $3.90 $2.00 TR 298 $3.90 $2.00 TR 294 $3.10 $2.00 TR 299 $3.35 $2.00 TR 295 $8.60 $3.00 TR 300 $3.90 $2.00 TR 296 $9.90 $3.00 TR 301 $5.75 $2.00 Subtotal __________ Your Local County CA tax __________ Total __________ -------