leff@smu.UUCP (Laurence Leff) (02/27/89)
Naomi Schulman Publications COMPUTER SYSTEMS LABORATORY STANFORD UNIVERSITY Stanford, CA 94305-4055 (415) 273-1430 (M-Th, 8-1) RECENT REPORTS & NOTES LIST #15 JANUARY 1989 To order reports, see end of this announcement. ABSTRACTS CSL-TR-88-365 Deriving Accurate Fault Models John Acken October 1988 130 pages.....$9.50 Modeling the effects that physical failures have on integrated circuit behavior is important for both design and test. Fault models describe circuit behavior at a level of abstraction that allows tests to be generated without requiring detailed knowledge of physical failures. This thesis presents three criteria for deriving accurate fault models and describes a method that meets these criteria. The method is to design, fabricate, and test a RAM specifically for deriving fault models, as opposed to the common practice of using RAMs to drive technology development. To demonstrate this approach, a static RAM was designed, fabricated, and tested to derive models for a target technology of CMOS custom logic circuits. For CMOS technology, the most accurate logic level model for a bridging fault is a vote among the shorted nodes. This vote is based upon the relative output strengths of the shorted transistors, and the short does not result in an indeterminate logic value. Using the voting model requires neither the electrical details of each specific shorted circuit nor the physical details of the shorting failures. All that is needed is the relative strength of the transistors and the location of the potential bridging fault sites. These sites depend upon the physical layout, but can be easily located using standard layout tools such as circuit extractors. The average number of potential bridging faults per node is small, between 4 and 8. Bridging faults require that differences between logical clustering and physical clustering must be considered for successful testing. In particular, circuit segmentation for pseudo-exhaustive test is modified to utilize physical layout information. ------------------------------------------------------------------------------- CSL-TR-88-366 Performance-Directed Memory Hierarchy Design Steven A. Przybylski September 1988 183 pages.....$12.15 Most Cache studies to date have concentrated on time independent metrics: miss rates and traffic ratios. These metrics can be misleading indicators of overall performance, and minimizing them can lead to distinctly sub-optimal computer implementations. In this thesis, program execution time replaces the miss rate as the basis of evaluation of memory hierarchies. The thesis describes the characteristics of performance-optimal single- and multi-level cache hierarchies. Using performance as the cache design metric introduces a often ignored design 1 variable: the machine's cycle time. The inevitable tradeoffs between the temporal and organizational variables are exposed by equating a change in the cache size, set associativity or block size to a change in cycle time. After the tradeoffs between the temporal and organizational parameters are individually explored, they are unified in a single graph that allows comparison of the performance of a wide variety of cache organizations and implementations. Performance-directed cache analysis reveals two dilemmas facing machine designers: small fast caches are not optimal; single-level caches have a definite upper performance bound. Multi-level cache hierarchies can simultaneously break the single-level performance barrier and decrease the optimal size and cycle time of the cache nearest the CPU. For this reason, the local and global characteristics of multi-level hierarchies are explored. Given a set of implementable caches, dynamic programming can find the performance-optimal multi-level hierarchy. The optimum number of levels in the hierarchy increases as CPU cycle times decrease with respect to main memory access times. Hierarchies with depths of two and three approach the theoretically maximal performance limit. This thesis concludes with a succinct set of guidelines that will aid designers in finding the memory hierarchy that maximizes system performance given particular implementation constraints. ------------------------------------------------------------------------------- CSL-TR-88-367 An Overview of VAL Larry M. Augustin, Benoit A. Gennart, Youm Huh, David C. Luckham, and Alec G. Stanculescu September 1988 39 pages.....$4.95 VAL (VHDL Annotation Language) provides a small number of new language constructs to annotate VHDL hardware descriptions. VAL annotations, added to the VHDL entity declaration in the form of formal comments, express intended behavior common to all architectural bodies of the entity. Annotations are expressed as parallel processes that accept streams of input signals and generate constraints on output streams. VAL views signals as streams of values ordered by time. Generalized timing expressions allow the designer to refer to relative points on a stream. No concept of preemptive delayed assignment or inertial delay are needed when referring to different relative points in time on a stream. The VAL abstract state model permits abstract data types to be used in specifying history dependent device behavior. Annotations placed inside a VHDL architecture define detailed correspondences between the behavior specification and architecture. The result is a simple but expressive language extension of VHDL with possible applications to automatic checking of VHDL simulations, hierarchical design, and automatic verification of hardware designs in VHDL. ------------------------------------------------------------------------------- CSL-TR-88-368 Improved Models for Switch-Level Simulation Chorng-Yeong Chu November 1988 147 pages.....$10.35 Simulation plays an important role in design verification. With increasingly large VLSI designs, the switch-level representation has become the only approach that is both reasonably accurate and computationally feasible. At present, switch-level simulators use relatively unsophisticated techniques to extract information from the switch-level representation, and even these small amounts of information are not always fully utilized. As a result, these simulators often lack accuracy. Most notably, the way some switch-level 2 simulators compute the final value can potentially generate undesirably pessimistic results, and charge-sharing problems are widely ignored. This thesis shows how to extract more information based on the same set of widely adopted switch-level assumptions. Using more sophisticated analyses, this thesis presents better final-value and charge-sharing models. The new final-value model uses a systematic way to look at the relationship between the voltage and the resistance. This approach can also objectively compare the accuracy of different DC-computation schemes. Charge-sharing problems are modeled with two time constants. The two-time-constant approach is based on the observation that most waveforms due to charge sharing are dominated by a pair of time constants. Charge-sharing models are first constructed on resistor networks, then they are extended to transistor networks. These models have been incorporated into nRSIM --- a RSIM-based switch-level simulator. The new simulator has the same running time as the original RSIM, but it can handle a larger class of circuits. ------------------------------------------------------------------------------- CSL-TR-88-369 Composing User Interfaces with InterViews Mark A. Linton, John M. Vlissides, and Paul R. Calder November 1988 25 pages.....$4.25 In this paper we show how to compose user interfaces with InterViews, a user interface toolkit we have developed at Stanford. InterViews provides a library of predefined objects and a set of protocols for composing them. A user interface is created by composing simple primitives in a hierarchical fashion, allowing complex user interfaces to be implemented easily. InterViews supports the composition of interactive objects (such as scroll bars and menus), text objects (such as words and whitespace), and graphics objects (such as circles and polygons). To illustrate how InterViews composition mechanisms facilitate the implementation of user interfaces, we present three simple applications: a dialog box built from interactive objects, a drawing editor using a hierarchy of graphical objects, and a class browser using a hierarchy of text objects. We also describe how InterViews supports consistency across applications as well as end-user customization. ------------------------------------------------------------------------------- CSL-TR-88-370 Copy Elimination in Functional Languages K. Gopinath and John L. Hennessy November 1988 15 pages.....$3.75 Copy elimination is an important optimization for compiling functional languages. Copies arise because these languages lack the concepts of state and variable; hence updating an object involves a copy in a naive implementation. Copies are also possible if proper targeting has not been carried out inside functions and across function calls. Targeting is the proper selection of a storage area for evaluating an expression. By abstracting a collection of functions by a target operator, we compute targets of function bodies that can then be used to define an optimized interpreter to eliminate copies due to updates and copies across function calls. The language we consider is typed lambda calculus with higher-order functions and special constructs for array operations. Our approach can eliminate copies in divide and conquer problems like quicksort and bitonic sort that previous approaches could not handle. We also present some results of implementing a compiler for a single assignment language called SAL on some small but tough programs. Our results indicate that it is possible to approach a performance comparable to imperative languages like Pascal. 3 ------------------------------------------------------------------------------- CSL-TR-88-371 Generalized Access Control Strategies for Token Passing Systems Joseph W. M. Pang, Fouad A. Tobagi and Stephen Boyd November 1988 42 pages.....$5.10 As the concepts of distributed computing, resource sharing, and office and factory automation are being put into practice, the demand for integrated services local area networks has never been greater before. Recently, a timer based integrated access protocol has been adopted by two major token passing local area network standards, the IEEE 802.4 Token Bus Standard [1] and the ANSI drafted Fiber Distributed Data Interface (FDDI) Standard [2]. This protocol is capable of providing bounded access delay for real-time traffic, and multiple priority classes for data traffic [3]--[4]. Nevertheless, under heavy traffic, this protocol is not asymptotically stable. In this report, we present a very broad class of access control strategies, generalized from the timer based approach adopted in [1]--[2]. Similar to the timer based protocol, this class of access control strategies has the same favorable properties of providing bounded token cycles and multiple classes of service. Furthermore, at high load, the token cycles for this new class of strategies converge with a geometric bound, as opposed to exhibiting the oscillatory behaviour observed with the timer based protocol in [3]--[4]. Based on the convergence property, we are able to compute the efficiency of the controlled access token channel and the bandwidth allocation among stations under heavy traffic. Moreover, we derive tight upper bounds on token cycles under arbitrary loading conditions. This study not only enriches the design space of integrated services token systems, but also provides meaningful insights into the control mechanism within the protocol. ------------------------------------------------------------------------------- CSL-TR-88-372 Incremental VLSI Compaction Clyde W. Carpenter December 1988 117 pages....$8.85 VLSI compaction is the translation from a high-level description of a circuit down to the detailed layout needed for fabrication. A compactor tries to make as compact a layout as possible without violating any design rules. An incremental compactor allows one to edit a schematic or change layout constraints and quickly see the effects of the change. An incremental compactor has to incrementally generate and solve the constraints needed to enforce the design rules. This dissertation presents an algorithm that uses adjacency lists to generate and incrementally update a minimal complete set of the spacing constraints needed to keep adjacent tiles in a layout from interfering with each other. The base algorithm creates clockwise threaded lists of non-overlapping, fixed-size tiles. The algorithm is complicated by the need to handle wires, overlapping tiles, and various different spacing rules. In near linear time it generates an average of 1.2 spacing constraints per tile. The adjacency lists allow fast, efficient updates when tiles are moved, deleted, or inserted. This dissertation also presents three algorithms to solve the constraints once they are generated. In addition to minimizing area, these algorithms also minimize the total wire length. One of them calculates the sum of the wire-pull weights on each subtree of a directed spanning tree of active constraints to decide which subtrees need to be moved. This weighted tree provides enough information to make incremental changes in time proportional to the size of the change instead of to the size of the circuit. Wire-length minimization improves the layout but gives compaction a slightly worse than linear expected time. 4 ------------------------------------------------------------------------------- CSL-TR-88-373 Sparse Distributed Memory Prototype: Address Module Hardware Guide M. J. Flynn, R. Zeidman, E. Lochner December 1988 78 pages.....$6.90 This document is a detailed specification of the hardware design of the Address Module for the prototype Sparse Distributed Memory. It contains all of the information needed to build, test, debug, modify and operate the Address Module. ------------------------------------------------------------------------------- CSL-TR-88-374 Post-Game Analysis--- A Heuristic Resource Management Framework for Concurrent Systems Jerry C. Yan December 1988 212 pages.....$13.60 While multiprocessors promise to deliver orders of magnitude speedup, the effective use of such computers still depends critically on the ability of their resource management systems to properly trade off communication loss and concurrency gain, exploit behavioral characteristics of the application programs, and take advantage of specific hardware features of the multiprocessor. Unfortunately, the performance of many proposed resource management strategies can only be evaluated indirectly by simple objective functions (such as execution cost or mapping cardinality). Research has been conducted to determine how distributed computations can be mapped onto multiprocessors to minimize execution time. The approach proposed here, known as Post-game analysis, offers an unconventional alternative to reduce program execution time. Program--machine mapping is improved ``in between'' program executions. Instead of using simple abstract models, post-game analysis utilizes actual timing data gathered during program execution. Program execution time is reduced based on many optimization sub-goals. Because heuristics are applied to improve the current mapping and resolve conflicting sub-goals, post-game analysis can be incrementally refined and tailored to specific applications and architectures. The performance of post-game analysis has been compared with other strategies using various program structures and multiprocessor models. Results obtained from simulations show that it outperforms the speedup obtained based on random placement, load balancing as well as clustering algorithms by 15%. Because intermediate program/machine models are not used, post-game analysis is very promising for immediate application to today's programs with today's machines.} area, these algorithms also minimize the total wire length. One of them calculates the sum of the wire-pull weights on each subtree of a directed spanning tree of active constraints to decide which subtrees need to be moved. This weighted tree provides enough information to make incremental changes in time proportional to the size of the change instead of to the size of the circuit. Wire-length minimization improves the layout but gives compaction a slightly worse than linear expected time. ------------------------------------------------------------------------------- CSL-TR-89-375 THROUGHPUT APPROXIMATION FOR THE GENERALIZED TIMER BASED PROTOCOL FOR TOKEN PASSING SYSTEMS Joseph W. M. Pang and Fouad A. Tobagi January 1989 33 pages.....$4.65 Recently, a timer based integrated access protocol has been adopted by two 5 major local area network (LAN) standards, the IEEE 802.4 Token Bus Standard [1] and the ANSI drafted Fiber Distributed Data Interface Standard [2]. This timer based approach is capable of providing bounded access delay for real-time traffic (i.e., traffic with stringent delay requirements), multiple classes of service, and efficient use of the communication channel [3]-[4]. Based on a careful examination of the dynamical behaviour of the timer based protocol under heavy traffic, we have been able to generalize the underlying idea behind the protocol and proposed a new class of access control schemes in [5]. We have also established in [5] important results pertaining to the bandwidth allocation under heavy traffic and upper bounds on token cycles under general traffic for the generalized control schemes. Although the results on bandwidth allocation obtained in [5] are very useful for first-cut design purposes, they are only applicable under heavy traffic. In this report, we are interested in obtaining the station throughputs for the generalized control schemes under arbitrary input traffic. Based on the assumption that the variances of station service times are small, we develop a simple and yet very good approximation for the station throughputs, that is dependent only on the average data arrival rates at the stations. Using this approximation, we can determine whether a station in the system is stable or not. Furthermore, if a station is not stable, its throughput can be obtained from the approximation. This approximation is also applicable to the timer based protocol used in the LAN standards [1]-[2]. ------------------------------------------------------------------------------- 6 Naomi Schulman COMPUTER SYSTEMS LABORATORY Stanford University Stanford, CA 94305-4055 EM: Schulman@Sierra.Stanford.Edu 415-723-1430 Hours: M-Th, 8-1 ORDER FORM To Order Reports: Print or type your name and address in the space provided. Check or circle the report(s) you wish to purchase, whether hardcopy or microfiche. All orders must be PREPAID. CALIFORNIA RESIDENTS must add sales tax of their local county. Return this order form with your check or money order made payable to Stanford University. FOREIGN ORDERS must be paid with an international money order or a check drawn on a U.S. bank, payable in dollars. ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ _____________________________________________ Report # Hardcopy Microfiche Report # Hardcopy Microfiche TR 365 $9.50 $3.00 TR 371 $5.10 $2.00 TR 366 $12.15 $3.00 TR 372 $8.85 $3.00 TR 367 $4.95 $2.00 TR 373 $6.90 $2.00 TR 368 $10.35 $3.00 TR 374 $13.60 $4.00 TR 369 $4.25 $2.00 TR 375 $4.65 $2.00 TR 370 $3.75 $2.00 Subtotal __________ Your Local CA County tax __________ Total __________ -------