MFLLL002@VE.BOGECN.EDU (03/22/91)
From: Naomi Schulman schulman@shasta.Stanford.EDU Return-Path: <schulman@shasta.Stanford.EDU> Date: Wed, 13 Mar 1991 13:23:43 PST Naomi Schulman COMPUTER SYSTEMS LABORATORY Stanford University Stanford, CA 94305-4055 e-mail: schulman@sierra.stanford.edu (415) 723-1430 Hours: M-Th 8-1 (PST) RECENT PUBLICATIONS LIST #19 SEPTEMBER 1990 To order reports, see Order Form on last page. CSL-TR-90-425 CONCURRENT RUNTIME MONITORING OF FORMALLY SPECIFIED PROGRAMS Manas Mandal and Sriram Sankar April 1990 31 pages.....$5.48 This paper describes an application of formal specifications after an executable program has been constructed. We describe how high level specifications can be utilized to monitor critical aspects of the behavior of a program continuously while it is executing. This methodology provides a capability to distribute the monitoring of specifications on multi-processor hardware platforms to meet practical time constraints. Typically, runtime checking of formal specifications involves a significant time penalty which makes it impractical during normal production opertion of a program. In previous research, runtime checking has been applied during testing and debugging of software, but not on a permanent basis. Crucial to our current mthodology is the use of multi-processor machines - hence runtime monitoring can be performed concurrently on different processors. We describe techniques for distributing checks onto different processors. To control the degree of concurrency, we introduce checkpoints - a point in the program beyond which execution cannot proceed until the specified checks have been completed. Error reporting and recovery in a multiprocessor environment is complicated and there are various techniques of handling this. We describe a few of these techniques in this paper. An implementation of this methodology for the Anna specification language for Ada programs is described. Results of experiments conducted on this implementation using a 12-processor Sequent Symmetry demonstrate that permanent concurrent monitoring of programs based on formal specifications is indeed feasible. ------------------------------------------------------------------------------- CSL-TR-90-426 A VLSI ARCHITECTURE FOR THE FCHC ISOMETRIC LATTICE GAS MODEL Fung F. Lee, Michael J. Flynn, and Martin Morf April 1990 28 pages.....$5.24 Lattice gas models are cellular automata used for the simulation of fluid dynamics. This paper addresses the design issues of a lattice gas collision rule processor for the four-dimensional FCHC isometric lattice gas model. A novel VLSI architecture based on an optimized version of Henon's isometric algorithm is proposed. One of the key concepts behind this architecture is the permutation group representation of the isometry group of the lattice. In contrast to the straightforward table lookup approach which would take 4.5 billion bits to implement this set of collision rules, the size of our processor is only about 5000 gates. With a reasonable number of pipeline stages, the processor can deliver one result per cycle with a cycle time comparable to or less than that of a common commercial DRAM. ------------------------------------------------------------------------------- CSL-TR-90-427 GENERALIZED GRAPHICAL OBJECT EDITING John M. Vlissides June 1990 166 pages.....$16.28 Many editors use the graphics capabilities of personal workstations to provide a visual editing environment. Such editors present graphical representations of familiar objects and allow the user to manipulate the representations directly. This style of interaction is usually more intuitive to the user than typing statements in a command language. However, implementing a graphical object editor has been a difficult undertaking. Though many packages exist that aid in the construction of graphical user interfaces, none are designed specifically for building graphical object editors. This is significant because there is a substantial semantic gap between general user interfaces and the functionality of graphical object editors. For example, user interface packages usually provide buttons, scroll bars, and ways to assemble them, but they do not offer primitives for building drawing editors that produce PostScript or schematic capture systems that produce netlists. Higher-level abstractions are needed to make such applications easier to develop. Unidraw is a framework for creating object-oriented graphical editors in domains such as technical and artistic drawing, music composition, and circuit design. The Unidraw architecture simplifies the construction of these editors by providing programming abstractions that are common across domains. Unidraw defines four basic abstractions: "components" encapsulate the appearance and semantics of objects in a domain, "tools" support direct manipulation of components, "commands" define operations on components and other objects, and "external representations" define the mapping betweencomponents and the file format generated by the editor. Unidraw also supports multiple views, graphical connectivity and confinement, and dataflow between components. This thesis describes the Unidraw design, implementation issues, and three experimental domain-specific editors we have developed with Unidraw: a drawing editor, a user interface builder, and a schematic capture system. Our results indicate a substantial reduction in implementation time and effort compared with existing tools. ------------------------------------------------------------------------------- CSL-TR-90-428 SUB-NANOSECOND ARITHMETIC Michael J. Flynn, Giovanni De Micheli, Robert Dutton, Bruce Wooley, and Fabian Pease May 1990 26 pages.....$5.08 The SNAP (Stanford Nanosecond Arithmetic Processor) project is targeted at realizing an arithmetic processor with performance approximately an order of magnitude faster than currently available technology. The realization of SNAP is predicated on an interdisciplinary approach and effort spanning research in algorithms, data representation, CAD, circuits and devices, and packaging. SNAP is visualized as an arithmetic coprocessor implemented on an active substrate containing several chips, each of which realize a particular arithmetic function. ------------------------------------------------------------------------------- CSL-TR-90-429 MULTIPROCESSOR SMALLTALK: IMPLEMENTATION, PERFORMANCE, AND ANALYSIS Joseph Ira Pallas June 1990 140 pages...$14.20 Multiprocessor Smalltalk demonstrates the value of object-oriented programming on amultiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object-oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy. Idle time in the interpreter (up to 51% overhead, excluding a pathological case) could be reduced with a parallel garbage collector to 10%. Idle time in user code (up to 35% overhead) remains the programmer's responsibility. We can identify the key characteristics that make Multiprocessor Smalltalk successful. The Smalltalk language allows us to build concurrent control structures using lambda expressions without extending the language. Inexpensive processes and efficient garbage collection are also crucial. Hardware/operating-system support for shared memory, per-process variables, and inexpensive synchronization are essential to the implementation. Given these, object-oriented languages and multiprocessors are a good match. ------------------------------------------------------------------------------- CSL-TR-90-430 (also numbered STAN-CS-90-1318) TECHNIQUES FOR IMPROVING THE PERFORMANCE OF SPARSE MATRIX FACTORIZATION ON MULTIPROCESSOR WORKSTATIONS Edward Rothberg and Anoop Gupta June 1990 16 pages.....$4.28 In this paper we look at the problem of factoring large sparse systems of equations on high-performance multiprocessor workstations. While these multiprocessor workstations are capable of very high peak floating point computation rates, most existing sparse factorization codes achieve only a small fraction of this potential. A major limiting factor is the cost of memory accesses performed during the factorization. In this paper, we describe a parallel factorization code which utilizes the supernodal structure of the matrix to reduce the number of memory references. We also propose enhancements that significantly reduce the overall cache miss rate. The result is greatly increased factorization performance. We present experimental results from executions of our codes on the Silicon Graphics 4D/380 multiprocessor. Using eight processors, we find that the supernodal parallel code achieves a computation rate of approximately 40 MFLOPS when factoring a range of benchmark matrices. This is more than twice as fast as the parallel nodal code developed at the Oak Ridge National Laboratory running on the SGI 4D/380. ------------------------------------------------------------------------------- CSL-TR-90-431 LATENCY AND THROUGHPUT TRADEOFFS IN SELF-TIMED SPEED-INDEPENDENT PIPELINES AND RINGS Ted Williams June 1990 29 pages....$5.32 Asynchronous pipelines control the flow of tokens through a sequence of logical stages based on the status of local completion detectors. As in a synchronously clocked circuit, the design of self-timed pipelines can trade off between achieving low latencyn and high throughput. However, there are more degress of freedom because of the variances in specific latch and function block styles, and the possibility of varying both the number of latches between function blocks and their connections to the completion detectors. This report demonstrates the utility of a graph-based methodology for analyzing the timing dependencies and uses it to make comparisons of different configurations. It is shown that the extremes for high throughput and low latency differ significantly, the placement of the completion detectors influences timing as much as adding an additional latch, and the choice as to whether precharged or static logic is best is dependent on the cost in complexity of the completion detectors. ------------------------------------------------------------------------------- CSL-TR-90-432 THE OLYMPUS SYNTHESIS SYSTEM FOR DIGITAL DESIGN Giovanni De Micheli, David Ku, Fredric Mailhot, Thomas Truong July 1990 29 pages.....$5.32 The Olympus Synthesis System, developed at Stanford University, is a vertically integrated set of tools for the synthesis of digital circuit designs. The system is specifically designed to support synthesis of Application-Specific Integrated Circuits from behavioral-level descriptions, written in a hardware description language called HardwareC. The Olympus system supports synthesis with timing constraints at the behavioral, structural and logic levels. Two internal models, Sequencing Intermediate Form, (SIF) and Structural/Logic Intermediate Form (SLIF), are used to represent the hardware at different levels of abstraction and to provide a way to pass design information between the different tools. This paper describes Olympus as a system consisting of a set of programs for behavioral, structural and logic synthesis, technology mapping and simulation. The system has been used to design three ASIC chips at Stanford University and it has been tested against benchmark circuits for high-level and logic synthesis. ------------------------------------------------------------------------------- CSL-TR-90-433 LIMITS ON MULTIPLE INSTRUCTION ISSUE Michael D. Smith, Mike Johnson, and Mark A. Horowitz July 1990 18 pages.....$4.44 This paper investigates the limitations on designing a processor which cansustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications. We have used trace-driven simulations to determine that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle. In a straightforward implementation, cost considerations argue strongly against decoding more than two instructions in one cycle. Given this constraint, the efficiency in instruction fetching rather than the complexity of the execution hardware limits the concurrency attainable at the instruction level. ------------------------------------------------------------------------------- CSL-TR-90-434 BOOSTING BEYOND STATIC SCHEDULING IN A SUPERSCALAR PROCESSOR Michael D. Smith, Monica S. Lam, and Mark A. Horowitz July 1990 19 pages.....$4.52 This paper describes a superscalar processor that combines the best qualities of static and dynamic instruction scheduling to increase the performance of non-numerical applications. The architecture performs all instruction scheduling statically to take advantage of the compiler's ability to efficiently schedule operations across many basic blocks. Since the conditional branches in non-numerical code are highly data dependent, the architecture introduces the concept of boosted instructions, instructions that are committed conditionally upon the result of later branch instructions. Boosting effectively removes the dependences caused by branches and makes the scheduling of side-effect instructions as simple as those that are side-effect free. For efficiency, boosting is supported in the hardware by shadow structures that temporarily hold the side effects of boosted instructions until the conditional branches that the boosted instructions depend upon are executed. When the branch condition is determined, the buffered side effects are either committed or squashed. The limited static scheduler in our evaluation system shows that a 1.6-times speedup over scalar code is achievable by boosting instructions above only a single conditional branch. This performance is similar to the performance of a pure dynamic scheduler. ------------------------------------------------------------------------------- CSL-TR-90-435 COLLABORATION TRANSPARENCY IN DESKTOP TELECONFERENCING ENVIRONMENTS J. Chris Lauwers July 1990 133 pages.....$13.64 The ultimate goal of desktop teleconferencing environments is to integrate contemporary (non-computer-supported) audio/video teleconferencing technologies with workstation-based network computing environments. This thesis studies the application sharing facilities that allow multiple conference participants to interact simultaneously with computer-based applications in a desktop teleconference. It focuses on collaboration-transparent facilities that permit existing single-user applications to be invoked from within a computer conference. In the context of contemporary workstation-based environments, the pursuit of collaboration transparency leads to the development of shared window systems. This thesis introduces the basic architectures for shared window systems and analyzes them based on experience with several prototypes. Contemporary window system standards present many obstacles to shared window designers. This thesis discusses these obstacles and suggests how they can be overcome. In addition, new architectures for window systems are proposed that can accommodate window sharing more effectively. The thesis then introduces application replication as a technique for improving overall shared window system performance. Application replication introduces many synchronization problems, arising mainly from the need to keep replicated copies of shared applications consistent. This thesis shows that the most frequent synchronization problems can be solved without changing existing software. The remaining problems can be solved by making applications or system servers collaboration-aware. Finally, this thesis studies the ramifications, for the software designer, of supporting spontaneous interactions, shared workspace management, floor control, user customization, and annotations and telepointing. While the recommendations that result are motivated by the desire to enable continued use of collaboration-transparent applications, addressing them involves the development of systems software that is distinctly collaboration-aware. ------------------------------------------------------------------------------- CSL-TR-90-436 APPLICATION OF FORMAL SPECIFICATION TO SOFTWARE MAINTENANCE Neel Madhav and Sriram Sankar August 1990 15 pages.....$4.20 This paper describes the use of formal specifications and associated tools in addressing various aspects of software maintenance ---corrective, perfective, and adaptive. It also addresses the refinement of the software development process to build programs that are easily maintainable. The task of software maintenance in our case includes the task of maintaining the specification as well as maintaining the program. We focus on the use of Anna, a specification language for formally specifying Ada programs, to aid us in maintaining Ada programs. These techniques are applicable to most other specification language and programming language environments. The tools of interest are: (1) the Anna Specification Analyzer which allows us to analyze the specification for correctness with respect to our informal understanding of program behavior; and (2) the Anna Consistency Checking System which monitors the Ada program at runtime based on the Anna specification. ------------------------------------------------------------------------------- CSL-TR-90-437 AN ADA---PROLOG SYSTEM Neel Madhav August 1990 15 pages.....$4.20 This paper presents a software development tool --- the Ada-Prolog system which combines the strengths of both descriptive and procedural programming styles. Concrete reasons and examples are provided to demonstrate that such a tool would be useful. This tool provides various operations available in Prolog for clausebuilding, database building and querying to Ada programs.In addition to allowing dynamic access to both Ada and Prolog, the Ada-Prolog system adds to the functionality provided by Prolog by partitioning the Prolog database into lists of clauses. These lists can be created, updated and destroyed dynamically. Concurrent access to the list of clauses is also possible. Queries can be directed to groups of these lists. The system is meant for use in expert systems, compilers, database applications, rapid prototyping systems, advanced environments, and other software tools which use deduction. ------------------------------------------------------------------------------- CSL-TR-90-438 A METHODOLOGY FOR FORMAL SPECIFICATION AND IMPLEMENTATION OF ADA PACKAGES USING ANNA Neel Madhav and Walter Mann August, 1990 24 pages.....$4.92 This paper presents a methodology for formal specification and prototype implementation of Ada packages using the Anna specification language. Specifications play an important role in the software development cycle. The methodology allows specifiers of Ada packages to follow a sequence of simple steps to formally specify packages. Given the formal specification of a package resulting from the methodology for package specifications, the methodology allows implementors of packages to follow a few simple steps to implement the package. The implementation is meant to be a prototype. This methodology for specification and implementation is applicable tomost Ada packages. Limitations of this approach are pointed out at various points in the paper. We present software tools which help the process of specification and implementation. ------------------------------------------------------------------------------- CSL-TR-90-439 TANGO: A MULTIPROCESSOR SIMULATION AND TRACING SYSTEM Helen Davis and Stephen R. Goldschmidt July 1990 25 pages.....$5.00 Tango is a software simulation and tracing system used to obtain data for evaluating parallel programs and multiprocessor systems. The system provides a simulated multiprocessor environment by multiplexing application processes onto a single processor. Tango achieves high efficiency by running compiled user code, and by focusing on the information of greatest interest to multiprocessing studies. The system is being applied to a wide range of investigations, including algorithm studies and a variety of hardware evaluations. ------------------------------------------------------------------------------- CSL-TR-90-440 BIBLIOGRAPHY OF THE COMPUTER SYSTEMS LABORATORY - 1968-1990 (Third Edition) Naomi Schulman August 1990 71 pages.....$10.00 The bibliography lists all technical reports and notes published by the Computer Systems Laboratory of Stanford University, from 1968 to date. This edition is in a binder where future announcements of new reports can be kept to update the bibliography temporarily. When new reports are in sufficient number, a whole new page will be created in the bibliography format. New pages will be listed on future announcement Order Forms and will be sent at no charge along with any order placed. Prices of reports and notes are listed separately, starting on page ix. Please note that report prices. are printed on yellow paper, and note prices are printed on blue paper. These prices will supersede any previously given prices, and may themselves be superseded in the future if printing and/or postage costs rise. The symbol (*), which, in earlier editions signified that a report was out of print, is now used only to indicates that no "original" is available for copying. All other reports will be made available upon request. An earlier edition of the Bibliography (TR-272 or TR-336), plus copies of announcements 15,16, 17, 18 and 19, will complete the list of publications. However, new price lists will be needed to complete the updating of previous editions. The TR Price List, 1990-91 and TN Price List, 1990-91 will be sent upon request with any order placed. ------------------------------------------------------------------------------- CSL-TR-90-441 COMPUTING TYPES DURING PROGRAM SPECIALIZATION Daniel Weise and Erik Ruf August 1990 26 pages.....$5.08 We have developed techniques for obtaining and using type information during program specialization (partial evaluation). Computed along with every residual expression and every specialized program is type information that bounds the possible values that the specialized program will compute at run time. The three keystones of this research are symbolic values that represent both a value and the code for creating the value, generalization of symbolic values, and the use of online fixed-point iterations for computing the type of values returned by specialized recursive functions. The specializer exploits type information to increase the efficiency of specialized functions. This research has two benefits, one anticipated and one unanticipated. The anticipated benefit is that programs that are to be specialized can now be written in a more natural style without losing accuracy during specialization. The unanticipated benefit is the creation of what we term concrete abstract interpretation. This is a method of performing abstract interpretation with concrete values where possible. The specializer abstracts values as needed, instead of requiring that all values be abstracted prior to abstract interpretation. ------------------------------------------------------------------------------- CSR-TR-90-442 AN IMPROVED HIGH-SPEED FLOATING-POINT ADDITION ALGORITHM Nhon T. Quach and Michael J. Flynn August 1990 21 pages.....$4.68 This paper describes an improved, IEEE conforming floating-point addition algorithm. This algorithm has only one addition step involving the significand in the worst-case path, hence offering a considerable speed advantage over the existing algorithms, which typically require two to three addition steps. ------------------------------------------------------------------------------- CSL-TR-90-443 A QUEUING ANALYSIS FOR DISK ARRAY SYSTEMS Mikito Ogata and Michael J. Flynn August 1990 51 pages.....$7.08 Using a queuing model of disk arrays, we study the performance and tradeoffs in disk array sub-systems and develop guidelines for designing these sub-systems in various CPU environments. Finally, we compare our model with some earlier simulation results. ------------------------------------------------------------------------------- CSL-TR-90-444 EVALUATION OF A FOUR FOLD SPARSE DISTRIBUTED MEMORY PROTOTYPE Brian Flachs August 1990 58 pages.....$7.64 Sparse Distributed Memory is a generalized random access memory for long binary words. This document describes changes made to the Single Fold Prototype to develop Stanford's Four Fold Prototype. A users manual for the libraries and utilities developed for the prototype and a performance evaluation of the prototype are also included. Appendices detail hardware settings and provide a complete, up-to-date, set of Address Module Schematics. ------------------------------------------------------------------------------- CSL-TR-90-445 (also numbered STAN-CS-90-1334) SOFT CONFIGURABLE WAFER SCALE INTEGRATION: DESIGN, IMPLEMENTATION AND YIELD ANALYSIS Miriam Greta Blatt June 1990 123.....$12.84 Soft-Configurable Wafer Scale Integration uses software controlled switches to connect up the fault-free parts of a wafer. Compared to hard configuration, th e soft configurable approach has the advantages of providing low-cost connections and runtime fault tolerance. The dissertation describes how to achieve soft co nfiguration with high performance, presenting a pipelined memory system implemented using this approach. The yield of the prototype is evaluated in two phases. Fault simulation applies measured defect statistics to the layout to predict the yield of each circuit unit. These unit yields are combined to produce wafer yields using redundancy models appropriate to wafer scale integration. The redundancy models constrain wafer yield by system requirements such as the minimum n umber of working circuit units, and whether these working units are distributed evenly around the wafer. Choice of redundancy model significantly affects the r esulting wafer yield. ------------------------------------------------------------------------------- CSL-TR-90-446 ANALYZING THE FAULT TOLERANCE OF A CLASS OF DOUBLE-LOOP NETWORKS Jon M. Peha and Fouad A. Tobagi September 1990 30 pages.....$5.40 This paper analyzes the fault tolerance of a class of double-loop networks referred to as forward-loop backward-hop (FLBH), in which each node is connected via unidirectional links to the node one hop in front of it and to the node S hops in back of it for some S. A new measure of fault tolerance is described, along with techniques based on Markov chains to calculate upper and lower bounds on the fault tolerance of this network topology quickly and efficiently. The result s of these calculations provide a more precise description of network fault tolerance than has been achieved with previously published techniques. ------------------------------------------------------------------------------- CSL-TR-90-447 EVALUATION OF SCHEDULING ALGORITHMS FOR INTEGRATED-SERVICES PACKET-SWITCHED NETWORKS Jon Peha and Fouad A. Tobagi September 1990 31 pages.....$5.48 In this paper, we consider integrated-services packet-switched networks with two types of traffic: traffic with deadlines, for which the most important perform ance objective is based on loss rate, and packets without deadlines, for which the most important performance objective is based on mean delay. We present an o ptimal scheduling algorithm to minimize weighted loss rate and weighted mean delay in the queues that form at the switches and at the network access points of a packet-switched network, where weights reflect the relative importance of packets. Although not practical for implementation, the algorithm is intended as a standard for comparison with which the performance of other scheduling algorithms can be evaluated. Our algorithm is more general and of lower computational co mplexity than previously published algorithms, enabling us to evaluate performance in some important scenarios that could not previously have been considered. U sing optimal performance results from our algorithm, we evaluate the performance of the first-come-first-served, static priority, and earliest deadline first al gorithms, finding some limitations of each. Our results suggest that network efficiency could be improved by using a more sophisticated heuristic scheduling alg orithm rather than one of the aforementioned algorithms. ------------------------------------------------------------------------------- CSL-TR-90-448 A COST-BASED SCHEDULING ALGORITHM TO SUPPORT INTEGRATED SERVICES Jon Peha and Fouad A. Tobagi September 1990 44 pages.....$6.52 It is becoming increasingly important to support applications with diverse performance objectives on a single packet-switched network. The efficiency of a netw ork with such diverse traffic can be greatly improved through the use of sophisticated scheduling and dropping algorithms within the queues that form at the net work access points and in switches throughout the network. This paper presents heuristic algorithms for that purpose. In our approach, arbitrary performance o bjectives are defined in the form of cost functions, which map the queueing delay experienced by each packet to a cost incurred. Our algorithms, cost-based scheduling (CBS) and cost-based dropping (CBD), then attempt to optimize network performance as perceived by the applications by mini mizing the total cost incurred by all packets. Cost functions are presented that are appropriate for most common applications. Scheduling and dropping algorit hms are defined based on these cost functions. Through simulation, it is demonstrated that network performance is better when these heuristic algorithms are use d as opposed to the common alternatives. ------------------------------------------------------------------------------- CSL-TR-90-449 (also numbered STAN-CS-90-1330) PARALLEL ICCG ON A HIERARCHICAL MEMORY MULTIPROCESSOR -- ADDRESSING THE TRIANGULAR SOLVE BOTTLENECK Edward Rothberg and Anoop Gupta September 1990 22 pages.....$4.76 The incomplete Cholesky conjugate gradient (ICCG) algorithm is a commonly used iterative method for solving large sparse systems of equations. In this paper, w e study the parallel solution of sparse triangular systems of equations, the most difficult aspect of implementing the ICCG method on a multiprocessor. We focu s on shared-memory multiprocessor architectures with deep memory hierarchies. On such architectures we find that previously proposed parallelization approaches result in little or no speedup. The reason is that these approaches cause significant increases in the amount of memory system traffic as compared to a sequen tial approach. Increases of as much as a factor of 10 on four processors were observed. In this paper we propose new techniques for limiting these increases, including data remappings to increase spatial locality, new processor synchronization techniques to decrease the use of auxiliary data structures, and data part itioning techniques to reduce the amount of interprocessor communication. With these techniques, memory system traffic is reduced to as little as one sixth of its previous volume. The resulting speedups are greatly improved as well, although they are still much less than linear. We discuss the factors that limit fur ther speedups. We present both simulation results and results of experiments on an SGI 4D/340 multiprocessor. ------------------------------------------------------------------------------- CSL-TR-90-450 (also numbered STAN-CS-90-1333) COMPARING STRUCTURALLY DIFFERENT VIEWS OF A VLSI DESIGN Michael Joseph Spreitzer June 1990 162 pages.....$15.96 VLSI designers often use structural hierarchies and alternate views of a design. Allowing alternate views to have different hierarchies improves the clarity of the views, but complicates comparison of those views. Most existing comparison techniques either require essentially identical hierarchies, or can only handle differences by flattening. A new technique, Informed Comparison, first reconciles the hierarchy differenc es by making purely structural transformations according to a description of the intended relationship between the hierarchies, and then finishes with an existi ng hierarchical technique. Informed Comparison thus can compare views with different hierarchies without the penalties associated with flattening. ------------------------------------------------------------------------------- Naomi Schulman COMPUTER SYSTEMS LABORATORY Stanford University Stanford, CA 94305-4055 E-mail: schulman@sierra.stanford.edu 415-723-1430 Hours: M-Th, 8-1 (PST) ORDER FORM AND INVOICE ALL ORDERS MUST BE PREPAID Please print or type your name and address in the space provided. Check the number(s) of the report(s) you wish to purchase and circle the price of hardcopy or microfiche. California Residents must add the sales tax of their own local county. Return this order form with a check or money order made payable to Stanford University. Foreign Orders must include $1.00 extra for each $15.00's worth of reports to cover postage. Checks must be drawn on a U.S. bank, payable in U.S. dollars. If an invoice is required for payment, please note that this form is both an order form and an invoice. We cannot invoice separately. (Name) ________________________________ (Address)_______________________________ ________________________________ ________________________________ ________________________________ E-mail address:________________________________ REPORT # HARDCOPY FICHE REPORT # HARDCOPY FICHE TR-425 $ 5.48 $2.00 TR-438 $ 4.92 $2.00 TR-426 $ 5.24 $2.00 TR-439 $ 5.00 $2.00 TR-427 $16.28 $3.00 TR-440 $10.00 $2.00 TR-428 $ 5.08 $2.00 TR-441 $ 5.08 $2.00 TR-429 $ 4.20 $2.00 TR-442 $ 4.68 $2.00 TR-430 $ 4.28 $2.00 TR-443 $ 7.08 $2.00 TR-431 $ 5.32 $2.00 TR-444 $ 7.64 $2.00 TR-432 $ 5.32 $2.00 TR-445 $12.84 $3.00 TR-433 $ 4.44 $2.00 TR-446 $ 5.40 $2.00 TR-434 $ 4.52 $2.00 TR-447 $12.84 $3.00 TR-435 $13.64 $3.00 TR-448 $ 5.40 $2.00 TR-436 $ 4.20 $2.00 TR-449 $ 5.48 $2.00 TR-437 $ 4.20 $2.00 TR-450 $15.96 $3.00 Subtotal __________ CA tax or Foreign Surcharge __________ Total __________ Circle which ones you want: Announcement 15, 16, 17, 18 Check here ___ to receive 1990-91 Price Lists (with report order, only) ** END OF MESSAGE ** Date: 1991-03-13 15:22:49 Msg-ID: RFC-822 CMM.0.88.668899423.schulman@shasta.Stanford.EDU OU= VE-GW ON=BITNET To: mflll002 CC: schulman@shasta.Stanford.EDU OU=VE-GW ON=BITNET From: Naomi Schulman schulman@shasta.Stanford.EDU OU=VE-GW ON=BITNET Subj: Announcement #19 ------------ Letter Body Part 1 - Text ------------ RFC-822-HEADERS: Return-Path: <schulman@shasta.Stanford.EDU> Date: Wed, 13 Mar 1991 13:23:43 PST ------------ Letter Body Part 2 - Text ------------ Naomi Schulman COMPUTER SYSTEMS LABORATORY Stanford University Stanford, CA 94305-4055 e-mail: schulman@sierra.stanford.edu (415) 723-1430 Hours: M-Th 8-1 (PST) RECENT PUBLICATIONS LIST #19 SEPTEMBER 1990 To order reports, see Order Form on last page. CSL-TR-90-425 CONCURRENT RUNTIME MONITORING OF FORMALLY SPECIFIED PROGRAMS Manas Mandal and Sriram Sankar April 1990 31 pages.....$5.48 This paper describes an application of formal specifications after an executable program has been constructed. We describe how high level specifications can be utilized to monitor critical aspects of the behavior of a program continuously while it is executing. This methodology provides a capability to distribute the monitoring of specifications on multi-processor hardware platforms to meet practical time constraints. Typically, runtime checking of formal specifications involves a significant time penalty which makes it impractical during normal production opertion of a program. In previous research, runtime checking has been applied during testing and debugging of software, but not on a permanent basis. Crucial to our current mthodology is the use of multi-processor machines - hence runtime monitoring can be performed concurrently on different processors. We describe techniques for distributing checks onto different processors. To control the degree of concurrency, we introduce checkpoints - a point in the program beyond which execution cannot proceed until the specified checks have been completed. Error reporting and recovery in a multiprocessor environment is complicated and there are various techniques of handling this. We describe a few of these techniques in this paper. An implementation of this methodology for the Anna specification language for Ada programs is described. Results of experiments conducted on this implementation using a 12-processor Sequent Symmetry demonstrate that permanent concurrent monitoring of programs based on formal specifications is indeed feasible. ------------------------------------------------------------------------------- CSL-TR-90-426 A VLSI ARCHITECTURE FOR THE FCHC ISOMETRIC LATTICE GAS MODEL Fung F. Lee, Michael J. Flynn, and Martin Morf April 1990 28 pages.....$5.24 Lattice gas models are cellular automata used for the simulation of fluid dynamics. This paper addresses the design issues of a lattice gas collision rule processor for the four-dimensional FCHC isometric lattice gas model. A novel VLSI architecture based on an optimized version of Henon's isometric algorithm is proposed. One of the key concepts behind this architecture is the permutation group representation of the isometry group of the lattice. In contrast to the straightforward table lookup approach which would take 4.5 billion bits to implement this set of collision rules, the size of our processor is only about 5000 gates. With a reasonable number of pipeline stages, the processor can deliver one result per cycle with a cycle time comparable to or less than that of a common commercial DRAM. ------------------------------------------------------------------------------- CSL-TR-90-427 GENERALIZED GRAPHICAL OBJECT EDITING John M. Vlissides June 1990 166 pages.....$16.28 Many editors use the graphics capabilities of personal workstations to provide a visual editing environment. Such editors present graphical representations of familiar objects and allow the user to manipulate the representations directly. This style of interaction is usually more intuitive to the user than typing statements in a command language. However, implementing a graphical object editor has been a difficult undertaking. Though many packages exist that aid in the construction of graphical user interfaces, none are designed specifically for building graphical object editors. This is significant because there is a substantial semantic gap between general user interfaces and the functionality of graphical object editors. For example, user interface packages usually provide buttons, scroll bars, and ways to assemble them, but they do not offer primitives for building drawing editors that produce PostScript or schematic capture systems that produce netlists. Higher-level abstractions are needed to make such applications easier to develop. Unidraw is a framework for creating object-oriented graphical editors in domains such as technical and artistic drawing, music composition, and circuit design. The Unidraw architecture simplifies the construction of these editors by providing programming abstractions that are common across domains. Unidraw defines four basic abstractions: "components" encapsulate the appearance and semantics of objects in a domain, "tools" support direct manipulation of components, "commands" define operations on components and other objects, and "external representations" define the mapping betweencomponents and the file format generated by the editor. Unidraw also supports multiple views, graphical connectivity and confinement, and dataflow between components. This thesis describes the Unidraw design, implementation issues, and three experimental domain-specific editors we have developed with Unidraw: a drawing editor, a user interface builder, and a schematic capture system. Our results indicate a substantial reduction in implementation time and effort compared with existing tools. ------------------------------------------------------------------------------- CSL-TR-90-428 SUB-NANOSECOND ARITHMETIC Michael J. Flynn, Giovanni De Micheli, Robert Dutton, Bruce Wooley, and Fabian Pease May 1990 26 pages.....$5.08 The SNAP (Stanford Nanosecond Arithmetic Processor) project is targeted at realizing an arithmetic processor with performance approximately an order of magnitude faster than currently available technology. The realization of SNAP is predicated on an interdisciplinary approach and effort spanning research in algorithms, data representation, CAD, circuits and devices, and packaging. SNAP is visualized as an arithmetic coprocessor implemented on an active substrate containing several chips, each of which realize a particular arithmetic function. ------------------------------------------------------------------------------- CSL-TR-90-429 MULTIPROCESSOR SMALLTALK: IMPLEMENTATION, PERFORMANCE, AND ANALYSIS Joseph Ira Pallas June 1990 140 pages...$14.20 Multiprocessor Smalltalk demonstrates the value of object-oriented programming on amultiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object-oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy. Idle time in the interpreter (up to 51% overhead, excluding a pathological case) could be reduced with a parallel garbage collector to 10%. Idle time in user code (up to 35% overhead) remains the programmer's responsibility. We can identify the key characteristics that make Multiprocessor Smalltalk successful. The Smalltalk language allows us to build concurrent control structures using lambda expressions without extending the language. Inexpensive processes and efficient garbage collection are also crucial. Hardware/operating-system support for shared memory, per-process variables, and inexpensive synchronization are essential to the implementation. Given these, object-oriented languages and multiprocessors are a good match. ------------------------------------------------------------------------------- CSL-TR-90-430 (also numbered STAN-CS-90-1318) TECHNIQUES FOR IMPROVING THE PERFORMANCE OF SPARSE MATRIX FACTORIZATION ON MULTIPROCESSOR WORKSTATIONS Edward Rothberg and Anoop Gupta June 1990 16 pages.....$4.28 In this paper we look at the problem of factoring large sparse systems of equations on high-performance multiprocessor workstations. While these multiprocessor workstations are capable of very high peak floating point computation rates, most existing sparse factorization codes achieve only a small fraction of this potential. A major limiting factor is the cost of memory accesses performed during the factorization. In this paper, we describe a parallel factorization code which utilizes the supernodal structure of the matrix to reduce the number of memory references. We also propose enhancements that significantly reduce the overall cache miss rate. The result is greatly increased factorization performance. We present experimental results from executions of our codes on the Silicon Graphics 4D/380 multiprocessor. Using eight processors, we find that the supernodal parallel code achieves a computation rate of approximately 40 MFLOPS when factoring a range of benchmark matrices. This is more than twice as fast as the parallel nodal code developed at the Oak Ridge National Laboratory running on the SGI 4D/380. ------------------------------------------------------------------------------- CSL-TR-90-431 LATENCY AND THROUGHPUT TRADEOFFS IN SELF-TIMED SPEED-INDEPENDENT PIPELINES AND RINGS Ted Williams June 1990 29 pages....$5.32 Asynchronous pipelines control the flow of tokens through a sequence of logical stages based on the status of local completion detectors. As in a synchronously clocked circuit, the design of self-timed pipelines can trade off between achieving low latencyn and high throughput. However, there are more degress of freedom because of the variances in specific latch and function block styles, and the possibility of varying both the number of latches between function blocks and their connections to the completion detectors. This report demonstrates the utility of a graph-based methodology for analyzing the timing dependencies and uses it to make comparisons of different configurations. It is shown that the extremes for high throughput and low latency differ significantly, the placement of the completion detectors influences timing as much as adding an additional latch, and the choice as to whether precharged or static logic is best is dependent on the cost in complexity of the completion detectors. ------------------------------------------------------------------------------- CSL-TR-90-432 THE OLYMPUS SYNTHESIS SYSTEM FOR DIGITAL DESIGN Giovanni De Micheli, David Ku, Fredric Mailhot, Thomas Truong July 1990 29 pages.....$5.32 The Olympus Synthesis System, developed at Stanford University, is a vertically integrated set of tools for the synthesis of digital circuit designs. The system is specifically designed to support synthesis of Application-Specific Integrated Circuits from behavioral-level descriptions, written in a hardware description language called HardwareC. The Olympus system supports synthesis with timing constraints at the behavioral, structural and logic levels. Two internal models, Sequencing Intermediate Form, (SIF) and Structural/Logic Intermediate Form (SLIF), are used to represent the hardware at different levels of abstraction and to provide a way to pass design information between the different tools. This paper describes Olympus as a system consisting of a set of programs for behavioral, structural and logic synthesis, technology mapping and simulation. The system has been used to design three ASIC chips at Stanford University and it has been tested against benchmark circuits for high-level and logic synthesis. ------------------------------------------------------------------------------- CSL-TR-90-433 LIMITS ON MULTIPLE INSTRUCTION ISSUE Michael D. Smith, Mike Johnson, and Mark A. Horowitz July 1990 18 pages.....$4.44 This paper investigates the limitations on designing a processor which cansustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications. We have used trace-driven simulations to determine that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle. In a straightforward implementation, cost considerations argue strongly against decoding more than two instructions in one cycle. Given this constraint, the efficiency in instruction fetching rather than the complexity of the execution hardware limits the concurrency attainable at the instruction level. ------------------------------------------------------------------------------- CSL-TR-90-434 BOOSTING BEYOND STATIC SCHEDULING IN A SUPERSCALAR PROCESSOR Michael D. Smith, Monica S. Lam, and Mark A. Horowitz July 1990 19 pages.....$4.52 This paper describes a superscalar processor that combines the best qualities of static and dynamic instruction scheduling to increase the performance of non-numerical applications. The architecture performs all instruction scheduling statically to take advantage of the compiler's ability to efficiently schedule operations across many basic blocks. Since the conditional branches in non-numerical code are highly data dependent, the architecture introduces the concept of boosted instructions, instructions that are committed conditionally upon the result of later branch instructions. Boosting effectively removes the dependences caused by branches and makes the scheduling of side-effect instructions as simple as those that are side-effect free. For efficiency, boosting is supported in the hardware by shadow structures that temporarily hold the side effects of boosted instructions until the conditional branches that the boosted instructions depend upon are executed. When the branch condition is determined, the buffered side effects are either committed or squashed. The limited static scheduler in our evaluation system shows that a 1.6-times speedup over scalar code is achievable by boosting instructions above only a single conditional branch. This performance is similar to the performance of a pure dynamic scheduler. ------------------------------------------------------------------------------- CSL-TR-90-435 COLLABORATION TRANSPARENCY IN DESKTOP TELECONFERENCING ENVIRONMENTS J. Chris Lauwers July 1990 133 pages.....$13.64 The ultimate goal of desktop teleconferencing environments is to integrate contemporary (non-computer-supported) audio/video teleconferencing technologies with workstation-based network computing environments. This thesis studies the application sharing facilities that allow multiple conference participants to interact simultaneously with computer-based applications in a desktop teleconference. It focuses on collaboration-transparent facilities that permit existing single-user applications to be invoked from within a computer conference. In the context of contemporary workstation-based environments, the pursuit of collaboration transparency leads to the development of shared window systems. This thesis introduces the basic architectures for shared window systems and analyzes them based on experience with several prototypes. Contemporary window system standards present many obstacles to shared window designers. This thesis discusses these obstacles and suggests how they can be overcome. In addition, new architectures for window systems are proposed that can accommodate window sharing more effectively. The thesis then introduces application replication as a technique for improving overall shared window system performance. Application replication introduces many synchronization problems, arising mainly from the need to keep replicated copies of shared applications consistent. This thesis shows that the most frequent synchronization problems can be solved without changing existing software. The remaining problems can be solved by making applications or system servers collaboration-aware. Finally, this thesis studies the ramifications, for the software designer, of supporting spontaneous interactions, shared workspace management, floor control, user customization, and annotations and telepointing. While the recommendations that result are motivated by the desire to enable continued use of collaboration-transparent applications, addressing them involves the development of systems software that is distinctly collaboration-aware. ------------------------------------------------------------------------------- CSL-TR-90-436 APPLICATION OF FORMAL SPECIFICATION TO SOFTWARE MAINTENANCE Neel Madhav and Sriram Sankar August 1990 15 pages.....$4.20 This paper describes the use of formal specifications and associated tools in addressing various aspects of software maintenance ---corrective, perfective, and adaptive. It also addresses the refinement of the software development process to build programs that are easily maintainable. The task of software maintenance in our case includes the task of maintaining the specification as well as maintaining the program. We focus on the use of Anna, a specification language for formally specifying Ada programs, to aid us in maintaining Ada programs. These techniques are applicable to most other specification language and programming language environments. The tools of interest are: (1) the Anna Specification Analyzer which allows us to analyze the specification for correctness with respect to our informal understanding of program behavior; and (2) the Anna Consistency Checking System which monitors the Ada program at runtime based on the Anna specification. ------------------------------------------------------------------------------- CSL-TR-90-437 AN ADA---PROLOG SYSTEM Neel Madhav August 1990 15 pages.....$4.20 This paper presents a software development tool --- the Ada-Prolog system which combines the strengths of both descriptive and procedural programming styles. Concrete reasons and examples are provided to demonstrate that such a tool would be useful. This tool provides various operations available in Prolog for clausebuilding, database building and querying to Ada programs.In addition to allowing dynamic access to both Ada and Prolog, the Ada-Prolog system adds to the functionality provided by Prolog by partitioning the Prolog database into lists of clauses. These lists can be created, updated and destroyed dynamically. Concurrent access to the list of clauses is also possible. Queries can be directed to groups of these lists. The system is meant for use in expert systems, compilers, database applications, rapid prototyping systems, advanced environments, and other software tools which use deduction. ------------------------------------------------------------------------------- CSL-TR-90-438 A METHODOLOGY FOR FORMAL SPECIFICATION AND IMPLEMENTATION OF ADA PACKAGES USING ANNA Neel Madhav and Walter Mann August, 1990 24 pages.....$4.92 This paper presents a methodology for formal specification and prototype implementation of Ada packages using the Anna specification language. Specifications play an important role in the software development cycle. The methodology allows specifiers of Ada packages to follow a sequence of simple steps to formally specify packages. Given the formal specification of a package resulting from the methodology for package specifications, the methodology allows implementors of packages to follow a few simple steps to implement the package. The implementation is meant to be a prototype. This methodology for specification and implementation is applicable tomost Ada packages. Limitations of this approach are pointed out at various points in the paper. We present software tools which help the process of specification and implementation. ------------------------------------------------------------------------------- CSL-TR-90-439 TANGO: A MULTIPROCESSOR SIMULATION AND TRACING SYSTEM Helen Davis and Stephen R. Goldschmidt July 1990 25 pages.....$5.00 Tango is a software simulation and tracing system used to obtain data for evaluating parallel programs and multiprocessor systems. The system provides a simulated multiprocessor environment by multiplexing application processes onto a single processor. Tango achieves high efficiency by running compiled user code, and by focusing on the information of greatest interest to multiprocessing studies. The system is being applied to a wide range of investigations, including algorithm studies and a variety of hardware evaluations. ------------------------------------------------------------------------------- CSL-TR-90-440 BIBLIOGRAPHY OF THE COMPUTER SYSTEMS LABORATORY - 1968-1990 (Third Edition) Naomi Schulman August 1990 71 pages.....$10.00 The bibliography lists all technical reports and notes published by the Computer Systems Laboratory of Stanford University, from 1968 to date. This edition is in a binder where future announcements of new reports can be kept to update the bibliography temporarily. When new reports are in sufficient number, a whole new page will be created in the bibliography format. New pages will be listed on future announcement Order Forms and will be sent at no charge along with any order placed. Prices of reports and notes are listed separately, starting on page ix. Please note that report prices. are printed on yellow paper, and note prices are printed on blue paper. These prices will supersede any previously given prices, and may themselves be superseded in the future if printing and/or postage costs rise. The symbol (*), which, in earlier editions signified that a report was out of print, is now used only to indicates that no "original" is available for copying. All other reports will be made available upon request. An earlier edition of the Bibliography (TR-272 or TR-336), plus copies of announcements 15,16, 17, 18 and 19, will complete the list of publications. However, new price lists will be needed to complete the updating of previous editions. The TR Price List, 1990-91 and TN Price List, 1990-91 will be sent upon request with any order placed. ------------------------------------------------------------------------------- CSL-TR-90-441 COMPUTING TYPES DURING PROGRAM SPECIALIZATION Daniel Weise and Erik Ruf August 1990 26 pages.....$5.08 We have developed techniques for obtaining and using type information during program specialization (partial evaluation). Computed along with every residual expression and every specialized program is type information that bounds the possible values that the specialized program will compute at run time. The three keystones of this research are symbolic values that represent both a value and the code for creating the value, generalization of symbolic values, and the use of online fixed-point iterations for computing the type of values returned by specialized recursive functions. The specializer exploits type information to increase the efficiency of specialized functions. This research has two benefits, one anticipated and one unanticipated. The anticipated benefit is that programs that are to be specialized can now be written in a more natural style without losing accuracy during specialization. The unanticipated benefit is the creation of what we term concrete abstract interpretation. This is a method of performing abstract interpretation with concrete values where possible. The specializer abstracts values as needed, instead of requiring that all values be abstracted prior to abstract interpretation. ------------------------------------------------------------------------------- CSR-TR-90-442 AN IMPROVED HIGH-SPEED FLOATING-POINT ADDITION ALGORITHM Nhon T. Quach and Michael J. Flynn August 1990 21 pages.....$4.68 This paper describes an improved, IEEE conforming floating-point addition algorithm. This algorithm has only one addition step involving the significand in the worst-case path, hence offering a considerable speed advantage over the existing algorithms, which typically require two to three addition steps. ------------------------------------------------------------------------------- CSL-TR-90-443 A QUEUING ANALYSIS FOR DISK ARRAY SYSTEMS Mikito Ogata and Michael J. Flynn August 1990 51 pages.....$7.08 Using a queuing model of disk arrays, we study the performance and tradeoffs in disk array sub-systems and develop guidelines for designing these sub-systems in various CPU environments. Finally, we compare our model with some earlier simulation results. ------------------------------------------------------------------------------- CSL-TR-90-444 EVALUATION OF A FOUR FOLD SPARSE DISTRIBUTED MEMORY PROTOTYPE Brian Flachs August 1990 58 pages.....$7.64 Sparse Distributed Memory is a generalized random access memory for long binary words. This document describes changes made to the Single Fold Prototype to develop Stanford's Four Fold Prototype. A users manual for the libraries and utilities developed for the prototype and a performance evaluation of the prototype are also included. Appendices detail hardware settings and provide a complete, up-to-date, set of Address Module Schematics. ------------------------------------------------------------------------------- CSL-TR-90-445 (also numbered STAN-CS-90-1334) SOFT CONFIGURABLE WAFER SCALE INTEGRATION: DESIGN, IMPLEMENTATION AND YIELD ANALYSIS Miriam Greta Blatt June 1990 123.....$12.84 Soft-Configurable Wafer Scale Integration uses software controlled switches to connect up the fault-free parts of a wafer. Compared to hard configuration, th e soft configurable approach has the advantages of providing low-cost connections and runtime fault tolerance. The dissertation describes how to achieve soft co nfiguration with high performance, presenting a pipelined memory system implemented using this approach. The yield of the prototype is evaluated in two phases. Fault simulation applies measured defect statistics to the layout to predict the yield of each circuit unit. These unit yields are combined to produce wafer yields using redundancy models appropriate to wafer scale integration. The redundancy models constrain wafer yield by system requirements such as the minimum n umber of working circuit units, and whether these working units are distributed evenly around the wafer. Choice of redundancy model significantly affects the r esulting wafer yield. ------------------------------------------------------------------------------- CSL-TR-90-446 ANALYZING THE FAULT TOLERANCE OF A CLASS OF DOUBLE-LOOP NETWORKS Jon M. Peha and Fouad A. Tobagi September 1990 30 pages.....$5.40 This paper analyzes the fault tolerance of a class of double-loop networks referred to as forward-loop backward-hop (FLBH), in which each node is connected via unidirectional links to the node one hop in front of it and to the node S hops in back of it for some S. A new measure of fault tolerance is described, along with techniques based on Markov chains to calculate upper and lower bounds on the fault tolerance of this network topology quickly and efficiently. The result s of these calculations provide a more precise description of network fault tolerance than has been achieved with previously published techniques. ------------------------------------------------------------------------------- CSL-TR-90-447 EVALUATION OF SCHEDULING ALGORITHMS FOR INTEGRATED-SERVICES PACKET-SWITCHED NETWORKS Jon Peha and Fouad A. Tobagi September 1990 31 pages.....$5.48 In this paper, we consider integrated-services packet-switched networks with two types of traffic: traffic with deadlines, for which the most important perform ance objective is based on loss rate, and packets without deadlines, for which the most important performance objective is based on mean delay. We present an o ptimal scheduling algorithm to minimize weighted loss rate and weighted mean delay in the queues that form at the switches and at the network access points of a packet-switched network, where weights reflect the relative importance of packets. Although not practical for implementation, the algorithm is intended as a standard for comparison with which the performance of other scheduling algorithms can be evaluated. Our algorithm is more general and of lower computational co mplexity than previously published algorithms, enabling us to evaluate performance in some important scenarios that could not previously have been considered. U sing optimal performance results from our algorithm, we evaluate the performance of the first-come-first-served, static priority, and earliest deadline first al gorithms, finding some limitations of each. Our results suggest that network efficiency could be improved by using a more sophisticated heuristic scheduling alg orithm rather than one of the aforementioned algorithms. ------------------------------------------------------------------------------- CSL-TR-90-448 A COST-BASED SCHEDULING ALGORITHM TO SUPPORT INTEGRATED SERVICES Jon Peha and Fouad A. Tobagi September 1990 44 pages.....$6.52 It is becoming increasingly important to support applications with diverse performance objectives on a single packet-switched network. The efficiency of a netw ork with such diverse traffic can be greatly improved through the use of sophisticated scheduling and dropping algorithms within the queues that form at the net work access points and in switches throughout the network. This paper presents heuristic algorithms for that purpose. In our approach, arbitrary performance o bjectives are defined in the form of cost functions, which map the queueing delay experienced by each packet to a cost incurred. Our algorithms, cost-based scheduling (CBS) and cost-based dropping (CBD), then attempt to optimize network performance as perceived by the applications by mini mizing the total cost incurred by all packets. Cost functions are presented that are appropriate for most common applications. Scheduling and dropping algorit hms are defined based on these cost functions. Through simulation, it is demonstrated that network performance is better when these heuristic algorithms are use d as opposed to the common alternatives. ------------------------------------------------------------------------------- CSL-TR-90-449 (also numbered STAN-CS-90-1330) PARALLEL ICCG ON A HIERARCHICAL MEMORY MULTIPROCESSOR -- ADDRESSING THE TRIANGULAR SOLVE BOTTLENECK Edward Rothberg and Anoop Gupta September 1990 22 pages.....$4.76 The incomplete Cholesky conjugate gradient (ICCG) algorithm is a commonly used iterative method for solving large sparse systems of equations. In this paper, w e study the parallel solution of sparse triangular systems of equations, the most difficult aspect of implementing the ICCG method on a multiprocessor. We focu s on shared-memory multiprocessor architectures with deep memory hierarchies. On such architectures we find that previously proposed parallelization approaches result in little or no speedup. The reason is that these approaches cause significant increases in the amount of memory system traffic as compared to a sequen tial approach. Increases of as much as a factor of 10 on four processors were observed. In this paper we propose new techniques for limiting these increases, including data remappings to increase spatial locality, new processor synchronization techniques to decrease the use of auxiliary data structures, and data part itioning techniques to reduce the amount of interprocessor communication. With these techniques, memory system traffic is reduced to as little as one sixth of its previous volume. The resulting speedups are greatly improved as well, although they are still much less than linear. We discuss the factors that limit fur ther speedups. We present both simulation results and results of experiments on an SGI 4D/340 multiprocessor. ------------------------------------------------------------------------------- CSL-TR-90-450 (also numbered STAN-CS-90-1333) COMPARING STRUCTURALLY DIFFERENT VIEWS OF A VLSI DESIGN Michael Joseph Spreitzer June 1990 162 pages.....$15.96 VLSI designers often use structural hierarchies and alternate views of a design. Allowing alternate views to have different hierarchies improves the clarity of the views, but complicates comparison of those views. Most existing comparison techniques either require essentially identical hierarchies, or can only handle differences by flattening. A new technique, Informed Comparison, first reconciles the hierarchy differenc es by making purely structural transformations according to a description of the intended relationship between the hierarchies, and then finishes with an existi ng hierarchical technique. Informed Comparison thus can compare views with different hierarchies without the penalties associated with flattening. ------------------------------------------------------------------------------- Naomi Schulman COMPUTER SYSTEMS LABORATORY Stanford University Stanford, CA 94305-4055 E-mail: schulman@sierra.stanford.edu 415-723-1430 Hours: M-Th, 8-1 (PST) ORDER FORM AND INVOICE ALL ORDERS MUST BE PREPAID Please print or type your name and address in the space provided. Check the number(s) of the report(s) you wish to purchase and circle the price of hardcopy or microfiche. California Residents must add the sales tax of their own local county. Return this order form with a check or money order made payable to Stanford University. Foreign Orders must include $1.00 extra for each $15.00's worth of reports to cover postage. Checks must be drawn on a U.S. bank, payable in U.S. dollars. If an invoice is required for payment, please note that this form is both an order form and an invoice. We cannot invoice separately. (Name) ________________________________ (Address)_______________________________ ________________________________ ________________________________ ________________________________ E-mail address:________________________________ REPORT # HARDCOPY FICHE REPORT # HARDCOPY FICHE TR-425 $ 5.48 $2.00 TR-438 $ 4.92 $2.00 TR-426 $ 5.24 $2.00 TR-439 $ 5.00 $2.00 TR-427 $16.28 $3.00 TR-440 $10.00 $2.00 TR-428 $ 5.08 $2.00 TR-441 $ 5.08 $2.00 TR-429 $ 4.20 $2.00 TR-442 $ 4.68 $2.00 TR-430 $ 4.28 $2.00 TR-443 $ 7.08 $2.00 TR-431 $ 5.32 $2.00 TR-444 $ 7.64 $2.00 TR-432 $ 5.32 $2.00 TR-445 $12.84 $3.00 TR-433 $ 4.44 $2.00 TR-446 $ 5.40 $2.00 TR-434 $ 4.52 $2.00 TR-447 $12.84 $3.00 TR-435 $13.64 $3.00 TR-448 $ 5.40 $2.00 TR-436 $ 4.20 $2.00 TR-449 $ 5.48 $2.00 TR-437 $ 4.20 $2.00 TR-450 $15.96 $3.00 Subtotal __________ CA tax or Foreign Surcharge __________ Total __________ Circle which ones you want: Announcement 15, 16, 17, 18 Check here ___ to receive 1990-91 Price Lists (with report order, only) ** END OF MESSAGE **