leff@smu.UUCP (Laurence Leff) (10/17/89)
Naomi Schulman COMPUTER SYSTEMS LABORATORY STANFORD UNIVERSITY Stanford, CA 94305-4055 (415) 273-1430 (M-Th, 8-1) RECENT REPORTS & NOTES LIST #16 SEPTEMBER 1989 To order reports, see end of this announcement. CSL-TR-89-376 INTEGRATING A STRUCTURING MECHANISM WITH A PROGRAM EDITOR Lia Adams APRIL 1989 104 pages.....$9.95 Module interconnections are an important aspect of a large program. The grid is a powerful, language-independent structuring mechanism that specifies what interconnections are valid and documents what interconnections actually exist. This thesis describes an editor for grids, named Ogre, that interactively checks the validity of new interconnections and finds inconsistencies in existing interconnections when grid specifications are changed. Additional information is stored with a grid to support this incremental processing. Ogre maintains the language-independence of the grid by communicating by messages with a language-oriented editor. This loosely-coupled integration of module and inter-module editing allows the programmer to access global structural information in the context of an individual module while supporting multiple source languages and module editors. We have implemented a prototype of Ogre using a simple editing environment for Modula-2 and a grid-manipulator that stores grids in text files. Our implementation demonstrates that the power of a grid can be brought to a programmer by interactive access and incremental update. The incremental- update algorithms described in this thesis make interactive grid access feasible. Loosely-coupled integration adds grid access to program editing without requiring a monolithic environment. ------------------------------------------------------------------------------- CSL-TR-89-377 ACCESS CONTROL STRATEGIES FOR TOKEN PASSING INTEGRATED SERVICES NETWORKS Joseph Wai-Ming Pang March 1989 157 pages.....$13.40 Traditionally, the major application of local area networks (LANs) has been for connecting computers, terminals and peripheral devices. Due to their flexibility and cost-effectiveness, LANs are vnow being considered as the communication means for a wide spectrum of new and exciting applications, for example, office and factory automation. However, these applications generate new traffic types that have diverse statistical characteristics and stringent delay constraints. Designed primarily for bursty computer traffic, conventional LAN multiple access protocols are not capable of supporting such new traffic types. Many proposals have been made, with various degrees of success, for supporting diverse applications on a single LAN. A timer based protocol, adopted by the IEEE 802.4 Token Bus Standard and the ANSI drafted Fiber Distributed Data Interface (FDDI) Standard, appears to be a very attractive solution. This protocol is based on a simple and fully distributed design, with the intention of supporting real-time traffic and providing multiple classes of service. 1 Nevertheless, due to a lack of understanding of the performance characteristics of this protocol, there do not exist any guidelines for selecting the parameters embedded in the protocol to meet given network constraints. In this research, we examine the timer based protocol under heavy traffic. A bound on the token cycles is established and a procedure for computing the bandwidth allocation among different stations is given. The analytic results derived allow us to convert the network design problem into a tractable optimization problem. Furthermore, this careful examination has led us to consider the protocol as a feedback control mechanism for limiting the token cycles. By adopting this novel view, we are able to generalize the idea behind the timer based protocol to a very large and flexible class of access control strategies. The salient features pertaining to the design of the new access control strategies are explored and a design example is presented. As an additional tool for performance evaluation and tuning, we have established a very good approximation for the station throughputs of the new access control strategies under general load. ------------------------------------------------------------------------------- CSL-TR-89-378 ANALYSIS OF PARALLELISM AND DEADLOCKS IN DISTRIBUTED-TIME LOGIC SIMULATION Larry Soule and Anoop Gupta March 1989 39 pages.....$5.75 This paper explores the suitability of the Chandy-Misra algorithm for digital logic simulation. We use four realistic circuits as benchmarks for our analysis, with one of them being the vector-unit controller for the Titan supercomputer from Ardent. Our results show that the average number of logic elements available for concurrent execution ranges from 10 to 111 for the four circuits, with an overall average of 68. Although this is twice as much parallelism as that obtained by traditional event-driven algorithms for these circuits, we feel it is still too low. One major factor limiting concurrency is the large number of global synchronization points --- ``deadlocks'' in the Chandy-Misra terminology --- that occur during execution. Towards the goal of reducing the number of deadlocks, the paper presents a classification of the types of deadlocks that occur during digital logic simulation. Four different types are identified and described intuitively in terms of circuit structure. Using domain specific knowledge, the paper proposes methods for reducing these deadlock occurrences. For one of the benchmark circuits, the use of the proposed techniques eliminated {\em all} deadlocks and increased the average parallelism from 40 to 160. We believe that the use of such domain knowledge will make the Chandy-Misra algorithm significantly more effective than it would be in its generic form. ------------------------------------------------------------------------------- CSL-TR-89-379 TWO DIMENSIONAL PINPOINTING: AN APPLICATION OF FORMAL SPECIFICATION TO DEBUGGING PACKAGES David Luckham 34 pages.....$5.40 April 1989 New methods of testing and debugging software utilizing high-level formal specifications are presented. These methods require a new generation of support tools. Such tools must be capable of automatically comparing the runtime behavior of hierarchically structured software with high-level specifications; they must provide information about inconsistencies in terms of abstractions used in specifications. This use of specifications has several advantages over present-day debugging methods: 2 1. the debugging problem itself is precisely defined by specifications; 2. violations of specifications are detected automatically, thus eliminating the need to search output traces and recognize errors manually; 3. complex tests, such as tests for side-effects on global data, can be made easily; 4. the new methods are independent of any compiler and runtime environment for a programming language; 5. they apply generally to hierarchically structured software --- e.g., packages containing nested units, 6. they also apply to other life-cycle processes such as analysis of prototypes, and the use of prototypes to build formal specifications. In this paper a particular process for locating errors in software packages, called it two dimensional pinpointing, is described. Tests consist of sequences of package operations (first dimension). Specifications at the highest (most abstract) level are checked first. If violations occur then new specifications are added if possible, otherwise checking of specifications at the next lower level (second dimension) is activated. Violation of a new specification provides more information about the error which reduces the region of program text under suspicion. All interaction between programmer and toolset is phrased in terms of the concepts used to specify the program. Two dimensional pinpointing is presented using the Anna specification language for Ada programs. Anna and a toolset for comparing behavior of Ada programs with Anna specifications is described. Pinpointing techniques are then illustrated by examples. The examples involve debugging of Ada packages, for which Anna provides a rich set of specification constructs. The Anna toolset supports use of the methodology on the full Ada/Anna languages, and is being engineered to commercial standards. ------------------------------------------------------------------------------- CSL-TR-89-380 UNIDRAW: A FRAMEWORK FOR BUILDING DOMAIN-SPECIFIC GRAPHICAL EDITORS John M. Vlissides and Mark A. Linton July 1989 16 pages.....$4.25 Unidraw is a framework for creating object-oriented graphical editors in domains such as technical and artistic drawing, music composition, and CAD. The Unidraw architecture simplifies the construction of these editors by providing programming abstractions that are common across domains. Unidraw defines four basic abstractions: components encapsulate the appearance and semantics of objects in a domain, tools support direct manipulation of components, commands define operations on components and other objects, and external representations define the mapping between components and the file format generated by the editor. Unidraw also supports multiple views, graphical connectivity and confinement, and dataflow between components. This paper describes the Unidraw design, implementation issues, and three prototype domain-specific editors we have developed with Unidraw: a drawing editor, a user interface builder, and a schematic capture system. Experience indicates a substantial reduction in implementation time compared with existing tools. ------------------------------------------------------------------------------- CSL-TR-89-381 3 THE DESIGN AND IMPLEMENTATION OF AN INCREMENTAL LINKER Russell W. Quong July 1989 85 pages.....$8.75 As computer systems grow in complexity and power, controlling the cost of software development and maintenance is increasingly important. One way to improve programmer productivity is to reduce program turnaround time and hence programmer wait time. An effective technique is the use of incremental algorithms to speed up compilation, linking and debugging. For incremental algorithms to be effective, typical changes to programs must be small and localized. To test this hypothesis we have obtained a quantitative profile of how programs are changed across compiles and links. We have found that most changes to modules between compiles are small and most changes to programs between links are localized to a few modules. To test the practicality of incremental algorithms we have designed and implemented an incremental linker, called \inclink. Our design trades space for time by saving the state of the linker in memory between links. Our linking algorithm processes only those modules that have changed between links. The resulting changes are rewritten into the current executable. Link time is proportional to the size of the change, not the size of the program. To minimize changes to the executable file, we place changed modules over their previous version if they fit. To maximize this probability, we add extra space (slop) to each module's allocation to absorb size increases. If a module overflows its allocation, then all parts of the program after the overflow must be rewritten, causing link time to be proportional to the size of the program. We have simulated various slop management schemes to find those with the best tradeoff between unused-slop-space and number-of-overflows. Inclink is a straight-forward implementation of our incremental linking algorithm. Its link time is under a couple of seconds, regardless of program size. Inclink is an order of magnitude faster than a batch linker and is fully compatible with existing compilers. Inclink helps to increase programmer productivity by reducing program turnaround time. ------------------------------------------------------------------------------- CSL-TR-89-382 MULTI-LEVEL SHARED CACHING TECHNIQUES FOR SCALABILITY IN VMP-MC David R. Cheriton, Hendrik A. Goosen and Patrick D. Boyle June 1989 19 pages.....$4.45 The problem of building a scalable shared memory multiprocessor can be reduced to that of building a scalable memory hierarchy, assuming interprocessor communication is handled by the memory system. In this paper, we describe the VMP-MC design, a distributed parallel multi-computer based on the VMP multiprocessor design, that is intended to provide a set of building blocks for configuring machines from one to several thousand processors. VMP-MC uses a memory hierarchy based on shared caches, ranging from on-chip caches to board-level caches connected by busses to, at the bottom, a high-speed fiber optic ring. In addition to describing the building block components of this architecture, we identify the key performance issues associated with the design and provide performance evaluation of these issues using tracedrive simulation and measurements from the VMP. ------------------------------------------------------------------------------- CSL-TR-89-383 SUPER-SCALAR PROCESSOR DESIGN William M. Johnson June 1989 147 pages.....$12.75 A super-scalar processor is one that is capable of sustaining an 4 instruction-execution rate of more than one instruction per clock cycle. Maintaining this execution rate is primarily a problem of scheduling processor resources (such as functional units) for high utilization. A number of scheduling algorithms have been published, with wide-ranging claims of performance over the single-instruction issue of a scalar processor. However, a number of these claims are based on idealizations or on special-purpose applications. This study uses trace-driven simulation to evaluate many different super-scalar hardware organizations. Super-scalar performance is limited primarily by instruction-fetch inefficiencies caused by both branch delays and instruction misalignment. Because of this instruction-fetch limitation, it is not worthwhile to explore highly-concurrent execution hardware. Rather, it is more appropriate to explore economical execution hardware that more closely matches the instruction throughput provided by the instruction fetcher. This study examines techniques for reducing the instruction-fetch inefficiencies and explores the resulting hardware organizations. This study concludes that a super-scalar processor can have nearly twice the performance of a scalar processor, but that this requires that four major hardware features: out-of-order execution, register renaming, branch prediction, and a four-instruction decoder. These features are interdependent, and removing any single feature reduces average performance by 18% or more. However, there are many hardware simplifications that cause only a small performance reduction. ------------------------------------------------------------------------------- CSL-TR-89-384 COPY ELIMINATION IN SINGLE ASSIGNMENT LANGUAGES K.Gopinath July 1989 133 pages.....$11.85 Copy elimination is an important problem that has to be solved for an efficient implementation of functional languages. Copies arise because these languages lack the concepts of state and variable; hence updating of an object involves a copy. Copies are also possible if proper targeting has not been carried out due to its complexity. To eliminate copies, we introduce the notion of address expressions and use abstract interpretation with its associated denotational model to compute them. Our work is in the context of a single assignment language called SAL and we also consider optimizations such as eliminating copies in dependent iterations that are important in single assignment languages. An operational model employing reduction rules is developed to handle the constructs present in the language. Using this model, it is possible to eliminate some difficult copies in divide and conquer algorithms such as quicksort and bitonic sort. This enables the implementation of these algorithms to be as efficient as imperative languages. An outline of the implementation is given along with the results of optimizing some difficult programs with large number of updates. The overall optimization system validates the approach taken: namely high-level optimization to find a suitable mapping of names in the program into store followed by generation of intermediate code and finally the generation of machine code by use of native code generators. ------------------------------------------------------------------------------- CSL-TR-89-385 A METHODOLOGY FOR MODELING INTERPROCESSOR TRAFFIC IN SHARED MEMORY MULTIPROCESSORS Josep Torrellas, Thierry Weil and John Hennessy July 1989 29 pages.....$5.10 5 A methodology for modeling interprocessor traffic in a shared memory multiprocessor environment is presented. The multiprocessor is modeled as an open queueing network. The inputs are (1) architectural machine parameters like interconnection network and memory hierarchy characteristics, cache coherence protocol, etc, and (2) benchmark parameters like locality of processor referencing behavior, amount of sharing, etc. The outputs of the model are average values for processor efficiency metrics and congestion measures in each machine resource. The modeling methodology is discussed through an example, based on the the DASH machine, a hierarchical shared memory multiprocessor being developed at Stanford. The benchmark characterization has been determined through parallel application traces. The discussion of the results gives insight on where the performance bottlenecks for that particular architecture-benchmark set are. 6 Naomi Schulman COMPUTER SYSTEMS LABORATORY Stanford University Stanford, CA 94305-4055 EM: Schulman@Sierra.Stanford.Edu 415-723-1430 Hours: M-Th, 8-1 ORDER FORM AND INVOICE ALL ORDERS MUST BE PREPAID To Order Reports: Print or type your name and address in the space provided. Check or circle the report number(s) you wish to purchase, and indicate hardcopy or microfiche. CALIFORNIA RESIDENTS must add sales tax of their local county. Return this order form with your check or money order made payable to Stanford University. FOREIGN ORDERS must be paid with an international money order or a check drawn on a U.S. bank, payable in dollars. Foreign order payment should include $1.00 extra for each $15.00's worth or reports requested If your employer requires an invoice for payment, please note that this form is both an order form and an invoice. We cannot invoice separately. (Name) _________________________________ (Address) _________________________________ _________________________________ _________________________________ _________________________________ (Email Address)_________________________________ Report # Hardcopy Microfiche Report # Hardcopy Microfiche TR 376 $ 9.95 $3.00 TR 381 $ 8.75 $2.00 TR 377 $13.40 $3.00 TR 382 $ 4.45 $2.00 TR 378 $ 5.75 $2.00 TR 383 $12.75 $3.00 TR 379 $ 5.40 $2.00 TR 384 $11.85 $3.00 TR 380 $ 4.25 $2.00 TR 385 $ 5.10 $2.00 Subtotal ________ Your Local CA County tax ________ Total ________