MFLLL002@VE.BOGECN.EDU (03/22/91)
From: Naomi Schulman
schulman@shasta.Stanford.EDU
Return-Path: <schulman@shasta.Stanford.EDU>
Date: Wed, 13 Mar 1991 13:23:43 PST
Naomi Schulman
COMPUTER SYSTEMS LABORATORY
Stanford University
Stanford, CA 94305-4055
e-mail: schulman@sierra.stanford.edu
(415) 723-1430
Hours: M-Th 8-1 (PST)
RECENT PUBLICATIONS
LIST #19
SEPTEMBER 1990
To order reports, see Order Form on last page.
CSL-TR-90-425
CONCURRENT RUNTIME MONITORING OF FORMALLY SPECIFIED PROGRAMS
Manas Mandal and Sriram Sankar
April 1990 31 pages.....$5.48
This paper describes an application of formal specifications after an
executable program has been constructed. We describe how high level
specifications can be utilized to monitor critical aspects of the behavior of a
program continuously while it is executing. This methodology provides a
capability to distribute the monitoring of specifications on multi-processor
hardware platforms to meet practical time constraints.
Typically, runtime checking of formal specifications involves a significant
time penalty which makes it impractical during normal production opertion of a
program. In previous research, runtime checking has been applied during
testing and debugging of software, but not on a permanent basis.
Crucial to our current mthodology is the use of multi-processor machines -
hence runtime monitoring can be performed concurrently on different processors.
We describe techniques for distributing checks onto different processors.
To control the degree of concurrency, we introduce checkpoints - a point in the
program beyond which execution cannot proceed until the specified checks have
been completed. Error reporting and recovery in a multiprocessor environment
is complicated and there are various techniques of handling this. We describe
a few of these techniques in this paper.
An implementation of this methodology for the Anna specification language for
Ada programs is described. Results of experiments conducted on this
implementation using a 12-processor Sequent Symmetry demonstrate that permanent
concurrent monitoring of programs based on formal specifications is indeed
feasible.
-------------------------------------------------------------------------------
CSL-TR-90-426
A VLSI ARCHITECTURE FOR THE FCHC ISOMETRIC LATTICE GAS MODEL
Fung F. Lee, Michael J. Flynn, and Martin Morf
April 1990 28 pages.....$5.24
Lattice gas models are cellular automata used for the simulation of fluid
dynamics. This paper addresses the design issues of a lattice gas collision
rule processor for the four-dimensional FCHC isometric lattice gas model. A
novel VLSI architecture based on an optimized version of Henon's isometric
algorithm is proposed. One of the key concepts behind this architecture is the
permutation group representation of the isometry group of the lattice.
In contrast to the straightforward table lookup approach which would take 4.5
billion bits to implement this set of collision rules, the size of our
processor is only about 5000 gates. With a reasonable number of pipeline
stages, the processor can deliver one result per cycle with a cycle time
comparable to or less than that of a common commercial DRAM.
-------------------------------------------------------------------------------
CSL-TR-90-427
GENERALIZED GRAPHICAL OBJECT EDITING
John M. Vlissides
June 1990 166 pages.....$16.28
Many editors use the graphics capabilities of personal workstations to provide
a visual editing environment. Such editors present graphical representations
of familiar objects and allow the user to manipulate the representations
directly. This style of interaction is usually more intuitive to the user than
typing statements in a command language. However, implementing a graphical
object editor has been a difficult undertaking. Though many packages exist
that aid in the construction of graphical user interfaces, none are designed
specifically for building graphical object editors. This is significant
because there is a substantial semantic gap between general user interfaces and
the functionality of graphical object editors. For example, user interface
packages usually provide buttons, scroll bars, and ways to assemble them, but
they do not offer primitives for building drawing editors that produce
PostScript or schematic capture systems that produce netlists. Higher-level
abstractions are needed to make such applications easier to develop.
Unidraw is a framework for creating object-oriented graphical editors in
domains such as technical and artistic drawing, music composition, and circuit
design. The Unidraw architecture simplifies the construction of these editors
by providing programming abstractions that are common across domains. Unidraw
defines four basic abstractions: "components" encapsulate the appearance and
semantics of objects in a domain, "tools" support direct manipulation of
components, "commands" define operations on components and other objects, and
"external representations" define the mapping betweencomponents and the file
format generated by the editor. Unidraw also supports multiple views,
graphical connectivity and confinement, and dataflow between components. This
thesis describes the Unidraw design, implementation issues, and three
experimental domain-specific editors we have developed with Unidraw: a drawing
editor, a user interface builder, and a schematic capture system. Our results
indicate a substantial reduction in implementation time and effort compared
with existing tools.
-------------------------------------------------------------------------------
CSL-TR-90-428
SUB-NANOSECOND ARITHMETIC
Michael J. Flynn, Giovanni De Micheli, Robert Dutton, Bruce Wooley,
and Fabian Pease
May 1990 26 pages.....$5.08
The SNAP (Stanford Nanosecond Arithmetic Processor) project is targeted at
realizing an arithmetic processor with performance approximately an order of
magnitude faster than currently available technology. The realization of SNAP
is predicated on an interdisciplinary approach and effort spanning research in
algorithms, data representation, CAD, circuits and devices, and packaging.
SNAP is visualized as an arithmetic coprocessor implemented on an active
substrate containing several chips, each of which realize a particular
arithmetic function.
-------------------------------------------------------------------------------
CSL-TR-90-429
MULTIPROCESSOR SMALLTALK: IMPLEMENTATION, PERFORMANCE, AND ANALYSIS
Joseph Ira Pallas
June 1990 140 pages...$14.20
Multiprocessor Smalltalk demonstrates the value of object-oriented programming
on amultiprocessor. Its implementation and analysis shed light on three areas:
concurrent programming in an object-oriented language without special
extensions, implementation techniques for adapting to multiprocessors, and
performance factors in the resulting system.
Multiprocessor Smalltalk's performance shows that the combination of
multiprocessing and object-oriented programming can be effective: speedups
(relative to the original serial version) exceed 2.0 for five processors on all
the benchmarks; the median efficiency is 48%. Analysis shows both where
performance is lost and how to improve and generalize the experimental results.
Changes in the interpreter to support concurrency add at most 12% overhead;
better access to per-process variables could eliminate much of that. Changes
in the user code to express concurrency add as much as 70% overhead; this
overhead could be reduced to 54% if blocks (lambda expressions) were reentrant.
Performance is also lost when the program cannot keep all five processors busy.
Idle time in the interpreter (up to 51% overhead, excluding a pathological
case) could be reduced with a parallel garbage collector to 10%. Idle time in
user code (up to 35% overhead) remains the programmer's responsibility.
We can identify the key characteristics that make Multiprocessor Smalltalk
successful. The Smalltalk language allows us to build concurrent control
structures using lambda expressions without extending the language. Inexpensive
processes and efficient garbage collection are also crucial.
Hardware/operating-system support for shared memory, per-process variables, and
inexpensive synchronization are essential to the implementation. Given these,
object-oriented languages and multiprocessors are a good match.
-------------------------------------------------------------------------------
CSL-TR-90-430 (also numbered STAN-CS-90-1318)
TECHNIQUES FOR IMPROVING THE PERFORMANCE OF SPARSE MATRIX
FACTORIZATION ON MULTIPROCESSOR WORKSTATIONS
Edward Rothberg and Anoop Gupta
June 1990 16 pages.....$4.28
In this paper we look at the problem of factoring large sparse systems of
equations on high-performance multiprocessor workstations. While these
multiprocessor workstations are capable of very high peak floating point
computation rates, most existing sparse factorization codes achieve only a
small fraction of this potential. A major limiting factor is the cost of
memory accesses performed during the factorization. In this paper, we describe
a parallel factorization code which utilizes the supernodal structure of the
matrix to reduce the number of memory references. We also propose enhancements
that significantly reduce the overall cache miss rate. The result is greatly
increased factorization performance. We present experimental results from
executions of our codes on the Silicon Graphics 4D/380 multiprocessor. Using
eight processors, we find that the supernodal parallel code achieves a
computation rate of approximately 40 MFLOPS when factoring a range of benchmark
matrices. This is more than twice as fast as the parallel nodal code developed
at the Oak Ridge National Laboratory running on the SGI 4D/380.
-------------------------------------------------------------------------------
CSL-TR-90-431
LATENCY AND THROUGHPUT TRADEOFFS IN SELF-TIMED SPEED-INDEPENDENT
PIPELINES AND RINGS
Ted Williams
June 1990 29 pages....$5.32
Asynchronous pipelines control the flow of tokens through a sequence of logical
stages based on the status of local completion detectors. As in a
synchronously clocked circuit, the design of self-timed pipelines can trade off
between achieving low latencyn and high throughput. However, there are more
degress of freedom because of the variances in specific latch and function
block styles, and the possibility of varying both the number of latches between
function blocks and their connections to the completion detectors. This report
demonstrates the utility of a graph-based methodology for analyzing the timing
dependencies and uses it to make comparisons of different configurations. It is
shown that the extremes for high throughput and low latency differ
significantly, the placement of the completion detectors influences timing as
much as adding an additional latch, and the choice as to whether precharged or
static logic is best is dependent on the cost in complexity of the completion
detectors.
-------------------------------------------------------------------------------
CSL-TR-90-432
THE OLYMPUS SYNTHESIS SYSTEM FOR DIGITAL DESIGN
Giovanni De Micheli, David Ku, Fredric Mailhot, Thomas Truong
July 1990 29 pages.....$5.32
The Olympus Synthesis System, developed at Stanford University, is a vertically
integrated set of tools for the synthesis of digital circuit designs. The
system is specifically designed to support synthesis of Application-Specific
Integrated Circuits from behavioral-level descriptions, written in a hardware
description language called HardwareC. The Olympus system supports synthesis
with timing constraints at the behavioral, structural and logic levels. Two
internal models, Sequencing Intermediate Form, (SIF) and Structural/Logic
Intermediate Form (SLIF), are used to represent the hardware at different
levels of abstraction and to provide a way to pass design information between
the different tools. This paper describes Olympus as a system consisting of a
set of programs for behavioral, structural and logic synthesis, technology
mapping and simulation. The system has been used to design three ASIC chips at
Stanford University and it has been tested against benchmark circuits for
high-level and logic synthesis.
-------------------------------------------------------------------------------
CSL-TR-90-433
LIMITS ON MULTIPLE INSTRUCTION ISSUE
Michael D. Smith, Mike Johnson, and Mark A. Horowitz
July 1990 18 pages.....$4.44
This paper investigates the limitations on designing a processor which
cansustain an execution rate of greater than one instruction per cycle on
highly-optimized, non-scientific applications. We have used trace-driven
simulations to determine that these applications contain enough instruction
independence to sustain an instruction rate of about two instructions per
cycle. In a straightforward implementation, cost considerations argue strongly
against decoding more than two instructions in one cycle. Given this
constraint, the efficiency in instruction fetching rather than the complexity
of the execution hardware limits the concurrency attainable at the instruction
level.
-------------------------------------------------------------------------------
CSL-TR-90-434
BOOSTING BEYOND STATIC SCHEDULING IN A SUPERSCALAR PROCESSOR
Michael D. Smith, Monica S. Lam, and Mark A. Horowitz
July 1990 19 pages.....$4.52
This paper describes a superscalar processor that combines the best qualities
of static and dynamic instruction scheduling to increase the performance of
non-numerical applications. The architecture performs all instruction
scheduling statically to take advantage of the compiler's ability to
efficiently schedule operations across many basic blocks. Since the conditional
branches in non-numerical code are highly data dependent, the architecture
introduces the concept of boosted instructions, instructions that are committed
conditionally upon the result of later branch instructions. Boosting
effectively removes the dependences caused by branches and makes the scheduling
of side-effect instructions as simple as those that are side-effect free. For
efficiency, boosting is supported in the hardware by shadow structures that
temporarily hold the side effects of boosted instructions until the conditional
branches that the boosted instructions depend upon are executed. When the
branch condition is determined, the buffered side effects are either committed
or squashed. The limited static scheduler in our evaluation system shows that a
1.6-times speedup over scalar code is achievable by boosting instructions above
only a single conditional branch. This performance is similar to the
performance of a pure dynamic scheduler.
-------------------------------------------------------------------------------
CSL-TR-90-435
COLLABORATION TRANSPARENCY IN DESKTOP TELECONFERENCING ENVIRONMENTS
J. Chris Lauwers
July 1990 133 pages.....$13.64
The ultimate goal of desktop teleconferencing environments is to integrate
contemporary (non-computer-supported) audio/video teleconferencing technologies
with workstation-based network computing environments. This thesis studies the
application sharing facilities that allow multiple conference participants to
interact simultaneously with computer-based applications in a desktop
teleconference. It focuses on collaboration-transparent facilities that permit
existing single-user applications to be invoked from within a computer
conference. In the context of contemporary workstation-based environments, the
pursuit of collaboration transparency leads to the development of shared window
systems. This thesis introduces the basic architectures for shared window
systems and analyzes them based on experience with several prototypes.
Contemporary window system standards present many obstacles to shared window
designers. This thesis discusses these obstacles and suggests how they can be
overcome. In addition, new architectures for window systems are proposed that
can accommodate window sharing more effectively. The thesis then introduces
application replication as a technique for improving overall shared window
system performance. Application replication introduces many synchronization
problems, arising mainly from the need to keep replicated copies of shared
applications consistent. This thesis shows that the most frequent
synchronization problems can be solved without changing existing software. The
remaining problems can be solved by making applications or system servers
collaboration-aware. Finally, this thesis studies the ramifications, for the
software designer, of supporting spontaneous interactions, shared workspace
management, floor control, user customization, and annotations and
telepointing. While the recommendations that result are motivated by the
desire to enable continued use of collaboration-transparent applications,
addressing them involves the development of systems software that is distinctly
collaboration-aware.
-------------------------------------------------------------------------------
CSL-TR-90-436
APPLICATION OF FORMAL SPECIFICATION TO SOFTWARE MAINTENANCE
Neel Madhav and Sriram Sankar
August 1990 15 pages.....$4.20
This paper describes the use of formal specifications and associated tools in
addressing various aspects of software maintenance ---corrective, perfective,
and adaptive. It also addresses the refinement of the software development
process to build programs that are easily maintainable. The task of software
maintenance in our case includes the task of maintaining the specification as
well as maintaining the program.
We focus on the use of Anna, a specification language for formally specifying
Ada programs, to aid us in maintaining Ada programs. These techniques are
applicable to most other specification language and programming language
environments. The tools of interest are: (1) the Anna Specification Analyzer
which allows us to analyze the specification for correctness with respect to
our informal understanding of program behavior; and (2) the Anna Consistency
Checking System which monitors the Ada program at runtime based on the Anna
specification.
-------------------------------------------------------------------------------
CSL-TR-90-437
AN ADA---PROLOG SYSTEM
Neel Madhav
August 1990 15 pages.....$4.20
This paper presents a software development tool --- the Ada-Prolog system which
combines the strengths of both descriptive and procedural programming styles.
Concrete reasons and examples are provided to demonstrate that such a tool
would be useful.
This tool provides various operations available in Prolog for clausebuilding,
database building and querying to Ada programs.In addition to allowing dynamic
access to both Ada and Prolog, the Ada-Prolog system adds to the functionality
provided by Prolog by partitioning the Prolog database into lists of clauses.
These lists can be created, updated and destroyed dynamically. Concurrent
access to the list of clauses is also possible. Queries can be directed to
groups of these lists.
The system is meant for use in expert systems, compilers, database
applications, rapid prototyping systems, advanced environments, and other
software tools which use deduction.
-------------------------------------------------------------------------------
CSL-TR-90-438
A METHODOLOGY FOR FORMAL SPECIFICATION AND IMPLEMENTATION OF ADA
PACKAGES USING ANNA
Neel Madhav and Walter Mann
August, 1990 24 pages.....$4.92
This paper presents a methodology for formal specification and prototype
implementation of Ada packages using the Anna specification language.
Specifications play an important role in the software development cycle. The
methodology allows specifiers of Ada packages to follow a sequence of simple
steps to formally specify packages.
Given the formal specification of a package resulting from the methodology for
package specifications, the methodology allows implementors of packages to
follow a few simple steps to implement the package. The implementation is meant
to be a prototype.
This methodology for specification and implementation is applicable tomost Ada
packages. Limitations of this approach are pointed out at various points in the
paper.
We present software tools which help the process of specification and
implementation.
-------------------------------------------------------------------------------
CSL-TR-90-439
TANGO: A MULTIPROCESSOR SIMULATION AND TRACING SYSTEM
Helen Davis and Stephen R. Goldschmidt
July 1990 25 pages.....$5.00
Tango is a software simulation and tracing system used to obtain data for
evaluating parallel programs and multiprocessor systems. The system provides a
simulated multiprocessor environment by multiplexing application processes onto
a single processor. Tango achieves high efficiency by running compiled user
code, and by focusing on the information of greatest interest to
multiprocessing studies. The system is being applied to a wide range of
investigations, including algorithm studies and a variety of hardware
evaluations.
-------------------------------------------------------------------------------
CSL-TR-90-440
BIBLIOGRAPHY OF THE COMPUTER SYSTEMS LABORATORY - 1968-1990 (Third
Edition)
Naomi Schulman
August 1990 71 pages.....$10.00
The bibliography lists all technical reports and notes published by the
Computer Systems Laboratory of Stanford University, from 1968 to date. This
edition is in a binder where future announcements of new reports can be kept to
update the bibliography temporarily. When new reports are in sufficient
number, a whole new page will be created in the bibliography format. New pages
will be listed on future announcement Order Forms and will be sent at no charge
along with any order placed.
Prices of reports and notes are listed separately, starting on page ix. Please
note that report prices. are printed on yellow paper, and note prices are
printed on blue paper. These prices will supersede any previously given
prices, and may themselves be superseded in the future if printing and/or
postage costs rise.
The symbol (*), which, in earlier editions signified that a report was out of
print, is now used only to indicates that no "original" is available for
copying. All other reports will be made available upon request.
An earlier edition of the Bibliography (TR-272 or TR-336), plus copies of
announcements 15,16, 17, 18 and 19, will complete the list of publications.
However, new price lists will be needed to complete the updating of previous
editions. The TR Price List, 1990-91 and TN Price List, 1990-91 will be sent
upon request with any order placed.
-------------------------------------------------------------------------------
CSL-TR-90-441
COMPUTING TYPES DURING PROGRAM SPECIALIZATION
Daniel Weise and Erik Ruf
August 1990 26 pages.....$5.08
We have developed techniques for obtaining and using type information during
program specialization (partial evaluation). Computed along with every
residual expression and every specialized program is type information that
bounds the possible values that the specialized program will compute at run
time. The three keystones of this research are symbolic values that represent
both a value and the code for creating the value, generalization of symbolic
values, and the use of online fixed-point iterations for computing the type of
values returned by specialized recursive functions. The specializer exploits
type information to increase the efficiency of specialized functions. This
research has two benefits, one anticipated and one unanticipated. The
anticipated benefit is that programs that are to be specialized can now be
written in a more natural style without losing accuracy during specialization.
The unanticipated benefit is the creation of what we term concrete abstract
interpretation. This is a method of performing abstract interpretation with
concrete values where possible. The specializer abstracts values as needed,
instead of requiring that all values be abstracted prior to abstract
interpretation.
-------------------------------------------------------------------------------
CSR-TR-90-442
AN IMPROVED HIGH-SPEED FLOATING-POINT ADDITION ALGORITHM
Nhon T. Quach and Michael J. Flynn
August 1990 21 pages.....$4.68
This paper describes an improved, IEEE conforming floating-point addition
algorithm. This algorithm has only one addition step involving the significand
in the worst-case path, hence offering a considerable speed advantage over the
existing algorithms, which typically require two to three addition steps.
-------------------------------------------------------------------------------
CSL-TR-90-443
A QUEUING ANALYSIS FOR DISK ARRAY SYSTEMS
Mikito Ogata and Michael J. Flynn
August 1990 51 pages.....$7.08
Using a queuing model of disk arrays, we study the performance and tradeoffs in
disk array sub-systems and develop guidelines for designing these sub-systems
in various CPU environments. Finally, we compare our model with some earlier
simulation results.
-------------------------------------------------------------------------------
CSL-TR-90-444
EVALUATION OF A FOUR FOLD SPARSE DISTRIBUTED MEMORY PROTOTYPE
Brian Flachs
August 1990 58 pages.....$7.64
Sparse Distributed Memory is a generalized random access memory for long binary
words. This document describes changes made to the Single Fold Prototype to
develop Stanford's Four Fold Prototype. A users manual for the libraries and
utilities developed for the prototype and a performance evaluation of the
prototype are also included. Appendices detail hardware settings and provide a
complete, up-to-date, set of Address Module Schematics.
-------------------------------------------------------------------------------
CSL-TR-90-445 (also numbered STAN-CS-90-1334)
SOFT CONFIGURABLE WAFER SCALE INTEGRATION: DESIGN, IMPLEMENTATION AND
YIELD ANALYSIS
Miriam Greta Blatt
June 1990 123.....$12.84
Soft-Configurable Wafer Scale Integration uses software controlled switches to
connect up the fault-free parts of a wafer. Compared to hard configuration, th
e soft configurable approach has the advantages of providing low-cost
connections and runtime fault tolerance. The dissertation describes how to
achieve soft co nfiguration with high performance, presenting a pipelined
memory system implemented using this approach. The yield of the prototype is
evaluated in two phases. Fault simulation applies measured defect statistics
to the layout to predict the yield of each circuit unit. These unit yields are
combined to produce wafer yields using redundancy models appropriate to wafer
scale integration. The redundancy models constrain wafer yield by system
requirements such as the minimum n umber of working circuit units, and whether
these working units are distributed evenly around the wafer. Choice of
redundancy model significantly affects the r esulting wafer yield.
-------------------------------------------------------------------------------
CSL-TR-90-446
ANALYZING THE FAULT TOLERANCE OF A CLASS OF DOUBLE-LOOP NETWORKS
Jon M. Peha and Fouad A. Tobagi
September 1990 30 pages.....$5.40
This paper analyzes the fault tolerance of a class of double-loop networks
referred to as forward-loop backward-hop (FLBH), in which each node is
connected via unidirectional links to the node one hop in front of it and to
the node S hops in back of it for some S. A new measure of fault tolerance is
described, along with techniques based on Markov chains to calculate upper and
lower bounds on the fault tolerance of this network topology quickly and
efficiently. The result s of these calculations provide a more precise
description of network fault tolerance than has been achieved with previously
published techniques.
-------------------------------------------------------------------------------
CSL-TR-90-447
EVALUATION OF SCHEDULING ALGORITHMS FOR INTEGRATED-SERVICES
PACKET-SWITCHED NETWORKS
Jon Peha and Fouad A. Tobagi
September 1990 31 pages.....$5.48
In this paper, we consider integrated-services packet-switched networks with
two types of traffic: traffic with deadlines, for which the most important
perform ance objective is based on loss rate, and packets without deadlines,
for which the most important performance objective is based on mean delay. We
present an o ptimal scheduling algorithm to minimize weighted loss rate and
weighted mean delay in the queues that form at the switches and at the network
access points of a packet-switched network, where weights reflect the relative
importance of packets. Although not practical for implementation, the
algorithm is intended as a standard for comparison with which the performance
of other scheduling algorithms can be evaluated. Our algorithm is more general
and of lower computational co mplexity than previously published algorithms,
enabling us to evaluate performance in some important scenarios that could not
previously have been considered. U sing optimal performance results from our
algorithm, we evaluate the performance of the first-come-first-served, static
priority, and earliest deadline first al gorithms, finding some limitations of
each. Our results suggest that network efficiency could be improved by using a
more sophisticated heuristic scheduling alg orithm rather than one of the
aforementioned algorithms.
-------------------------------------------------------------------------------
CSL-TR-90-448
A COST-BASED SCHEDULING ALGORITHM TO SUPPORT INTEGRATED SERVICES
Jon Peha and Fouad A. Tobagi
September 1990 44 pages.....$6.52
It is becoming increasingly important to support applications with diverse
performance objectives on a single packet-switched network. The efficiency of
a netw ork with such diverse traffic can be greatly improved through the use of
sophisticated scheduling and dropping algorithms within the queues that form at
the net work access points and in switches throughout the network. This paper
presents heuristic algorithms for that purpose. In our approach, arbitrary
performance o bjectives are defined in the form of cost functions, which map
the queueing delay experienced by each packet to a cost incurred.
Our algorithms, cost-based scheduling (CBS) and cost-based dropping (CBD), then
attempt to optimize network performance as perceived by the applications by
mini mizing the total cost incurred by all packets. Cost functions are
presented that are appropriate for most common applications. Scheduling and
dropping algorit hms are defined based on these cost functions. Through
simulation, it is demonstrated that network performance is better when these
heuristic algorithms are use d as opposed to the common alternatives.
-------------------------------------------------------------------------------
CSL-TR-90-449 (also numbered STAN-CS-90-1330)
PARALLEL ICCG ON A HIERARCHICAL MEMORY MULTIPROCESSOR -- ADDRESSING
THE TRIANGULAR SOLVE BOTTLENECK
Edward Rothberg and Anoop Gupta
September 1990 22 pages.....$4.76
The incomplete Cholesky conjugate gradient (ICCG) algorithm is a commonly used
iterative method for solving large sparse systems of equations. In this paper,
w e study the parallel solution of sparse triangular systems of equations, the
most difficult aspect of implementing the ICCG method on a multiprocessor. We
focu s on shared-memory multiprocessor architectures with deep memory
hierarchies. On such architectures we find that previously proposed
parallelization approaches result in little or no speedup. The reason is that
these approaches cause significant increases in the amount of memory system
traffic as compared to a sequen tial approach. Increases of as much as a
factor of 10 on four processors were observed. In this paper we propose new
techniques for limiting these increases, including data remappings to increase
spatial locality, new processor synchronization techniques to decrease the use
of auxiliary data structures, and data part itioning techniques to reduce the
amount of interprocessor communication. With these techniques, memory system
traffic is reduced to as little as one sixth of its previous volume. The
resulting speedups are greatly improved as well, although they are still much
less than linear. We discuss the factors that limit fur ther speedups. We
present both simulation results and results of experiments on an SGI 4D/340
multiprocessor.
-------------------------------------------------------------------------------
CSL-TR-90-450 (also numbered STAN-CS-90-1333)
COMPARING STRUCTURALLY DIFFERENT VIEWS OF A VLSI DESIGN
Michael Joseph Spreitzer
June 1990 162 pages.....$15.96
VLSI designers often use structural hierarchies and alternate views of a
design. Allowing alternate views to have different hierarchies improves the
clarity of the views, but complicates comparison of those views. Most existing
comparison techniques either require essentially identical hierarchies, or can
only handle differences by flattening. A new technique, Informed Comparison,
first reconciles the hierarchy differenc es by making purely structural
transformations according to a description of the intended relationship between
the hierarchies, and then finishes with an existi ng hierarchical technique.
Informed Comparison thus can compare views with different hierarchies without
the penalties associated with flattening.
-------------------------------------------------------------------------------
Naomi Schulman
COMPUTER SYSTEMS LABORATORY
Stanford University
Stanford, CA 94305-4055
E-mail: schulman@sierra.stanford.edu
415-723-1430
Hours: M-Th, 8-1 (PST)
ORDER FORM AND INVOICE
ALL ORDERS MUST BE PREPAID
Please print or type your name and address in the space provided. Check
the number(s) of the report(s) you wish to purchase and circle the price of
hardcopy or microfiche. California Residents must add the sales tax of their
own local county. Return this order form with a check or money order made
payable to Stanford University.
Foreign Orders must include $1.00 extra for each $15.00's worth of reports to
cover postage. Checks must be drawn on a U.S. bank, payable in U.S.
dollars. If an invoice is required for payment, please note that this form
is both an order form and an invoice. We cannot invoice separately.
(Name) ________________________________
(Address)_______________________________
________________________________
________________________________
________________________________
E-mail address:________________________________
REPORT # HARDCOPY FICHE REPORT # HARDCOPY FICHE
TR-425 $ 5.48 $2.00 TR-438 $ 4.92 $2.00
TR-426 $ 5.24 $2.00 TR-439 $ 5.00 $2.00
TR-427 $16.28 $3.00 TR-440 $10.00 $2.00
TR-428 $ 5.08 $2.00 TR-441 $ 5.08 $2.00
TR-429 $ 4.20 $2.00 TR-442 $ 4.68 $2.00
TR-430 $ 4.28 $2.00 TR-443 $ 7.08 $2.00
TR-431 $ 5.32 $2.00 TR-444 $ 7.64 $2.00
TR-432 $ 5.32 $2.00 TR-445 $12.84 $3.00
TR-433 $ 4.44 $2.00 TR-446 $ 5.40 $2.00
TR-434 $ 4.52 $2.00 TR-447 $12.84 $3.00
TR-435 $13.64 $3.00 TR-448 $ 5.40 $2.00
TR-436 $ 4.20 $2.00 TR-449 $ 5.48 $2.00
TR-437 $ 4.20 $2.00 TR-450 $15.96 $3.00
Subtotal __________
CA tax or Foreign Surcharge __________
Total __________
Circle which ones you want: Announcement 15, 16, 17, 18
Check here ___ to receive 1990-91 Price Lists (with report order, only)
** END OF MESSAGE **
Date: 1991-03-13 15:22:49
Msg-ID: RFC-822
CMM.0.88.668899423.schulman@shasta.Stanford.EDU OU=
VE-GW ON=BITNET
To: mflll002
CC: schulman@shasta.Stanford.EDU OU=VE-GW ON=BITNET
From: Naomi Schulman
schulman@shasta.Stanford.EDU OU=VE-GW ON=BITNET
Subj: Announcement #19
------------ Letter Body Part 1 - Text ------------
RFC-822-HEADERS:
Return-Path: <schulman@shasta.Stanford.EDU>
Date: Wed, 13 Mar 1991 13:23:43 PST
------------ Letter Body Part 2 - Text ------------
Naomi Schulman
COMPUTER SYSTEMS LABORATORY
Stanford University
Stanford, CA 94305-4055
e-mail: schulman@sierra.stanford.edu
(415) 723-1430
Hours: M-Th 8-1 (PST)
RECENT PUBLICATIONS
LIST #19
SEPTEMBER 1990
To order reports, see Order Form on last page.
CSL-TR-90-425
CONCURRENT RUNTIME MONITORING OF FORMALLY SPECIFIED PROGRAMS
Manas Mandal and Sriram Sankar
April 1990 31 pages.....$5.48
This paper describes an application of formal specifications after an
executable program has been constructed. We describe how high level
specifications can be utilized to monitor critical aspects of the behavior of a
program continuously while it is executing. This methodology provides a
capability to distribute the monitoring of specifications on multi-processor
hardware platforms to meet practical time constraints.
Typically, runtime checking of formal specifications involves a significant
time penalty which makes it impractical during normal production opertion of a
program. In previous research, runtime checking has been applied during
testing and debugging of software, but not on a permanent basis.
Crucial to our current mthodology is the use of multi-processor machines -
hence runtime monitoring can be performed concurrently on different processors.
We describe techniques for distributing checks onto different processors.
To control the degree of concurrency, we introduce checkpoints - a point in the
program beyond which execution cannot proceed until the specified checks have
been completed. Error reporting and recovery in a multiprocessor environment
is complicated and there are various techniques of handling this. We describe
a few of these techniques in this paper.
An implementation of this methodology for the Anna specification language for
Ada programs is described. Results of experiments conducted on this
implementation using a 12-processor Sequent Symmetry demonstrate that permanent
concurrent monitoring of programs based on formal specifications is indeed
feasible.
-------------------------------------------------------------------------------
CSL-TR-90-426
A VLSI ARCHITECTURE FOR THE FCHC ISOMETRIC LATTICE GAS MODEL
Fung F. Lee, Michael J. Flynn, and Martin Morf
April 1990 28 pages.....$5.24
Lattice gas models are cellular automata used for the simulation of fluid
dynamics. This paper addresses the design issues of a lattice gas collision
rule processor for the four-dimensional FCHC isometric lattice gas model. A
novel VLSI architecture based on an optimized version of Henon's isometric
algorithm is proposed. One of the key concepts behind this architecture is the
permutation group representation of the isometry group of the lattice.
In contrast to the straightforward table lookup approach which would take 4.5
billion bits to implement this set of collision rules, the size of our
processor is only about 5000 gates. With a reasonable number of pipeline
stages, the processor can deliver one result per cycle with a cycle time
comparable to or less than that of a common commercial DRAM.
-------------------------------------------------------------------------------
CSL-TR-90-427
GENERALIZED GRAPHICAL OBJECT EDITING
John M. Vlissides
June 1990 166 pages.....$16.28
Many editors use the graphics capabilities of personal workstations to provide
a visual editing environment. Such editors present graphical representations
of familiar objects and allow the user to manipulate the representations
directly. This style of interaction is usually more intuitive to the user than
typing statements in a command language. However, implementing a graphical
object editor has been a difficult undertaking. Though many packages exist
that aid in the construction of graphical user interfaces, none are designed
specifically for building graphical object editors. This is significant
because there is a substantial semantic gap between general user interfaces and
the functionality of graphical object editors. For example, user interface
packages usually provide buttons, scroll bars, and ways to assemble them, but
they do not offer primitives for building drawing editors that produce
PostScript or schematic capture systems that produce netlists. Higher-level
abstractions are needed to make such applications easier to develop.
Unidraw is a framework for creating object-oriented graphical editors in
domains such as technical and artistic drawing, music composition, and circuit
design. The Unidraw architecture simplifies the construction of these editors
by providing programming abstractions that are common across domains. Unidraw
defines four basic abstractions: "components" encapsulate the appearance and
semantics of objects in a domain, "tools" support direct manipulation of
components, "commands" define operations on components and other objects, and
"external representations" define the mapping betweencomponents and the file
format generated by the editor. Unidraw also supports multiple views,
graphical connectivity and confinement, and dataflow between components. This
thesis describes the Unidraw design, implementation issues, and three
experimental domain-specific editors we have developed with Unidraw: a drawing
editor, a user interface builder, and a schematic capture system. Our results
indicate a substantial reduction in implementation time and effort compared
with existing tools.
-------------------------------------------------------------------------------
CSL-TR-90-428
SUB-NANOSECOND ARITHMETIC
Michael J. Flynn, Giovanni De Micheli, Robert Dutton, Bruce Wooley,
and Fabian Pease
May 1990 26 pages.....$5.08
The SNAP (Stanford Nanosecond Arithmetic Processor) project is targeted at
realizing an arithmetic processor with performance approximately an order of
magnitude faster than currently available technology. The realization of SNAP
is predicated on an interdisciplinary approach and effort spanning research in
algorithms, data representation, CAD, circuits and devices, and packaging.
SNAP is visualized as an arithmetic coprocessor implemented on an active
substrate containing several chips, each of which realize a particular
arithmetic function.
-------------------------------------------------------------------------------
CSL-TR-90-429
MULTIPROCESSOR SMALLTALK: IMPLEMENTATION, PERFORMANCE, AND ANALYSIS
Joseph Ira Pallas
June 1990 140 pages...$14.20
Multiprocessor Smalltalk demonstrates the value of object-oriented programming
on amultiprocessor. Its implementation and analysis shed light on three areas:
concurrent programming in an object-oriented language without special
extensions, implementation techniques for adapting to multiprocessors, and
performance factors in the resulting system.
Multiprocessor Smalltalk's performance shows that the combination of
multiprocessing and object-oriented programming can be effective: speedups
(relative to the original serial version) exceed 2.0 for five processors on all
the benchmarks; the median efficiency is 48%. Analysis shows both where
performance is lost and how to improve and generalize the experimental results.
Changes in the interpreter to support concurrency add at most 12% overhead;
better access to per-process variables could eliminate much of that. Changes
in the user code to express concurrency add as much as 70% overhead; this
overhead could be reduced to 54% if blocks (lambda expressions) were reentrant.
Performance is also lost when the program cannot keep all five processors busy.
Idle time in the interpreter (up to 51% overhead, excluding a pathological
case) could be reduced with a parallel garbage collector to 10%. Idle time in
user code (up to 35% overhead) remains the programmer's responsibility.
We can identify the key characteristics that make Multiprocessor Smalltalk
successful. The Smalltalk language allows us to build concurrent control
structures using lambda expressions without extending the language. Inexpensive
processes and efficient garbage collection are also crucial.
Hardware/operating-system support for shared memory, per-process variables, and
inexpensive synchronization are essential to the implementation. Given these,
object-oriented languages and multiprocessors are a good match.
-------------------------------------------------------------------------------
CSL-TR-90-430 (also numbered STAN-CS-90-1318)
TECHNIQUES FOR IMPROVING THE PERFORMANCE OF SPARSE MATRIX
FACTORIZATION ON MULTIPROCESSOR WORKSTATIONS
Edward Rothberg and Anoop Gupta
June 1990 16 pages.....$4.28
In this paper we look at the problem of factoring large sparse systems of
equations on high-performance multiprocessor workstations. While these
multiprocessor workstations are capable of very high peak floating point
computation rates, most existing sparse factorization codes achieve only a
small fraction of this potential. A major limiting factor is the cost of
memory accesses performed during the factorization. In this paper, we describe
a parallel factorization code which utilizes the supernodal structure of the
matrix to reduce the number of memory references. We also propose enhancements
that significantly reduce the overall cache miss rate. The result is greatly
increased factorization performance. We present experimental results from
executions of our codes on the Silicon Graphics 4D/380 multiprocessor. Using
eight processors, we find that the supernodal parallel code achieves a
computation rate of approximately 40 MFLOPS when factoring a range of benchmark
matrices. This is more than twice as fast as the parallel nodal code developed
at the Oak Ridge National Laboratory running on the SGI 4D/380.
-------------------------------------------------------------------------------
CSL-TR-90-431
LATENCY AND THROUGHPUT TRADEOFFS IN SELF-TIMED SPEED-INDEPENDENT
PIPELINES AND RINGS
Ted Williams
June 1990 29 pages....$5.32
Asynchronous pipelines control the flow of tokens through a sequence of logical
stages based on the status of local completion detectors. As in a
synchronously clocked circuit, the design of self-timed pipelines can trade off
between achieving low latencyn and high throughput. However, there are more
degress of freedom because of the variances in specific latch and function
block styles, and the possibility of varying both the number of latches between
function blocks and their connections to the completion detectors. This report
demonstrates the utility of a graph-based methodology for analyzing the timing
dependencies and uses it to make comparisons of different configurations. It is
shown that the extremes for high throughput and low latency differ
significantly, the placement of the completion detectors influences timing as
much as adding an additional latch, and the choice as to whether precharged or
static logic is best is dependent on the cost in complexity of the completion
detectors.
-------------------------------------------------------------------------------
CSL-TR-90-432
THE OLYMPUS SYNTHESIS SYSTEM FOR DIGITAL DESIGN
Giovanni De Micheli, David Ku, Fredric Mailhot, Thomas Truong
July 1990 29 pages.....$5.32
The Olympus Synthesis System, developed at Stanford University, is a vertically
integrated set of tools for the synthesis of digital circuit designs. The
system is specifically designed to support synthesis of Application-Specific
Integrated Circuits from behavioral-level descriptions, written in a hardware
description language called HardwareC. The Olympus system supports synthesis
with timing constraints at the behavioral, structural and logic levels. Two
internal models, Sequencing Intermediate Form, (SIF) and Structural/Logic
Intermediate Form (SLIF), are used to represent the hardware at different
levels of abstraction and to provide a way to pass design information between
the different tools. This paper describes Olympus as a system consisting of a
set of programs for behavioral, structural and logic synthesis, technology
mapping and simulation. The system has been used to design three ASIC chips at
Stanford University and it has been tested against benchmark circuits for
high-level and logic synthesis.
-------------------------------------------------------------------------------
CSL-TR-90-433
LIMITS ON MULTIPLE INSTRUCTION ISSUE
Michael D. Smith, Mike Johnson, and Mark A. Horowitz
July 1990 18 pages.....$4.44
This paper investigates the limitations on designing a processor which
cansustain an execution rate of greater than one instruction per cycle on
highly-optimized, non-scientific applications. We have used trace-driven
simulations to determine that these applications contain enough instruction
independence to sustain an instruction rate of about two instructions per
cycle. In a straightforward implementation, cost considerations argue strongly
against decoding more than two instructions in one cycle. Given this
constraint, the efficiency in instruction fetching rather than the complexity
of the execution hardware limits the concurrency attainable at the instruction
level.
-------------------------------------------------------------------------------
CSL-TR-90-434
BOOSTING BEYOND STATIC SCHEDULING IN A SUPERSCALAR PROCESSOR
Michael D. Smith, Monica S. Lam, and Mark A. Horowitz
July 1990 19 pages.....$4.52
This paper describes a superscalar processor that combines the best qualities
of static and dynamic instruction scheduling to increase the performance of
non-numerical applications. The architecture performs all instruction
scheduling statically to take advantage of the compiler's ability to
efficiently schedule operations across many basic blocks. Since the conditional
branches in non-numerical code are highly data dependent, the architecture
introduces the concept of boosted instructions, instructions that are committed
conditionally upon the result of later branch instructions. Boosting
effectively removes the dependences caused by branches and makes the scheduling
of side-effect instructions as simple as those that are side-effect free. For
efficiency, boosting is supported in the hardware by shadow structures that
temporarily hold the side effects of boosted instructions until the conditional
branches that the boosted instructions depend upon are executed. When the
branch condition is determined, the buffered side effects are either committed
or squashed. The limited static scheduler in our evaluation system shows that a
1.6-times speedup over scalar code is achievable by boosting instructions above
only a single conditional branch. This performance is similar to the
performance of a pure dynamic scheduler.
-------------------------------------------------------------------------------
CSL-TR-90-435
COLLABORATION TRANSPARENCY IN DESKTOP TELECONFERENCING ENVIRONMENTS
J. Chris Lauwers
July 1990 133 pages.....$13.64
The ultimate goal of desktop teleconferencing environments is to integrate
contemporary (non-computer-supported) audio/video teleconferencing technologies
with workstation-based network computing environments. This thesis studies the
application sharing facilities that allow multiple conference participants to
interact simultaneously with computer-based applications in a desktop
teleconference. It focuses on collaboration-transparent facilities that permit
existing single-user applications to be invoked from within a computer
conference. In the context of contemporary workstation-based environments, the
pursuit of collaboration transparency leads to the development of shared window
systems. This thesis introduces the basic architectures for shared window
systems and analyzes them based on experience with several prototypes.
Contemporary window system standards present many obstacles to shared window
designers. This thesis discusses these obstacles and suggests how they can be
overcome. In addition, new architectures for window systems are proposed that
can accommodate window sharing more effectively. The thesis then introduces
application replication as a technique for improving overall shared window
system performance. Application replication introduces many synchronization
problems, arising mainly from the need to keep replicated copies of shared
applications consistent. This thesis shows that the most frequent
synchronization problems can be solved without changing existing software. The
remaining problems can be solved by making applications or system servers
collaboration-aware. Finally, this thesis studies the ramifications, for the
software designer, of supporting spontaneous interactions, shared workspace
management, floor control, user customization, and annotations and
telepointing. While the recommendations that result are motivated by the
desire to enable continued use of collaboration-transparent applications,
addressing them involves the development of systems software that is distinctly
collaboration-aware.
-------------------------------------------------------------------------------
CSL-TR-90-436
APPLICATION OF FORMAL SPECIFICATION TO SOFTWARE MAINTENANCE
Neel Madhav and Sriram Sankar
August 1990 15 pages.....$4.20
This paper describes the use of formal specifications and associated tools in
addressing various aspects of software maintenance ---corrective, perfective,
and adaptive. It also addresses the refinement of the software development
process to build programs that are easily maintainable. The task of software
maintenance in our case includes the task of maintaining the specification as
well as maintaining the program.
We focus on the use of Anna, a specification language for formally specifying
Ada programs, to aid us in maintaining Ada programs. These techniques are
applicable to most other specification language and programming language
environments. The tools of interest are: (1) the Anna Specification Analyzer
which allows us to analyze the specification for correctness with respect to
our informal understanding of program behavior; and (2) the Anna Consistency
Checking System which monitors the Ada program at runtime based on the Anna
specification.
-------------------------------------------------------------------------------
CSL-TR-90-437
AN ADA---PROLOG SYSTEM
Neel Madhav
August 1990 15 pages.....$4.20
This paper presents a software development tool --- the Ada-Prolog system which
combines the strengths of both descriptive and procedural programming styles.
Concrete reasons and examples are provided to demonstrate that such a tool
would be useful.
This tool provides various operations available in Prolog for clausebuilding,
database building and querying to Ada programs.In addition to allowing dynamic
access to both Ada and Prolog, the Ada-Prolog system adds to the functionality
provided by Prolog by partitioning the Prolog database into lists of clauses.
These lists can be created, updated and destroyed dynamically. Concurrent
access to the list of clauses is also possible. Queries can be directed to
groups of these lists.
The system is meant for use in expert systems, compilers, database
applications, rapid prototyping systems, advanced environments, and other
software tools which use deduction.
-------------------------------------------------------------------------------
CSL-TR-90-438
A METHODOLOGY FOR FORMAL SPECIFICATION AND IMPLEMENTATION OF ADA
PACKAGES USING ANNA
Neel Madhav and Walter Mann
August, 1990 24 pages.....$4.92
This paper presents a methodology for formal specification and prototype
implementation of Ada packages using the Anna specification language.
Specifications play an important role in the software development cycle. The
methodology allows specifiers of Ada packages to follow a sequence of simple
steps to formally specify packages.
Given the formal specification of a package resulting from the methodology for
package specifications, the methodology allows implementors of packages to
follow a few simple steps to implement the package. The implementation is meant
to be a prototype.
This methodology for specification and implementation is applicable tomost Ada
packages. Limitations of this approach are pointed out at various points in the
paper.
We present software tools which help the process of specification and
implementation.
-------------------------------------------------------------------------------
CSL-TR-90-439
TANGO: A MULTIPROCESSOR SIMULATION AND TRACING SYSTEM
Helen Davis and Stephen R. Goldschmidt
July 1990 25 pages.....$5.00
Tango is a software simulation and tracing system used to obtain data for
evaluating parallel programs and multiprocessor systems. The system provides a
simulated multiprocessor environment by multiplexing application processes onto
a single processor. Tango achieves high efficiency by running compiled user
code, and by focusing on the information of greatest interest to
multiprocessing studies. The system is being applied to a wide range of
investigations, including algorithm studies and a variety of hardware
evaluations.
-------------------------------------------------------------------------------
CSL-TR-90-440
BIBLIOGRAPHY OF THE COMPUTER SYSTEMS LABORATORY - 1968-1990 (Third
Edition)
Naomi Schulman
August 1990 71 pages.....$10.00
The bibliography lists all technical reports and notes published by the
Computer Systems Laboratory of Stanford University, from 1968 to date. This
edition is in a binder where future announcements of new reports can be kept to
update the bibliography temporarily. When new reports are in sufficient
number, a whole new page will be created in the bibliography format. New pages
will be listed on future announcement Order Forms and will be sent at no charge
along with any order placed.
Prices of reports and notes are listed separately, starting on page ix. Please
note that report prices. are printed on yellow paper, and note prices are
printed on blue paper. These prices will supersede any previously given
prices, and may themselves be superseded in the future if printing and/or
postage costs rise.
The symbol (*), which, in earlier editions signified that a report was out of
print, is now used only to indicates that no "original" is available for
copying. All other reports will be made available upon request.
An earlier edition of the Bibliography (TR-272 or TR-336), plus copies of
announcements 15,16, 17, 18 and 19, will complete the list of publications.
However, new price lists will be needed to complete the updating of previous
editions. The TR Price List, 1990-91 and TN Price List, 1990-91 will be sent
upon request with any order placed.
-------------------------------------------------------------------------------
CSL-TR-90-441
COMPUTING TYPES DURING PROGRAM SPECIALIZATION
Daniel Weise and Erik Ruf
August 1990 26 pages.....$5.08
We have developed techniques for obtaining and using type information during
program specialization (partial evaluation). Computed along with every
residual expression and every specialized program is type information that
bounds the possible values that the specialized program will compute at run
time. The three keystones of this research are symbolic values that represent
both a value and the code for creating the value, generalization of symbolic
values, and the use of online fixed-point iterations for computing the type of
values returned by specialized recursive functions. The specializer exploits
type information to increase the efficiency of specialized functions. This
research has two benefits, one anticipated and one unanticipated. The
anticipated benefit is that programs that are to be specialized can now be
written in a more natural style without losing accuracy during specialization.
The unanticipated benefit is the creation of what we term concrete abstract
interpretation. This is a method of performing abstract interpretation with
concrete values where possible. The specializer abstracts values as needed,
instead of requiring that all values be abstracted prior to abstract
interpretation.
-------------------------------------------------------------------------------
CSR-TR-90-442
AN IMPROVED HIGH-SPEED FLOATING-POINT ADDITION ALGORITHM
Nhon T. Quach and Michael J. Flynn
August 1990 21 pages.....$4.68
This paper describes an improved, IEEE conforming floating-point addition
algorithm. This algorithm has only one addition step involving the significand
in the worst-case path, hence offering a considerable speed advantage over the
existing algorithms, which typically require two to three addition steps.
-------------------------------------------------------------------------------
CSL-TR-90-443
A QUEUING ANALYSIS FOR DISK ARRAY SYSTEMS
Mikito Ogata and Michael J. Flynn
August 1990 51 pages.....$7.08
Using a queuing model of disk arrays, we study the performance and tradeoffs in
disk array sub-systems and develop guidelines for designing these sub-systems
in various CPU environments. Finally, we compare our model with some earlier
simulation results.
-------------------------------------------------------------------------------
CSL-TR-90-444
EVALUATION OF A FOUR FOLD SPARSE DISTRIBUTED MEMORY PROTOTYPE
Brian Flachs
August 1990 58 pages.....$7.64
Sparse Distributed Memory is a generalized random access memory for long binary
words. This document describes changes made to the Single Fold Prototype to
develop Stanford's Four Fold Prototype. A users manual for the libraries and
utilities developed for the prototype and a performance evaluation of the
prototype are also included. Appendices detail hardware settings and provide a
complete, up-to-date, set of Address Module Schematics.
-------------------------------------------------------------------------------
CSL-TR-90-445 (also numbered STAN-CS-90-1334)
SOFT CONFIGURABLE WAFER SCALE INTEGRATION: DESIGN, IMPLEMENTATION AND
YIELD ANALYSIS
Miriam Greta Blatt
June 1990 123.....$12.84
Soft-Configurable Wafer Scale Integration uses software controlled switches to
connect up the fault-free parts of a wafer. Compared to hard configuration, th
e soft configurable approach has the advantages of providing low-cost
connections and runtime fault tolerance. The dissertation describes how to
achieve soft co nfiguration with high performance, presenting a pipelined
memory system implemented using this approach. The yield of the prototype is
evaluated in two phases. Fault simulation applies measured defect statistics
to the layout to predict the yield of each circuit unit. These unit yields are
combined to produce wafer yields using redundancy models appropriate to wafer
scale integration. The redundancy models constrain wafer yield by system
requirements such as the minimum n umber of working circuit units, and whether
these working units are distributed evenly around the wafer. Choice of
redundancy model significantly affects the r esulting wafer yield.
-------------------------------------------------------------------------------
CSL-TR-90-446
ANALYZING THE FAULT TOLERANCE OF A CLASS OF DOUBLE-LOOP NETWORKS
Jon M. Peha and Fouad A. Tobagi
September 1990 30 pages.....$5.40
This paper analyzes the fault tolerance of a class of double-loop networks
referred to as forward-loop backward-hop (FLBH), in which each node is
connected via unidirectional links to the node one hop in front of it and to
the node S hops in back of it for some S. A new measure of fault tolerance is
described, along with techniques based on Markov chains to calculate upper and
lower bounds on the fault tolerance of this network topology quickly and
efficiently. The result s of these calculations provide a more precise
description of network fault tolerance than has been achieved with previously
published techniques.
-------------------------------------------------------------------------------
CSL-TR-90-447
EVALUATION OF SCHEDULING ALGORITHMS FOR INTEGRATED-SERVICES
PACKET-SWITCHED NETWORKS
Jon Peha and Fouad A. Tobagi
September 1990 31 pages.....$5.48
In this paper, we consider integrated-services packet-switched networks with
two types of traffic: traffic with deadlines, for which the most important
perform ance objective is based on loss rate, and packets without deadlines,
for which the most important performance objective is based on mean delay. We
present an o ptimal scheduling algorithm to minimize weighted loss rate and
weighted mean delay in the queues that form at the switches and at the network
access points of a packet-switched network, where weights reflect the relative
importance of packets. Although not practical for implementation, the
algorithm is intended as a standard for comparison with which the performance
of other scheduling algorithms can be evaluated. Our algorithm is more general
and of lower computational co mplexity than previously published algorithms,
enabling us to evaluate performance in some important scenarios that could not
previously have been considered. U sing optimal performance results from our
algorithm, we evaluate the performance of the first-come-first-served, static
priority, and earliest deadline first al gorithms, finding some limitations of
each. Our results suggest that network efficiency could be improved by using a
more sophisticated heuristic scheduling alg orithm rather than one of the
aforementioned algorithms.
-------------------------------------------------------------------------------
CSL-TR-90-448
A COST-BASED SCHEDULING ALGORITHM TO SUPPORT INTEGRATED SERVICES
Jon Peha and Fouad A. Tobagi
September 1990 44 pages.....$6.52
It is becoming increasingly important to support applications with diverse
performance objectives on a single packet-switched network. The efficiency of
a netw ork with such diverse traffic can be greatly improved through the use of
sophisticated scheduling and dropping algorithms within the queues that form at
the net work access points and in switches throughout the network. This paper
presents heuristic algorithms for that purpose. In our approach, arbitrary
performance o bjectives are defined in the form of cost functions, which map
the queueing delay experienced by each packet to a cost incurred.
Our algorithms, cost-based scheduling (CBS) and cost-based dropping (CBD), then
attempt to optimize network performance as perceived by the applications by
mini mizing the total cost incurred by all packets. Cost functions are
presented that are appropriate for most common applications. Scheduling and
dropping algorit hms are defined based on these cost functions. Through
simulation, it is demonstrated that network performance is better when these
heuristic algorithms are use d as opposed to the common alternatives.
-------------------------------------------------------------------------------
CSL-TR-90-449 (also numbered STAN-CS-90-1330)
PARALLEL ICCG ON A HIERARCHICAL MEMORY MULTIPROCESSOR -- ADDRESSING
THE TRIANGULAR SOLVE BOTTLENECK
Edward Rothberg and Anoop Gupta
September 1990 22 pages.....$4.76
The incomplete Cholesky conjugate gradient (ICCG) algorithm is a commonly used
iterative method for solving large sparse systems of equations. In this paper,
w e study the parallel solution of sparse triangular systems of equations, the
most difficult aspect of implementing the ICCG method on a multiprocessor. We
focu s on shared-memory multiprocessor architectures with deep memory
hierarchies. On such architectures we find that previously proposed
parallelization approaches result in little or no speedup. The reason is that
these approaches cause significant increases in the amount of memory system
traffic as compared to a sequen tial approach. Increases of as much as a
factor of 10 on four processors were observed. In this paper we propose new
techniques for limiting these increases, including data remappings to increase
spatial locality, new processor synchronization techniques to decrease the use
of auxiliary data structures, and data part itioning techniques to reduce the
amount of interprocessor communication. With these techniques, memory system
traffic is reduced to as little as one sixth of its previous volume. The
resulting speedups are greatly improved as well, although they are still much
less than linear. We discuss the factors that limit fur ther speedups. We
present both simulation results and results of experiments on an SGI 4D/340
multiprocessor.
-------------------------------------------------------------------------------
CSL-TR-90-450 (also numbered STAN-CS-90-1333)
COMPARING STRUCTURALLY DIFFERENT VIEWS OF A VLSI DESIGN
Michael Joseph Spreitzer
June 1990 162 pages.....$15.96
VLSI designers often use structural hierarchies and alternate views of a
design. Allowing alternate views to have different hierarchies improves the
clarity of the views, but complicates comparison of those views. Most existing
comparison techniques either require essentially identical hierarchies, or can
only handle differences by flattening. A new technique, Informed Comparison,
first reconciles the hierarchy differenc es by making purely structural
transformations according to a description of the intended relationship between
the hierarchies, and then finishes with an existi ng hierarchical technique.
Informed Comparison thus can compare views with different hierarchies without
the penalties associated with flattening.
-------------------------------------------------------------------------------
Naomi Schulman
COMPUTER SYSTEMS LABORATORY
Stanford University
Stanford, CA 94305-4055
E-mail: schulman@sierra.stanford.edu
415-723-1430
Hours: M-Th, 8-1 (PST)
ORDER FORM AND INVOICE
ALL ORDERS MUST BE PREPAID
Please print or type your name and address in the space provided. Check
the number(s) of the report(s) you wish to purchase and circle the price of
hardcopy or microfiche. California Residents must add the sales tax of their
own local county. Return this order form with a check or money order made
payable to Stanford University.
Foreign Orders must include $1.00 extra for each $15.00's worth of reports to
cover postage. Checks must be drawn on a U.S. bank, payable in U.S.
dollars. If an invoice is required for payment, please note that this form
is both an order form and an invoice. We cannot invoice separately.
(Name) ________________________________
(Address)_______________________________
________________________________
________________________________
________________________________
E-mail address:________________________________
REPORT # HARDCOPY FICHE REPORT # HARDCOPY FICHE
TR-425 $ 5.48 $2.00 TR-438 $ 4.92 $2.00
TR-426 $ 5.24 $2.00 TR-439 $ 5.00 $2.00
TR-427 $16.28 $3.00 TR-440 $10.00 $2.00
TR-428 $ 5.08 $2.00 TR-441 $ 5.08 $2.00
TR-429 $ 4.20 $2.00 TR-442 $ 4.68 $2.00
TR-430 $ 4.28 $2.00 TR-443 $ 7.08 $2.00
TR-431 $ 5.32 $2.00 TR-444 $ 7.64 $2.00
TR-432 $ 5.32 $2.00 TR-445 $12.84 $3.00
TR-433 $ 4.44 $2.00 TR-446 $ 5.40 $2.00
TR-434 $ 4.52 $2.00 TR-447 $12.84 $3.00
TR-435 $13.64 $3.00 TR-448 $ 5.40 $2.00
TR-436 $ 4.20 $2.00 TR-449 $ 5.48 $2.00
TR-437 $ 4.20 $2.00 TR-450 $15.96 $3.00
Subtotal __________
CA tax or Foreign Surcharge __________
Total __________
Circle which ones you want: Announcement 15, 16, 17, 18
Check here ___ to receive 1990-91 Price Lists (with report order, only)
** END OF MESSAGE **