leff@smu.UUCP (Laurence Leff) (02/27/89)
Naomi Schulman
Publications
COMPUTER SYSTEMS LABORATORY
STANFORD UNIVERSITY
Stanford, CA 94305-4055
(415) 273-1430 (M-Th, 8-1)
RECENT REPORTS & NOTES
LIST #15
JANUARY 1989
To order reports, see end of this announcement.
ABSTRACTS
CSL-TR-88-365
Deriving Accurate Fault Models
John Acken
October 1988 130 pages.....$9.50
Modeling the effects that physical failures have on integrated circuit behavior
is important for both design and test. Fault models describe circuit behavior
at a level of abstraction that allows tests to be generated without requiring
detailed knowledge of physical failures. This thesis presents three criteria
for deriving accurate fault models and describes a method that meets these
criteria. The method is to design, fabricate, and test a RAM specifically for
deriving fault models, as opposed to the common practice of using RAMs to drive
technology development. To demonstrate this approach, a static RAM was
designed, fabricated, and tested to derive models for a target technology of
CMOS custom logic circuits.
For CMOS technology, the most accurate logic level model for a bridging fault
is a vote among the shorted nodes. This vote is based upon the relative output
strengths of the shorted transistors, and the short does not result in an
indeterminate logic value. Using the voting model requires neither the
electrical details of each specific shorted circuit nor the physical details of
the shorting failures. All that is needed is the relative strength of the
transistors and the location of the potential bridging fault sites. These
sites depend upon the physical layout, but can be easily located using standard
layout tools such as circuit extractors. The average number of potential
bridging faults per node is small, between 4 and 8.
Bridging faults require that differences between logical clustering and
physical clustering must be considered for successful testing. In particular,
circuit segmentation for pseudo-exhaustive test is modified to utilize physical
layout information.
-------------------------------------------------------------------------------
CSL-TR-88-366
Performance-Directed Memory Hierarchy Design
Steven A. Przybylski
September 1988 183 pages.....$12.15
Most Cache studies to date have concentrated on time independent metrics: miss
rates and traffic ratios. These metrics can be misleading indicators of
overall performance, and minimizing them can lead to distinctly sub-optimal
computer implementations. In this thesis, program execution time replaces the
miss rate as the basis of evaluation of memory hierarchies. The thesis
describes the characteristics of performance-optimal single- and multi-level
cache hierarchies.
Using performance as the cache design metric introduces a often ignored design
1
variable: the machine's cycle time. The inevitable tradeoffs between the
temporal and organizational variables are exposed by equating a change in the
cache size, set associativity or block size to a change in cycle time. After
the tradeoffs between the temporal and organizational parameters are
individually explored, they are unified in a single graph that allows
comparison of the performance of a wide variety of cache organizations and
implementations.
Performance-directed cache analysis reveals two dilemmas facing machine
designers: small fast caches are not optimal; single-level caches have a
definite upper performance bound. Multi-level cache hierarchies can
simultaneously break the single-level performance barrier and decrease the
optimal size and cycle time of the cache nearest the CPU. For this reason, the
local and global characteristics of multi-level hierarchies are explored.
Given a set of implementable caches, dynamic programming can find the
performance-optimal multi-level hierarchy. The optimum number of levels in the
hierarchy increases as CPU cycle times decrease with respect to main memory
access times. Hierarchies with depths of two and three approach the
theoretically maximal performance limit.
This thesis concludes with a succinct set of guidelines that will aid designers
in finding the memory hierarchy that maximizes system performance given
particular implementation constraints.
-------------------------------------------------------------------------------
CSL-TR-88-367
An Overview of VAL
Larry M. Augustin, Benoit A. Gennart, Youm Huh, David C. Luckham, and
Alec G. Stanculescu
September 1988 39 pages.....$4.95
VAL (VHDL Annotation Language) provides a small number of new language
constructs to annotate VHDL hardware descriptions. VAL annotations, added to
the VHDL entity declaration in the form of formal comments, express intended
behavior common to all architectural bodies of the entity. Annotations are
expressed as parallel processes that accept streams of input signals and
generate constraints on output streams. VAL views signals as streams of values
ordered by time. Generalized timing expressions allow the designer to refer to
relative points on a stream. No concept of preemptive delayed assignment or
inertial delay are needed when referring to different relative points in time
on a stream. The VAL abstract state model permits abstract data types to be
used in specifying history dependent device behavior. Annotations placed
inside a VHDL architecture define detailed correspondences between the behavior
specification and architecture. The result is a simple but expressive language
extension of VHDL with possible applications to automatic checking of VHDL
simulations, hierarchical design, and automatic verification of hardware
designs in VHDL.
-------------------------------------------------------------------------------
CSL-TR-88-368
Improved Models for Switch-Level Simulation
Chorng-Yeong Chu
November 1988 147 pages.....$10.35
Simulation plays an important role in design verification. With increasingly
large VLSI designs, the switch-level representation has become the only
approach that is both reasonably accurate and computationally feasible.
At present, switch-level simulators use relatively unsophisticated techniques
to extract information from the switch-level representation, and even these
small amounts of information are not always fully utilized. As a result, these
simulators often lack accuracy. Most notably, the way some switch-level
2
simulators compute the final value can potentially generate undesirably
pessimistic results, and charge-sharing problems are widely ignored.
This thesis shows how to extract more information based on the same set of
widely adopted switch-level assumptions. Using more sophisticated analyses,
this thesis presents better final-value and charge-sharing models. The new
final-value model uses a systematic way to look at the relationship between the
voltage and the resistance. This approach can also objectively compare the
accuracy of different DC-computation schemes. Charge-sharing problems are
modeled with two time constants. The two-time-constant approach is based on the
observation that most waveforms due to charge sharing are dominated by a pair
of time constants. Charge-sharing models are first constructed on resistor
networks, then they are extended to transistor networks.
These models have been incorporated into nRSIM --- a RSIM-based switch-level
simulator. The new simulator has the same running time as the original RSIM,
but it can handle a larger class of circuits.
-------------------------------------------------------------------------------
CSL-TR-88-369
Composing User Interfaces with InterViews
Mark A. Linton, John M. Vlissides, and Paul R. Calder
November 1988 25 pages.....$4.25
In this paper we show how to compose user interfaces with InterViews, a user
interface toolkit we have developed at Stanford. InterViews provides a library
of predefined objects and a set of protocols for composing them. A user
interface is created by composing simple primitives in a hierarchical fashion,
allowing complex user interfaces to be implemented easily. InterViews supports
the composition of interactive objects (such as scroll bars and menus), text
objects (such as words and whitespace), and graphics objects (such as circles
and polygons). To illustrate how InterViews composition mechanisms facilitate
the implementation of user interfaces, we present three simple applications: a
dialog box built from interactive objects, a drawing editor using a hierarchy
of graphical objects, and a class browser using a hierarchy of text objects.
We also describe how InterViews supports consistency across applications as
well as end-user customization.
-------------------------------------------------------------------------------
CSL-TR-88-370
Copy Elimination in Functional Languages
K. Gopinath and John L. Hennessy
November 1988 15 pages.....$3.75
Copy elimination is an important optimization for compiling functional
languages. Copies arise because these languages lack the concepts of state and
variable; hence updating an object involves a copy in a naive implementation.
Copies are also possible if proper targeting has not been carried out inside
functions and across function calls. Targeting is the proper selection of a
storage area for evaluating an expression. By abstracting a collection of
functions by a target operator, we compute targets of function bodies that can
then be used to define an optimized interpreter to eliminate copies due to
updates and copies across function calls. The language we consider is typed
lambda calculus with higher-order functions and special constructs for array
operations. Our approach can eliminate copies in divide and conquer problems
like quicksort and bitonic sort that previous approaches could not handle.
We also present some results of implementing a compiler for a single assignment
language called SAL on some small but tough programs. Our results indicate that
it is possible to approach a performance comparable to imperative languages
like Pascal.
3
-------------------------------------------------------------------------------
CSL-TR-88-371
Generalized Access Control Strategies for Token Passing Systems
Joseph W. M. Pang, Fouad A. Tobagi and Stephen Boyd
November 1988 42 pages.....$5.10
As the concepts of distributed computing, resource sharing, and office and
factory automation are being put into practice, the demand for integrated
services local area networks has never been greater before. Recently, a timer
based integrated access protocol has been adopted by two major token passing
local area network standards, the IEEE 802.4 Token Bus Standard [1] and the
ANSI drafted Fiber Distributed Data Interface (FDDI) Standard [2]. This
protocol is capable of providing bounded access delay for real-time traffic,
and multiple priority classes for data traffic [3]--[4]. Nevertheless, under
heavy traffic, this protocol is not asymptotically stable. In this report, we
present a very broad class of access control strategies, generalized from the
timer based approach adopted in [1]--[2]. Similar to the timer based protocol,
this class of access control strategies has the same favorable properties of
providing bounded token cycles and multiple classes of service. Furthermore,
at high load, the token cycles for this new class of strategies converge with a
geometric bound, as opposed to exhibiting the oscillatory behaviour observed
with the timer based protocol in [3]--[4]. Based on the convergence property,
we are able to compute the efficiency of the controlled access token channel
and the bandwidth allocation among stations under heavy traffic. Moreover, we
derive tight upper bounds on token cycles under arbitrary loading conditions.
This study not only enriches the design space of integrated services token
systems, but also provides meaningful insights into the control mechanism
within the protocol.
-------------------------------------------------------------------------------
CSL-TR-88-372
Incremental VLSI Compaction
Clyde W. Carpenter
December 1988 117 pages....$8.85
VLSI compaction is the translation from a high-level description of a circuit
down to the detailed layout needed for fabrication. A compactor tries to make
as compact a layout as possible without violating any design rules. An
incremental compactor allows one to edit a schematic or change layout
constraints and quickly see the effects of the change.
An incremental compactor has to incrementally generate and solve the
constraints needed to enforce the design rules. This dissertation presents an
algorithm that uses adjacency lists to generate and incrementally update a
minimal complete set of the spacing constraints needed to keep adjacent tiles
in a layout from interfering with each other. The base algorithm creates
clockwise threaded lists of non-overlapping, fixed-size tiles. The algorithm
is complicated by the need to handle wires, overlapping tiles, and various
different spacing rules. In near linear time it generates an average of 1.2
spacing constraints per tile. The adjacency lists allow fast, efficient
updates when tiles are moved, deleted, or inserted.
This dissertation also presents three algorithms to solve the constraints once
they are generated. In addition to minimizing area, these algorithms also
minimize the total wire length. One of them calculates the sum of the
wire-pull weights on each subtree of a directed spanning tree of active
constraints to decide which subtrees need to be moved. This weighted tree
provides enough information to make incremental changes in time proportional to
the size of the change instead of to the size of the circuit. Wire-length
minimization improves the layout but gives compaction a slightly worse than
linear expected time.
4
-------------------------------------------------------------------------------
CSL-TR-88-373
Sparse Distributed Memory Prototype: Address Module Hardware Guide
M. J. Flynn, R. Zeidman, E. Lochner
December 1988 78 pages.....$6.90
This document is a detailed specification of the hardware design of the Address
Module for the prototype Sparse Distributed Memory. It contains all of the
information needed to build, test, debug, modify and operate the Address
Module.
-------------------------------------------------------------------------------
CSL-TR-88-374
Post-Game Analysis--- A Heuristic Resource Management Framework for
Concurrent Systems
Jerry C. Yan
December 1988 212 pages.....$13.60
While multiprocessors promise to deliver orders of magnitude speedup, the
effective use of such computers still depends critically on the ability of
their resource management systems to properly trade off communication loss and
concurrency gain, exploit behavioral characteristics of the application
programs, and take advantage of specific hardware features of the
multiprocessor. Unfortunately, the performance of many proposed resource
management strategies can only be evaluated indirectly by simple objective
functions (such as execution cost or mapping cardinality).
Research has been conducted to determine how distributed computations can be
mapped onto multiprocessors to minimize execution time. The approach proposed
here, known as Post-game analysis, offers an unconventional alternative to
reduce program execution time. Program--machine mapping is improved ``in
between'' program executions. Instead of using simple abstract models,
post-game analysis utilizes actual timing data gathered during program
execution. Program execution time is reduced based on many optimization
sub-goals. Because heuristics are applied to improve the current mapping and
resolve conflicting sub-goals, post-game analysis can be incrementally refined
and tailored to specific applications and architectures. The performance of
post-game analysis has been compared with other strategies using various
program structures and multiprocessor models. Results obtained from simulations
show that it outperforms the speedup obtained based on random placement, load
balancing as well as clustering algorithms by 15%. Because intermediate
program/machine models are not used, post-game analysis is very promising for
immediate application to today's programs with today's machines.} area, these
algorithms also minimize the total wire length. One of them calculates the sum
of the wire-pull weights on each subtree of a directed spanning tree of active
constraints to decide which subtrees need to be moved. This weighted tree
provides enough information to make incremental changes in time proportional to
the size of the change instead of to the size of the circuit. Wire-length
minimization improves the layout but gives compaction a slightly worse than
linear expected time.
-------------------------------------------------------------------------------
CSL-TR-89-375
THROUGHPUT APPROXIMATION FOR THE GENERALIZED TIMER BASED PROTOCOL
FOR TOKEN PASSING SYSTEMS
Joseph W. M. Pang and Fouad A. Tobagi
January 1989 33 pages.....$4.65
Recently, a timer based integrated access protocol has been adopted by two
5
major local area network (LAN) standards, the IEEE 802.4 Token Bus Standard [1]
and the ANSI drafted Fiber Distributed Data Interface Standard [2]. This timer
based approach is capable of providing bounded access delay for real-time
traffic (i.e., traffic with stringent delay requirements), multiple classes of
service, and efficient use of the communication channel [3]-[4]. Based on a
careful examination of the dynamical behaviour of the timer based protocol
under heavy traffic, we have been able to generalize the underlying idea behind
the protocol and proposed a new class of access control schemes in [5]. We
have also established in [5] important results pertaining to the bandwidth
allocation under heavy traffic and upper bounds on token cycles under general
traffic for the generalized control schemes. Although the results on bandwidth
allocation obtained in [5] are very useful for first-cut design purposes, they
are only applicable under heavy traffic. In this report, we are interested in
obtaining the station throughputs for the generalized control schemes under
arbitrary input traffic. Based on the assumption that the variances of station
service times are small, we develop a simple and yet very good approximation
for the station throughputs, that is dependent only on the average data arrival
rates at the stations. Using this approximation, we can determine whether a
station in the system is stable or not. Furthermore, if a station is not
stable, its throughput can be obtained from the approximation. This
approximation is also applicable to the timer based protocol used in the LAN
standards [1]-[2].
-------------------------------------------------------------------------------
6
Naomi Schulman
COMPUTER SYSTEMS LABORATORY
Stanford University
Stanford, CA 94305-4055
EM: Schulman@Sierra.Stanford.Edu
415-723-1430
Hours: M-Th, 8-1
ORDER FORM
To Order Reports: Print or type your name and address in the space provided.
Check or circle the report(s) you wish to purchase, whether hardcopy or
microfiche. All orders must be PREPAID. CALIFORNIA RESIDENTS must add sales
tax of their local county. Return this order form with your check or money
order made payable to Stanford University. FOREIGN ORDERS must be paid with an
international money order or a check drawn on a U.S. bank, payable in dollars.
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
_____________________________________________
Report # Hardcopy Microfiche Report # Hardcopy Microfiche
TR 365 $9.50 $3.00 TR 371 $5.10 $2.00
TR 366 $12.15 $3.00 TR 372 $8.85 $3.00
TR 367 $4.95 $2.00 TR 373 $6.90 $2.00
TR 368 $10.35 $3.00 TR 374 $13.60 $4.00
TR 369 $4.25 $2.00 TR 375 $4.65 $2.00
TR 370 $3.75 $2.00
Subtotal __________
Your Local CA County tax __________
Total __________
-------