leff@smu.UUCP (Laurence Leff) (10/17/89)
Naomi Schulman
COMPUTER SYSTEMS LABORATORY
STANFORD UNIVERSITY
Stanford, CA 94305-4055
(415) 273-1430 (M-Th, 8-1)
RECENT REPORTS & NOTES
LIST #16
SEPTEMBER 1989
To order reports, see end of this announcement.
CSL-TR-89-376
INTEGRATING A STRUCTURING MECHANISM WITH A PROGRAM EDITOR
Lia Adams
APRIL 1989 104 pages.....$9.95
Module interconnections are an important aspect of a large program. The grid
is a powerful, language-independent structuring mechanism that specifies what
interconnections are valid and documents what interconnections actually exist.
This thesis describes an editor for grids, named Ogre, that interactively
checks the validity of new interconnections and finds inconsistencies in
existing interconnections when grid specifications are changed. Additional
information is stored with a grid to support this incremental processing.
Ogre maintains the language-independence of the grid by communicating by
messages with a language-oriented editor. This loosely-coupled integration of
module and inter-module editing allows the programmer to access global
structural information in the context of an individual module while supporting
multiple source languages and module editors. We have implemented a prototype
of Ogre using a simple editing environment for Modula-2 and a grid-manipulator
that stores grids in text files.
Our implementation demonstrates that the power of a grid can be brought to a
programmer by interactive access and incremental update. The incremental-
update algorithms described in this thesis make interactive grid access
feasible. Loosely-coupled integration adds grid access to program editing
without requiring a monolithic environment.
-------------------------------------------------------------------------------
CSL-TR-89-377
ACCESS CONTROL STRATEGIES FOR TOKEN PASSING
INTEGRATED SERVICES NETWORKS
Joseph Wai-Ming Pang
March 1989 157 pages.....$13.40
Traditionally, the major application of local area networks (LANs) has been for
connecting computers, terminals and peripheral devices. Due to their
flexibility and cost-effectiveness, LANs are vnow being considered as the
communication means for a wide spectrum of new and exciting applications, for
example, office and factory automation. However, these applications generate
new traffic types that have diverse statistical characteristics and stringent
delay constraints. Designed primarily for bursty computer traffic,
conventional LAN multiple access protocols are not capable of supporting such
new traffic types.
Many proposals have been made, with various degrees of success, for supporting
diverse applications on a single LAN. A timer based protocol, adopted by the
IEEE 802.4 Token Bus Standard and the ANSI drafted Fiber Distributed Data
Interface (FDDI) Standard, appears to be a very attractive solution. This
protocol is based on a simple and fully distributed design, with the intention
of supporting real-time traffic and providing multiple classes of service.
1
Nevertheless, due to a lack of understanding of the performance characteristics
of this protocol, there do not exist any guidelines for selecting the
parameters embedded in the protocol to meet given network constraints.
In this research, we examine the timer based protocol under heavy traffic. A
bound on the token cycles is established and a procedure for computing the
bandwidth allocation among different stations is given. The analytic results
derived allow us to convert the network design problem into a tractable
optimization problem. Furthermore, this careful examination has led us to
consider the protocol as a feedback control mechanism for limiting the token
cycles. By adopting this novel view, we are able to generalize the idea behind
the timer based protocol to a very large and flexible class of access control
strategies. The salient features pertaining to the design of the new access
control strategies are explored and a design example is presented. As an
additional tool for performance evaluation and tuning, we have established a
very good approximation for the station throughputs of the new access control
strategies under general load.
-------------------------------------------------------------------------------
CSL-TR-89-378
ANALYSIS OF PARALLELISM AND DEADLOCKS
IN DISTRIBUTED-TIME LOGIC SIMULATION
Larry Soule and Anoop Gupta
March 1989 39 pages.....$5.75
This paper explores the suitability of the Chandy-Misra algorithm for digital
logic simulation. We use four realistic circuits as benchmarks for our
analysis, with one of them being the vector-unit controller for the Titan
supercomputer from Ardent. Our results show that the average number of logic
elements available for concurrent execution ranges from 10 to 111 for the four
circuits, with an overall average of 68. Although this is twice as much
parallelism as that obtained by traditional event-driven algorithms for these
circuits, we feel it is still too low. One major factor limiting concurrency
is the large number of global synchronization points --- ``deadlocks'' in the
Chandy-Misra terminology --- that occur during execution. Towards the goal of
reducing the number of deadlocks, the paper presents a classification of the
types of deadlocks that occur during digital logic simulation. Four different
types are identified and described intuitively in terms of circuit structure.
Using domain specific knowledge, the paper proposes methods for reducing these
deadlock occurrences. For one of the benchmark circuits, the use of the
proposed techniques eliminated {\em all} deadlocks and increased the average
parallelism from 40 to 160. We believe that the use of such domain knowledge
will make the Chandy-Misra algorithm significantly more effective than it would
be in its generic form.
-------------------------------------------------------------------------------
CSL-TR-89-379
TWO DIMENSIONAL PINPOINTING:
AN APPLICATION OF FORMAL SPECIFICATION TO DEBUGGING PACKAGES
David Luckham 34 pages.....$5.40
April 1989
New methods of testing and debugging software utilizing high-level formal
specifications are presented. These methods require a new generation of support
tools. Such tools must be capable of automatically comparing the runtime
behavior of hierarchically structured software with high-level specifications;
they must provide information about inconsistencies in terms of abstractions
used in specifications.
This use of specifications has several advantages over present-day debugging
methods:
2
1. the debugging problem itself is precisely defined by specifications;
2. violations of specifications are detected automatically, thus
eliminating the need to search output traces and recognize errors
manually;
3. complex tests, such as tests for side-effects on global data, can be
made easily;
4. the new methods are independent of any compiler and runtime
environment for a programming language;
5. they apply generally to hierarchically structured software --- e.g.,
packages containing nested units,
6. they also apply to other life-cycle processes such as analysis of
prototypes, and the use of prototypes to build formal
specifications.
In this paper a particular process for locating errors in software packages,
called it two dimensional pinpointing, is described. Tests consist of
sequences of package operations (first dimension). Specifications at the
highest (most abstract) level are checked first. If violations occur then new
specifications are added if possible, otherwise checking of specifications at
the next lower level (second dimension) is activated. Violation of a new
specification provides more information about the error which reduces the
region of program text under suspicion. All interaction between programmer and
toolset is phrased in terms of the concepts used to specify the program.
Two dimensional pinpointing is presented using the Anna specification language
for Ada programs. Anna and a toolset for comparing behavior of Ada programs
with Anna specifications is described. Pinpointing techniques are then
illustrated by examples. The examples involve debugging of Ada packages, for
which Anna provides a rich set of specification constructs. The Anna toolset
supports use of the methodology on the full Ada/Anna languages, and is being
engineered to commercial standards.
-------------------------------------------------------------------------------
CSL-TR-89-380
UNIDRAW: A FRAMEWORK FOR BUILDING DOMAIN-SPECIFIC GRAPHICAL EDITORS
John M. Vlissides and Mark A. Linton
July 1989 16 pages.....$4.25
Unidraw is a framework for creating object-oriented graphical editors in
domains such as technical and artistic drawing, music composition, and CAD.
The Unidraw architecture simplifies the construction of these editors by
providing programming abstractions that are common across domains. Unidraw
defines four basic abstractions: components encapsulate the appearance and
semantics of objects in a domain, tools support direct manipulation of
components, commands define operations on components and other objects, and
external representations define the mapping between components and the file
format generated by the editor. Unidraw also supports multiple views,
graphical connectivity and confinement, and dataflow between components. This
paper describes the Unidraw design, implementation issues, and three prototype
domain-specific editors we have developed with Unidraw: a drawing editor, a
user interface builder, and a schematic capture system. Experience indicates a
substantial reduction in implementation time compared with existing tools.
-------------------------------------------------------------------------------
CSL-TR-89-381
3
THE DESIGN AND IMPLEMENTATION OF AN INCREMENTAL LINKER
Russell W. Quong
July 1989 85 pages.....$8.75
As computer systems grow in complexity and power, controlling the cost of
software development and maintenance is increasingly important. One way to
improve programmer productivity is to reduce program turnaround time and hence
programmer wait time. An effective technique is the use of incremental
algorithms to speed up compilation, linking and debugging.
For incremental algorithms to be effective, typical changes to programs must be
small and localized. To test this hypothesis we have obtained a quantitative
profile of how programs are changed across compiles and links. We have found
that most changes to modules between compiles are small and most changes to
programs between links are localized to a few modules.
To test the practicality of incremental algorithms we have designed and
implemented an incremental linker, called \inclink. Our design trades space
for time by saving the state of the linker in memory between links. Our
linking algorithm processes only those modules that have changed between links.
The resulting changes are rewritten into the current executable. Link time is
proportional to the size of the change, not the size of the program.
To minimize changes to the executable file, we place changed modules over their
previous version if they fit. To maximize this probability, we add extra space
(slop) to each module's allocation to absorb size increases. If a module
overflows its allocation, then all parts of the program after the overflow must
be rewritten, causing link time to be proportional to the size of the program.
We have simulated various slop management schemes to find those with the best
tradeoff between unused-slop-space and number-of-overflows.
Inclink is a straight-forward implementation of our incremental linking
algorithm. Its link time is under a couple of seconds, regardless of program
size. Inclink is an order of magnitude faster than a batch linker and is fully
compatible with existing compilers. Inclink helps to increase programmer
productivity by reducing program turnaround time.
-------------------------------------------------------------------------------
CSL-TR-89-382
MULTI-LEVEL SHARED CACHING TECHNIQUES FOR SCALABILITY IN VMP-MC
David R. Cheriton, Hendrik A. Goosen and Patrick D. Boyle
June 1989 19 pages.....$4.45
The problem of building a scalable shared memory multiprocessor can be reduced
to that of building a scalable memory hierarchy, assuming interprocessor
communication is handled by the memory system. In this paper, we describe the
VMP-MC design, a distributed parallel multi-computer based on the VMP
multiprocessor design, that is intended to provide a set of building blocks for
configuring machines from one to several thousand processors. VMP-MC uses a
memory hierarchy based on shared caches, ranging from on-chip caches to
board-level caches connected by busses to, at the bottom, a high-speed fiber
optic ring. In addition to describing the building block components of this
architecture, we identify the key performance issues associated with the design
and provide performance evaluation of these issues using tracedrive simulation
and measurements from the VMP.
-------------------------------------------------------------------------------
CSL-TR-89-383
SUPER-SCALAR PROCESSOR DESIGN
William M. Johnson
June 1989 147 pages.....$12.75
A super-scalar processor is one that is capable of sustaining an
4
instruction-execution rate of more than one instruction per clock cycle.
Maintaining this execution rate is primarily a problem of scheduling processor
resources (such as functional units) for high utilization. A number of
scheduling algorithms have been published, with wide-ranging claims of
performance over the single-instruction issue of a scalar processor. However,
a number of these claims are based on idealizations or on special-purpose
applications.
This study uses trace-driven simulation to evaluate many different super-scalar
hardware organizations. Super-scalar performance is limited primarily by
instruction-fetch inefficiencies caused by both branch delays and instruction
misalignment. Because of this instruction-fetch limitation, it is not
worthwhile to explore highly-concurrent execution hardware. Rather, it is more
appropriate to explore economical execution hardware that more closely matches
the instruction throughput provided by the instruction fetcher. This study
examines techniques for reducing the instruction-fetch inefficiencies and
explores the resulting hardware organizations.
This study concludes that a super-scalar processor can have nearly twice the
performance of a scalar processor, but that this requires that four major
hardware features: out-of-order execution, register renaming, branch
prediction, and a four-instruction decoder. These features are interdependent,
and removing any single feature reduces average performance by 18% or more.
However, there are many hardware simplifications that cause only a small
performance reduction.
-------------------------------------------------------------------------------
CSL-TR-89-384
COPY ELIMINATION IN SINGLE ASSIGNMENT LANGUAGES
K.Gopinath
July 1989 133 pages.....$11.85
Copy elimination is an important problem that has to be solved for an efficient
implementation of functional languages. Copies arise because these languages
lack the concepts of state and variable; hence updating of an object involves a
copy. Copies are also possible if proper targeting has not been carried out due
to its complexity. To eliminate copies, we introduce the notion of address
expressions and use abstract interpretation with its associated denotational
model to compute them.
Our work is in the context of a single assignment language called SAL and we
also consider optimizations such as eliminating copies in dependent iterations
that are important in single assignment languages. An operational model
employing reduction rules is developed to handle the constructs present in the
language. Using this model, it is possible to eliminate some difficult copies
in divide and conquer algorithms such as quicksort and bitonic sort. This
enables the implementation of these algorithms to be as efficient as imperative
languages.
An outline of the implementation is given along with the results of optimizing
some difficult programs with large number of updates. The overall optimization
system validates the approach taken: namely high-level optimization to find a
suitable mapping of names in the program into store followed by generation of
intermediate code and finally the generation of machine code by use of native
code generators.
-------------------------------------------------------------------------------
CSL-TR-89-385
A METHODOLOGY FOR MODELING INTERPROCESSOR TRAFFIC IN SHARED MEMORY
MULTIPROCESSORS
Josep Torrellas, Thierry Weil and John Hennessy
July 1989 29 pages.....$5.10
5
A methodology for modeling interprocessor traffic in a shared memory
multiprocessor environment is presented. The multiprocessor is modeled as an
open queueing network. The inputs are (1) architectural machine parameters
like interconnection network and memory hierarchy characteristics, cache
coherence protocol, etc, and (2) benchmark parameters like locality of
processor referencing behavior, amount of sharing, etc. The outputs of the
model are average values for processor efficiency metrics and congestion
measures in each machine resource. The modeling methodology is discussed
through an example, based on the the DASH machine, a hierarchical shared memory
multiprocessor being developed at Stanford. The benchmark characterization has
been determined through parallel application traces. The discussion of the
results gives insight on where the performance bottlenecks for that particular
architecture-benchmark set are.
6
Naomi Schulman
COMPUTER SYSTEMS LABORATORY
Stanford University
Stanford, CA 94305-4055
EM: Schulman@Sierra.Stanford.Edu
415-723-1430
Hours: M-Th, 8-1
ORDER FORM AND INVOICE
ALL ORDERS MUST BE PREPAID
To Order Reports: Print or type your name and address in the space provided.
Check or circle the report number(s) you wish to purchase, and indicate
hardcopy or microfiche.
CALIFORNIA RESIDENTS must add sales tax of their local county.
Return this order form with your check or money order made payable to Stanford
University.
FOREIGN ORDERS must be paid with an international money order or a check drawn
on a U.S. bank, payable in dollars. Foreign order payment should include $1.00
extra for each $15.00's worth or reports requested
If your employer requires an invoice for payment, please note that this form is
both an order form and an invoice. We cannot invoice separately.
(Name) _________________________________
(Address) _________________________________
_________________________________
_________________________________
_________________________________
(Email Address)_________________________________
Report # Hardcopy Microfiche Report # Hardcopy Microfiche
TR 376 $ 9.95 $3.00 TR 381 $ 8.75 $2.00
TR 377 $13.40 $3.00 TR 382 $ 4.45 $2.00
TR 378 $ 5.75 $2.00 TR 383 $12.75 $3.00
TR 379 $ 5.40 $2.00 TR 384 $11.85 $3.00
TR 380 $ 4.25 $2.00 TR 385 $ 5.10 $2.00
Subtotal ________
Your Local CA County tax ________
Total ________