leff@smu.CSNET.UUCP (02/21/87)
Naomi Schulman
Publications
COMPUTER SYSTEMS LABORATORY
STANFORD UNIVERSITY
Stanford, CA 94305
RECENT REPORTS & NOTES
LIST #8
SEPTEMBER, 1986
To order reports see end of this announcement.
ABSTRACTS
CSL T.R. 86-292
On the Performance Effects of Station Locations and Access Protocol
Parameters in Ethernet Networks
Timothy A. Gonsalves and Fouad A. Tobagi
January 1986 31 pages.....$4.05
The Ethernet has come into widespread use for interconnection of computers
within a local area such as an office or campus. Stations on the network may
be distributed evenly or may occur in clusters. We present a simulation study
of several distributions of stations on a linear bus Ethernet. Aggregate
performance is shown to depend on the distribution of stations. Individual
station performance varies with the location of the station. Unbalanced
distributions can lead to large performance differences between individual
stations with isolated stations achieving relatively poor performance compared
to the average. We next examine the effects of access protocol parameters such
as the number of buffers per station and the retransmission algorithm. A
modification of the standard retransmission algorithm is presented which
enables a higher throughput to be achieved at high load. Finally, our results
are compared to the predictions of theoretical models and the applicability of
the models to finite population Ethernets is examined.
-------------------------------------------------------------------------------
CSL T.R. 86-293
A Decentralized Naming Facility
David R. Cheritan & Timothy P. Mann
February 1986 28 pages.....$3.90
A key component in distributed computer systems is the naming facility: the
means by which high-level names are bound to objects and by which objects are
located given only their names. We describe the design, implementation, and
performance of a decentralized naming facility, in which the global name space
and name mapping mechanism are implemented by a set of cooperating peers, with
no central authority. Decentralization is shown to lend increased
extensibility and reliability to the design. Efficiency in name mapping is
achieved through specialized caching techniques.
-------------------------------------------------------------------------------
CSL T.R. 86-294
Software-Controlled Caches in the VMP Multiprocessor
David R. Cheriton, Gert A. Slavenburg & Patrick D. Boyle
February 1986 12 pages.....$3.10
VMP is an experimental multiprocessor that follows the familiar basic design of
multiple processors, each with a cache, connected by a shared bus to global
memory. Each processor has a synchronous, virtually addressed, single master
connection to its cache, providing very high memory bandwidth. An unusually
large cache page size and fast sequential memory copy hardware make it feasible
for cache misses to be handled in software, analogously to the handling of
virtual memory page faults. Hardware support for cache consistency is limited
to a simple state machine that monitors the bus and interrupts the processor
when a cache consistency action is required.
In this paper, we show how the VMP design provides the high memory bandwidth
required by modern high-performance processors with a minimum of hardware
complexity and cost. We also describe simple solutions to the consistency
problems associated with virtually addressed caches. Simulation results
indicate that the design achieves good performance providing data contention is
not excessive.
-------------------------------------------------------------------------------
CSL T.R. 86-295
Fast Symbolic Layout Translation for Custom VLSI Integrated Circuits
Peter A. Eichenberger
April 1986 122 pages.....$8.60
Symbolic layout tools have enormous potential for easing the task of custom
integrated circuit layout by allowing the designer to work at a higher level of
abstraction, hiding some of the complexity of full custom design.
Unfortunately, the practicality of symbolic layout tools has been limited for
several reasons. Most important, the CPU resources required to compute a full
size integrated circuit from a symbolic description are prohibitively large;
this problem has been avoided either by restricting the range of applicability
to a narrow class of integrated circuits, or by using a simpler translation
algorithm, which reduces the quality of the output. Other problems include:
producing poor quality layouts, insufficient user control of the generated
output, and inability to cooperate with other layout tools. There problems
make symbolic design of complete chips difficult.
This thesis presents an approach to the symbolic layout problems that produces
high-quality layout for an arbitrary circuit without requiring excessive CPU
time. The key to this approach includes the use of hierarchy to improve CPU
time, the use of wire-length minimization to improve quality, a good balance
between optimization of the layout and optimization of CPU time, and a smooth
transition over varying degrees of automation. The result has been a symbolic
layout tool that has been successfully used to lay out several chips from a
design-rule-independent input.
-------------------------------------------------------------------------------
CSL T.R. 86-296
Processor Architecture and Cache Performance
by Chad L. Mitchell
June 1986 147 pages.....$9.85
Previous researchers have compared and contrasted the effects of various
features of processor architecture. Others have studied Instruction cache
performance. In this study, a methodology has been developed which allows
simu- lation of different processors without the necessity of creating
interpreters and compilers for each architecture simulated. The methodology
was applied to study the effects of processor architecture on instruction cache
performance. New results are provided about the relationship between processor
architecture and instruction traffic and cache performance. Among the results
is the general observation that relative instruction traffic differences
between architectures are about the same with very large caches as with no
cache and that intermediate sized caches tend to accentuate such relative
differences.
-------------------------------------------------------------------------------
CSL T.R. 86-297
Scan Line Access Memories for High Speed Rasterization
by Stefan G. Demetrescu
June 1986 149 pages.....$9.95
Conventional architectures which produce solid-color computer graphics are slow
and/or expensive. To solve this problem, a novel kind of VLSI integrated
circuit called a Scan Line Access Memory (SLAM) has been developed which
multiplies the time-performance of an inexpensive graphics system by factors of
100 to 1000 without significantly increasing its complexity and hence its cost.
Because of the major performance improvement, SLAM systems promise to
significantly expand the use of real-time computer graphics.
A SLAM consists of a conventional dense semiconductor dynamic memory augmented
with highly parallel, but simple, on-chip processors designed specifically for
fast computer graphics rasterization. A graphics system consisting of SLAM
smart memory chips has been built and tested. An arbitrary horizontal pixel
line segment can be filled in one memory access. As a result, the speed with
which polygons are rasterized is improved by several orders of magnitude. SLAM
systems can also rasterize vectors and bit-mapped characters effectively,
unlike many other proposed rasterization architectures.
The SLAM system is compared with previously suggested rasterization techniques.
Due to its highly parallel architecture, this versatile graphics system is
shown to be capable of achieving performance comparable to the "processor per
pixel" architectures while avoiding the tremendous circuit density (and hence
cost) penalty incurred by such approaches. It thus becomes practical to build
a SLAM-based, high performance, real-time graphics system with a complexity and
cost comparable to that of a conventional inexpensive image frame buffer.
-------------------------------------------------------------------------------
CSL T.R. 86-298
Parallel Program Behavior - Specification & Abstraction Using BDL
Jerry C. Yan
August 1986 28 pages.....$3.90
This paper describes the process of abstracting program behavior from programs
formulated in the Actor paradigm. The transformed program (or 'program model')
is described by a behavior description language (or BDL). A simulator for BDL
has been constructed to investigate the performance of various programs on
different multi-processor architectures. Simulating BDL is much more
economical than an instruction level emulation while program behavior is
realistically preserved.
A programming language enables (but also constrains) the programmer to
formulate his/her problem in a particular computation paradigm. The Actor
programming paradigm (and the Act languages) stands out among the many
n
proposed for concurrent computing because of three reasons:
1. Concurrency may be exploited explicitly as well as implicitly;
2. No assumptions were made about the architecture of the hardware;
3. Most parallel computation paradigm previously proposed can be
expressed as computations involving Actors.
In fact, BDL was invented to facilitate a research project in progress
- studying placement strategies for Actors on loosely-coupled multi-processors.
There were two experimental obstacles:
1. Simulation at the instruction level is precise but too costly;
2. Program behavior (such as message passing pattern and CPU usage)
cannot be preserved fully using existing stochastic models.
The invention of BDL has made the "hypothesis-verification cycle" in the
research process much faster and more pleasant.
-------------------------------------------------------------------------------
CSL T.R. 299
Designing for Ada Reuse: A Case Study
by Geoffrey Mendal
June l986 19 pages.....$3.45
This paper describes the design of a generic sorting package, showing that Ada
reuse can be accomplished during, and even prior to, coding. This paper
identifies some key technical issues in reuse. These issues are of general
interest to a software engineer, as they focus on program design and
implementation.
The conclusions are based on a generic program unit which includes array
sorting algorithms adapted and generalized to work as a generic program unit.
Any array component type excepting limited types may be directly sorted;
limited types may be indirectly sorted. Examples of direct and indirect
sorting will be presented. This package may also be used in conjunction with a
merging package to sort data residing on external memory devices, in effect
allowing files of arbitrary length to be sorted by means of an extended
operation.
In less tractable situations, reusable Ada promotes a general solution that
uses an abstraction to isolate classic limitations. For example, a user may
specify one's own linear order, so as not to be limited to ascending and
descending orders as in conventional sorting routines.
-------------------------------------------------------------------------------
CSL T.R. 86-300
An Overview of the MIPS-X-MP Project
John L. Hennessy & Mark A. Horowitz
April 1986 28 pages.....$3.90
MIPS-X-MP is a research project whose end goal is to build a small
(workstation- sized) multiprocessor with a total throughput of 100-200 mips.
The archi- tectural approach uses a small number (tens) of high performance
RISC-based microprocessors (10-20 mips each). The multiprocessor architecture
uses software-controlled cache coherency to allow cooperation among processors
without sacrificing performance of the processors. Software technology for
automatically decomposing problems to allow the entire machine to be
concentrated on a single problem is a key component of the research. This
report surveys the four key components of the project: high performance VLSI
processor architecture and design, multiprocessor architectural studies,
multiprocessor programming systems, and optimizing compiler technology.
-------------------------------------------------------------------------------
CSL T.R. 86-301
The Complete Transformation Methodology for Sequential Runtime Checking
of an Anna Subset
Sriram Sankar & David Rosenblum
August 1986 65 pages.....$5.75
We present in this report a complete description of a methodology for
transformation of Anna (Annotated Ada) programs to executable self-checking Ada
programs. The methodology covers a subset of Anna which allows annotation of
scalar types and objects. The allowed annotations include subtype annotations,
subprogram annotations, result annotations, object annotations, out annotations
and statement annotations. Except for package state expressions and quantified
expressions, the full expression language of Anna is allowed in the subset.
The transformation of annotations to executable checking functions is
thoroughly illustrated through informal textual description, universal checking
function templates and several transformation examples. We also describe the
transformer and related software tools used to transform Anna programs. In
conclusion, we describe validation of the transformer and some methods of
making the transformation and runtime checking processes more efficient.
-------------------------------------------------------------------------------
Publications
COMPUTER SYSTEMS LABORATORY
Stanford University
Stanford, CA 94305
415-723-1430
ORDER FORM
To Order Reports: Print or type your name and address in the space provided.
Check or circle the report(s) you wish to purchase, whether hardcopy or
microfiche. All orders must be PREPAID. CALIFORNIA RESIDENTS must add sales
tax of their local county. Return this order form with your check or money
order made payable to Stanford University. FOREIGN ORDERS must be paid with an
international money order or a check drawn on a U.S. bank, payable in dollars.
Please type or print your name and complete address:
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
Report # Hardcopy Microfiche Report # Hardcopy Microfiche
-------- -------- ---------- -------- -------- ----------
TR 292 $4.05 $2.00 TR 297 $9.95 $3.00
TR 293 $3.90 $2.00 TR 298 $3.90 $2.00
TR 294 $3.10 $2.00 TR 299 $3.35 $2.00
TR 295 $8.60 $3.00 TR 300 $3.90 $2.00
TR 296 $9.90 $3.00 TR 301 $5.75 $2.00
Subtotal __________
Your Local County CA tax __________
Total __________
-------