[mod.techreports] tr-input/stanford5

leff@smu.CSNET.UUCP (02/21/87)
                                Naomi Schulman
                                 Publications
                          COMPUTER SYSTEMS LABORATORY
                              STANFORD UNIVERSITY
                              Stanford, CA 94305

                            RECENT REPORTS & NOTES
                                    LIST #8
                                SEPTEMBER, 1986


                To order reports see end of this announcement.


                                   ABSTRACTS

CSL T.R. 86-292
On the Performance Effects of Station Locations and Access Protocol
Parameters in Ethernet Networks
Timothy A. Gonsalves and Fouad A. Tobagi
January 1986                                                 31 pages.....$4.05
The  Ethernet  has  come  into  widespread use for interconnection of computers
within a local area such as an office or campus.  Stations on the  network  may
be  distributed evenly or may occur in clusters.  We present a simulation study
of several distributions of stations on  a  linear  bus  Ethernet.    Aggregate
performance  is  shown  to  depend on the distribution of stations.  Individual
station performance varies with  the  location  of  the  station.    Unbalanced
distributions  can  lead  to  large  performance differences between individual
stations with isolated stations achieving relatively poor performance  compared
to the average.  We next examine the effects of access protocol parameters such
as  the  number  of  buffers  per  station and the retransmission algorithm.  A
modification of  the  standard  retransmission  algorithm  is  presented  which
enables  a higher throughput to be achieved at high load.  Finally, our results
are compared to the predictions of theoretical models and the applicability  of
the models to finite population Ethernets is examined.  
-------------------------------------------------------------------------------
CSL T.R. 86-293
A Decentralized Naming Facility
David R. Cheritan & Timothy P. Mann
February 1986                                                28 pages.....$3.90
A  key  component  in distributed computer systems is the naming facility:  the
means by which high-level names are bound to objects and by which  objects  are
located  given  only  their names.  We describe the design, implementation, and
performance of a decentralized naming facility, in which the global name  space
and  name mapping mechanism are implemented by a set of cooperating peers, with
no  central  authority.    Decentralization  is   shown   to   lend   increased
extensibility  and  reliability  to  the design.  Efficiency in name mapping is
achieved through specialized caching techniques.  
-------------------------------------------------------------------------------
CSL T.R. 86-294
Software-Controlled Caches in the VMP Multiprocessor
David R. Cheriton, Gert A. Slavenburg & Patrick D. Boyle
February 1986                                                12 pages.....$3.10
VMP is an experimental multiprocessor that follows the familiar basic design of
multiple processors, each with a cache, connected by a  shared  bus  to  global
memory.    Each processor has a synchronous, virtually addressed, single master
connection to its cache, providing very high memory bandwidth.    An  unusually
large cache page size and fast sequential memory copy hardware make it feasible
for  cache  misses  to  be  handled in software, analogously to the handling of
virtual memory page faults.  Hardware support for cache consistency is  limited
to  a  simple  state machine that monitors the bus and interrupts the processor
when a cache consistency action is required.
In this paper, we show how the VMP design provides the  high  memory  bandwidth
required  by  modern  high-performance  processors  with  a minimum of hardware
complexity and cost.  We also describe  simple  solutions  to  the  consistency
problems  associated  with  virtually  addressed  caches.    Simulation results
indicate that the design achieves good performance providing data contention is
not excessive.  
-------------------------------------------------------------------------------
CSL T.R. 86-295
Fast Symbolic Layout Translation for Custom VLSI Integrated Circuits
Peter A. Eichenberger
April 1986                                                  122 pages.....$8.60
Symbolic layout tools have enormous potential for easing  the  task  of  custom
integrated circuit layout by allowing the designer to work at a higher level of
abstraction,   hiding   some   of   the   complexity  of  full  custom  design.
Unfortunately, the practicality of symbolic layout tools has been  limited  for
several  reasons.  Most important, the CPU resources required to compute a full
size integrated circuit from a symbolic description  are  prohibitively  large;
this  problem has been avoided either by restricting the range of applicability
to a narrow class of integrated circuits, or by  using  a  simpler  translation
algorithm,  which  reduces  the quality of the output.  Other problems include:
producing poor quality layouts, insufficient  user  control  of  the  generated
output,  and  inability  to  cooperate with other layout tools.  There problems
make symbolic design of complete chips difficult.
This thesis presents an approach to the symbolic layout problems that  produces
high-quality  layout  for  an arbitrary circuit without requiring excessive CPU
time.  The key to this approach includes the use of hierarchy  to  improve  CPU
time,  the  use  of wire-length minimization to improve quality, a good balance
between optimization of the layout and optimization of CPU time, and  a  smooth
transition  over varying degrees of automation.  The result has been a symbolic
layout tool that has been successfully used to lay out  several  chips  from  a
design-rule-independent input.  
-------------------------------------------------------------------------------
CSL T.R. 86-296
Processor Architecture and Cache Performance
by Chad L. Mitchell
June 1986                                                   147 pages.....$9.85
Previous  researchers  have  compared  and  contrasted  the  effects of various
features of processor architecture.   Others  have  studied  Instruction  cache
performance.  In  this  study,  a  methodology  has been developed which allows
simu-  lation  of  different  processors  without  the  necessity  of  creating
interpreters  and  compilers  for each architecture simulated.  The methodology
was applied to study the effects of processor architecture on instruction cache
performance.  New results are provided about the relationship between processor
architecture and instruction traffic and cache performance.  Among the  results
is  the  general  observation  that  relative  instruction  traffic differences
between architectures are about the same with very  large  caches  as  with  no
cache  and  that  intermediate  sized  caches  tend to accentuate such relative
differences.  
-------------------------------------------------------------------------------

CSL T.R. 86-297
Scan Line Access Memories for High Speed Rasterization
by Stefan G. Demetrescu
June 1986                                                   149 pages.....$9.95
Conventional architectures which produce solid-color computer graphics are slow
and/or expensive.  To solve this problem,  a  novel  kind  of  VLSI  integrated
circuit  called  a  Scan  Line  Access  Memory  (SLAM) has been developed which
multiplies the time-performance of an inexpensive graphics system by factors of
100 to 1000 without significantly increasing its complexity and hence its cost.
Because  of  the  major  performance  improvement,  SLAM  systems  promise   to
significantly expand the use of real-time computer graphics.
A  SLAM consists of a conventional dense semiconductor dynamic memory augmented
with highly parallel, but simple, on-chip processors designed specifically  for
fast  computer  graphics  rasterization.   A graphics system consisting of SLAM
smart memory chips has been built and tested.  An  arbitrary  horizontal  pixel
line  segment  can be filled in one memory access.  As a result, the speed with
which polygons are rasterized is improved by several orders of magnitude.  SLAM
systems can also  rasterize  vectors  and  bit-mapped  characters  effectively,
unlike many other proposed rasterization architectures.
The SLAM system is compared with previously suggested rasterization techniques.
Due  to  its  highly  parallel  architecture, this versatile graphics system is
shown to be capable of achieving performance comparable to the  "processor  per
pixel"  architectures  while avoiding the tremendous circuit density (and hence
cost) penalty incurred by such approaches.  It thus becomes practical to  build
a SLAM-based, high performance, real-time graphics system with a complexity and
cost comparable to that of a conventional inexpensive image frame buffer.  
-------------------------------------------------------------------------------

CSL T.R. 86-298
Parallel Program Behavior - Specification & Abstraction Using BDL
Jerry C. Yan
August 1986                                                  28 pages.....$3.90
This  paper describes the process of abstracting program behavior from programs
formulated in the Actor paradigm.  The transformed program (or 'program model')
is described by a behavior description language (or BDL).  A simulator for  BDL
has  been  constructed  to  investigate  the performance of various programs on
different  multi-processor  architectures.    Simulating  BDL  is   much   more
economical  than  an  instruction  level  emulation  while  program behavior is
realistically preserved.
A  programming  language  enables  (but  also  constrains)  the  programmer  to
formulate  his/her  problem  in  a  particular computation paradigm.  The Actor
programming paradigm (and  the  Act   languages)  stands  out  among  the  many
                                   n
proposed for concurrent computing because of three reasons:

   1. Concurrency may be exploited explicitly as well as implicitly;
   2. No assumptions were made about the architecture of the hardware;
   3. Most  parallel  computation  paradigm  previously  proposed  can  be
      expressed as computations involving Actors.

In fact, BDL  was  invented  to  facilitate  a  research  project  in  progress
- studying placement strategies for Actors on loosely-coupled multi-processors.
There were two experimental obstacles:

   1. Simulation at the instruction level is precise but too costly;
   2. Program  behavior  (such  as  message passing pattern and CPU usage)
      cannot be preserved fully using existing stochastic models.

The invention of BDL  has  made  the  "hypothesis-verification  cycle"  in  the
research process much faster and more pleasant.  
-------------------------------------------------------------------------------

CSL T.R. 299
Designing for Ada Reuse: A Case Study
by Geoffrey Mendal
June l986                                                    19 pages.....$3.45
This  paper describes the design of a generic sorting package, showing that Ada
reuse can be accomplished during, and  even  prior  to,  coding.    This  paper
identifies  some  key  technical  issues in reuse.  These issues are of general
interest  to  a  software  engineer,  as  they  focus  on  program  design  and
implementation.
The  conclusions  are  based  on  a  generic  program unit which includes array
sorting algorithms adapted and generalized to work as a generic  program  unit.
Any  array  component  type  excepting  limited  types  may be directly sorted;
limited types may be indirectly  sorted.    Examples  of  direct  and  indirect
sorting will be presented.  This package may also be used in conjunction with a
merging  package  to  sort  data residing on external memory devices, in effect
allowing files of arbitrary length  to  be  sorted  by  means  of  an  extended
operation.
In  less  tractable  situations,  reusable Ada promotes a general solution that
uses an abstraction to isolate classic limitations.  For example,  a  user  may
specify  one's  own  linear  order,  so  as  not to be limited to ascending and
descending orders as in conventional sorting routines.  
-------------------------------------------------------------------------------

CSL T.R. 86-300
An Overview of the MIPS-X-MP Project
John L. Hennessy & Mark A. Horowitz
April 1986                                                   28 pages.....$3.90
MIPS-X-MP   is  a  research  project  whose  end  goal  is  to  build  a  small
(workstation- sized) multiprocessor with a total throughput  of  100-200  mips.
The  archi-  tectural  approach  uses a small number (tens) of high performance
RISC-based microprocessors (10-20 mips each).  The multiprocessor  architecture
uses  software-controlled cache coherency to allow cooperation among processors
without sacrificing performance of the processors.    Software  technology  for
automatically   decomposing   problems  to  allow  the  entire  machine  to  be
concentrated on a single problem is a key component  of  the  research.    This
report  surveys  the  four key components of the project: high performance VLSI
processor  architecture  and  design,  multiprocessor  architectural   studies,
multiprocessor programming systems, and optimizing compiler technology.  
-------------------------------------------------------------------------------

CSL T.R. 86-301
The Complete Transformation Methodology for Sequential Runtime Checking
of an Anna Subset
Sriram Sankar & David Rosenblum
August 1986                                                  65 pages.....$5.75
We  present  in  this  report  a  complete  description  of  a  methodology for
transformation of Anna (Annotated Ada) programs to executable self-checking Ada
programs.  The methodology covers a subset of Anna which allows  annotation  of
scalar types and objects.  The allowed annotations include subtype annotations,
subprogram annotations, result annotations, object annotations, out annotations
and statement annotations.  Except for package state expressions and quantified
expressions,  the  full  expression  language of Anna is allowed in the subset.
The  transformation  of  annotations  to  executable  checking   functions   is
thoroughly illustrated through informal textual description, universal checking
function  templates  and several transformation examples.  We also describe the
transformer and related software tools used to transform  Anna  programs.    In
conclusion,  we  describe  validation  of  the  transformer and some methods of
making the transformation and runtime checking processes more efficient.  
-------------------------------------------------------------------------------

                                 Publications

                          COMPUTER SYSTEMS LABORATORY

                              Stanford University

                              Stanford, CA 94305

                                 415-723-1430

                                  ORDER FORM

To Order Reports: Print or type your name and address in the space provided.
Check or circle the  report(s)  you  wish  to  purchase,  whether  hardcopy  or
microfiche.    All orders must be PREPAID.  CALIFORNIA RESIDENTS must add sales
tax of their local county.  Return this order form with  your  check  or  money
order  made payable to Stanford University. FOREIGN ORDERS must be paid with an
international money order or a check drawn on a U.S. bank, payable in dollars.
Please type or print your name and complete address:
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
Report #  Hardcopy  Microfiche     Report #  Hardcopy  Microfiche
--------  --------  ----------     --------  --------  ----------
TR 292      $4.05     $2.00        TR 297      $9.95     $3.00
TR 293      $3.90     $2.00        TR 298      $3.90     $2.00
TR 294      $3.10     $2.00        TR 299      $3.35     $2.00
TR 295      $8.60     $3.00        TR 300      $3.90     $2.00
TR 296      $9.90     $3.00        TR 301      $5.75     $2.00


                          Subtotal __________
          Your Local County CA tax __________
                             Total __________
-------