[comp.doc.techreports] tr-input/stanford.16

leff@smu.UUCP (Laurence Leff) (10/17/89)
                                Naomi Schulman
                          COMPUTER SYSTEMS LABORATORY
                              STANFORD UNIVERSITY
                            Stanford, CA 94305-4055

                          (415) 273-1430 (M-Th, 8-1)

                            RECENT REPORTS & NOTES
                                   LIST #16
                                SEPTEMBER 1989

To order reports, see end of this announcement.

CSL-TR-89-376
INTEGRATING A STRUCTURING MECHANISM WITH A PROGRAM EDITOR
Lia Adams
APRIL 1989                                                  104 pages.....$9.95

Module  interconnections  are an important aspect of a large program.  The grid
is a powerful, language-independent structuring mechanism that  specifies  what
interconnections  are valid and documents what interconnections actually exist.
This thesis describes an editor  for  grids,  named  Ogre,  that  interactively
checks  the  validity  of  new  interconnections  and  finds inconsistencies in
existing interconnections when grid specifications  are  changed.    Additional
information is stored with a grid to support this incremental processing.
Ogre  maintains  the  language-independence  of  the  grid  by communicating by
messages with a language-oriented editor.  This loosely-coupled integration  of
module  and  inter-module  editing  allows  the  programmer  to  access  global
structural information in the context of an individual module while  supporting
multiple  source languages and module editors.  We have implemented a prototype
of Ogre using a simple editing environment for Modula-2 and a  grid-manipulator
that stores grids in text files.
Our  implementation  demonstrates  that the power of a grid can be brought to a
programmer by interactive access and  incremental  update.    The  incremental-
update  algorithms  described  in  this  thesis  make  interactive  grid access
feasible.  Loosely-coupled integration adds  grid  access  to  program  editing
without requiring a monolithic environment.

-------------------------------------------------------------------------------

CSL-TR-89-377
ACCESS CONTROL STRATEGIES FOR TOKEN PASSING
INTEGRATED SERVICES NETWORKS
Joseph Wai-Ming Pang
March 1989                                                 157 pages.....$13.40

Traditionally, the major application of local area networks (LANs) has been for
connecting  computers,  terminals  and  peripheral  devices.    Due  to   their
flexibility  and  cost-effectiveness,  LANs  are  vnow  being considered as the
communication means for a wide spectrum of new and exciting  applications,  for
example,  office  and factory automation.  However, these applications generate
new traffic types that have diverse statistical characteristics  and  stringent
delay   constraints.      Designed   primarily  for  bursty  computer  traffic,
conventional LAN multiple access protocols are not capable of  supporting  such
new traffic types.
Many  proposals have been made, with various degrees of success, for supporting
diverse applications on a single LAN.  A timer based protocol, adopted  by  the
IEEE  802.4  Token  Bus  Standard  and  the ANSI drafted Fiber Distributed Data
Interface (FDDI) Standard, appears to be a  very  attractive  solution.    This
protocol  is based on a simple and fully distributed design, with the intention
of supporting real-time traffic and  providing  multiple  classes  of  service.
                                       1


Nevertheless, due to a lack of understanding of the performance characteristics
of this  protocol,  there  do  not  exist  any  guidelines  for  selecting  the
parameters embedded in the protocol to meet given network constraints.
In  this  research, we examine the timer based protocol under heavy traffic.  A
bound on the token cycles is established and  a  procedure  for  computing  the
bandwidth  allocation  among different stations is given.  The analytic results
derived allow us to  convert  the  network  design  problem  into  a  tractable
optimization  problem.    Furthermore,  this  careful examination has led us to
consider the protocol as a feedback control mechanism for  limiting  the  token
cycles.  By adopting this novel view, we are able to generalize the idea behind
the timer based protocol to a very large and flexible class of  access  control
strategies.    The  salient features pertaining to the design of the new access
control strategies are explored and a design  example  is  presented.    As  an
additional  tool  for  performance evaluation and tuning, we have established a
very good approximation for the station throughputs of the new  access  control
strategies under general load.

-------------------------------------------------------------------------------


CSL-TR-89-378
ANALYSIS OF PARALLELISM AND DEADLOCKS
IN DISTRIBUTED-TIME LOGIC SIMULATION
Larry Soule and Anoop Gupta
March 1989                                                   39 pages.....$5.75

This  paper  explores the suitability of the Chandy-Misra algorithm for digital
logic simulation.  We  use  four  realistic  circuits  as  benchmarks  for  our
analysis,  with  one  of  them  being  the vector-unit controller for the Titan
supercomputer from Ardent.  Our results show that the average number  of  logic
elements  available for concurrent execution ranges from 10 to 111 for the four
circuits, with an overall average of 68.    Although  this  is  twice  as  much
parallelism  as  that obtained by traditional event-driven algorithms for these
circuits, we feel it is still too low.  One major factor  limiting  concurrency
is  the  large number of global synchronization points --- ``deadlocks'' in the
Chandy-Misra terminology --- that occur during execution.  Towards the goal  of
reducing  the  number  of deadlocks, the paper presents a classification of the
types of deadlocks that occur during digital logic simulation.  Four  different
types  are  identified and described intuitively in terms of circuit structure.
Using domain specific knowledge, the paper proposes methods for reducing  these
deadlock  occurrences.    For  one  of  the  benchmark circuits, the use of the
proposed techniques eliminated {\em all} deadlocks and  increased  the  average
parallelism  from  40 to 160.  We believe that the use of such domain knowledge
will make the Chandy-Misra algorithm significantly more effective than it would
be in its generic form.

-------------------------------------------------------------------------------

CSL-TR-89-379
TWO DIMENSIONAL PINPOINTING:
AN APPLICATION OF FORMAL SPECIFICATION TO DEBUGGING PACKAGES
David Luckham                                                34 pages.....$5.40
April 1989

New  methods  of  testing  and  debugging  software utilizing high-level formal
specifications are presented. These methods require a new generation of support
tools.  Such  tools  must  be  capable  of  automatically comparing the runtime
behavior of hierarchically structured software with high-level  specifications;
they  must  provide  information about inconsistencies in terms of abstractions
used in specifications.
This use of specifications has several advantages  over  present-day  debugging
methods:
                                       2


   1. the debugging problem itself is precisely defined by specifications;

   2. violations   of  specifications  are  detected  automatically,  thus
      eliminating the need to search output traces  and  recognize  errors
      manually;

   3. complex tests, such as tests for side-effects on global data, can be
      made easily;

   4. the  new  methods  are  independent  of  any  compiler  and  runtime
      environment for a programming language;

   5. they apply generally to hierarchically structured software --- e.g.,
      packages containing nested units,

   6. they also apply to other life-cycle processes such  as  analysis  of
      prototypes,   and   the   use   of   prototypes   to   build  formal
      specifications.

In this paper a particular process for locating errors  in  software  packages,
called  it  two  dimensional  pinpointing,  is  described.    Tests  consist of
sequences of package operations  (first  dimension).    Specifications  at  the
highest  (most  abstract) level are checked first. If violations occur then new
specifications are added if possible, otherwise checking of  specifications  at
the  next  lower  level  (second  dimension)  is activated.  Violation of a new
specification provides more information  about  the  error  which  reduces  the
region of program text under suspicion.  All interaction between programmer and
toolset is phrased in terms of the concepts used to specify the program.
Two dimensional pinpointing is presented using the Anna specification  language
for  Ada  programs.  Anna  and a toolset for comparing behavior of Ada programs
with Anna  specifications  is  described.    Pinpointing  techniques  are  then
illustrated  by  examples.  The examples involve debugging of Ada packages, for
which Anna provides a rich set of specification constructs.  The  Anna  toolset
supports  use  of  the methodology on the full Ada/Anna languages, and is being
engineered to commercial standards.

-------------------------------------------------------------------------------

CSL-TR-89-380
UNIDRAW: A FRAMEWORK FOR BUILDING DOMAIN-SPECIFIC GRAPHICAL EDITORS
John M. Vlissides and Mark A. Linton
July 1989                                                    16 pages.....$4.25

Unidraw is a  framework  for  creating  object-oriented  graphical  editors  in
domains  such  as  technical  and artistic drawing, music composition, and CAD.
The Unidraw architecture  simplifies  the  construction  of  these  editors  by
providing  programming  abstractions  that  are common across domains.  Unidraw
defines four basic abstractions:  components  encapsulate  the  appearance  and
semantics  of  objects  in  a  domain,  tools  support  direct  manipulation of
components, commands define operations on components  and  other  objects,  and
external  representations  define  the  mapping between components and the file
format generated  by  the  editor.    Unidraw  also  supports  multiple  views,
graphical  connectivity and confinement, and dataflow between components.  This
paper describes the Unidraw design, implementation issues, and three  prototype
domain-specific  editors  we  have  developed with Unidraw: a drawing editor, a
user interface builder, and a schematic capture system.  Experience indicates a
substantial reduction in implementation time compared with existing tools.

-------------------------------------------------------------------------------

CSL-TR-89-381
                                       3


THE DESIGN AND IMPLEMENTATION OF AN INCREMENTAL LINKER
Russell W. Quong
July 1989                                                    85 pages.....$8.75

As  computer  systems  grow  in  complexity  and power, controlling the cost of
software development and maintenance is increasingly important.    One  way  to
improve  programmer productivity is to reduce program turnaround time and hence
programmer wait time.   An  effective  technique  is  the  use  of  incremental
algorithms to speed up compilation, linking and debugging.
For incremental algorithms to be effective, typical changes to programs must be
small and localized.  To test this hypothesis we have obtained  a  quantitative
profile  of  how programs are changed across compiles and links.  We have found
that most changes to modules between compiles are small  and  most  changes  to
programs between links are localized to a few modules.
To  test  the  practicality  of  incremental  algorithms  we  have designed and
implemented an incremental linker, called \inclink.  Our  design  trades  space
for  time  by  saving  the  state  of  the linker in memory between links.  Our
linking algorithm processes only those modules that have changed between links.
The  resulting changes are rewritten into the current executable.  Link time is
proportional to the size of the change, not the size of the program.
To minimize changes to the executable file, we place changed modules over their
previous version if they fit.  To maximize this probability, we add extra space
(slop) to each module's allocation to absorb  size  increases.    If  a  module
overflows its allocation, then all parts of the program after the overflow must
be rewritten, causing link time to be proportional to the size of the  program.
We  have  simulated various slop management schemes to find those with the best
tradeoff between unused-slop-space and number-of-overflows.
Inclink  is  a  straight-forward  implementation  of  our  incremental  linking
algorithm.    Its link time is under a couple of seconds, regardless of program
size.  Inclink is an order of magnitude faster than a batch linker and is fully
compatible  with  existing  compilers.    Inclink  helps to increase programmer
productivity by reducing program turnaround time.

-------------------------------------------------------------------------------

CSL-TR-89-382
MULTI-LEVEL SHARED CACHING TECHNIQUES FOR SCALABILITY IN VMP-MC
David R. Cheriton, Hendrik A. Goosen and Patrick D. Boyle
June 1989                                                    19 pages.....$4.45

The problem of building a scalable shared memory multiprocessor can be  reduced
to  that  of  building  a  scalable  memory  hierarchy, assuming interprocessor
communication is handled by the memory system.  In this paper, we describe  the
VMP-MC   design,  a  distributed  parallel  multi-computer  based  on  the  VMP
multiprocessor design, that is intended to provide a set of building blocks for
configuring  machines  from  one to several thousand processors.  VMP-MC uses a
memory hierarchy based  on  shared  caches,  ranging  from  on-chip  caches  to
board-level  caches  connected  by busses to, at the bottom, a high-speed fiber
optic ring.  In addition to describing the building block  components  of  this
architecture, we identify the key performance issues associated with the design
and provide performance evaluation of these issues using tracedrive  simulation
and measurements from the VMP.

-------------------------------------------------------------------------------

CSL-TR-89-383
SUPER-SCALAR PROCESSOR DESIGN
William M. Johnson
June 1989                                                  147 pages.....$12.75

A   super-scalar   processor   is   one   that  is  capable  of  sustaining  an
                                       4


instruction-execution rate of  more  than  one  instruction  per  clock  cycle.
Maintaining  this execution rate is primarily a problem of scheduling processor
resources (such as functional  units)  for  high  utilization.    A  number  of
scheduling   algorithms  have  been  published,  with  wide-ranging  claims  of
performance over the single-instruction issue of a scalar processor.   However,
a  number  of  these  claims  are  based on idealizations or on special-purpose
applications.
This study uses trace-driven simulation to evaluate many different super-scalar
hardware  organizations.    Super-scalar  performance  is  limited primarily by
instruction-fetch inefficiencies caused by both branch delays  and  instruction
misalignment.    Because  of  this  instruction-fetch  limitation,  it  is  not
worthwhile to explore highly-concurrent execution hardware.  Rather, it is more
appropriate  to explore economical execution hardware that more closely matches
the instruction throughput provided by the instruction  fetcher.    This  study
examines  techniques  for  reducing  the  instruction-fetch  inefficiencies and
explores the resulting hardware organizations.
This study concludes that a super-scalar processor can have  nearly  twice  the
performance  of  a  scalar  processor,  but  that this requires that four major
hardware  features:   out-of-order   execution,   register   renaming,   branch
prediction, and a four-instruction decoder.  These features are interdependent,
and removing any single feature reduces average performance  by  18%  or  more.
However,  there  are  many  hardware  simplifications  that  cause only a small
performance reduction.

-------------------------------------------------------------------------------


CSL-TR-89-384
COPY ELIMINATION IN SINGLE ASSIGNMENT LANGUAGES
K.Gopinath
July 1989                                                  133 pages.....$11.85

Copy elimination is an important problem that has to be solved for an efficient
implementation  of  functional  languages. Copies arise because these languages
lack the concepts of state and variable; hence updating of an object involves a
copy. Copies are also possible if proper targeting has not been carried out due
to its complexity. To eliminate copies, we  introduce  the  notion  of  address
expressions  and  use  abstract interpretation with its associated denotational
model to compute them.
Our work is in the context of a single assignment language called  SAL  and  we
also  consider optimizations such as eliminating copies in dependent iterations
that are important in  single  assignment  languages.    An  operational  model
employing  reduction rules is developed to handle the constructs present in the
language.  Using this model, it is possible to eliminate some difficult  copies
in  divide  and  conquer  algorithms  such  as quicksort and bitonic sort. This
enables the implementation of these algorithms to be as efficient as imperative
languages.
An  outline of the implementation is given along with the results of optimizing
some difficult programs with large number of updates.  The overall optimization
system  validates  the approach taken: namely high-level optimization to find a
suitable mapping of names in the program into store followed by  generation  of
intermediate  code  and finally the generation of machine code by use of native
code generators.

-------------------------------------------------------------------------------

CSL-TR-89-385
A METHODOLOGY FOR MODELING INTERPROCESSOR TRAFFIC IN SHARED MEMORY
MULTIPROCESSORS
Josep Torrellas, Thierry Weil and John Hennessy
July 1989                                                    29 pages.....$5.10
                                       5


A  methodology  for  modeling  interprocessor  traffic  in  a   shared   memory
multiprocessor  environment  is presented.  The multiprocessor is modeled as an
open queueing network.  The inputs are  (1)  architectural  machine  parameters
like  interconnection  network  and  memory  hierarchy  characteristics,  cache
coherence  protocol,  etc,  and  (2)  benchmark  parameters  like  locality  of
processor  referencing  behavior,  amount  of sharing, etc.  The outputs of the
model are average  values  for  processor  efficiency  metrics  and  congestion
measures  in  each  machine  resource.    The modeling methodology is discussed
through an example, based on the the DASH machine, a hierarchical shared memory
multiprocessor  being developed at Stanford. The benchmark characterization has
been determined through parallel application traces.   The  discussion  of  the
results  gives insight on where the performance bottlenecks for that particular
architecture-benchmark set are.
                                       6


                                Naomi Schulman
                          COMPUTER SYSTEMS LABORATORY
                              Stanford University
                            Stanford, CA 94305-4055
                       EM: Schulman@Sierra.Stanford.Edu
                                 415-723-1430
                               Hours: M-Th, 8-1

                            ORDER FORM AND INVOICE

ALL ORDERS MUST BE PREPAID
To Order Reports: Print or type your name and address in the space provided.
Check or circle the  report  number(s)  you  wish  to  purchase,  and  indicate
hardcopy or microfiche.
CALIFORNIA RESIDENTS must add sales tax of their local county.
Return  this order form with your check or money order made payable to Stanford
University.
FOREIGN ORDERS must be paid with an international money order or a check  drawn
on  a U.S. bank, payable in dollars. Foreign order payment should include $1.00
extra for each $15.00's worth or reports requested
If your employer requires an invoice for payment, please note that this form is
both an order form and an invoice.  We cannot invoice separately.

          (Name)         _________________________________

          (Address)      _________________________________

                         _________________________________

                         _________________________________

                         _________________________________

          (Email Address)_________________________________                                                      





Report #  Hardcopy  Microfiche     Report #  Hardcopy  Microfiche
TR 376    $ 9.95       $3.00       TR 381     $ 8.75      $2.00          
TR 377    $13.40       $3.00       TR 382     $ 4.45      $2.00          
TR 378    $ 5.75       $2.00       TR 383     $12.75      $3.00          
TR 379    $ 5.40       $2.00       TR 384     $11.85      $3.00          
TR 380    $ 4.25       $2.00       TR 385     $ 5.10      $2.00          

               Subtotal                 ________
               Your Local CA County tax ________
               Total                    ________