[comp.doc.techreports] Stanford University TR list, 9/90

MFLLL002@VE.BOGECN.EDU (03/22/91)
From: Naomi Schulman
schulman@shasta.Stanford.EDU

Return-Path: <schulman@shasta.Stanford.EDU>
Date: Wed, 13 Mar 1991 13:23:43 PST





                                Naomi Schulman
                          COMPUTER SYSTEMS LABORATORY
                              Stanford University
                            Stanford, CA 94305-4055
                     e-mail: schulman@sierra.stanford.edu
                                (415) 723-1430
                             Hours: M-Th 8-1 (PST)

                              RECENT PUBLICATIONS
                                   LIST #19
                                SEPTEMBER 1990

                To order reports, see Order Form on last page.

CSL-TR-90-425
CONCURRENT RUNTIME MONITORING OF FORMALLY SPECIFIED PROGRAMS
Manas Mandal and Sriram Sankar
April 1990                                                   31 pages.....$5.48

This   paper  describes  an  application  of  formal  specifications  after  an
executable  program  has  been  constructed.    We  describe  how  high   level
specifications can be utilized to monitor critical aspects of the behavior of a
program continuously while it  is  executing.    This  methodology  provides  a
capability  to  distribute  the monitoring of specifications on multi-processor
hardware platforms to meet practical time constraints.
Typically, runtime checking of formal  specifications  involves  a  significant
time  penalty which makes it impractical during normal production opertion of a
program.  In previous  research,  runtime  checking  has  been  applied  during
testing and debugging of software, but not on a permanent basis.
Crucial  to  our  current  mthodology  is the use of multi-processor machines -
hence runtime monitoring can be performed concurrently on different processors.
We describe techniques for distributing checks onto different processors.
To control the degree of concurrency, we introduce checkpoints - a point in the
program beyond which execution cannot proceed until the specified  checks  have
been  completed.   Error reporting and recovery in a multiprocessor environment
is complicated and there are various techniques of handling this.  We  describe
a few of these techniques in this paper.
An  implementation  of this methodology for the Anna specification language for
Ada  programs  is  described.    Results  of  experiments  conducted  on   this
implementation using a 12-processor Sequent Symmetry demonstrate that permanent
concurrent monitoring of programs based  on  formal  specifications  is  indeed
feasible.

-------------------------------------------------------------------------------

CSL-TR-90-426
A VLSI ARCHITECTURE FOR THE FCHC ISOMETRIC LATTICE GAS MODEL
Fung F. Lee, Michael J. Flynn, and Martin Morf
April 1990                                                   28 pages.....$5.24

Lattice  gas  models  are  cellular  automata  used for the simulation of fluid
dynamics. This paper addresses the design issues of  a  lattice  gas  collision
rule  processor  for  the four-dimensional FCHC isometric lattice gas model.  A
novel VLSI architecture based on an  optimized  version  of  Henon's  isometric
algorithm is proposed.  One of the key concepts behind this architecture is the
permutation group representation of the isometry group of the lattice.
In contrast to the straightforward table lookup approach which would  take  4.5
billion  bits  to  implement  this  set  of  collision  rules,  the size of our
processor is only about 5000 gates.   With  a  reasonable  number  of  pipeline
stages,  the  processor  can  deliver  one  result  per cycle with a cycle time
comparable to or less than that of a common commercial DRAM.



-------------------------------------------------------------------------------

CSL-TR-90-427
GENERALIZED GRAPHICAL OBJECT EDITING
John M. Vlissides
June 1990                                                  166 pages.....$16.28

Many editors use the graphics capabilities of personal workstations to  provide
a  visual  editing environment.  Such editors present graphical representations
of familiar objects and  allow  the  user  to  manipulate  the  representations
directly.  This style of interaction is usually more intuitive to the user than
typing statements in a command language.   However,  implementing  a  graphical
object  editor  has  been  a difficult undertaking.  Though many packages exist
that aid in the construction of graphical user interfaces,  none  are  designed
specifically  for  building  graphical  object  editors.    This is significant
because there is a substantial semantic gap between general user interfaces and
the  functionality  of  graphical  object editors.  For example, user interface
packages usually provide buttons, scroll bars, and ways to assemble  them,  but
they  do  not  offer  primitives  for  building  drawing  editors  that produce
PostScript or schematic capture systems that produce  netlists.    Higher-level
abstractions are needed to make such applications easier to develop.
Unidraw  is  a  framework  for  creating  object-oriented  graphical editors in
domains such as technical and artistic drawing, music composition, and  circuit
design.  The  Unidraw architecture simplifies the construction of these editors
by providing programming abstractions that are common across domains.   Unidraw
defines  four  basic  abstractions: "components" encapsulate the appearance and
semantics of objects in  a  domain,  "tools"  support  direct  manipulation  of
components,  "commands"  define operations on components and other objects, and
"external representations" define the mapping betweencomponents  and  the  file
format  generated  by  the  editor.    Unidraw  also  supports  multiple views,
graphical connectivity and confinement, and dataflow between components.   This
thesis   describes   the  Unidraw  design,  implementation  issues,  and  three
experimental domain-specific editors we have developed with Unidraw:  a drawing
editor,  a user interface builder, and a schematic capture system.  Our results
indicate a substantial reduction in implementation  time  and  effort  compared
with existing tools.

-------------------------------------------------------------------------------

CSL-TR-90-428
SUB-NANOSECOND ARITHMETIC
Michael J. Flynn, Giovanni De Micheli, Robert Dutton, Bruce Wooley,
and Fabian Pease
May 1990                                                     26 pages.....$5.08

The  SNAP  (Stanford  Nanosecond  Arithmetic  Processor) project is targeted at
realizing an arithmetic processor with performance approximately  an  order  of
magnitude  faster  than currently available technology. The realization of SNAP
is predicated on an interdisciplinary approach and effort spanning research  in
algorithms,  data  representation,  CAD,  circuits  and devices, and packaging.
SNAP is visualized as  an  arithmetic  coprocessor  implemented  on  an  active
substrate  containing  several  chips,  each  of  which  realize  a  particular
arithmetic function.

-------------------------------------------------------------------------------

CSL-TR-90-429
MULTIPROCESSOR SMALLTALK: IMPLEMENTATION, PERFORMANCE, AND ANALYSIS
Joseph Ira Pallas
June 1990                                                   140 pages...$14.20



Multiprocessor Smalltalk demonstrates the value of object-oriented  programming
on amultiprocessor.  Its implementation and analysis shed light on three areas:
concurrent  programming  in  an  object-oriented   language   without   special
extensions,  implementation  techniques  for  adapting  to multiprocessors, and
performance factors in the resulting system.
Multiprocessor  Smalltalk's  performance  shows   that   the   combination   of
multiprocessing  and  object-oriented  programming  can be effective:  speedups
(relative to the original serial version) exceed 2.0 for five processors on all
the  benchmarks;  the  median  efficiency  is  48%.   Analysis shows both where
performance is lost and how to improve and generalize the experimental results.
Changes  in  the  interpreter  to support concurrency add at most 12% overhead;
better access to per-process variables could eliminate much of that.    Changes
in  the  user  code  to  express  concurrency add as much as 70% overhead; this
overhead could be reduced to 54% if blocks (lambda expressions) were reentrant.
Performance is also lost when the program cannot keep all five processors busy.
Idle time in the interpreter (up to  51%  overhead,  excluding  a  pathological
case)  could be reduced with a parallel garbage collector to 10%.  Idle time in
user code (up to 35% overhead) remains the programmer's responsibility.
We can identify the key  characteristics  that  make  Multiprocessor  Smalltalk
successful.    The  Smalltalk  language  allows  us to build concurrent control
structures using lambda expressions without extending the language. Inexpensive
processes    and    efficient    garbage    collection    are   also   crucial.
Hardware/operating-system support for shared memory, per-process variables, and
inexpensive  synchronization are essential to the implementation.  Given these,
object-oriented languages and multiprocessors are a good match.

-------------------------------------------------------------------------------

CSL-TR-90-430 (also numbered STAN-CS-90-1318)
TECHNIQUES FOR IMPROVING THE PERFORMANCE OF SPARSE MATRIX
FACTORIZATION ON MULTIPROCESSOR WORKSTATIONS
Edward Rothberg and Anoop Gupta
June 1990                                                    16 pages.....$4.28

In this paper we look at the problem  of  factoring  large  sparse  systems  of
equations   on  high-performance  multiprocessor  workstations.    While  these
multiprocessor workstations are  capable  of  very  high  peak  floating  point
computation  rates,  most  existing  sparse  factorization codes achieve only a
small fraction of this potential.  A major  limiting  factor  is  the  cost  of
memory accesses performed during the factorization.  In this paper, we describe
a parallel factorization code which utilizes the supernodal  structure  of  the
matrix to reduce the number of memory references.  We also propose enhancements
that significantly reduce the overall cache miss rate.  The result  is  greatly
increased  factorization  performance.    We  present experimental results from
executions of our codes on the Silicon Graphics 4D/380 multiprocessor.    Using
eight  processors,  we  find  that  the  supernodal  parallel  code  achieves a
computation rate of approximately 40 MFLOPS when factoring a range of benchmark
matrices.  This is more than twice as fast as the parallel nodal code developed
at the Oak Ridge National Laboratory running on the SGI 4D/380.

-------------------------------------------------------------------------------

CSL-TR-90-431
LATENCY AND THROUGHPUT TRADEOFFS IN SELF-TIMED SPEED-INDEPENDENT
PIPELINES AND RINGS
Ted Williams
June 1990                                                     29 pages....$5.32

Asynchronous pipelines control the flow of tokens through a sequence of logical
stages   based  on  the  status  of  local  completion  detectors.    As  in  a
synchronously clocked circuit, the design of self-timed pipelines can trade off



between  achieving  low  latencyn  and high throughput. However, there are more
degress of freedom because of the variances  in  specific  latch  and  function
block styles, and the possibility of varying both the number of latches between
function blocks and their connections to the completion detectors. This  report
demonstrates  the utility of a graph-based methodology for analyzing the timing
dependencies and uses it to make comparisons of different configurations. It is
shown   that   the   extremes  for  high  throughput  and  low  latency  differ
significantly, the placement of the completion detectors influences  timing  as
much  as adding an additional latch, and the choice as to whether precharged or
static logic is best is dependent on the cost in complexity of  the  completion
detectors.

-------------------------------------------------------------------------------

CSL-TR-90-432
THE OLYMPUS SYNTHESIS SYSTEM FOR DIGITAL DESIGN
Giovanni De Micheli, David Ku, Fredric Mailhot, Thomas Truong
July 1990                                                    29 pages.....$5.32

The Olympus Synthesis System, developed at Stanford University, is a vertically
integrated set of tools for the synthesis of  digital  circuit  designs.    The
system  is  specifically  designed to support synthesis of Application-Specific
Integrated Circuits from behavioral-level descriptions, written in  a  hardware
description  language  called HardwareC.  The Olympus system supports synthesis
with timing constraints at the behavioral, structural and logic  levels.    Two
internal  models,  Sequencing  Intermediate  Form,  (SIF)  and Structural/Logic
Intermediate Form (SLIF), are used  to  represent  the  hardware  at  different
levels  of  abstraction and to provide a way to pass design information between
the different tools.  This paper describes Olympus as a system consisting of  a
set  of  programs  for  behavioral,  structural and logic synthesis, technology
mapping and simulation.  The system has been used to design three ASIC chips at
Stanford  University  and  it  has  been  tested against benchmark circuits for
high-level and logic synthesis.

-------------------------------------------------------------------------------

CSL-TR-90-433
LIMITS ON MULTIPLE INSTRUCTION ISSUE
Michael D. Smith, Mike Johnson, and Mark A. Horowitz
July 1990                                                    18 pages.....$4.44

This  paper  investigates  the  limitations  on  designing  a  processor  which
cansustain  an  execution  rate  of  greater  than one instruction per cycle on
highly-optimized, non-scientific  applications.    We  have  used  trace-driven
simulations  to  determine  that  these applications contain enough instruction
independence to sustain an instruction  rate  of  about  two  instructions  per
cycle.  In a straightforward implementation, cost considerations argue strongly
against decoding  more  than  two  instructions  in  one  cycle.    Given  this
constraint,  the  efficiency in instruction fetching rather than the complexity
of the execution hardware limits the concurrency attainable at the  instruction
level.

-------------------------------------------------------------------------------

CSL-TR-90-434
BOOSTING BEYOND STATIC SCHEDULING IN A SUPERSCALAR PROCESSOR
Michael D. Smith, Monica S. Lam, and Mark A. Horowitz
July 1990                                                    19 pages.....$4.52

This  paper  describes a superscalar processor that combines the best qualities
of static and dynamic instruction scheduling to  increase  the  performance  of



non-numerical   applications.   The   architecture   performs  all  instruction
scheduling  statically  to  take  advantage  of  the  compiler's   ability   to
efficiently schedule operations across many basic blocks. Since the conditional
branches in non-numerical code are  highly  data  dependent,  the  architecture
introduces the concept of boosted instructions, instructions that are committed
conditionally  upon  the  result  of  later   branch   instructions.   Boosting
effectively removes the dependences caused by branches and makes the scheduling
of side-effect instructions as simple as those that are side-effect free.   For
efficiency,  boosting  is  supported  in the hardware by shadow structures that
temporarily hold the side effects of boosted instructions until the conditional
branches  that  the  boosted  instructions  depend  upon are executed. When the
branch condition is determined, the buffered side effects are either  committed
or squashed. The limited static scheduler in our evaluation system shows that a
1.6-times speedup over scalar code is achievable by boosting instructions above
only   a  single  conditional  branch.  This  performance  is  similar  to  the
performance of a pure dynamic scheduler.

-------------------------------------------------------------------------------

CSL-TR-90-435
COLLABORATION TRANSPARENCY IN DESKTOP TELECONFERENCING ENVIRONMENTS
J. Chris Lauwers
July 1990                                                  133 pages.....$13.64

The ultimate goal of desktop  teleconferencing  environments  is  to  integrate
contemporary (non-computer-supported) audio/video teleconferencing technologies
with workstation-based network computing environments.  This thesis studies the
application  sharing  facilities that allow multiple conference participants to
interact  simultaneously  with  computer-based  applications   in   a   desktop
teleconference.  It focuses on collaboration-transparent facilities that permit
existing  single-user  applications  to  be  invoked  from  within  a  computer
conference.  In the context of contemporary workstation-based environments, the
pursuit of collaboration transparency leads to the development of shared window
systems.  This  thesis  introduces  the  basic  architectures for shared window
systems and analyzes them based on experience with several prototypes.
Contemporary window system standards present many obstacles  to  shared  window
designers.  This  thesis discusses these obstacles and suggests how they can be
overcome. In addition, new architectures for window systems are  proposed  that
can  accommodate  window  sharing more effectively.  The thesis then introduces
application replication as  a  technique  for improving overall  shared  window
system  performance.    Application replication introduces many synchronization
problems, arising mainly from the need to  keep  replicated  copies  of  shared
applications   consistent.   This   thesis   shows   that   the  most  frequent
synchronization problems can be solved without changing existing software.  The
remaining  problems  can  be  solved  by  making applications or system servers
collaboration-aware.  Finally, this thesis studies the ramifications,  for  the
software  designer,  of  supporting  spontaneous interactions, shared workspace
management,  floor   control,   user   customization,   and   annotations   and
telepointing.    While  the  recommendations  that  result are motivated by the
desire to  enable  continued  use  of  collaboration-transparent  applications,
addressing them involves the development of systems software that is distinctly
collaboration-aware.

-------------------------------------------------------------------------------

CSL-TR-90-436
APPLICATION OF FORMAL SPECIFICATION TO SOFTWARE MAINTENANCE
Neel Madhav and Sriram Sankar
August 1990                                                  15 pages.....$4.20

This paper describes the use of formal specifications and associated  tools  in



addressing  various  aspects of software maintenance ---corrective, perfective,
and adaptive.  It also addresses the refinement  of  the  software  development
process  to  build programs that are easily maintainable.  The task of software
maintenance in our case includes the task of maintaining the  specification  as
well as maintaining the program.
We  focus  on the use of Anna, a specification language for formally specifying
Ada programs, to aid us in maintaining Ada  programs.    These  techniques  are
applicable  to  most  other  specification  language  and  programming language
environments. The tools of interest are: (1) the  Anna  Specification  Analyzer
which  allows  us  to analyze the specification for correctness with respect to
our informal understanding of program behavior; and (2)  the  Anna  Consistency
Checking  System  which  monitors  the Ada program at runtime based on the Anna
specification.

-------------------------------------------------------------------------------

CSL-TR-90-437
AN ADA---PROLOG SYSTEM
Neel Madhav
August 1990                                                  15 pages.....$4.20

This paper presents a software development tool --- the Ada-Prolog system which
combines  the  strengths of both descriptive and procedural programming styles.
Concrete reasons and examples are provided to  demonstrate  that  such  a  tool
would be useful.
This  tool  provides various operations available in Prolog for clausebuilding,
database building and querying to Ada programs.In addition to allowing  dynamic
access  to both Ada and Prolog, the Ada-Prolog system adds to the functionality
provided by Prolog by partitioning the Prolog database into lists  of  clauses.
These  lists  can  be  created,  updated and destroyed dynamically.  Concurrent
access to the list of clauses is also possible.  Queries  can  be  directed  to
groups of these lists.
The   system   is   meant  for  use  in  expert  systems,  compilers,  database
applications, rapid  prototyping  systems,  advanced  environments,  and  other
software tools which use deduction.

-------------------------------------------------------------------------------

CSL-TR-90-438
A METHODOLOGY FOR FORMAL SPECIFICATION AND IMPLEMENTATION OF ADA
PACKAGES USING ANNA
Neel Madhav and Walter Mann
August, 1990                                                 24 pages.....$4.92

This  paper  presents  a  methodology  for  formal  specification and prototype
implementation of Ada packages using the Anna specification language.
Specifications play an important role in the software development cycle.    The
methodology  allows  specifiers  of Ada packages to follow a sequence of simple
steps to formally specify packages.
Given the formal specification of a package resulting from the methodology  for
package  specifications,  the  methodology  allows  implementors of packages to
follow a few simple steps to implement the package. The implementation is meant
to be a prototype.
This  methodology for specification and implementation is applicable tomost Ada
packages. Limitations of this approach are pointed out at various points in the
paper.
We  present  software  tools  which  help  the  process  of  specification  and
implementation.

-------------------------------------------------------------------------------



CSL-TR-90-439
TANGO:  A MULTIPROCESSOR SIMULATION AND TRACING SYSTEM
Helen Davis and Stephen R. Goldschmidt
July 1990                                                    25 pages.....$5.00

Tango is a software simulation and tracing  system  used  to  obtain  data  for
evaluating parallel programs and multiprocessor systems.  The system provides a
simulated multiprocessor environment by multiplexing application processes onto
a  single  processor.  Tango  achieves high efficiency by running compiled user
code,  and  by  focusing  on  the   information   of   greatest   interest   to
multiprocessing  studies.  The  system  is  being  applied  to  a wide range of
investigations,  including  algorithm  studies  and  a  variety   of   hardware
evaluations.

-------------------------------------------------------------------------------

CSL-TR-90-440
BIBLIOGRAPHY OF THE COMPUTER SYSTEMS LABORATORY - 1968-1990 (Third
Edition)
Naomi Schulman
August 1990                                                 71 pages.....$10.00

The  bibliography  lists  all  technical  reports  and  notes  published by the
Computer Systems Laboratory of Stanford University, from 1968 to  date.    This
edition is in a binder where future announcements of new reports can be kept to
update the bibliography temporarily.    When  new  reports  are  in  sufficient
number, a whole new page will be created in the bibliography format.  New pages
will be listed on future announcement Order Forms and will be sent at no charge
along with any order placed.
Prices of reports and notes are listed separately, starting on page ix.  Please
note that report prices. are printed on  yellow  paper,  and  note  prices  are
printed  on  blue  paper.    These prices will supersede any previously given
prices, and may themselves be superseded  in  the  future  if  printing  and/or
postage costs rise.
The  symbol  (*), which, in earlier editions signified that a report was out of
print, is now used only to  indicates  that  no  "original"  is  available  for
copying.  All other reports will be made available upon request.
An  earlier  edition  of  the  Bibliography  (TR-272 or TR-336), plus copies of
announcements 15,16, 17, 18 and 19, will complete  the  list  of  publications.
However,  new  price  lists will be needed to complete the updating of previous
editions.  The TR Price List, 1990-91 and TN Price List, 1990-91 will  be  sent
upon request with any order placed.

-------------------------------------------------------------------------------

CSL-TR-90-441
COMPUTING TYPES DURING PROGRAM SPECIALIZATION
Daniel Weise and Erik Ruf
August 1990                                                  26 pages.....$5.08

We  have  developed  techniques for obtaining and using type information during
program  specialization  (partial  evaluation).    Computed  along  with  every
residual  expression  and  every  specialized  program is type information that
bounds the possible values that the specialized program  will  compute  at  run
time.   The three keystones of this research are symbolic values that represent
both a value and the code for creating the value,  generalization  of  symbolic
values,  and the use of online fixed-point iterations for computing the type of
values returned by specialized recursive functions.  The  specializer  exploits
type  information  to  increase  the efficiency of specialized functions.  This
research has  two  benefits,  one  anticipated  and  one  unanticipated.    The
anticipated  benefit  is  that  programs  that are to be specialized can now be



written in a more natural style without losing accuracy during  specialization.
The  unanticipated  benefit  is  the creation of what we term concrete abstract
interpretation. This is a method of  performing  abstract  interpretation  with
concrete  values  where  possible.  The specializer abstracts values as needed,
instead  of  requiring  that  all  values  be  abstracted  prior  to   abstract
interpretation.

-------------------------------------------------------------------------------

CSR-TR-90-442
AN IMPROVED HIGH-SPEED FLOATING-POINT ADDITION ALGORITHM
Nhon T. Quach and Michael J. Flynn
August 1990                                                  21 pages.....$4.68

This  paper  describes  an  improved,  IEEE  conforming floating-point addition
algorithm.  This algorithm has only one addition step involving the significand
in  the worst-case path, hence offering a considerable speed advantage over the
existing algorithms, which typically require two to three addition steps.

-------------------------------------------------------------------------------

CSL-TR-90-443
A QUEUING ANALYSIS FOR DISK ARRAY SYSTEMS
Mikito Ogata and Michael J. Flynn
August 1990                                                  51 pages.....$7.08

Using a queuing model of disk arrays, we study the performance and tradeoffs in
disk  array  sub-systems and develop guidelines for designing these sub-systems
in various CPU environments.  Finally, we compare our model with  some  earlier
simulation results.

-------------------------------------------------------------------------------

CSL-TR-90-444
EVALUATION OF A FOUR FOLD SPARSE DISTRIBUTED MEMORY PROTOTYPE
Brian Flachs
August 1990                                                  58 pages.....$7.64

Sparse Distributed Memory is a generalized random access memory for long binary
words.  This document describes changes made to the Single  Fold  Prototype  to
develop  Stanford's  Four Fold Prototype.  A users manual for the libraries and
utilities developed for the prototype  and  a  performance  evaluation  of  the
prototype are also included.  Appendices detail hardware settings and provide a
complete, up-to-date, set of Address Module Schematics.

-------------------------------------------------------------------------------

CSL-TR-90-445 (also numbered STAN-CS-90-1334)
SOFT CONFIGURABLE WAFER SCALE INTEGRATION: DESIGN, IMPLEMENTATION AND
YIELD ANALYSIS
Miriam Greta Blatt
June 1990                                                        123.....$12.84

Soft-Configurable Wafer Scale Integration uses software controlled switches  to
connect up the fault-free parts of a wafer.  Compared to hard configuration, th
e  soft  configurable  approach  has  the  advantages  of  providing   low-cost
connections  and  runtime  fault  tolerance.  The dissertation describes how to
achieve soft co nfiguration  with  high  performance,  presenting  a  pipelined
memory  system  implemented using this approach.  The yield of the prototype is
evaluated in two phases.  Fault simulation applies measured  defect  statistics
to the layout to predict the yield of each circuit unit.  These unit yields are



combined to produce wafer yields using redundancy models appropriate  to  wafer
scale  integration.    The  redundancy  models  constrain wafer yield by system
requirements such as the minimum n umber of working circuit units, and  whether
these  working  units  are  distributed  evenly  around  the  wafer.  Choice of
redundancy model significantly affects the r esulting wafer yield.

-------------------------------------------------------------------------------

CSL-TR-90-446
ANALYZING THE FAULT TOLERANCE OF A CLASS OF DOUBLE-LOOP NETWORKS
Jon M. Peha and Fouad A. Tobagi
September 1990                                               30 pages.....$5.40

This paper analyzes the fault tolerance of  a  class  of  double-loop  networks
referred  to  as  forward-loop  backward-hop  (FLBH),  in  which  each  node is
connected via unidirectional links to the node one hop in front of  it  and  to
the  node  S hops in back of it for some S. A new measure of fault tolerance is
described, along with techniques based on Markov chains to calculate upper  and
lower  bounds  on  the  fault  tolerance  of  this network topology quickly and
efficiently.  The result  s  of  these  calculations  provide  a  more  precise
description  of  network fault tolerance than has been achieved with previously
published techniques.

-------------------------------------------------------------------------------

CSL-TR-90-447
EVALUATION OF SCHEDULING ALGORITHMS FOR INTEGRATED-SERVICES
PACKET-SWITCHED NETWORKS
Jon Peha and Fouad A. Tobagi
September 1990                                               31 pages.....$5.48

In this paper, we consider integrated-services  packet-switched  networks  with
two  types  of  traffic:  traffic  with deadlines, for which the most important
perform ance objective is based on loss rate, and  packets  without  deadlines,
for  which the most important performance objective is based on mean delay.  We
present an o ptimal scheduling algorithm to minimize  weighted  loss  rate  and
weighted  mean delay in the queues that form at the switches and at the network
access points of a packet-switched network, where weights reflect the  relative
importance  of  packets.    Although  not  practical  for  implementation,  the
algorithm is intended as a standard for comparison with which  the  performance
of other scheduling algorithms can be evaluated.  Our algorithm is more general
and of lower computational co mplexity than  previously  published  algorithms,
enabling  us to evaluate performance in some important scenarios that could not
previously have been considered. U sing optimal performance  results  from  our
algorithm,  we  evaluate the performance of the first-come-first-served, static
priority, and earliest deadline first al gorithms, finding some limitations  of
each.  Our results suggest that network efficiency could be improved by using a
more sophisticated heuristic scheduling alg  orithm  rather  than  one  of  the
aforementioned algorithms.

-------------------------------------------------------------------------------

CSL-TR-90-448
A COST-BASED SCHEDULING ALGORITHM TO SUPPORT INTEGRATED SERVICES
Jon Peha and Fouad A. Tobagi
September 1990                                               44 pages.....$6.52

It  is  becoming  increasingly  important  to support applications with diverse
performance objectives on a single packet-switched network.  The efficiency  of
a netw ork with such diverse traffic can be greatly improved through the use of
sophisticated scheduling and dropping algorithms within the queues that form at



the  net work access points and in switches throughout the network.  This paper
presents heuristic algorithms for that purpose.   In  our  approach,  arbitrary
performance  o  bjectives  are defined in the form of cost functions, which map
the queueing delay experienced by each packet to a cost incurred.
Our algorithms, cost-based scheduling (CBS) and cost-based dropping (CBD), then
attempt  to  optimize  network  performance as perceived by the applications by
mini mizing the total cost  incurred  by  all  packets.    Cost  functions  are
presented  that  are  appropriate for most common applications.  Scheduling and
dropping algorit hms  are  defined  based  on  these  cost  functions.  Through
simulation,  it  is  demonstrated that network performance is better when these
heuristic algorithms are use d as opposed to the common alternatives.

-------------------------------------------------------------------------------

CSL-TR-90-449 (also numbered STAN-CS-90-1330)
PARALLEL ICCG ON A HIERARCHICAL MEMORY MULTIPROCESSOR -- ADDRESSING
THE TRIANGULAR SOLVE BOTTLENECK
Edward Rothberg and Anoop Gupta
September 1990                                               22 pages.....$4.76

The incomplete Cholesky conjugate gradient (ICCG) algorithm is a commonly  used
iterative method for solving large sparse systems of equations.  In this paper,
w e study the parallel solution of sparse triangular systems of equations,  the
most  difficult aspect of implementing the ICCG method on a multiprocessor.  We
focu  s  on  shared-memory  multiprocessor  architectures  with   deep   memory
hierarchies.      On  such  architectures  we  find  that  previously  proposed
parallelization approaches result in little or no speedup.  The reason is  that
these  approaches  cause  significant  increases in the amount of memory system
traffic as compared to a sequen tial approach.   Increases  of  as  much  as  a
factor  of  10  on four processors were observed.  In this paper we propose new
techniques for limiting these increases, including data remappings to  increase
spatial  locality, new processor synchronization techniques to decrease the use
of auxiliary data structures, and data part itioning techniques to  reduce  the
amount  of  interprocessor communication.  With these techniques, memory system
traffic is reduced to as little as one sixth  of  its  previous  volume.    The
resulting  speedups  are greatly improved as well, although they are still much
less than linear.  We discuss the factors that limit fur  ther  speedups.    We
present  both  simulation  results  and results of experiments on an SGI 4D/340
multiprocessor.

-------------------------------------------------------------------------------

CSL-TR-90-450 (also numbered STAN-CS-90-1333)
COMPARING STRUCTURALLY DIFFERENT VIEWS OF A VLSI DESIGN
Michael Joseph Spreitzer
June 1990                                                  162 pages.....$15.96

VLSI designers often use  structural  hierarchies  and  alternate  views  of  a
design.    Allowing  alternate views to have different hierarchies improves the
clarity of the views, but complicates comparison of those views.  Most existing
comparison  techniques either require essentially identical hierarchies, or can
only handle differences by flattening.  A new technique,  Informed  Comparison,
first  reconciles  the  hierarchy  differenc  es  by  making  purely structural
transformations according to a description of the intended relationship between
the  hierarchies,  and  then finishes with an existi ng hierarchical technique.
Informed Comparison thus can compare views with different  hierarchies  without
the penalties associated with flattening.

-------------------------------------------------------------------------------




     Naomi Schulman
                           COMPUTER SYSTEMS LABORATORY
                               Stanford University
                             Stanford, CA 94305-4055
                       E-mail: schulman@sierra.stanford.edu
                                   415-723-1430
                              Hours: M-Th, 8-1 (PST)


                              ORDER FORM AND INVOICE

ALL ORDERS MUST BE PREPAID

Please  print  or  type  your name and address in the space provided.  Check
the number(s) of the report(s) you wish to purchase and circle the price of
hardcopy or microfiche.  California Residents must add the sales tax of their
own local county.  Return this order form with a check or money order made
payable to Stanford University.

Foreign Orders must include $1.00 extra for each $15.00's worth of reports to
cover  postage.  Checks  must  be drawn  on  a  U.S.  bank, payable in U.S.
dollars.  If an invoice is required for payment, please note that this form
is both an order form and an invoice.  We cannot invoice separately.

                          (Name)  ________________________________

                          (Address)_______________________________

                                  ________________________________

                                  ________________________________

                                  ________________________________

                   E-mail address:________________________________


               REPORT #  HARDCOPY  FICHE        REPORT #  HARDCOPY  FICHE
               TR-425    $ 5.48    $2.00        TR-438    $ 4.92    $2.00
               TR-426    $ 5.24    $2.00        TR-439    $ 5.00    $2.00
               TR-427    $16.28    $3.00        TR-440    $10.00    $2.00
               TR-428    $ 5.08    $2.00        TR-441    $ 5.08    $2.00
               TR-429    $ 4.20    $2.00        TR-442    $ 4.68    $2.00
               TR-430    $ 4.28    $2.00        TR-443    $ 7.08    $2.00
               TR-431    $ 5.32    $2.00        TR-444    $ 7.64    $2.00
               TR-432    $ 5.32    $2.00        TR-445    $12.84    $3.00
               TR-433    $ 4.44    $2.00        TR-446    $ 5.40    $2.00
               TR-434    $ 4.52    $2.00        TR-447    $12.84    $3.00
               TR-435    $13.64    $3.00        TR-448    $ 5.40    $2.00
               TR-436    $ 4.20    $2.00        TR-449    $ 5.48    $2.00
               TR-437    $ 4.20    $2.00        TR-450    $15.96    $3.00

                                  Subtotal    __________

                  CA tax or Foreign Surcharge __________

                                  Total       __________

Circle which ones you want: Announcement 15, 16, 17, 18
Check here ___ to receive 1990-91 Price Lists (with report order, only)

** END OF MESSAGE **

Date: 1991-03-13 15:22:49
                             Msg-ID: RFC-822
                             CMM.0.88.668899423.schulman@shasta.Stanford.EDU  OU=
                             VE-GW ON=BITNET

To:   mflll002
CC:   schulman@shasta.Stanford.EDU   OU=VE-GW ON=BITNET
From: Naomi Schulman
schulman@shasta.Stanford.EDU OU=VE-GW ON=BITNET

Subj: Announcement #19

------------ Letter Body Part 1 - Text  ------------

RFC-822-HEADERS:
Return-Path: <schulman@shasta.Stanford.EDU>
Date: Wed, 13 Mar 1991 13:23:43 PST


------------ Letter Body Part 2 - Text  ------------





                                Naomi Schulman
                          COMPUTER SYSTEMS LABORATORY
                              Stanford University
                            Stanford, CA 94305-4055
                     e-mail: schulman@sierra.stanford.edu
                                (415) 723-1430
                             Hours: M-Th 8-1 (PST)

                              RECENT PUBLICATIONS
                                   LIST #19
                                SEPTEMBER 1990

                To order reports, see Order Form on last page.

CSL-TR-90-425
CONCURRENT RUNTIME MONITORING OF FORMALLY SPECIFIED PROGRAMS
Manas Mandal and Sriram Sankar
April 1990                                                   31 pages.....$5.48

This   paper  describes  an  application  of  formal  specifications  after  an
executable  program  has  been  constructed.    We  describe  how  high   level
specifications can be utilized to monitor critical aspects of the behavior of a
program continuously while it  is  executing.    This  methodology  provides  a
capability  to  distribute  the monitoring of specifications on multi-processor
hardware platforms to meet practical time constraints.
Typically, runtime checking of formal  specifications  involves  a  significant
time  penalty which makes it impractical during normal production opertion of a
program.  In previous  research,  runtime  checking  has  been  applied  during
testing and debugging of software, but not on a permanent basis.
Crucial  to  our  current  mthodology  is the use of multi-processor machines -
hence runtime monitoring can be performed concurrently on different processors.
We describe techniques for distributing checks onto different processors.
To control the degree of concurrency, we introduce checkpoints - a point in the
program beyond which execution cannot proceed until the specified  checks  have
been  completed.   Error reporting and recovery in a multiprocessor environment
is complicated and there are various techniques of handling this.  We  describe
a few of these techniques in this paper.
An  implementation  of this methodology for the Anna specification language for
Ada  programs  is  described.    Results  of  experiments  conducted  on   this
implementation using a 12-processor Sequent Symmetry demonstrate that permanent
concurrent monitoring of programs based  on  formal  specifications  is  indeed
feasible.

-------------------------------------------------------------------------------

CSL-TR-90-426
A VLSI ARCHITECTURE FOR THE FCHC ISOMETRIC LATTICE GAS MODEL
Fung F. Lee, Michael J. Flynn, and Martin Morf
April 1990                                                   28 pages.....$5.24

Lattice  gas  models  are  cellular  automata  used for the simulation of fluid
dynamics. This paper addresses the design issues of  a  lattice  gas  collision
rule  processor  for  the four-dimensional FCHC isometric lattice gas model.  A
novel VLSI architecture based on an  optimized  version  of  Henon's  isometric
algorithm is proposed.  One of the key concepts behind this architecture is the
permutation group representation of the isometry group of the lattice.
In contrast to the straightforward table lookup approach which would  take  4.5
billion  bits  to  implement  this  set  of  collision  rules,  the size of our
processor is only about 5000 gates.   With  a  reasonable  number  of  pipeline
stages,  the  processor  can  deliver  one  result  per cycle with a cycle time
comparable to or less than that of a common commercial DRAM.



-------------------------------------------------------------------------------

CSL-TR-90-427
GENERALIZED GRAPHICAL OBJECT EDITING
John M. Vlissides
June 1990                                                  166 pages.....$16.28

Many editors use the graphics capabilities of personal workstations to  provide
a  visual  editing environment.  Such editors present graphical representations
of familiar objects and  allow  the  user  to  manipulate  the  representations
directly.  This style of interaction is usually more intuitive to the user than
typing statements in a command language.   However,  implementing  a  graphical
object  editor  has  been  a difficult undertaking.  Though many packages exist
that aid in the construction of graphical user interfaces,  none  are  designed
specifically  for  building  graphical  object  editors.    This is significant
because there is a substantial semantic gap between general user interfaces and
the  functionality  of  graphical  object editors.  For example, user interface
packages usually provide buttons, scroll bars, and ways to assemble  them,  but
they  do  not  offer  primitives  for  building  drawing  editors  that produce
PostScript or schematic capture systems that produce  netlists.    Higher-level
abstractions are needed to make such applications easier to develop.
Unidraw  is  a  framework  for  creating  object-oriented  graphical editors in
domains such as technical and artistic drawing, music composition, and  circuit
design.  The  Unidraw architecture simplifies the construction of these editors
by providing programming abstractions that are common across domains.   Unidraw
defines  four  basic  abstractions: "components" encapsulate the appearance and
semantics of objects in  a  domain,  "tools"  support  direct  manipulation  of
components,  "commands"  define operations on components and other objects, and
"external representations" define the mapping betweencomponents  and  the  file
format  generated  by  the  editor.    Unidraw  also  supports  multiple views,
graphical connectivity and confinement, and dataflow between components.   This
thesis   describes   the  Unidraw  design,  implementation  issues,  and  three
experimental domain-specific editors we have developed with Unidraw:  a drawing
editor,  a user interface builder, and a schematic capture system.  Our results
indicate a substantial reduction in implementation  time  and  effort  compared
with existing tools.

-------------------------------------------------------------------------------

CSL-TR-90-428
SUB-NANOSECOND ARITHMETIC
Michael J. Flynn, Giovanni De Micheli, Robert Dutton, Bruce Wooley,
and Fabian Pease
May 1990                                                     26 pages.....$5.08

The  SNAP  (Stanford  Nanosecond  Arithmetic  Processor) project is targeted at
realizing an arithmetic processor with performance approximately  an  order  of
magnitude  faster  than currently available technology. The realization of SNAP
is predicated on an interdisciplinary approach and effort spanning research  in
algorithms,  data  representation,  CAD,  circuits  and devices, and packaging.
SNAP is visualized as  an  arithmetic  coprocessor  implemented  on  an  active
substrate  containing  several  chips,  each  of  which  realize  a  particular
arithmetic function.

-------------------------------------------------------------------------------

CSL-TR-90-429
MULTIPROCESSOR SMALLTALK: IMPLEMENTATION, PERFORMANCE, AND ANALYSIS
Joseph Ira Pallas
June 1990                                                   140 pages...$14.20



Multiprocessor Smalltalk demonstrates the value of object-oriented  programming
on amultiprocessor.  Its implementation and analysis shed light on three areas:
concurrent  programming  in  an  object-oriented   language   without   special
extensions,  implementation  techniques  for  adapting  to multiprocessors, and
performance factors in the resulting system.
Multiprocessor  Smalltalk's  performance  shows   that   the   combination   of
multiprocessing  and  object-oriented  programming  can be effective:  speedups
(relative to the original serial version) exceed 2.0 for five processors on all
the  benchmarks;  the  median  efficiency  is  48%.   Analysis shows both where
performance is lost and how to improve and generalize the experimental results.
Changes  in  the  interpreter  to support concurrency add at most 12% overhead;
better access to per-process variables could eliminate much of that.    Changes
in  the  user  code  to  express  concurrency add as much as 70% overhead; this
overhead could be reduced to 54% if blocks (lambda expressions) were reentrant.
Performance is also lost when the program cannot keep all five processors busy.
Idle time in the interpreter (up to  51%  overhead,  excluding  a  pathological
case)  could be reduced with a parallel garbage collector to 10%.  Idle time in
user code (up to 35% overhead) remains the programmer's responsibility.
We can identify the key  characteristics  that  make  Multiprocessor  Smalltalk
successful.    The  Smalltalk  language  allows  us to build concurrent control
structures using lambda expressions without extending the language. Inexpensive
processes    and    efficient    garbage    collection    are   also   crucial.
Hardware/operating-system support for shared memory, per-process variables, and
inexpensive  synchronization are essential to the implementation.  Given these,
object-oriented languages and multiprocessors are a good match.

-------------------------------------------------------------------------------

CSL-TR-90-430 (also numbered STAN-CS-90-1318)
TECHNIQUES FOR IMPROVING THE PERFORMANCE OF SPARSE MATRIX
FACTORIZATION ON MULTIPROCESSOR WORKSTATIONS
Edward Rothberg and Anoop Gupta
June 1990                                                    16 pages.....$4.28

In this paper we look at the problem  of  factoring  large  sparse  systems  of
equations   on  high-performance  multiprocessor  workstations.    While  these
multiprocessor workstations are  capable  of  very  high  peak  floating  point
computation  rates,  most  existing  sparse  factorization codes achieve only a
small fraction of this potential.  A major  limiting  factor  is  the  cost  of
memory accesses performed during the factorization.  In this paper, we describe
a parallel factorization code which utilizes the supernodal  structure  of  the
matrix to reduce the number of memory references.  We also propose enhancements
that significantly reduce the overall cache miss rate.  The result  is  greatly
increased  factorization  performance.    We  present experimental results from
executions of our codes on the Silicon Graphics 4D/380 multiprocessor.    Using
eight  processors,  we  find  that  the  supernodal  parallel  code  achieves a
computation rate of approximately 40 MFLOPS when factoring a range of benchmark
matrices.  This is more than twice as fast as the parallel nodal code developed
at the Oak Ridge National Laboratory running on the SGI 4D/380.

-------------------------------------------------------------------------------

CSL-TR-90-431
LATENCY AND THROUGHPUT TRADEOFFS IN SELF-TIMED SPEED-INDEPENDENT
PIPELINES AND RINGS
Ted Williams
June 1990                                                     29 pages....$5.32

Asynchronous pipelines control the flow of tokens through a sequence of logical
stages   based  on  the  status  of  local  completion  detectors.    As  in  a
synchronously clocked circuit, the design of self-timed pipelines can trade off



between  achieving  low  latencyn  and high throughput. However, there are more
degress of freedom because of the variances  in  specific  latch  and  function
block styles, and the possibility of varying both the number of latches between
function blocks and their connections to the completion detectors. This  report
demonstrates  the utility of a graph-based methodology for analyzing the timing
dependencies and uses it to make comparisons of different configurations. It is
shown   that   the   extremes  for  high  throughput  and  low  latency  differ
significantly, the placement of the completion detectors influences  timing  as
much  as adding an additional latch, and the choice as to whether precharged or
static logic is best is dependent on the cost in complexity of  the  completion
detectors.

-------------------------------------------------------------------------------

CSL-TR-90-432
THE OLYMPUS SYNTHESIS SYSTEM FOR DIGITAL DESIGN
Giovanni De Micheli, David Ku, Fredric Mailhot, Thomas Truong
July 1990                                                    29 pages.....$5.32

The Olympus Synthesis System, developed at Stanford University, is a vertically
integrated set of tools for the synthesis of  digital  circuit  designs.    The
system  is  specifically  designed to support synthesis of Application-Specific
Integrated Circuits from behavioral-level descriptions, written in  a  hardware
description  language  called HardwareC.  The Olympus system supports synthesis
with timing constraints at the behavioral, structural and logic  levels.    Two
internal  models,  Sequencing  Intermediate  Form,  (SIF)  and Structural/Logic
Intermediate Form (SLIF), are used  to  represent  the  hardware  at  different
levels  of  abstraction and to provide a way to pass design information between
the different tools.  This paper describes Olympus as a system consisting of  a
set  of  programs  for  behavioral,  structural and logic synthesis, technology
mapping and simulation.  The system has been used to design three ASIC chips at
Stanford  University  and  it  has  been  tested against benchmark circuits for
high-level and logic synthesis.

-------------------------------------------------------------------------------

CSL-TR-90-433
LIMITS ON MULTIPLE INSTRUCTION ISSUE
Michael D. Smith, Mike Johnson, and Mark A. Horowitz
July 1990                                                    18 pages.....$4.44

This  paper  investigates  the  limitations  on  designing  a  processor  which
cansustain  an  execution  rate  of  greater  than one instruction per cycle on
highly-optimized, non-scientific  applications.    We  have  used  trace-driven
simulations  to  determine  that  these applications contain enough instruction
independence to sustain an instruction  rate  of  about  two  instructions  per
cycle.  In a straightforward implementation, cost considerations argue strongly
against decoding  more  than  two  instructions  in  one  cycle.    Given  this
constraint,  the  efficiency in instruction fetching rather than the complexity
of the execution hardware limits the concurrency attainable at the  instruction
level.

-------------------------------------------------------------------------------

CSL-TR-90-434
BOOSTING BEYOND STATIC SCHEDULING IN A SUPERSCALAR PROCESSOR
Michael D. Smith, Monica S. Lam, and Mark A. Horowitz
July 1990                                                    19 pages.....$4.52

This  paper  describes a superscalar processor that combines the best qualities
of static and dynamic instruction scheduling to  increase  the  performance  of



non-numerical   applications.   The   architecture   performs  all  instruction
scheduling  statically  to  take  advantage  of  the  compiler's   ability   to
efficiently schedule operations across many basic blocks. Since the conditional
branches in non-numerical code are  highly  data  dependent,  the  architecture
introduces the concept of boosted instructions, instructions that are committed
conditionally  upon  the  result  of  later   branch   instructions.   Boosting
effectively removes the dependences caused by branches and makes the scheduling
of side-effect instructions as simple as those that are side-effect free.   For
efficiency,  boosting  is  supported  in the hardware by shadow structures that
temporarily hold the side effects of boosted instructions until the conditional
branches  that  the  boosted  instructions  depend  upon are executed. When the
branch condition is determined, the buffered side effects are either  committed
or squashed. The limited static scheduler in our evaluation system shows that a
1.6-times speedup over scalar code is achievable by boosting instructions above
only   a  single  conditional  branch.  This  performance  is  similar  to  the
performance of a pure dynamic scheduler.

-------------------------------------------------------------------------------

CSL-TR-90-435
COLLABORATION TRANSPARENCY IN DESKTOP TELECONFERENCING ENVIRONMENTS
J. Chris Lauwers
July 1990                                                  133 pages.....$13.64

The ultimate goal of desktop  teleconferencing  environments  is  to  integrate
contemporary (non-computer-supported) audio/video teleconferencing technologies
with workstation-based network computing environments.  This thesis studies the
application  sharing  facilities that allow multiple conference participants to
interact  simultaneously  with  computer-based  applications   in   a   desktop
teleconference.  It focuses on collaboration-transparent facilities that permit
existing  single-user  applications  to  be  invoked  from  within  a  computer
conference.  In the context of contemporary workstation-based environments, the
pursuit of collaboration transparency leads to the development of shared window
systems.  This  thesis  introduces  the  basic  architectures for shared window
systems and analyzes them based on experience with several prototypes.
Contemporary window system standards present many obstacles  to  shared  window
designers.  This  thesis discusses these obstacles and suggests how they can be
overcome. In addition, new architectures for window systems are  proposed  that
can  accommodate  window  sharing more effectively.  The thesis then introduces
application replication as  a  technique  for improving overall  shared  window
system  performance.    Application replication introduces many synchronization
problems, arising mainly from the need to  keep  replicated  copies  of  shared
applications   consistent.   This   thesis   shows   that   the  most  frequent
synchronization problems can be solved without changing existing software.  The
remaining  problems  can  be  solved  by  making applications or system servers
collaboration-aware.  Finally, this thesis studies the ramifications,  for  the
software  designer,  of  supporting  spontaneous interactions, shared workspace
management,  floor   control,   user   customization,   and   annotations   and
telepointing.    While  the  recommendations  that  result are motivated by the
desire to  enable  continued  use  of  collaboration-transparent  applications,
addressing them involves the development of systems software that is distinctly
collaboration-aware.

-------------------------------------------------------------------------------

CSL-TR-90-436
APPLICATION OF FORMAL SPECIFICATION TO SOFTWARE MAINTENANCE
Neel Madhav and Sriram Sankar
August 1990                                                  15 pages.....$4.20

This paper describes the use of formal specifications and associated  tools  in



addressing  various  aspects of software maintenance ---corrective, perfective,
and adaptive.  It also addresses the refinement  of  the  software  development
process  to  build programs that are easily maintainable.  The task of software
maintenance in our case includes the task of maintaining the  specification  as
well as maintaining the program.
We  focus  on the use of Anna, a specification language for formally specifying
Ada programs, to aid us in maintaining Ada  programs.    These  techniques  are
applicable  to  most  other  specification  language  and  programming language
environments. The tools of interest are: (1) the  Anna  Specification  Analyzer
which  allows  us  to analyze the specification for correctness with respect to
our informal understanding of program behavior; and (2)  the  Anna  Consistency
Checking  System  which  monitors  the Ada program at runtime based on the Anna
specification.

-------------------------------------------------------------------------------

CSL-TR-90-437
AN ADA---PROLOG SYSTEM
Neel Madhav
August 1990                                                  15 pages.....$4.20

This paper presents a software development tool --- the Ada-Prolog system which
combines  the  strengths of both descriptive and procedural programming styles.
Concrete reasons and examples are provided to  demonstrate  that  such  a  tool
would be useful.
This  tool  provides various operations available in Prolog for clausebuilding,
database building and querying to Ada programs.In addition to allowing  dynamic
access  to both Ada and Prolog, the Ada-Prolog system adds to the functionality
provided by Prolog by partitioning the Prolog database into lists  of  clauses.
These  lists  can  be  created,  updated and destroyed dynamically.  Concurrent
access to the list of clauses is also possible.  Queries  can  be  directed  to
groups of these lists.
The   system   is   meant  for  use  in  expert  systems,  compilers,  database
applications, rapid  prototyping  systems,  advanced  environments,  and  other
software tools which use deduction.

-------------------------------------------------------------------------------

CSL-TR-90-438
A METHODOLOGY FOR FORMAL SPECIFICATION AND IMPLEMENTATION OF ADA
PACKAGES USING ANNA
Neel Madhav and Walter Mann
August, 1990                                                 24 pages.....$4.92

This  paper  presents  a  methodology  for  formal  specification and prototype
implementation of Ada packages using the Anna specification language.
Specifications play an important role in the software development cycle.    The
methodology  allows  specifiers  of Ada packages to follow a sequence of simple
steps to formally specify packages.
Given the formal specification of a package resulting from the methodology  for
package  specifications,  the  methodology  allows  implementors of packages to
follow a few simple steps to implement the package. The implementation is meant
to be a prototype.
This  methodology for specification and implementation is applicable tomost Ada
packages. Limitations of this approach are pointed out at various points in the
paper.
We  present  software  tools  which  help  the  process  of  specification  and
implementation.

-------------------------------------------------------------------------------



CSL-TR-90-439
TANGO:  A MULTIPROCESSOR SIMULATION AND TRACING SYSTEM
Helen Davis and Stephen R. Goldschmidt
July 1990                                                    25 pages.....$5.00

Tango is a software simulation and tracing  system  used  to  obtain  data  for
evaluating parallel programs and multiprocessor systems.  The system provides a
simulated multiprocessor environment by multiplexing application processes onto
a  single  processor.  Tango  achieves high efficiency by running compiled user
code,  and  by  focusing  on  the   information   of   greatest   interest   to
multiprocessing  studies.  The  system  is  being  applied  to  a wide range of
investigations,  including  algorithm  studies  and  a  variety   of   hardware
evaluations.

-------------------------------------------------------------------------------

CSL-TR-90-440
BIBLIOGRAPHY OF THE COMPUTER SYSTEMS LABORATORY - 1968-1990 (Third
Edition)
Naomi Schulman
August 1990                                                 71 pages.....$10.00

The  bibliography  lists  all  technical  reports  and  notes  published by the
Computer Systems Laboratory of Stanford University, from 1968 to  date.    This
edition is in a binder where future announcements of new reports can be kept to
update the bibliography temporarily.    When  new  reports  are  in  sufficient
number, a whole new page will be created in the bibliography format.  New pages
will be listed on future announcement Order Forms and will be sent at no charge
along with any order placed.
Prices of reports and notes are listed separately, starting on page ix.  Please
note that report prices. are printed on  yellow  paper,  and  note  prices  are
printed  on  blue  paper.    These prices will supersede any previously given
prices, and may themselves be superseded  in  the  future  if  printing  and/or
postage costs rise.
The  symbol  (*), which, in earlier editions signified that a report was out of
print, is now used only to  indicates  that  no  "original"  is  available  for
copying.  All other reports will be made available upon request.
An  earlier  edition  of  the  Bibliography  (TR-272 or TR-336), plus copies of
announcements 15,16, 17, 18 and 19, will complete  the  list  of  publications.
However,  new  price  lists will be needed to complete the updating of previous
editions.  The TR Price List, 1990-91 and TN Price List, 1990-91 will  be  sent
upon request with any order placed.

-------------------------------------------------------------------------------

CSL-TR-90-441
COMPUTING TYPES DURING PROGRAM SPECIALIZATION
Daniel Weise and Erik Ruf
August 1990                                                  26 pages.....$5.08

We  have  developed  techniques for obtaining and using type information during
program  specialization  (partial  evaluation).    Computed  along  with  every
residual  expression  and  every  specialized  program is type information that
bounds the possible values that the specialized program  will  compute  at  run
time.   The three keystones of this research are symbolic values that represent
both a value and the code for creating the value,  generalization  of  symbolic
values,  and the use of online fixed-point iterations for computing the type of
values returned by specialized recursive functions.  The  specializer  exploits
type  information  to  increase  the efficiency of specialized functions.  This
research has  two  benefits,  one  anticipated  and  one  unanticipated.    The
anticipated  benefit  is  that  programs  that are to be specialized can now be



written in a more natural style without losing accuracy during  specialization.
The  unanticipated  benefit  is  the creation of what we term concrete abstract
interpretation. This is a method of  performing  abstract  interpretation  with
concrete  values  where  possible.  The specializer abstracts values as needed,
instead  of  requiring  that  all  values  be  abstracted  prior  to   abstract
interpretation.

-------------------------------------------------------------------------------

CSR-TR-90-442
AN IMPROVED HIGH-SPEED FLOATING-POINT ADDITION ALGORITHM
Nhon T. Quach and Michael J. Flynn
August 1990                                                  21 pages.....$4.68

This  paper  describes  an  improved,  IEEE  conforming floating-point addition
algorithm.  This algorithm has only one addition step involving the significand
in  the worst-case path, hence offering a considerable speed advantage over the
existing algorithms, which typically require two to three addition steps.

-------------------------------------------------------------------------------

CSL-TR-90-443
A QUEUING ANALYSIS FOR DISK ARRAY SYSTEMS
Mikito Ogata and Michael J. Flynn
August 1990                                                  51 pages.....$7.08

Using a queuing model of disk arrays, we study the performance and tradeoffs in
disk  array  sub-systems and develop guidelines for designing these sub-systems
in various CPU environments.  Finally, we compare our model with  some  earlier
simulation results.

-------------------------------------------------------------------------------

CSL-TR-90-444
EVALUATION OF A FOUR FOLD SPARSE DISTRIBUTED MEMORY PROTOTYPE
Brian Flachs
August 1990                                                  58 pages.....$7.64

Sparse Distributed Memory is a generalized random access memory for long binary
words.  This document describes changes made to the Single  Fold  Prototype  to
develop  Stanford's  Four Fold Prototype.  A users manual for the libraries and
utilities developed for the prototype  and  a  performance  evaluation  of  the
prototype are also included.  Appendices detail hardware settings and provide a
complete, up-to-date, set of Address Module Schematics.

-------------------------------------------------------------------------------

CSL-TR-90-445 (also numbered STAN-CS-90-1334)
SOFT CONFIGURABLE WAFER SCALE INTEGRATION: DESIGN, IMPLEMENTATION AND
YIELD ANALYSIS
Miriam Greta Blatt
June 1990                                                        123.....$12.84

Soft-Configurable Wafer Scale Integration uses software controlled switches  to
connect up the fault-free parts of a wafer.  Compared to hard configuration, th
e  soft  configurable  approach  has  the  advantages  of  providing   low-cost
connections  and  runtime  fault  tolerance.  The dissertation describes how to
achieve soft co nfiguration  with  high  performance,  presenting  a  pipelined
memory  system  implemented using this approach.  The yield of the prototype is
evaluated in two phases.  Fault simulation applies measured  defect  statistics
to the layout to predict the yield of each circuit unit.  These unit yields are



combined to produce wafer yields using redundancy models appropriate  to  wafer
scale  integration.    The  redundancy  models  constrain wafer yield by system
requirements such as the minimum n umber of working circuit units, and  whether
these  working  units  are  distributed  evenly  around  the  wafer.  Choice of
redundancy model significantly affects the r esulting wafer yield.

-------------------------------------------------------------------------------

CSL-TR-90-446
ANALYZING THE FAULT TOLERANCE OF A CLASS OF DOUBLE-LOOP NETWORKS
Jon M. Peha and Fouad A. Tobagi
September 1990                                               30 pages.....$5.40

This paper analyzes the fault tolerance of  a  class  of  double-loop  networks
referred  to  as  forward-loop  backward-hop  (FLBH),  in  which  each  node is
connected via unidirectional links to the node one hop in front of  it  and  to
the  node  S hops in back of it for some S. A new measure of fault tolerance is
described, along with techniques based on Markov chains to calculate upper  and
lower  bounds  on  the  fault  tolerance  of  this network topology quickly and
efficiently.  The result  s  of  these  calculations  provide  a  more  precise
description  of  network fault tolerance than has been achieved with previously
published techniques.

-------------------------------------------------------------------------------

CSL-TR-90-447
EVALUATION OF SCHEDULING ALGORITHMS FOR INTEGRATED-SERVICES
PACKET-SWITCHED NETWORKS
Jon Peha and Fouad A. Tobagi
September 1990                                               31 pages.....$5.48

In this paper, we consider integrated-services  packet-switched  networks  with
two  types  of  traffic:  traffic  with deadlines, for which the most important
perform ance objective is based on loss rate, and  packets  without  deadlines,
for  which the most important performance objective is based on mean delay.  We
present an o ptimal scheduling algorithm to minimize  weighted  loss  rate  and
weighted  mean delay in the queues that form at the switches and at the network
access points of a packet-switched network, where weights reflect the  relative
importance  of  packets.    Although  not  practical  for  implementation,  the
algorithm is intended as a standard for comparison with which  the  performance
of other scheduling algorithms can be evaluated.  Our algorithm is more general
and of lower computational co mplexity than  previously  published  algorithms,
enabling  us to evaluate performance in some important scenarios that could not
previously have been considered. U sing optimal performance  results  from  our
algorithm,  we  evaluate the performance of the first-come-first-served, static
priority, and earliest deadline first al gorithms, finding some limitations  of
each.  Our results suggest that network efficiency could be improved by using a
more sophisticated heuristic scheduling alg  orithm  rather  than  one  of  the
aforementioned algorithms.

-------------------------------------------------------------------------------

CSL-TR-90-448
A COST-BASED SCHEDULING ALGORITHM TO SUPPORT INTEGRATED SERVICES
Jon Peha and Fouad A. Tobagi
September 1990                                               44 pages.....$6.52

It  is  becoming  increasingly  important  to support applications with diverse
performance objectives on a single packet-switched network.  The efficiency  of
a netw ork with such diverse traffic can be greatly improved through the use of
sophisticated scheduling and dropping algorithms within the queues that form at



the  net work access points and in switches throughout the network.  This paper
presents heuristic algorithms for that purpose.   In  our  approach,  arbitrary
performance  o  bjectives  are defined in the form of cost functions, which map
the queueing delay experienced by each packet to a cost incurred.
Our algorithms, cost-based scheduling (CBS) and cost-based dropping (CBD), then
attempt  to  optimize  network  performance as perceived by the applications by
mini mizing the total cost  incurred  by  all  packets.    Cost  functions  are
presented  that  are  appropriate for most common applications.  Scheduling and
dropping algorit hms  are  defined  based  on  these  cost  functions.  Through
simulation,  it  is  demonstrated that network performance is better when these
heuristic algorithms are use d as opposed to the common alternatives.

-------------------------------------------------------------------------------

CSL-TR-90-449 (also numbered STAN-CS-90-1330)
PARALLEL ICCG ON A HIERARCHICAL MEMORY MULTIPROCESSOR -- ADDRESSING
THE TRIANGULAR SOLVE BOTTLENECK
Edward Rothberg and Anoop Gupta
September 1990                                               22 pages.....$4.76

The incomplete Cholesky conjugate gradient (ICCG) algorithm is a commonly  used
iterative method for solving large sparse systems of equations.  In this paper,
w e study the parallel solution of sparse triangular systems of equations,  the
most  difficult aspect of implementing the ICCG method on a multiprocessor.  We
focu  s  on  shared-memory  multiprocessor  architectures  with   deep   memory
hierarchies.      On  such  architectures  we  find  that  previously  proposed
parallelization approaches result in little or no speedup.  The reason is  that
these  approaches  cause  significant  increases in the amount of memory system
traffic as compared to a sequen tial approach.   Increases  of  as  much  as  a
factor  of  10  on four processors were observed.  In this paper we propose new
techniques for limiting these increases, including data remappings to  increase
spatial  locality, new processor synchronization techniques to decrease the use
of auxiliary data structures, and data part itioning techniques to  reduce  the
amount  of  interprocessor communication.  With these techniques, memory system
traffic is reduced to as little as one sixth  of  its  previous  volume.    The
resulting  speedups  are greatly improved as well, although they are still much
less than linear.  We discuss the factors that limit fur  ther  speedups.    We
present  both  simulation  results  and results of experiments on an SGI 4D/340
multiprocessor.

-------------------------------------------------------------------------------

CSL-TR-90-450 (also numbered STAN-CS-90-1333)
COMPARING STRUCTURALLY DIFFERENT VIEWS OF A VLSI DESIGN
Michael Joseph Spreitzer
June 1990                                                  162 pages.....$15.96

VLSI designers often use  structural  hierarchies  and  alternate  views  of  a
design.    Allowing  alternate views to have different hierarchies improves the
clarity of the views, but complicates comparison of those views.  Most existing
comparison  techniques either require essentially identical hierarchies, or can
only handle differences by flattening.  A new technique,  Informed  Comparison,
first  reconciles  the  hierarchy  differenc  es  by  making  purely structural
transformations according to a description of the intended relationship between
the  hierarchies,  and  then finishes with an existi ng hierarchical technique.
Informed Comparison thus can compare views with different  hierarchies  without
the penalties associated with flattening.

-------------------------------------------------------------------------------




     Naomi Schulman
                           COMPUTER SYSTEMS LABORATORY
                               Stanford University
                             Stanford, CA 94305-4055
                       E-mail: schulman@sierra.stanford.edu
                                   415-723-1430
                              Hours: M-Th, 8-1 (PST)


                              ORDER FORM AND INVOICE

ALL ORDERS MUST BE PREPAID

Please  print  or  type  your name and address in the space provided.  Check
the number(s) of the report(s) you wish to purchase and circle the price of
hardcopy or microfiche.  California Residents must add the sales tax of their
own local county.  Return this order form with a check or money order made
payable to Stanford University.

Foreign Orders must include $1.00 extra for each $15.00's worth of reports to
cover  postage.  Checks  must  be drawn  on  a  U.S.  bank, payable in U.S.
dollars.  If an invoice is required for payment, please note that this form
is both an order form and an invoice.  We cannot invoice separately.

                          (Name)  ________________________________

                          (Address)_______________________________

                                  ________________________________

                                  ________________________________

                                  ________________________________

                   E-mail address:________________________________


               REPORT #  HARDCOPY  FICHE        REPORT #  HARDCOPY  FICHE
               TR-425    $ 5.48    $2.00        TR-438    $ 4.92    $2.00
               TR-426    $ 5.24    $2.00        TR-439    $ 5.00    $2.00
               TR-427    $16.28    $3.00        TR-440    $10.00    $2.00
               TR-428    $ 5.08    $2.00        TR-441    $ 5.08    $2.00
               TR-429    $ 4.20    $2.00        TR-442    $ 4.68    $2.00
               TR-430    $ 4.28    $2.00        TR-443    $ 7.08    $2.00
               TR-431    $ 5.32    $2.00        TR-444    $ 7.64    $2.00
               TR-432    $ 5.32    $2.00        TR-445    $12.84    $3.00
               TR-433    $ 4.44    $2.00        TR-446    $ 5.40    $2.00
               TR-434    $ 4.52    $2.00        TR-447    $12.84    $3.00
               TR-435    $13.64    $3.00        TR-448    $ 5.40    $2.00
               TR-436    $ 4.20    $2.00        TR-449    $ 5.48    $2.00
               TR-437    $ 4.20    $2.00        TR-450    $15.96    $3.00

                                  Subtotal    __________

                  CA tax or Foreign Surcharge __________

                                  Total       __________

Circle which ones you want: Announcement 15, 16, 17, 18
Check here ___ to receive 1990-91 Price Lists (with report order, only)

** END OF MESSAGE **