[comp.doc.techreports] tr-input/stanford.15

leff@smu.UUCP (Laurence Leff) (02/27/89)

                                Naomi Schulman
                                 Publications
                          COMPUTER SYSTEMS LABORATORY
                              STANFORD UNIVERSITY
                            Stanford, CA 94305-4055

                          (415) 273-1430 (M-Th, 8-1)

                            RECENT REPORTS & NOTES
                                   LIST #15
                                 JANUARY 1989

To order reports, see end of this announcement.

                                   ABSTRACTS

CSL-TR-88-365
Deriving Accurate Fault Models
John Acken
October 1988                                                130 pages.....$9.50

Modeling the effects that physical failures have on integrated circuit behavior
is important for both design and test.  Fault models describe circuit  behavior
at  a  level of abstraction that allows tests to be generated without requiring
detailed knowledge of physical failures.  This thesis presents  three  criteria
for  deriving  accurate  fault  models  and describes a method that meets these
criteria.  The method is to design, fabricate, and test a RAM specifically  for
deriving fault models, as opposed to the common practice of using RAMs to drive
technology development.   To  demonstrate  this  approach,  a  static  RAM  was
designed,  fabricated,  and  tested to derive models for a target technology of
CMOS custom logic circuits.
For CMOS technology, the most accurate logic level model for a  bridging  fault
is a vote among the shorted nodes.  This vote is based upon the relative output
strengths of the shorted transistors, and the  short  does  not  result  in  an
indeterminate  logic  value.    Using  the  voting  model  requires neither the
electrical details of each specific shorted circuit nor the physical details of
the  shorting  failures.    All  that is needed is the relative strength of the
transistors and the location of the potential  bridging  fault  sites.    These
sites depend upon the physical layout, but can be easily located using standard
layout tools such as circuit extractors.    The  average  number  of  potential
bridging faults per node is small, between 4 and 8.
Bridging  faults  require  that  differences  between  logical  clustering  and
physical clustering must be considered for successful testing.  In  particular,
circuit segmentation for pseudo-exhaustive test is modified to utilize physical
layout information.

-------------------------------------------------------------------------------

CSL-TR-88-366
Performance-Directed Memory Hierarchy Design
Steven A. Przybylski
September 1988                                             183 pages.....$12.15

Most Cache studies to date have concentrated on time independent metrics:  miss
rates  and  traffic  ratios.    These  metrics  can be misleading indicators of
overall performance, and minimizing them can  lead  to  distinctly  sub-optimal
computer  implementations.  In this thesis, program execution time replaces the
miss rate as the basis  of  evaluation  of  memory  hierarchies.    The  thesis
describes  the  characteristics  of performance-optimal single- and multi-level
cache hierarchies.
Using performance as the cache design metric introduces a often ignored  design
                                       1


variable:  the  machine's  cycle  time.    The inevitable tradeoffs between the
temporal and organizational variables are exposed by equating a change  in  the
cache  size,  set associativity or block size to a change in cycle time.  After
the  tradeoffs  between  the  temporal  and   organizational   parameters   are
individually  explored,  they  are  unified  in  a  single  graph  that  allows
comparison of the performance of a wide  variety  of  cache  organizations  and
implementations.
Performance-directed   cache  analysis  reveals  two  dilemmas  facing  machine
designers: small fast caches  are  not  optimal;  single-level  caches  have  a
definite   upper   performance   bound.    Multi-level  cache  hierarchies  can
simultaneously break the single-level  performance  barrier  and  decrease  the
optimal size and cycle time of the cache nearest the CPU.  For this reason, the
local and global  characteristics  of  multi-level  hierarchies  are  explored.
Given  a  set  of  implementable  caches,  dynamic  programming  can  find  the
performance-optimal multi-level hierarchy.  The optimum number of levels in the
hierarchy  increases  as  CPU  cycle times decrease with respect to main memory
access  times.    Hierarchies  with  depths  of  two  and  three  approach  the
theoretically maximal performance limit.
This thesis concludes with a succinct set of guidelines that will aid designers
in finding  the  memory  hierarchy  that  maximizes  system  performance  given
particular implementation constraints.

-------------------------------------------------------------------------------

CSL-TR-88-367
An Overview of VAL
Larry M. Augustin, Benoit A. Gennart, Youm Huh, David C. Luckham, and
Alec G. Stanculescu
September 1988                                               39 pages.....$4.95

VAL  (VHDL  Annotation  Language)  provides  a  small  number  of  new language
constructs to annotate VHDL hardware descriptions.  VAL annotations,  added  to
the  VHDL  entity  declaration in the form of formal comments, express intended
behavior common to all architectural bodies of the  entity.    Annotations  are
expressed  as  parallel  processes  that  accept  streams  of input signals and
generate constraints on output streams.  VAL views signals as streams of values
ordered by time.  Generalized timing expressions allow the designer to refer to
relative points on a stream.  No concept of preemptive  delayed  assignment  or
inertial  delay  are needed when referring to different relative points in time
on a stream.  The VAL abstract state model permits abstract data  types  to  be
used  in  specifying  history  dependent  device  behavior.  Annotations placed
inside a VHDL architecture define detailed correspondences between the behavior
specification and architecture.  The result is a simple but expressive language
extension of VHDL with possible applications  to  automatic  checking  of  VHDL
simulations,  hierarchical  design,  and  automatic  verification  of  hardware
designs in VHDL.

-------------------------------------------------------------------------------

CSL-TR-88-368
Improved Models for Switch-Level Simulation
Chorng-Yeong Chu
November 1988                                              147 pages.....$10.35

Simulation plays an important role in design  verification.  With  increasingly
large  VLSI  designs,  the  switch-level  representation  has  become  the only
approach that is both reasonably accurate and computationally feasible.
At present, switch-level simulators use relatively  unsophisticated  techniques
to  extract  information  from  the switch-level representation, and even these
small amounts of information are not always fully utilized. As a result,  these
simulators  often  lack  accuracy.  Most  notably,  the  way  some switch-level
                                       2


simulators  compute  the  final  value  can  potentially  generate  undesirably
pessimistic results, and charge-sharing problems are widely ignored.
This  thesis  shows  how  to  extract more information based on the same set of
widely adopted switch-level assumptions.  Using  more  sophisticated  analyses,
this  thesis  presents  better  final-value  and charge-sharing models. The new
final-value model uses a systematic way to look at the relationship between the
voltage  and  the  resistance.  This  approach can also objectively compare the
accuracy of  different  DC-computation  schemes.  Charge-sharing  problems  are
modeled with two time constants. The two-time-constant approach is based on the
observation that most waveforms due to charge sharing are dominated by  a  pair
of  time  constants.  Charge-sharing  models  are first constructed on resistor
networks, then they are extended to transistor networks.
These models have been incorporated into nRSIM ---  a  RSIM-based  switch-level
simulator.  The  new  simulator has the same running time as the original RSIM,
but it can handle a larger class of circuits.

-------------------------------------------------------------------------------

CSL-TR-88-369
Composing User Interfaces with InterViews
Mark A. Linton, John M. Vlissides, and Paul R. Calder
November 1988                                                25 pages.....$4.25

In this paper we show how to compose user interfaces with  InterViews,  a  user
interface toolkit we have developed at Stanford.  InterViews provides a library
of predefined objects and a set of  protocols  for  composing  them.    A  user
interface  is created by composing simple primitives in a hierarchical fashion,
allowing complex user interfaces to be implemented easily.  InterViews supports
the  composition  of  interactive objects (such as scroll bars and menus), text
objects (such as words and whitespace), and graphics objects (such  as  circles
and  polygons).  To illustrate how InterViews composition mechanisms facilitate
the implementation of user interfaces, we present three simple applications:  a
dialog  box  built from interactive objects, a drawing editor using a hierarchy
of graphical objects, and a class browser using a hierarchy  of  text  objects.
We  also  describe  how  InterViews supports consistency across applications as
well as end-user customization.

-------------------------------------------------------------------------------

CSL-TR-88-370
Copy Elimination in Functional Languages
K. Gopinath and John L. Hennessy
November 1988                                                15 pages.....$3.75

Copy  elimination  is  an  important  optimization  for  compiling   functional
languages.  Copies arise because these languages lack the concepts of state and
variable; hence updating an object involves a copy in a  naive  implementation.
Copies  are  also  possible if proper targeting has not been carried out inside
functions and across function calls. Targeting is the  proper  selection  of  a
storage  area  for  evaluating  an  expression.  By abstracting a collection of
functions by a target operator, we compute targets of function bodies that  can
then  be  used  to  define  an optimized interpreter to eliminate copies due to
updates and copies across function calls.  The language we  consider  is  typed
lambda  calculus  with  higher-order functions and special constructs for array
operations. Our approach can eliminate copies in divide  and  conquer  problems
like quicksort and bitonic sort that previous approaches could not handle.


We also present some results of implementing a compiler for a single assignment
language called SAL on some small but tough programs. Our results indicate that
it  is  possible  to  approach a performance comparable to imperative languages
like Pascal.
                                       3


-------------------------------------------------------------------------------

CSL-TR-88-371
Generalized Access Control Strategies for Token Passing Systems
Joseph W. M. Pang, Fouad A. Tobagi and Stephen Boyd
November 1988                                                42 pages.....$5.10

As the concepts of distributed computing,  resource  sharing,  and  office  and
factory  automation  are  being  put  into  practice, the demand for integrated
services local area networks has never been greater before.  Recently, a  timer
based  integrated  access  protocol has been adopted by two major token passing
local area network standards, the IEEE 802.4 Token Bus  Standard  [1]  and  the
ANSI  drafted  Fiber  Distributed  Data  Interface  (FDDI)  Standard [2].  This
protocol is capable of providing bounded access delay  for  real-time  traffic,
and  multiple  priority classes for data traffic [3]--[4].  Nevertheless, under
heavy traffic, this protocol is not asymptotically stable.  In this report,  we
present  a  very broad class of access control strategies, generalized from the
timer based approach adopted in [1]--[2].  Similar to the timer based protocol,
this  class  of  access control strategies has the same favorable properties of
providing bounded token cycles and multiple classes of service.    Furthermore,
at high load, the token cycles for this new class of strategies converge with a
geometric bound, as opposed to exhibiting the  oscillatory  behaviour  observed
with  the timer based protocol in [3]--[4].  Based on the convergence property,
we are able to compute the efficiency of the controlled  access  token  channel
and  the bandwidth allocation among stations under heavy traffic.  Moreover, we
derive tight upper bounds on token cycles under arbitrary  loading  conditions.
This  study  not  only  enriches  the design space of integrated services token
systems, but also provides  meaningful  insights  into  the  control  mechanism
within the protocol.

-------------------------------------------------------------------------------

CSL-TR-88-372
Incremental VLSI Compaction
Clyde W. Carpenter
December 1988                                                117 pages....$8.85

VLSI  compaction  is the translation from a high-level description of a circuit
down to the detailed layout needed for fabrication.  A compactor tries to  make
as  compact  a  layout  as  possible  without  violating  any design rules.  An
incremental  compactor  allows  one  to  edit  a  schematic  or  change  layout
constraints and quickly see the effects of the change.
An   incremental   compactor  has  to  incrementally  generate  and  solve  the
constraints needed to enforce the design rules.  This dissertation presents  an
algorithm  that  uses  adjacency  lists  to generate and incrementally update a
minimal complete set of the spacing constraints needed to keep  adjacent  tiles
in  a  layout  from  interfering  with  each other.  The base algorithm creates
clockwise threaded lists of non-overlapping, fixed-size tiles.   The  algorithm
is  complicated  by  the  need  to handle wires, overlapping tiles, and various
different spacing rules.  In near linear time it generates an  average  of  1.2
spacing  constraints  per  tile.    The  adjacency  lists allow fast, efficient
updates when tiles are moved, deleted, or inserted.
This dissertation also presents three algorithms to solve the constraints  once
they  are  generated.    In  addition to minimizing area, these algorithms also
minimize the total wire length.    One  of  them  calculates  the  sum  of  the
wire-pull  weights  on  each  subtree  of  a  directed  spanning tree of active
constraints to decide which subtrees need to be  moved.    This  weighted  tree
provides enough information to make incremental changes in time proportional to
the size of the change instead of to the size  of  the  circuit.    Wire-length
minimization  improves  the  layout  but gives compaction a slightly worse than
linear expected time.
                                       4


-------------------------------------------------------------------------------

CSL-TR-88-373
Sparse Distributed Memory Prototype: Address Module Hardware Guide
M. J. Flynn, R. Zeidman, E. Lochner
December 1988                                                78 pages.....$6.90

This document is a detailed specification of the hardware design of the Address
Module  for  the  prototype  Sparse  Distributed Memory. It contains all of the
information needed to build,  test,  debug,  modify  and  operate  the  Address
Module.

-------------------------------------------------------------------------------

CSL-TR-88-374
Post-Game Analysis--- A Heuristic Resource Management Framework for
Concurrent Systems
Jerry C. Yan
December 1988                                              212 pages.....$13.60

While  multiprocessors  promise  to  deliver  orders  of magnitude speedup, the
effective use of such computers still depends  critically  on  the  ability  of
their  resource management systems to properly trade off communication loss and
concurrency  gain,  exploit  behavioral  characteristics  of  the   application
programs,   and   take   advantage   of   specific  hardware  features  of  the
multiprocessor.  Unfortunately,  the  performance  of  many  proposed  resource
management  strategies  can  only  be  evaluated indirectly by simple objective
functions (such as execution cost or mapping cardinality).
Research has been conducted to determine how distributed  computations  can  be
mapped  onto  multiprocessors to minimize execution time. The approach proposed
here, known as Post-game analysis,  offers  an  unconventional  alternative  to
reduce  program  execution  time.  Program--machine  mapping  is  improved ``in
between''  program  executions.  Instead  of  using  simple  abstract   models,
post-game   analysis  utilizes  actual  timing  data  gathered  during  program
execution. Program  execution  time  is  reduced  based  on  many  optimization
sub-goals.  Because  heuristics  are applied to improve the current mapping and
resolve conflicting sub-goals, post-game analysis can be incrementally  refined
and  tailored  to  specific applications and architectures.  The performance of
post-game analysis has  been  compared  with  other  strategies  using  various
program structures and multiprocessor models. Results obtained from simulations
show that it outperforms the speedup obtained based on random  placement,  load
balancing  as  well  as  clustering  algorithms  by  15%.  Because intermediate
program/machine models are not used, post-game analysis is very  promising  for
immediate  application  to today's programs with today's machines.} area, these
algorithms also minimize the total wire length.  One of them calculates the sum
of  the wire-pull weights on each subtree of a directed spanning tree of active
constraints to decide which subtrees need to be  moved.    This  weighted  tree
provides enough information to make incremental changes in time proportional to
the size of the change instead of to the size  of  the  circuit.    Wire-length
minimization  improves  the  layout  but gives compaction a slightly worse than
linear expected time.

-------------------------------------------------------------------------------

CSL-TR-89-375
THROUGHPUT APPROXIMATION FOR THE GENERALIZED TIMER BASED PROTOCOL
FOR TOKEN PASSING SYSTEMS
Joseph W. M. Pang and Fouad A. Tobagi
January 1989                                                 33 pages.....$4.65

Recently, a timer based integrated access protocol  has  been  adopted  by  two
                                       5


major local area network (LAN) standards, the IEEE 802.4 Token Bus Standard [1]
and the ANSI drafted Fiber Distributed Data Interface Standard [2].  This timer
based  approach  is  capable  of  providing  bounded access delay for real-time
traffic (i.e., traffic with stringent delay requirements), multiple classes  of
service,  and  efficient  use of the communication channel [3]-[4].  Based on a
careful examination of the dynamical behaviour  of  the  timer  based  protocol
under heavy traffic, we have been able to generalize the underlying idea behind
the protocol and proposed a new class of access control schemes  in  [5].    We
have  also  established  in  [5]  important results pertaining to the bandwidth
allocation under heavy traffic and upper bounds on token cycles  under  general
traffic for the generalized control schemes.  Although the results on bandwidth
allocation obtained in [5] are very useful for first-cut design purposes,  they
are  only applicable under heavy traffic.  In this report, we are interested in
obtaining the station throughputs for the  generalized  control  schemes  under
arbitrary input traffic.  Based on the assumption that the variances of station
service times are small, we develop a simple and yet  very  good  approximation
for the station throughputs, that is dependent only on the average data arrival
rates at the stations.  Using this approximation, we can  determine  whether  a
station  in  the  system  is  stable  or not.  Furthermore, if a station is not
stable,  its  throughput  can  be  obtained  from  the  approximation.     This
approximation  is  also  applicable to the timer based protocol used in the LAN
standards [1]-[2].

-------------------------------------------------------------------------------
                                       6


                                Naomi Schulman
                          COMPUTER SYSTEMS LABORATORY
                              Stanford University
                            Stanford, CA 94305-4055
                       EM: Schulman@Sierra.Stanford.Edu
                                 415-723-1430
                               Hours: M-Th, 8-1

                                  ORDER FORM

To Order Reports: Print or type your name and address in the space provided.
Check or circle the  report(s)  you  wish  to  purchase,  whether  hardcopy  or
microfiche.    All orders must be PREPAID.  CALIFORNIA RESIDENTS must add sales
tax of their local county.  Return this order form with  your  check  or  money
order  made payable to Stanford University. FOREIGN ORDERS must be paid with an
international money order or a check drawn on a U.S. bank, payable in dollars.
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
_____________________________________________

Report #  Hardcopy  Microfiche     Report #  Hardcopy  Microfiche

TR 365      $9.50      $3.00        TR 371     $5.10      $2.00
TR 366     $12.15      $3.00        TR 372     $8.85      $3.00
TR 367      $4.95      $2.00        TR 373     $6.90      $2.00
TR 368     $10.35      $3.00        TR 374    $13.60      $4.00
TR 369      $4.25      $2.00        TR 375     $4.65      $2.00
TR 370      $3.75      $2.00

                          Subtotal __________
          Your Local CA County tax __________
                             Total __________


-------