[comp.doc.techreports] stanford6

leff@smu.CSNET.UUCP (07/10/87)
                                Naomi Schulman
                                 Publications
                          COMPUTER SYSTEMS LABORATORY
                              STANFORD UNIVERSITY
                              Stanford, CA 94305

                            RECENT REPORTS & NOTES
                                    LIST #9
                                 OCTOBER, 1986


                To order reports see end of this announcement.


                                   ABSTRACTS


CSL-TR-86-302
Preemptable Remote Execution Facilities for Loosely-Coupled Distributed
Systems
Marvin M. Theimer
June 1986                                                   139 pages.....$9.45
A  loosely-coupled  distributed  system consisting of a cluster of workstations
and server machines represents a large amount of computational power,  much  of
which  is  frequently  idle.  Users  would  like to take advantage of this idle
processing power by running one or  more  jobs  in  parallel  on  underutilized
workstations.    In this thesis, we study the key design and performance issues
that affect preemptable  remote  execution  in  a  loosely-coupled  distributed
system.    Five  major  topics  are  addressed  in  our  work: (1) provision of
network-transparent  execution  environments  for  programs,  (2)   structuring
migration  facilities such that they interfere with the normal operation of the
system in a minimal manner, (3) elimination of residual dependencies that occur
when a program migrates but has  state  information  left  in  machine-relative
servers  on the original machine, (4) provision of global scheduling facilities
for finding idle/lightly loaded machines for remote execution and migration  of
programs,  and  (5)  provision  of  fair  access  to global resources among the
programs and users of a system.  In the process of addressing these  topics  we
delineate   when   remote  execution  facilities,  with  or  without  migration
facilities, are useful and under what conditions they are easy  (or  difficult)
to provide.  
-------------------------------------------------------------------------------

T.R. 86-303
The Semantics of Timing Constructs in Hardware Description Languages
David C. Luckham, Youm Huh & Alec G. Stanculescu
August 1986                                                  31 pages.....$4.05
Three different approaches to the representation of time in high level hardware
design  languages are described and compared. The first is the timed assignment
statement of ADLIB/SABLE which anticipates future events.  The  second  is  the
timed assignment of VHDL which predicts future events and allows predictions to
be  preempted  by  other  predictions.  The  third  is a new proposed method of
expressing time dependency by qualifying expressions so that their  values  are
required  to  be  constant  over  a specified time interval. Examples comparing
these three approaches are given. It is shown  how  time-qualified  expressions
could  be  introduced  into a hardware description language. The possibility of
proving correctness of hardware models in this language is illustrated.  
-------------------------------------------------------------------------------

CSL T.R. 86-304
An Analytical Cache Model
Anant Agarwal, Mark Horowitz & John Hennessy
September 1986                                               47 pages.....$4.85
Trace driven simulation and hardware measurement are the techniques most  often
used  to obtain accurate performance figures for caches.  The former requires a
large amount of simulation time to evaluate each cache configuration while  the
latter  is  restricted to measurements of existing caches.  An analytical cache
model that uses parameters  extracted  from  address  traces  of  programs  can
provide  estimates  of  cache performance and show the effects of varying cache
parameters.  By representing the factors  that  affect  cache  performance,  we
develop  an  analytical  model  that  gives  miss  rates for a given trace as a
function of cache-size, degree of associativity,  block-size,  multiprogramming
level,  task  switch  interval, and observation interval.  The predicted values
closely approximate the results of trace drive simulations while requiring only
a small fraction of the computation cost.  
-------------------------------------------------------------------------------

CSL-TR-86-305
An Environment for Ada Software Development Based on Formal Specification
David C. Luckham, Randall Neff and David Rosenblum
August 1986                                                  46 pages.....$4.80
This report gives an overview of the current status and plans  to  construct  a
prototype  environment  of advanced tools for software and hardware development
based on the use of  wide-spectrum  languages.    The  wide-spectrum  languages
include Anna (ANNotated Ada), TSL (Task Sequencing Language), and HDL (Hardware
Design  Language).    The  tools  described here provide interactive aid at all
stages in the system development  process,  including  both  the  software  and
hardware  components  of  systems.    Special emphasis is placed on distributed
computing, both in providing tools for  handling  parallelism  in  the  subject
system,  and  in  designing  tools  that utilize parallelism in the programming
environment.  Applications of these tools include requirements analysis, formal
specification, rapid prototyping, testing, formal verfication and  construction
of self-testing Ada software for multi-processor systems.  
-------------------------------------------------------------------------------

CSL-TR-86-306
Queueing Network Models for Parallel Processing of Task Systems:
An Operational Approach
by Victor W.K. Mak
September 1986                                               27 pages.....$3.85
Computer  performance  modeling  of  possibly  complex  computations running on
highly concurrent systems is considered.  Earlier works  in  this  area  either
dealt  with  a  very  simple  program  structure  or  resulted  in methods with
exponential complexity.    A  computationally  efficient  approximate  solution
method  is  developed  to compute the performance measures for series-parallel-
reducible task systems using queueing network models.  Numerical results for  a
number of test cases are presented and compared to those of simulations.  
-------------------------------------------------------------------------------

CSL-TR-86-307
A Survey of Concurrent Architectures
Victor W.K. Mak
September 1986                                               34 pages.....$4.20
A  survey of 18 different concurrent architectures is presented in this report.
Although this is by no means complete, it does cover a wide  spectrum  of  both
commercial  and  research  architectures.    A  scheme  is proposed to describe
concurrent architectures using different  dimensions:  models  of  computation,
interconnection  network,  processing  element,  memory system, and application
areas.  
-------------------------------------------------------------------------------

CSL-TR-86-308
Performance Optimization of Digital VLSI Circuits
David P. Marple
September 1986                                              141 pages.....$9.55
Designers of digital VLSI circuits have virtually no computer  tools  available
for  the  optimization  of  circuit  performance.    Instead, a designer relies
extensively on circuit analysis tools, such as circuit simulation SPICE, and/or
critical delay path analysis.  A circuit analysis approach to digital design is
very labor intensive and seldom produces a circuit with optimum  area/delay  or
power/delay tradeoff.
The  goal  of this research is to provide a synthesis approach to the design of
digital circuits by finding the sizes  of  transistors  that  optimize  circuit
performance  (delay,  area,  power).  Solutions are found which are optimum for
all possible delay paths of a given circuit and not for  just  a  single  path.
The  approach  of  this  research  is to formulate the problem of area/delay or
power/delay optimization as a nonlinear program.  Conditions for optimality are
then established using graph theory and Kuhn-Tucker conditions.    Finally  the
use  of  augmented  Lagrangian and projected Lagrangian algorithms are reviewed
for the solution of the nonlinear programs.  Two computer programs,  PLATO  and
COP, have been developed by the author to optimize CMOS PLA's PLATO and general
CMOS  circuits  COP.  These tools provably find the globally optimum transistor
sizes for a given circuit.  Results are presented for PLA's and small to medium
sized cells.  
-------------------------------------------------------------------------------

CSL-TR-86-309
Design of Testbed and Emulation Tools
by Stephen F. Lundstrom and Michael J. Flynn
September 1986                                               85 pages.....$6.75
The research summarized in this report was concerned with the design of testbed
and emulation tools suitable to assist in projecting, with reasonable accuracy,
the expected performance of  highly  concurrent  computing  systems  on  large,
complete  applications.   Such testbed and emulation tools are intended for the
eventual use  of  those  exploring  new  concurrent  system  architectures  and
organizations,  either as users or as designers of such systems.  While a range
of alternatives was considered, a software-based set of hierarchical tools  was
chosen  to  provide  maximum flexibility, to ease in moving to new computers as
technology improves and to take  advantage  of  the  inherent  reliability  and
availability of commercially available computing systems.  
-------------------------------------------------------------------------------

CSL-TR-86-311
Distributed, Replicated Computer Bulletin Board Service
by July L. Edighoffer
June 1986                                                  160 pages.....$10.50
Computer  systems  offer  a variety of services to assist communication between
people.  This dissertation examines computer bulletin boards, one such facility
that allows recipients to arrange for the delivery of  messages  on  topics  of
personal  interest.  The thesis focuses on the problems of replication and cost
scaling.
It is no longer necessarily true that users are closely tied to a single  host,
yet  current  methods for replicating bulletin boards do not provice a good way
to represent what a person has seen that  is  independent  of  the  copy  read.
Existing  replication  algorithms  either  don't  support copy-independent read
records or offer too little concurrency for  this  application.    An  original
replication  algorithm  provides  a  copy-independent  ordering for submissions
using just a single copy of a bulletin board during the execution of  the  user
operations.    The algorithm works well even on a network frequently in a state
of partition.
A more significant problem from the viewpoint of computer system administrators
is the cost of a distributed bulletin board service.  In existing mail  systems
and  bulletin  board  systems,  such  as distribution on the Arpanet and USENET
running under UNIX, the cost  per  participating  computer  tends  to  grow  in
proportion  to  the  network  size.    The causes for this poor scaling will be
examined, then it will be explained how a structured name space  together  with
suitable operations on it leads to improved scaling by encouraging the creation
of highly specialized bulletin boards.  
-------------------------------------------------------------------------------

CSL-TR-86-312
Concurrent Runtime Checking of Annotated Ada Programs
by David S. Rosenblum, Sriram Sankar and David C. Luckham
November 1986                                                40 pages.....$4.50
Anna is a language for writing machine-processable annotations of Ada programs.
One  of the main applications of Anna is the runtime checking of an Ada program
for  consistency  with  its  formal  specifications  written  in  Anna.      On
single-processor  systems,  Anna  runtime  checks  are  used during testing and
debugging of software.
This paper describes strategies for distributing Anna runtime  checks  so  that
they  are executed in parallel with the Ada program.  Concurrent checking of an
annotated  program  can  offer  a  substantial  computational  speedup  over  a
sequentially  checked version of the same program.  Concurrent checking of Anna
is therefore a crucial step in producing a self-checking  program  by  allowing
runtime  checks for annotations to reside permanently in production versions of
the program.  Parallel checking will not  always  be  useful  in  self-checking
code,  but  certain kinds of annotations require parallel checking in real-time
and interactive programs.
This paper defines an efficient parallel checking model in  which  checking  is
performed  by  Ada  tasks  running  in parallel with the underlying Ada program
being checked.  The difficulties in reporting Anna consistency violations in  a
parallel  environment are also described.  Finally, the paper discusses some of
the practical aspects of mixing checking strategies whereby sequential checking
may be applied to some kinds of annotations and distributed checking  to  other
kinds.  
-------------------------------------------------------------------------------

                                Technical Notes


CSL T.N. 86-289
Architecture and Cache Simulation Results for Individual Benchmarks
by Chad L. Mitchell
June 1986                                                  163 pages.....$10.65
This  technical  note  contains  the  detailed  simulation results for the work
discussed in CSL T.R. 86-296.  Over fifty architectures were simulated for five
benchmarks.  This note gives the results for each individual  architecture  and
benchmark combination with 26 different memory configurations.  
-------------------------------------------------------------------------------

CSL T.N. 86-302
A Performance Comparison Between PLM and an MC68020 Prolog Processor
Hans Mulder and Evan Tick
September 1986                                               75 pages.....$6.25
This  report  compares  the  performance  of  the  UC Berkeley Programmed Logic
Machine (PLM) and a generic MC68020 processor running  large  Prolog  programs.
The  PLM  compiler  translates  Prolog source programs into the Warren Abstract
Machine (WAM) instruction set.  The PLM is  a  microcoded  host  for  the  WAM.
Important  built-in  functions,  e.g.,  general unification, are implemented in
microcode.  The MC68020 model used assumes a compiler  which  first  translates
Prolog  source programs into WAM intermediate code and then into native MC68020
code.  Important built-in functions are  implemented  as  MC68020  subroutines.
Timing  measurements  of  benchmark  programs,  in terms of cycles executed and
logical  inferences  per  second  (LIPS),  are  given  for  variants  of  these
architecture models.  Results indicate that the MC68020 needs approximately 2.5
to  3.5 times the number of cycles needed by the PLM, primarily due to poor tag
handling.  
-------------------------------------------------------------------------------

                                 Publications

                          COMPUTER SYSTEMS LABORATORY

                              Stanford University

                              Stanford, CA 94305

                                 415-723-1430

                              (Hours: M-Th, 8-1)

                                  ORDER FORM

To Order Reports: Print or type your name and address in the space provided.
Check or circle the  report(s)  you  wish  to  purchase,  whether  hardcopy  or
microfiche.    All orders must be PREPAID.  CALIFORNIA RESIDENTS must add sales
tax of their local county.  Return this order form with  your  check  or  money
order  made payable to Stanford University. FOREIGN ORDERS must be paid with an
international money order or a check drawn on a U.S. bank, payable in dollars.
Please type or print your name and complete address:
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
Report #  Hardcopy  Microfiche     Report #  Hardcopy  Microfiche
--------  --------  ----------     --------  --------  ----------
TR 302      $9.45     $3.00        TR 307      $4.20     $2.00
TR 303      $4.05     $2.00        TR 308      $9.55     $3.00
TR 304      $4.85     $2.00        TR 309      $6.75     $2.00
TR 305      $4.80     $2.00        TR 311      $10.50   $3.00
TR 306      $3.85     $2.00        TR 312      $4.50     $2.00

TN 289      $10.65
TN 302      $6.25


                          Subtotal __________
          Your Local County CA tax __________
                             Total __________
-------