[comp.doc.techreports] stanford.13

leff@smu.UUCP (Laurence Leff) (03/17/88)
                                Naomi Schulman
                                 Publications
                          COMPUTER SYSTEMS LABORATORY
                              STANFORD UNIVERSITY
                            Stanford, CA 94305-4055

                            RECENT REPORTS & NOTES
                                   LIST #13
                                  MARCH 1988

To order reports, see end of this announcement.

                                   ABSTRACTS

CSL-TR-87-343
EDITING GRAPHICAL OBJECTS USING PROCEDURAL REPRESENTATIONS
Paul J. Asente
October 1987                                                117 pages.....$8.35
Traditionally,   people  have  created  computer-generated  images  by  writing
programs in a programming language that  supports  graphics.    More  recently,
interactive  graphics  editors  have  become commonplace.  Graphics editors are
easy to use but frequently lack many of  the  capabilities  found  in  graphics
programming languages.  This deficiency is intrinsic to graphics editors; it is
not a result of neglect or incompetence by the implementer.
Tweedle is a graphics editor that attempts  to  bridge  this  gap  by  using  a
program  as  its  internal  representation  for  a  picture.  During an editing
session  the  user  can  modify  either  the  picture  itself  or  the  program
representation;  the editor modifies the other to keep the two consistent.  The
language used by  the  editor  contains  features  that  allow  the  editor  to
incrementally  execute  parts  of a program in response to a change so that the
picture can be regenerated without completely reexecuting the program.
The use of a  procedural  representation  allows  the  user  to  create  easily
pictures  with  repetition, recursion, and calculated point values.  It further
allows him to define parts of a drawing  as  variants  of  other  parts;  these
variants  can  differ  from  their original objects in quite arbitrary ways but
still respond to changes made to the original.
A working prototype of Tweedle has been implemented under the X Window  System.
-------------------------------------------------------------------------------

CSL-TR-87-344
PERFORMANCE PREDICTION OF CONCURRENT SYSTEMS
Victor W. K. Mak
December 1987                                              162 pages.....$10.60
Concurrent systems are computers that use multiple processors to solve a single
problem.  A means to predict the application performance on these systems is  a
useful  tool in many areas of concurrent system research.  Earlier work in this
area either dealt with very restricted program structures or used methods  with
exponential   complexity.     This  dissertation  describes  a  computationally
efficient and accurate method to predict performance for a  class  of  parallel
computations on concurrent systems.
A   parallel   computation   is  modeled  as  a  task  system  with  precedence
relationships expressed as a series-parallel directed acyclic graph.  Resources
in  concurrent  systems  are  modeled  as  service  centers in queueing network
models.  Using these two models as inputs, the method  outputs  predictions  of
both   the   time  to  complete  the  computation  and  the  concurrent  system
utilization.
The algorithm used is based on the approximate Mean Value Analysis in  queueing
network  modeling with extensions to model concurrency in the computation.  The
new algorithm has been validated against both detailed  simulation  and  actual
execution on a commercial multiprocessor.  Average error of the prediction when
compared to simulation statistics is 1.7% with a standard  deviation  of  1.5%.
The  algorithm  is  iterative.    The  time complexity of each iteration of the
                2     3                              2
algorithm is O(N K + N ), the space complexity is O(N K) where N is the  number
of  tasks in the computation and K is the number of resources in the concurrent
system.  The prediction method has a very high rate of convergence.  90% of the
one  hundred test cases evaluated achieved convergence within seven iterations.
Worst case convergence observed was twelve iterations.  
-------------------------------------------------------------------------------
CSL-TR-87-345
TRADEOFFS IN PROCESSOR-ARCHITECTURE AND DATA-BUFFER DESIGN
Johannes M. Mulder
December 1987                                              219 pages.....$13.45
Processor-Memory traffic can be a serious performance constraint  for  computer
systems  in  general  and  for  closely  coupled  processors specifically. In a
memory-bound system the data traffic becomes a critical performance factor when
sufficient  instruction buffering is used; when highly encoded instructions are
used; or when multiple processors  share  data  memory.    Microprocessors  are
particularly  vulnerable  to  performance  degradation because of pin-bandwidth
limitations.
To relax the  memory  bandwidth  requirements  and  sustain  a  high  processor
throughput,  buffering  of  both instruction and data requests is essential for
high performance.  This dissertation describes the performance and  utility  of
buffer hierarchies under reference patterns typical for procedural languages.
Hardware  data-buffering schemes (multiple register sets and stack buffers) can
profit significantly from compile-time assistance. Even  single  register  sets
perform    competitively   with   these   hardware   schemes   when   allocated
interprocedurally.
In multi-level buffering,  first-level  (register  oriented)  and  second-level
(cache)  buffers have distinct purposes.  The first-level buffer determines the
peak performance of the processor, while the second-level determines the actual
performance.    Hence, a cache does not hide architectural (first-level buffer)
flaws, but emphasizes them, because it reduces  the  influence  of  the  memory
system on performance.
The  usefulness  of caches is a function of the size of the cache and of memory
speed. A cache, independent of size,  improves  the  system  performance,  when
memory traffic is the system bottleneck.  A cache has to be of significant size
to justify implementation  when  processor  performance  is  a  primary  design
objective  and  memory  is  relative  fast.  A  cache connected to an efficient
32-word buffer and 2-cycle memory only reduces the buffer  cycles  by  25%  for
1024  cache words, and by 10% for 128 cache words. The same configuration for a
4-cycle memory yields a cycle reduction of 50% and 30% respectively.  
-------------------------------------------------------------------------------

CSL-TR-87-346
BLAZENET: A PHOTONIC IMPLEMENTABLE WIDE-AREA NETWORK
Zygmunt Haas and David R. Cheriton
October 1987                                                 26 pages.....$3.80
High-performance wide-area networks are required to  interconnect  clusters  of
computers connected by local area and metropolitan area networks. Optical fiber
technology provides long distance channels  in  the  multi-gigabit  per  second
range.    The  challenge  is  to provide switching nodes that handle these data
rates with minimum delay, and at a reasonable cost.
In this paper, we describe a packet  switching  network,  christened  Blazenet,
that  provides  low  delay  and  has  minimal  memory  requirements.  It can be
extended to support multicase and  priority  delivery.    Such  a  network  can
revolutionize   the   opportunities   for   distributed  command  and  control,
information  and  resources  sharing,  real-time  conferencing,  and  wide-area
parallel computation, to mention but a few applications.  
-------------------------------------------------------------------------------

CSL-TR-88-347
TRACE COMPACTION USING CACHE FILTERING WITH BLOCKING
Anant Agarwal
December 1987                                                19 pages.....$3.45
Trace-driven  simulation  is  a popular method of estimating the performance of
cache memories, translation lookaside buffers, and paging schemes.  Because the
cost  of  trace-driven  simulation  is  directly  proportional to trace length,
reducing the number of references in the trace significantly impacts simulation
time.    This paper concentrates on trace-driven simulation for cache analysis.
A technique called cache filtering with blocking is presented  that  compresses
traces  by  exploiting  both  the  temporal  and spatial locality in the trace.
Experimental results show that this scheme can reduce trace  length  by  nearly
two  orders  of  magnitude  while introducing less than 15% error in cache miss
rate estimates.  
-------------------------------------------------------------------------------
CSL-TR-88-348
THOR USER'S MANUAL: TUTORIAL AND COMMANDS
Robert Alverson, Tom Blank, Kiyoung Choi, Sun Young Hwang, Arturo Salz,
Larry Soule, Thomas Rokicki
December 1987                                                66 pages.....$5.80
THOR is a behavioral simulation  environment  intended  for  use  with  digital
circuits  at  either the gate, register transfer, or functional levels.  Models
are written in the CHDL modeling  language  (a  hardware  description  language
based  on  the  "C" programming language).  Network descriptions are written in
the  CSL  language  supporting  hierarchical  network  descriptions.      Using
interactive  mode,  batch  mode  or  both  combined,  a variety of commands are
available to control execution.  Simulation output can  be  viewed  in  tabular
format  or  in  waveforms.   A library of components and a toolbox for building
simulation models are also provided.    Other  tools  include  CSLIM,  used  to
generate  boolean equations directly from THOR models and an interface to other
simulators (e.g. RSIM and a physical chip tester) so that two  simulations  can
be run concurrently verifying equivalent operation.
This  technical report is part one of two parts and is formated similar to UNIX
manuals.  Part one contains the THOR tutorial and all the  commands  associated
with  THOR.    Part  two contains descriptions of the general purpose functions
used in models, the parts library including many TTL components, and the  logic
analyzer model.  
-------------------------------------------------------------------------------
CSL-TR-88-349
THOR USER'S MANUAL: LIBRARY FUNCTIONS
Robert Alverson, Tom Blank, Kiyoung Choi, Sun Young Hwang, Arturo Salz,
Larry Soule, Thomas Rokicki
December 1987                                                98 pages.....$7.40
THOR  is  a  behavioral  simulation  environment  intended for use with digital
circuits at either the gate, register transfer, or functional levels.    Models
are  written  in  the  CHDL  modeling language (a hardware description language
based on the "C" programming language).  Network descriptions  are  written  in
the   CSL   language  supporting  hierarchical  network  descriptions.    Using
interactive mode, batch mode or  both  combined,  a  variety  of  commands  are
available  to  control  execution.   Simulation output can be viewed in tabular
format or in waveforms.  A library of components and  a  toolbox  for  building
simulation  models  are  also  provided.    Other  tools include CSLIM, used to
generate boolean equations directly from THOR models and an interface to  other
simulators  (e.g.  RSIM and a physical chip tester) so that two simulations can
be run concurrently verifying equivalent operation.
This technical report is part two of two parts and is formated similar to  UNIX
manuals.    Part one contains the THOR tutorial and all the commands associated
with THOR.  Part two contains descriptions of  the  general  purpose  functions
used  in models, the parts library including many TTL components, and the logic
analyzer model.  
-------------------------------------------------------------------------------

CSL-TR-88-350
THE ILSP BEHAVIORAL DESCRIPTION LANGUAGE AND ITS GRAPH REPRESENTATION FOR
BEHAVIORAL SYNTHESIS
Masayasu Odani, Sun Young Hwang, Tom Blank, Thomas Rokicki
March 1988                                                   38 pages.....$4.40
This report describes the ILSP behavioral description language and its internal
representation  employed  in  the  Hermod  behavioral  synthesis system.  Using
combined control and data flow graph (C/DFG) as an intermediate representation,
the  Hermod  system  generates  hardware modules and their interconnection from
behavioral descriptions.  The  Hermod  system  is  included  in  an  integrated
environment for hardware simulation and synthesis under development at Stanford
University.  The functional models written in the ILSP can be simulated on  the
THOR  logic/functional/behavioral  simulator without translation.  After proper
verification of its behavior, an ILSP model can be input to the synthesizer for
compilation into an RT-level description.
This report consists of two parts: the specification of the ILSP language and
its graph representation.
-------------------------------------------------------------------------------
CSL-TR-88-351
EMPIRICAL ANALYSIS OF A LISP SYSTEM
Robert A. Shaw
February 1988                                              189 pages.....$11.95
We have analyzed four large Lisp programs running in a Common Lisp system on  a
conventional  processor  to  provide information useful for implementers of new
Lisp systems.  The execution of over fifty million Lisp  function  calls  (over
one billion host instructions) are represented in the results.  We address some
possible implementation tradeoffs and provide pertinent empirical  information.
Among  the issues discussed are data tagging techniques, quantity of tags, time
and space efficient representations for Lisp data types,  methods  of  function
linkage, the nature of variable access, the use of tail-recursion and non-local
goto's,  and  garbage  collection.    Our  findings  indicate  that  for   many
conventional  processors  choices  should  include a small number of tag values
encoded in the low-order bits of a data reference, a simple ``car  followed  by
cdr''  organization for CONS cells, and an encoding of symbols which segregates
the different fields.
Garbage collection frequently represents a sizable portion of overall execution
time.     The  new  ``generation''  garbage  collectors  can  provide  dramatic
performance improvements, but require the maintenance of entry vector and  data
age  information.  To maintain the entry vector, previous generation collectors
have included runtime tests (generation checks) at each tagged store operation.
For  our Lisp system the cost of these checks in software is projected to be at
least 7% of runtime.
We describe a new version of generation garbage collection adapted for Lisp  on
conventional  processors.    Our  version  eliminates  generation checks during
normal execution: information maintained by the virtual memory system  is  used
to  construct  the entry vector.  A simple memory layout is employed which does
not require storage for age information (other than one value  per  generation)
and   allows   adjustment  of  the  boundaries  between  generations  to  avoid
fragmentation and maximize memory utilization in implementations  with  limited
memory.   An analysis of the new algorithm on the test system showed that, when
collections were separated by more than 1.4 seconds (or at least 160  kilobytes
of  new  allocation),  using virtual memory information required less time than
software generation checks.  
-------------------------------------------------------------------------------

CSL-TR-88-352
PARTITIONING OF REGULAR COMPUTATION ON MULTIPROCESSOR SYSTEMS
Fung Fung Lee
February 1987                                                17 pages.....$3.35
Problem partitioning of regular  computation  over  two-dimensional  meshes  on
multiprocessor  systems  is examined.  The regular computation model considered
involves repetitive  evaluation  of  values  at  each  mesh  point  with  local
communication.    The  computational workload and the communication pattern are
the same at each mesh point.  The regular computation model arises in numerical
solutions  of  partial  differential  equations  and  simulations  of  cellular
automata.  Given a communication pattern, a systematic way to generate a family
of  partitions  is  presented. The influence of various partitioning schemes on
performance is compared on the basis of computation to communication ratio.  
-------------------------------------------------------------------------------

                                Naomi Schulman
                          COMPUTER SYSTEMS LABORATORY
                              Stanford University
                            Stanford, CA 94305-4055
                      EM:CSL.Schulman@Sierra.Stanford.Edu
                                 415-723-1430
                               Hours: M-Th, 8-1

                                  ORDER FORM

To Order Reports: Print or type your name and address in the space provided.
Check or circle the  report(s)  you  wish  to  purchase,  whether  hardcopy  or
microfiche.    All orders must be PREPAID.  CALIFORNIA RESIDENTS must add sales
tax of their local county.  Return this order form with  your  check  or  money
order  made payable to Stanford University. FOREIGN ORDERS must be paid with an
international money order or a check drawn on a U.S. bank, payable in dollars.
Please type or print your name and complete address:
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
Report #  Hardcopy  Microfiche     Report #  Hardcopy  Microfiche
--------  --------  ----------     --------  --------  ----------
TR 343      $8.35     $3.00        TR 348      $5.80     $2.00
TR 344     $10.60     $3.00        TR 349      $7.40     $2.00
TR 345     $13.45     $4.00        TR 350      $4.40     $2.00
TR 346      $3.80     $2.00        TR 351     $11.95     $3.00
TR 347      $3.45     $2.00        TR 352      $3.35     $2.00


                          Subtotal __________
          Your Local CA County tax __________
                             Total __________
-------