leff@smu.UUCP (Laurence Leff) (03/17/88)
Naomi Schulman
Publications
COMPUTER SYSTEMS LABORATORY
STANFORD UNIVERSITY
Stanford, CA 94305-4055
RECENT REPORTS & NOTES
LIST #13
MARCH 1988
To order reports, see end of this announcement.
ABSTRACTS
CSL-TR-87-343
EDITING GRAPHICAL OBJECTS USING PROCEDURAL REPRESENTATIONS
Paul J. Asente
October 1987 117 pages.....$8.35
Traditionally, people have created computer-generated images by writing
programs in a programming language that supports graphics. More recently,
interactive graphics editors have become commonplace. Graphics editors are
easy to use but frequently lack many of the capabilities found in graphics
programming languages. This deficiency is intrinsic to graphics editors; it is
not a result of neglect or incompetence by the implementer.
Tweedle is a graphics editor that attempts to bridge this gap by using a
program as its internal representation for a picture. During an editing
session the user can modify either the picture itself or the program
representation; the editor modifies the other to keep the two consistent. The
language used by the editor contains features that allow the editor to
incrementally execute parts of a program in response to a change so that the
picture can be regenerated without completely reexecuting the program.
The use of a procedural representation allows the user to create easily
pictures with repetition, recursion, and calculated point values. It further
allows him to define parts of a drawing as variants of other parts; these
variants can differ from their original objects in quite arbitrary ways but
still respond to changes made to the original.
A working prototype of Tweedle has been implemented under the X Window System.
-------------------------------------------------------------------------------
CSL-TR-87-344
PERFORMANCE PREDICTION OF CONCURRENT SYSTEMS
Victor W. K. Mak
December 1987 162 pages.....$10.60
Concurrent systems are computers that use multiple processors to solve a single
problem. A means to predict the application performance on these systems is a
useful tool in many areas of concurrent system research. Earlier work in this
area either dealt with very restricted program structures or used methods with
exponential complexity. This dissertation describes a computationally
efficient and accurate method to predict performance for a class of parallel
computations on concurrent systems.
A parallel computation is modeled as a task system with precedence
relationships expressed as a series-parallel directed acyclic graph. Resources
in concurrent systems are modeled as service centers in queueing network
models. Using these two models as inputs, the method outputs predictions of
both the time to complete the computation and the concurrent system
utilization.
The algorithm used is based on the approximate Mean Value Analysis in queueing
network modeling with extensions to model concurrency in the computation. The
new algorithm has been validated against both detailed simulation and actual
execution on a commercial multiprocessor. Average error of the prediction when
compared to simulation statistics is 1.7% with a standard deviation of 1.5%.
The algorithm is iterative. The time complexity of each iteration of the
2 3 2
algorithm is O(N K + N ), the space complexity is O(N K) where N is the number
of tasks in the computation and K is the number of resources in the concurrent
system. The prediction method has a very high rate of convergence. 90% of the
one hundred test cases evaluated achieved convergence within seven iterations.
Worst case convergence observed was twelve iterations.
-------------------------------------------------------------------------------
CSL-TR-87-345
TRADEOFFS IN PROCESSOR-ARCHITECTURE AND DATA-BUFFER DESIGN
Johannes M. Mulder
December 1987 219 pages.....$13.45
Processor-Memory traffic can be a serious performance constraint for computer
systems in general and for closely coupled processors specifically. In a
memory-bound system the data traffic becomes a critical performance factor when
sufficient instruction buffering is used; when highly encoded instructions are
used; or when multiple processors share data memory. Microprocessors are
particularly vulnerable to performance degradation because of pin-bandwidth
limitations.
To relax the memory bandwidth requirements and sustain a high processor
throughput, buffering of both instruction and data requests is essential for
high performance. This dissertation describes the performance and utility of
buffer hierarchies under reference patterns typical for procedural languages.
Hardware data-buffering schemes (multiple register sets and stack buffers) can
profit significantly from compile-time assistance. Even single register sets
perform competitively with these hardware schemes when allocated
interprocedurally.
In multi-level buffering, first-level (register oriented) and second-level
(cache) buffers have distinct purposes. The first-level buffer determines the
peak performance of the processor, while the second-level determines the actual
performance. Hence, a cache does not hide architectural (first-level buffer)
flaws, but emphasizes them, because it reduces the influence of the memory
system on performance.
The usefulness of caches is a function of the size of the cache and of memory
speed. A cache, independent of size, improves the system performance, when
memory traffic is the system bottleneck. A cache has to be of significant size
to justify implementation when processor performance is a primary design
objective and memory is relative fast. A cache connected to an efficient
32-word buffer and 2-cycle memory only reduces the buffer cycles by 25% for
1024 cache words, and by 10% for 128 cache words. The same configuration for a
4-cycle memory yields a cycle reduction of 50% and 30% respectively.
-------------------------------------------------------------------------------
CSL-TR-87-346
BLAZENET: A PHOTONIC IMPLEMENTABLE WIDE-AREA NETWORK
Zygmunt Haas and David R. Cheriton
October 1987 26 pages.....$3.80
High-performance wide-area networks are required to interconnect clusters of
computers connected by local area and metropolitan area networks. Optical fiber
technology provides long distance channels in the multi-gigabit per second
range. The challenge is to provide switching nodes that handle these data
rates with minimum delay, and at a reasonable cost.
In this paper, we describe a packet switching network, christened Blazenet,
that provides low delay and has minimal memory requirements. It can be
extended to support multicase and priority delivery. Such a network can
revolutionize the opportunities for distributed command and control,
information and resources sharing, real-time conferencing, and wide-area
parallel computation, to mention but a few applications.
-------------------------------------------------------------------------------
CSL-TR-88-347
TRACE COMPACTION USING CACHE FILTERING WITH BLOCKING
Anant Agarwal
December 1987 19 pages.....$3.45
Trace-driven simulation is a popular method of estimating the performance of
cache memories, translation lookaside buffers, and paging schemes. Because the
cost of trace-driven simulation is directly proportional to trace length,
reducing the number of references in the trace significantly impacts simulation
time. This paper concentrates on trace-driven simulation for cache analysis.
A technique called cache filtering with blocking is presented that compresses
traces by exploiting both the temporal and spatial locality in the trace.
Experimental results show that this scheme can reduce trace length by nearly
two orders of magnitude while introducing less than 15% error in cache miss
rate estimates.
-------------------------------------------------------------------------------
CSL-TR-88-348
THOR USER'S MANUAL: TUTORIAL AND COMMANDS
Robert Alverson, Tom Blank, Kiyoung Choi, Sun Young Hwang, Arturo Salz,
Larry Soule, Thomas Rokicki
December 1987 66 pages.....$5.80
THOR is a behavioral simulation environment intended for use with digital
circuits at either the gate, register transfer, or functional levels. Models
are written in the CHDL modeling language (a hardware description language
based on the "C" programming language). Network descriptions are written in
the CSL language supporting hierarchical network descriptions. Using
interactive mode, batch mode or both combined, a variety of commands are
available to control execution. Simulation output can be viewed in tabular
format or in waveforms. A library of components and a toolbox for building
simulation models are also provided. Other tools include CSLIM, used to
generate boolean equations directly from THOR models and an interface to other
simulators (e.g. RSIM and a physical chip tester) so that two simulations can
be run concurrently verifying equivalent operation.
This technical report is part one of two parts and is formated similar to UNIX
manuals. Part one contains the THOR tutorial and all the commands associated
with THOR. Part two contains descriptions of the general purpose functions
used in models, the parts library including many TTL components, and the logic
analyzer model.
-------------------------------------------------------------------------------
CSL-TR-88-349
THOR USER'S MANUAL: LIBRARY FUNCTIONS
Robert Alverson, Tom Blank, Kiyoung Choi, Sun Young Hwang, Arturo Salz,
Larry Soule, Thomas Rokicki
December 1987 98 pages.....$7.40
THOR is a behavioral simulation environment intended for use with digital
circuits at either the gate, register transfer, or functional levels. Models
are written in the CHDL modeling language (a hardware description language
based on the "C" programming language). Network descriptions are written in
the CSL language supporting hierarchical network descriptions. Using
interactive mode, batch mode or both combined, a variety of commands are
available to control execution. Simulation output can be viewed in tabular
format or in waveforms. A library of components and a toolbox for building
simulation models are also provided. Other tools include CSLIM, used to
generate boolean equations directly from THOR models and an interface to other
simulators (e.g. RSIM and a physical chip tester) so that two simulations can
be run concurrently verifying equivalent operation.
This technical report is part two of two parts and is formated similar to UNIX
manuals. Part one contains the THOR tutorial and all the commands associated
with THOR. Part two contains descriptions of the general purpose functions
used in models, the parts library including many TTL components, and the logic
analyzer model.
-------------------------------------------------------------------------------
CSL-TR-88-350
THE ILSP BEHAVIORAL DESCRIPTION LANGUAGE AND ITS GRAPH REPRESENTATION FOR
BEHAVIORAL SYNTHESIS
Masayasu Odani, Sun Young Hwang, Tom Blank, Thomas Rokicki
March 1988 38 pages.....$4.40
This report describes the ILSP behavioral description language and its internal
representation employed in the Hermod behavioral synthesis system. Using
combined control and data flow graph (C/DFG) as an intermediate representation,
the Hermod system generates hardware modules and their interconnection from
behavioral descriptions. The Hermod system is included in an integrated
environment for hardware simulation and synthesis under development at Stanford
University. The functional models written in the ILSP can be simulated on the
THOR logic/functional/behavioral simulator without translation. After proper
verification of its behavior, an ILSP model can be input to the synthesizer for
compilation into an RT-level description.
This report consists of two parts: the specification of the ILSP language and
its graph representation.
-------------------------------------------------------------------------------
CSL-TR-88-351
EMPIRICAL ANALYSIS OF A LISP SYSTEM
Robert A. Shaw
February 1988 189 pages.....$11.95
We have analyzed four large Lisp programs running in a Common Lisp system on a
conventional processor to provide information useful for implementers of new
Lisp systems. The execution of over fifty million Lisp function calls (over
one billion host instructions) are represented in the results. We address some
possible implementation tradeoffs and provide pertinent empirical information.
Among the issues discussed are data tagging techniques, quantity of tags, time
and space efficient representations for Lisp data types, methods of function
linkage, the nature of variable access, the use of tail-recursion and non-local
goto's, and garbage collection. Our findings indicate that for many
conventional processors choices should include a small number of tag values
encoded in the low-order bits of a data reference, a simple ``car followed by
cdr'' organization for CONS cells, and an encoding of symbols which segregates
the different fields.
Garbage collection frequently represents a sizable portion of overall execution
time. The new ``generation'' garbage collectors can provide dramatic
performance improvements, but require the maintenance of entry vector and data
age information. To maintain the entry vector, previous generation collectors
have included runtime tests (generation checks) at each tagged store operation.
For our Lisp system the cost of these checks in software is projected to be at
least 7% of runtime.
We describe a new version of generation garbage collection adapted for Lisp on
conventional processors. Our version eliminates generation checks during
normal execution: information maintained by the virtual memory system is used
to construct the entry vector. A simple memory layout is employed which does
not require storage for age information (other than one value per generation)
and allows adjustment of the boundaries between generations to avoid
fragmentation and maximize memory utilization in implementations with limited
memory. An analysis of the new algorithm on the test system showed that, when
collections were separated by more than 1.4 seconds (or at least 160 kilobytes
of new allocation), using virtual memory information required less time than
software generation checks.
-------------------------------------------------------------------------------
CSL-TR-88-352
PARTITIONING OF REGULAR COMPUTATION ON MULTIPROCESSOR SYSTEMS
Fung Fung Lee
February 1987 17 pages.....$3.35
Problem partitioning of regular computation over two-dimensional meshes on
multiprocessor systems is examined. The regular computation model considered
involves repetitive evaluation of values at each mesh point with local
communication. The computational workload and the communication pattern are
the same at each mesh point. The regular computation model arises in numerical
solutions of partial differential equations and simulations of cellular
automata. Given a communication pattern, a systematic way to generate a family
of partitions is presented. The influence of various partitioning schemes on
performance is compared on the basis of computation to communication ratio.
-------------------------------------------------------------------------------
Naomi Schulman
COMPUTER SYSTEMS LABORATORY
Stanford University
Stanford, CA 94305-4055
EM:CSL.Schulman@Sierra.Stanford.Edu
415-723-1430
Hours: M-Th, 8-1
ORDER FORM
To Order Reports: Print or type your name and address in the space provided.
Check or circle the report(s) you wish to purchase, whether hardcopy or
microfiche. All orders must be PREPAID. CALIFORNIA RESIDENTS must add sales
tax of their local county. Return this order form with your check or money
order made payable to Stanford University. FOREIGN ORDERS must be paid with an
international money order or a check drawn on a U.S. bank, payable in dollars.
Please type or print your name and complete address:
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
______________________________________________
Report # Hardcopy Microfiche Report # Hardcopy Microfiche
-------- -------- ---------- -------- -------- ----------
TR 343 $8.35 $3.00 TR 348 $5.80 $2.00
TR 344 $10.60 $3.00 TR 349 $7.40 $2.00
TR 345 $13.45 $4.00 TR 350 $4.40 $2.00
TR 346 $3.80 $2.00 TR 351 $11.95 $3.00
TR 347 $3.45 $2.00 TR 352 $3.35 $2.00
Subtotal __________
Your Local CA County tax __________
Total __________
-------