[comp.doc.techreports] tr-input/wisconsin.madison

leff@smu.UUCP (Laurence Leff) (05/12/88)

Bibliography of Technical Reports
Computer Science Department
University of Wisconsin
December 1987 - April 1988

For copies, send requests to:

Technical Report Librarian
Computer Science Department
University of Wisconsin
1210 W. Dayton Street
Madison, WI 53706

%A Michael J. Carey 
%A David J. DeWitt 
%A Scott L. Vandenberg 
%T A Data Model and Query Language for EXODUS
%D December 1987
%R TR 734
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this paper, we present the design of the EXTRA data model and the 
EXCESS query language for the EXODUS extensible database system.  The EXTRA 
data model includes support for complex objects with shared subobjects, a novel
mix of object- and value-oriented semantics for data, support for persistent 
objects of any type in the EXTRA type lattice, and user-defined abstract data 
types (ADTs).  The EXCESS query language provides facilities for querying and
updating complex object structures, and it can be extended through the addition
of ADT functions and operators, procedures and functions for manipulating
EXTRA schema types, and generic set functions.  EXTRA and EXCESS are intended to
serve as a test vehicle for tools developed under the EXODUS extensible 
database system project.

%A Harry Plantinga
%A Charles R. Dyer
%T Construction and Display Algorithms for the Asp
%D December 1987
%R TR 735
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X  The \fIaspect representation\fR or \fIasp\fR is a continuous, 
viewer-centered representation for polyhedra introduced by Plantinga and Dyer 
(1987b).  In this paper we present in detail algorithms for working with the 
asp under orthographic projection.  We derive equations for aspect surfaces,
show how to represent asps, and present a detailed algorithm for their
construction.  We then show how to display an image of the represented object
from any viewpoint with hidden surfaces removed by finding a cross-section of
the asp for a given viewpoint.  We present separate algorithms for the convex
and non-convex cases.

%A Harry Plantinga
%A Charles R. Dyer
%T Visibility, Occlusion and the Aspect Graph
%D December 1987
%R TR 736
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this paper we study the ways in which the topology of the image
of a polyhedron changes with changing viewpoint.  We catalog the ways that
the "topological appearance" or \fIaspect\fR can change.  This enables us to
find maximal regions of viewpoints of the same aspect.  We use these 
techniques to construct the \fIviewpoint space partition (VSP),\fR a 
partition of viewpoint space into maximal regions of constant aspect, and its
dual, the \fIaspect graph\fR.  In this paper we present tight bounds on the
maximum size of the VSP and the aspect graph and give algorithms for their
construction, first in the convex case and then in the general case.  In
particular, we give tight bounds on the maximum size of \(*H(@n sup 2@) and 
\(*H(@n sup 6@) under orthographic projection with a 2-D viewpoint space and of
\(*H(@n sup 3@) and \(*H(@n sup 9@) under perspective projection with a 3-D 
viewpoint space.  We also present properties of the VSP and aspect graph and
give algorithms for their construction.

%A Miron Livny
%A Udi Manber
%T \(*m - A System for Simulating and Implementing Distributed and Parallel
Algorithms
%D December 1987
%R TR 737
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X A system for simulation of distributed algorithms for a collection
of workstations connected by a local area network is presented.  The system,
called \(*m, is intended for programmers who want to parallelize a program
(or parts of it) in order to tap the CPU and memory space of idle workstations.
Writing such parallel programs is difficult.  \(*m simplifies this task 
significantly in many ways.  First, \(*m uses only a small set of
communication primitives on top of a regular programming language and makes the
details of their implementation transparent to the user.  \(*m translates the
program into a discrete event system and interfaces with a simulated 
processing environment.  The user need not be familiar at all with the 
simulation mechanism.  Second, the environment is under complete control of the
programmer.  Tradeoffs such as communication speed vs. CPU speed can be easily
tested, and decisions depending on them can be made more easily.  Third, 
debugging is much simpler.  The asynchronous and nondeterministic nature of
the computation is preserved even though the simulation is executed on one
machine.  Fourth, \(*m is also the basis of a package for distributed 
programming.  With it, \(*m programs are stripped (automatically) of their
simulation parts and can be executed on several workstations with, in most
cases, no modifications by the user.  (This package is not completely
implemented yet; it is expected to be ready very soon.)  Several examples of
application programs are presented.

%A Deborah A. Joseph
%A W. Harry Plantinga
%T Efficient Algorithms for Polyghedron Collision Detection 
%D December 1987
%R TR 738
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this paper we present efficient algorithms for determining whether
polyhedra undergoing linear translations will collide.  We present algorithms 
for finding collisions between a pair of moving convex polyhedra and among
several moving non-convex polyhedra.  With the non-convex algorithm we also
show how to determine \fIwhen\fR the objects will collide.  The algorithms 
work by considering time to be a fourth dimension and solving the related 4-D
polytope separation problem.  In the convex case we present an algorithm that
runs in O(n) time where n is the sum of the sizes of the objects.  The
algorithm is a generalization of an O(n)-time polyhedron separation algorithm
of Dobkin and Kirkpatrick to 4-D convex prisms.  In the non-convex case we
present an algorithm that runs in O(@n sup 2@)-time.  The algorithm is a 
generalization of the naive polyhedron separation algorithm.  We generalize
the naive algorithm to finding the separation of \fId\fR-polytopes in 
@E sup d@ for any \fId,\fR although we use only the 4-D case for collision
detection.  Both algorithms use O(n) space.  In the process of finding the
separation in the non-convex case we must determine whether a point is in a
polytope in @E sup d@, and we present an algorithm for doing that.

%A R.H. Clark
%A R.R. Meyer
%T Multiprocessor Algorithms for Generalized Network Flows
%D December 1987
%R TR 739
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This paper presents parallel algorithms for the solution of 
generalized network optimization problems on a shared-memory multiprocessor.
These algorithms exploit the quasi-tree forest basis structure of generalized
networks by attempting to perform multiple simplex pivot operations in parallel on disconnected subtrees.  We consider algorithms for both single-period 
generalized networks and multi-period generalized networks.  In the latter 
case, the multi-period structure is utilized in the initial stage of the
algorithms in order to initially partition the problem among processors.  
Computational experience on the Sequent Balance 21000 multiprocessor is
presented that demonstrates linear and sometimes superlinear speedup for a
large class of test problems.

%A M. Muralikrishna
%A David J. DeWitt
%T Optimization of Multiple-Relation Multiple-Disjunct Queries
%D January 1988
%R TR 740
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this paper we discuss the optimization of multiple-relation 
multiple-disjunct queries in a relational database system.  Since 
optimization techniques for conjunctive (single disjunct) queries in 
relational databases are well known (Smith75, Wong76, Selinger79, Yao79,
Youssefi79), the natural way to evaluate a multi-disjunct query was to 
execute each disjunct independently (Bernstein81, Kerschberg82).  However,
evaluating each disjunct independently may be very inefficient.  In this
paper, we develop methods that merge two or more disjuncts to form a term.
The advantage of merging disjuncts to form terms lies in the fact that each
term can be evaluated with a single scan of each relation that is present in
the term.  In addition, the number of times a join is performed will also be
reduced when two or more disjuncts are merged.  The criteria for merging a set
of disjuncts will be presented.  As we will see, the number of times each
relation in the query is scanned will be equal to the number of terms.  Thus,
minimizing the number of terms will minimize the number of scans for each
relation.  We will formulate the problem of minimizing the number of scans
as one of covering a merge graph by a minimum number of complete merge
graphs which are a restricted class of cartesian product graphs.  In general,
the problem of minimizing the number of scans is NP-complete.  We present
polynomial time algorithms for special classes of merge graphs.  We also
present a hueristic for general merge graphs.

Throughout this paper, we will assume that no relations have any
indices on them and that we are only concerned with reducing the number of
scans for each relation present in the query.  What about relations that
have indices on them?  It turns out that our performance metric of reducing
the number of scans is beneficial even in the case that there are indices.
In (Muralikrishna88) we demonstrate that when optimizing single-relation 
multiple-disjunct queries, the cost (measured in terms of disk accesses) may
be reduced if all the disjuncts are optimized together rather than
individually.  Thus, our algorithm for minimizing the number of terms is also
very beneficial in cases where indices exist.

%A Judy Goldsmith 
%A Deborah Joseph
%A Paul Young
%T A Note on Bi-Immunity and P-Closeness of P-Cheatable Sets in P/Poly
%D January 1988
%R TR 741
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this paper we study the interplay between three measures of
polynomial time behavior in sets: \fIp\fRcheatability, \fIp\fR-closeness, and
membership in \fIP/poly\fR.  First we construct @2 sup k@ - 1 \fIfor k - 
p\fRcheatable sets that are bi-immune with respect to all recursively 
enumerable sets.  We show that the constructed sets are in \fIP/poly,\fR but
can be constructed so that they are \fInot p\fR-close.  In fact, they can be
constructed so that they are not even \fIrecursively\fR-close.  Next, we 
construct \fIn for 2 - p\fRcheatable sets that are bi-immune with respect to
arbitrarily difficult deterministic time classes.  These sets are also in
\fIP/poly,\fR and they also can be constructed so that they are \fInot p\fRclose.  Finally, we construct a set that is \fIn for 1 - p\fRcheatable but is 
\fInot p\fR-close, although it too is in \fIP/poly\fR.  These results show
that, although \fIp\fRcheatabe, \fIP/poly,\fR and \fIp\fR-close sets all
exhibit some form of polynomial time behavior, the notions of 
\fIp\fRcheatability and \fIp\fR-closeness are often orthogonal.  In addition,
the results on \fIp\fR-closeness answer conjectures made in (A&G-87).

%A David J. DeWitt
%A Shahram Ghandeharizadeh
%A Donovan Schneider
%T A Performance Analysis of the Gamma Database Machine
%D January 1988
%R TR 742
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This paper presents the results of an initial performance evaluation of the
Gamma database machine based on an expanded version of the single-user 
Wisconsin benchmark.  In our experiments we measured the effect of relation
size and indices on response time for selection, join, and aggregation 
queries, and single-tuple updates.  A Teradata DBC/1012 database machine of
similar size is used as a basis for interpreting the results obtained.  We 
also analyze the performance of Gamma relative to the number of processors
employed and study the impact of varying the memory size and disk page size
on the execution time of a variety of selection and join queries.  We analyze
and interpret the results of these experiments based on our understanding of
the system hardware and software, and conclude with an assessment of the
strengths and weaknesses of the two machines.

%A Mary K. Vernon
%A Udi Manber
%T Distributed Round-Robin and First-Come First-Serve Protocols
%D February 1988
%R TR 745
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Two new distributed protocols for fair and efficient bus arbitration
are presented.  The protocols implement round-robin (RR) and first-come 
first-serve (FCFS) scheduling, respectively.  Both protocols use relatively
few control lines on the bus, and their logic is simple.  The round-robin 
protocol, which uses \fIstatically assigned\fR arbitration numbers to resolve
conflict during an arbitration, is more robust and simpler to implement than
previous distributed RR protocols that are based on rotating agent priorities.
The proposed FCFS protocol uses partly static arbitration numbers, and is the
first practical proposal for a FCFS arbiter known to the authors.  The 
proposed protocols thus have a better combination of efficiency, cost, and 
fairness characteristics than existing multiprocessor bus arbitration 
algorithms.

Three implementations of our RR protocol, and two implementations of our FCFS protocol, are discussed.  Simulation results are presented that address:
1) the practical potential for unfairness in the simpler implementation of the
FCFS protocol, 2) the practical implications of the higher waiting time
variance in the RR protocol, and 3) the allocation of bus bandwidth among
agents with unequal request rates in each protocol.  The simulation results
indicate that there is very little practical difference in the performance
of the two protocols.

%A M.K. Vernon
%A E.D. Lazowska
%A J. Zahorjan
%T An Accurate and Efficient Performance Analysis Technique for 
Multiprocessor Snooping Cache-Consistency Protocols
%D February 1988
%R TR 746
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X A number of dynamic cache consistency protocols have been developed 
for multiprocessors having a shared bus interconnect between processors and
shared memory.  The relative performance of these protocols has been studied extensively using simulation and detailed analytical models based on Markov chain
techniques.  Both of these approaches use relatively detailed models, which 
capture cache and bus interference rather precisely, but which are highly
expensive to evaluate.  In this paper, we investigate the use of a more
abstract and significantly more efficient analytical model for evaluating the 
relative
performance of cache consistency protocols.  The model includes bus
interference, cache interference, and main memory interference, but represents
the interactions between the caches by steady-state mean collision rates which
are computed by iterative solution of the model equations.

We show that the speedup estimates obtained from the mean-value model are 
highly accurate.  The results agree with the speedup estimates of the detailed
analytical models to within 3%, over all modifications studied and over a wide
range of parameter values.  This result is surprising, given that the
distinctions among the protocols are quite subtle.  The validation experiments
include sets of reasonable values of the workload parameters, as well as sets
of unrealistic values for which one might expect the mean-value equations to
break down.  The conclusion we can draw is that this modeling technique shows
promise for evaluating architectural tradeoffs at a much more detailed level
than was previously thought possible.  We also discuss the relationship
between results of the analytical models and the results of independent
evalutations of the protocols using simulation.

%A Rajeev Jog
%A Gurindar S. Sohi
%A Mary K. Vernon
%T The TREEBus Architecture and Its Analysis 
%D February 1988
%R TR 747
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This paper investigates a cache coherence protocol for a 
hierarchical, general-purpose multiprocessor that we refer to as the TREEBus 
architecture.  
We apply the concept of non-identical dual tag directories to the specification
of the protocol.  We then develop a sophisticated mean-value queueing model 
that considers bus latency, shared memory latency and bus interference as the
primary sources of performance degradation in the system.  A unique feature of
the model is its high-level input parameters, which can be related to
application program characteristics.  Another nice property of the model is 
that it can be solved for very large systems in a matter of seconds.

Using the model we reach several important conclusions.  First, the system 
topology has an important effect on overall performance.  We find optimal two-level, three-level and four-level topologies that distribute the load evenly 
across all bus levels, resulting in much higher performance.  For a particular
system size, the optimal topology is the topology with the minimum number of
levels that has sufficient total bandwidth in the bus subsystem to support the
memory request rates of the processors.  We also provide processing power estimates for these topologies, under a given set of workload assumptions.  For
example, the optimal three-level TREEBus topology, supporting 512 processors 
each with a peak processing rate of 4 MIPS, assuming 22% of the data references are to globally shared data and that the shared data is read on the average by a
significant fraction of processors between write operations, provides an
effective 1400 (1700) MIPS in processing power, if the buses operate at 20 (40)
MHz.  Results of our study also indicate that for reasonbably low cache miss rates (3% at level 0), and 20 MHz buses, the bus subnetwork saturates with processor speeds of 6-8 MIPS, at least for topologies of five or fewer levels.  Finally,
we present experimental results that indicate how performance is affected by
the locality and pattern of sharing, and by the miss ratios that are due to cache size limitations.

%A Scott T. Leutenegger
%A Mary K. Vernon
%T A Mean-Value Performance Analysis of a New Multiprocessor Architecture
%D February 1988
%R TR 748
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This paper presents a preliminary performance analysis of a new
large-scale multiprocessor: the \fIWisconsin Multicube\fR.  A key 
characteristic of the machine is that it is based on shared buses and a
snooping cache coherence protocol.  The organization of the shared buses and
shared memory is unique and non-hierarchical.  The two-dimensional version of
the architecture is envisioned as scaling to 1024 processors.

We develop an approximate mean-value analysis of bus interference for the
proposed cache coherence protocol.  The model includes FCFS scheduling at the
bus queues with deterministic bus access times, and asynchronous memory 
write-backs and invalidation requests.

We use our model to investigate the feasibility of the multiprocessor, and to
study some initial system design issues.  Our results indicate that a 
1024-processor system can operate at 75 - 95% of its peak processing power, if
the mean time between cache misses is larger than 1000 bus cycles (i.e. 50
microseconds for 20 MHz buses; 25 microseconds for 40 MHz buses).  This miss 
rate is not unreasonable for the cache sizes specified in the design, which 
are comparable to main memory sizes in existing multiprocessors.  We also
present results which address the issues of optimal cache block size, optimal
size of the two-dimensional Multicube, the effect of broadcast invalidations 
on system performance, and the viability of several hardware techniques for
reducing the latency for remote memory requests.

%A Judy Goldsmith
%A Deborah Joseph
%A Paul Young
%T Using Self-Reducibilities to Characterize Polynomial Time
%D February 1988
%R TR 749
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this paper we study the effect that the \fIself-reducibility\fR 
properties of a set have on its time complexity.  In particular, we show 
that the extent to which a set is self-reducible can determine whether 
or not the set is polynomially decidable.  Our results concern 
three variations on the notion of Turing self-reducibility: 
\fInear-testability,  word decreasing query 
self-reducibility,\fR and \fIp-cheatability\fR.  Our first pair of results
provide characterizations of polynomial time by coupling the notion, first of
Turing self-reducibility, then of near-testability, with the notion of
p-cheatability.  In contrast, our final theorems show that if specific 
nondeterministic and deterministic time classes do not collapse, then 1) we
cannot similarly characterize polynomial time by coupling word decreasing
query reducibility with a variant of p-cheatability, and 2) we cannot
characterize polynomial time by coupling \fIparity-P\fR and p-cheatability.

%A M. Muralikrishna
%T Optimization of Multiple-Disjunct Queries in a Relational Database System
%D February 1988
%R TR 750
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this thesis, we describe the optimization of arbitrarily complex
queries expressed in relational calculus.  The qualification list is allowed to
be any complex boolean expression involving both ANDs and ORs.  In other words,
the qualification list may have an arbitrary number of disjuncts.  The query
graph of each disjunct may also have any number of components.  Optimizing the
various disjuncts independently of each other can be very inefficient.  
Considerable savings in cost can be achieved by optimizing the various 
disjuncts together.

In a multiple-relation multiple-disjunct query, it may be possible to combine
two or more disjuncts into one term.  This will cut down the number of scans 
on each relation and also the number of times each join is performed.  The 
objective will be to merge the disjuncts into the minimum number of terms.
Minimizing the number of terms can be formulated as the problem of covering a
merge graph with the minimum number of complete merge graphs, which are a
restricted class of cartesian product graphs.  The problem of minimizing the
number of terms is NP-complete.  We present polynomial time algorithms for 
special classes of merge graphs.  We provide a heuristic for general merge 
graphs.

For single-relation multiple-disjunct queries involving more than one 
attribute, an optimal access path might consist of more than one index.  The
cost in our optimization model, for single relation queries, is measured in 
terms of the number of pages fetched from disk.  We will formulate the problem
of finding a set of optimal access paths for a single-relation multiple-disjunct
query as one of finding a minimum weighted vertex cover in a hypergraph.  
Finding the cheapest vertex cover in a hypergraph is NP-complete.  We present
a new approximation algorithm that gives near optimal vertex covers for 
random hypergraphs over a wide range of edge probabilities.

We also demonstrate the usefulness of equi-depth multi-dimensional histograms
in optimizing queries using multi-dimensional indices.

%A Kifung C. Cheung
%A Gurindar S. Sohi
%A Kewal K. Saluja
%A Dhiraj K. Pradhan
%T Design and Analysis of a Gracefully-Degrading Interleaved Memory System
%D February 1988
%R TR 751
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X A hardware mechanism has been proposed to reconfigure an interleaved
memory system.  The reconfiguration scheme is such that, at any instant all
fault-free memory banks in the memory system can be utilized in an interleaved
manner.  The design of the hardware that enables the reconfiguration is 
discussed.  The reconfiguration scheme proposed in this paper is analyzed for
a number of distinct benchmark programs.  It is shown that memory system
performance degrades gracefully as the number of faulty banks increase if the
memory system uses the proposed reconfiguration scheme.

%A A.R. Pleszkun
%A G.S. Sohi
%T The Performance Potential of Multiple Functional Unit Processors
%D February 1988
%R TR 752
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this paper, we look at the interaction of pipelining and multiple
functional units in single processor machines.  As a basis for our studies we
use a CRAY-like processor model and the issue rate (instructions per clock
cycle) as the performance measure.  We initially find that in non-vector
machines, pipelining multiple function units does not provide significant
performance improvements.  Dataflow limits are then derived for our 
benchmark programs to determine the performance potential of each benchmark.
In addition to a traditional dataflow limit computation, several other limits
are computed which apply more realistic constraints on a computation.  Based
on these more realistic limits, we determine it is worthwhile to investigate
the performance improvements that can be achieved from issuing multiple
instructions each clock cycle.  Several approaches are proposed and evaluated
for issuing multiple instructions each clock cycle.

%A G.S. Sohi
%T Logical Data Skewing Schemes for Interleaved Memories in Vector Processors
%D February 1988
%R TR 753
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Sustained memory bandwidth for a range of strides is a key to
high-performance vector processing.  To reduce the number of memory bank
conflicts and thereby increase memory bandwidth, one often resorts to data
skewing schemes.  In this paper, we present a family of data skewing schemes
called logical skewing schemes.  In logical skewing schemes, the location of a
data element is determined solely by logical manipulation of the address bits.
Our experiments show that logical skewing schemes allow for better vector 
memory system performance than other known data skewing schemes without the
significant increase in memory latency that is traditionally associated with
data skewing schemes.

%A Barton P. Miller
%A Jong-Deok Choi
%T A Mechanism for Efficient DeBugging of Parallel Programs
%D February 1988
%R TR 754
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This paper addresses the design and implementation of an integrated
debugging system for parallel programs running on shared memory 
multi-processors (SMMP).  We describe the use of \fIflowback analysis\fR to
provide information on causal relationships between events in a program's
execution without re-executing the program for debugging.  We introduce a 
mechanism called \fIincremental tracing\fR that, by using semantic analyses of
the debugged program, makes the flowback analysis practical with only a small
amount of trace generated during execution.  We extend flowback analysis to 
apply to parallel programs and describe a method to detect race conditions in
the interactions of the co-operating processes.

%A Susan Horwitz
%A Thomas Reps
%A David Binkley
%T Interprocedural Slicing Using Dependence Graphs
%D March 1988
%R TR 756
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X A slice of a program with respect to a program point \fIp\fR and 
variable \fIx\fR consists of all statements of the program that might affect the
value of \fIx\fR  at point \fIp\fR.  This paper concerns the problem of 
interprocedural slicing --generating a slice of an entire program, where the 
slice crosses the boundaries of procedure calls.  To solve this problem, we
introduce a new kind of graph to represent programs, called a \fIsystem 
dependence graph\fR, which extends previous dependence representations to
incorporate collections of procedures (with procedure calls) rather than just
monolithic programs.  Our main result is an algorithm for interprocedural
slicing that uses the new representation.

The chief difficulty in interprocedural slicing is correctly accounting for 
the calling context of a called procedure.  To handle this problem, system 
dependence graphs include some data-dependence edges that represent 
\fItransitive\fR dependencies due to the effects of procedure calls, in 
addition to the conventional direct-dependence edges.  These edges are
constructed with the aid of an auxiliary structure that represents calling 
and parameter-linkage relationships.  This structure takes the form of an 
attribute grammar.  The step of computing the required transitive-dependence
edges is reduced to the construction of the subordinate characteristic graphs
for the grammar's nonterminals.

It should be noted that our work concerns a somewhat restricted kind of slice:
Rather than permitting a program to be sliced with respect to program point
\fIp\fR and an \fIarbitrary\fR variable, a slice must be taken with respect to
a variable that is \fIdefined\fR at or \fIused\fR at \fIp\fR.

%A Eric Bach
%A Victor Shoup
%T Factoring Polynomials Using Fewer Random Bits
%D March 1988
%R TR 757
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Let \fIF\fR be a field of \fIq = @p sup n@\fR elements, where 
\fIp\fR is prime.  We present two new probabilistic algorithms for factoring
polynomials in \fIF(X)\fR that make particularly efficient use of random bits.
They are easy to implement, and require no randomness beyond an initial seed
whose length is proportional to the input size.  The first algorithm is based
on a procedure of Berlekamp; on input \fIf\fR in \fIF(X)\fR of degree \fId\fR, 
it uses \fId\fR@log sub 2@\fIp\fR random bits and produces in polynomial time a
complete factorization of \fIf\fR with a failure probability of no more than
1/@\fIp sup (1-\(mo)1/2d@\fR.  (Here \(mo denotes a fixed parameter 
between 0 and 1 that can be chosen by the implementor.)  The second algorithm
is based on a method of Cantor and Zassenhaus; it uses \fId@log sub 2@ \fIq\fR 
random bits and fails to find a complete factorization with probability no
more than 1/@\fIq sup (1-\(mo)1/4d@\fR.  For both of these 
algorithms, the failure probability is exponentially small in the number of
random bits used.

%A Michael J. Carey
%A Miron Livny
%T Distributed Concurrency Control Performance: A Study of Algorithms,
Distribution, and Replication
%D March 1988
%R TR 758
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Many concurrency control algorithms have been proposed for use in
distributed database systems.  Despite the large number of available
algorithms, and the fact that distributed database systems are becoming a
commercial reality, distributed concurrency control performance trade-offs
are still not well understood.  In this paper we attempt to shed light on
some of the important issues by studying the perfomance of four 
representative algorithms -- distributed 2PL, wound-wait, basic timestamp
ordering, and a distributed optimistic algorithm -- using a detailed 
simulation model of a distributed DBMS.  We examine the performance of these
algorithms for various levels of contention, "distributedness" of the 
workload, and data replication.  The results should prove useful to designers
of future distributed database systems.

%A Barton P. Miller
%T The Frequency of Dynamic Pointer References in "C" Programs
%D March 1988
%R TR 759
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X A collection of "C" programs was measured for the number of dynamic
references to pointers.  The number of dynamic references to pointers is
represented with respect to the total number of instructions a program
executes, giving the percentage of pointer references executed in a "C" 
program.  The measurements were done on a VAX 11/780 running the Berkeley
UNIX operating system.  The measured programs were selected by examining
the most commonly run programs on the Computer Sciences Department UNIX
machines.  The measurement process was performed in two steps: (1) the
dynamic counting of pointer references, and  (2) the counting of the total
number of instructions executed by the program.

%A Charles V. Stewart
%A Charles R. Dyer
%T Simulation of a Connectionist Stereo Algorithm on a Shared-Memory
Multiprocessor
%D March 1988
%R TR 760
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This paper presents results on the simulation of a connectionist
network for stereo matching on a shared-memory multiprocessor.  The nodes in 
the network represent possible matches between points in each image.  Because
we only consider matches between intensity edges, only a few of these nodes
actually represent candidate matches for a given pair of images; consequently,
only the active part of the network is ever constructed by the simulation.
This includes the nodes representing candidate matches and the connections
between these candidate matches as defined by a variety of constraints.  Thus, 
the simulation involves constructing a list of matches and a list of
connections, and simulating the iterations of the network.  Each of these
phases can be partitioned among a number of processors with very little
interference between the processors due to synchronization or mutual 
exclusion.  The resulting parallel implementation produced nearly linear
speed-ups for up to the nine processors in our system.

%A Yannis E. Ioannidis
%A Miron Livny
%T Data Modeling in DE @size 14 L@AB
%D March 1988
%R TR 761
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X DE @size 14 L@AB is a simulation laboratory currently under
construction, that aims to provide programmers and system analysts with
support for the construction of complex simulators and the management of
long term simulation studies.  The size of the data generated by such studies
makes a DBMS an important module of the laboratory.  Several unique data
modeling issues, which can only be partially addressed by current models, are
raised by the special nature of simulation studies.  In this paper, we
describe the salient features of MOOSE (MOdeling Objects in a Simulation
Environment), which is the model we have developed to capture the semantics
of such databases.  MOOSE supports the notion of object in the strong sense,
it supports collections of objects in various flavors (sets, 
multisets, and arrays), and it supports sharing among objects.  Objects belong
to various classes, some of which may be defined by specialization rules.  
Structural constraints between classes can be specified to capture the
exact semantics of the relationships between classes, especially with
respect to sharing.  Finally, every MOOSE schema has a straightforward graph
representation, thus facilitating a graphics interface to the database.

%A Jorg Peters
%T A Parallel Algorithm for Minimal Cost Network Flow Problems
%D April 1988
%R TR 762
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this paper we explore how simplex-based algorithms can take
advantage of multi-processor capability in solving minimal cost network flow
problems.

We present an implementation of such an algorithm that stresses the importance
of parallelized pricing.  This implementation solves all NETGEN test problems
of size (5000 x 25000) in less than 50 seconds wall clock time on the 
Sequent Symmetry S-81 using 6 processors.

Additionally, we contrast this algorithm, based on specialized processes, with
algorithms that execute a uniform code on each processor.  We discuss why 
specialized processes perform better on the set of test problems.

%A Victor Shoup
%T New Algorithms for Finding Irreducible Polynomials over Finite Fields
%D April 1988
%R TR 763
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Let \fIp\fR be a prime number, \fIF\fR be the finite field with 
\fIp\fR elements, and \fIn\fR be a positive integer.  We present new algorithms
for finding irreducible polynomials in \fIF(X)\fR of degree \fIn\fR.  We show
that in time polynomial in \fIn\fR and log \fIp\fR we can reduce the problem of
finding an irreducible polynomial over \fIF\fR of degree \fIn\fR to the problem
of factoring polynomials over \fIF\fR.  Combining this with Berlekamp's 
deterministic factoring algorithm, we obtain a deterministic algorithm for 
finding irreducible polynomials that runs in time polynomial in \fIn\fR and
\fIp\fR.  This is useful when \fIp\fR is small.  Unlike earlier results in
this area, ours does not rely on any unproven hypotheses, such as the Extended
Riemann Hypothesis.  We also present a new randomized algorithm for finding
irreducible polynomials that runs in time polynomial in \fIn\fR and log \fIp\fR
and makes particularly efficient use of randomness.  It uses \fIn\fRlog\fIp\fR 
random bits, and fails with probability less than 1/@\fIp sup \(*a n@\fR where
\(*a is a constant between 0 and 1/4.  This result is interesting in a setting
where random bits are viewed as a scarce resource.

%A Leonard Uhr
%T Process-Structured Architectures to Transform Information Flowing
Through
%D April 1988
%R TR 764
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X It is rapidly becoming possible to build networks of many millions
of processors.  These massively parallel networks should be capable of
perceiving, remembering, and reasoning in real time, even approaching the
brain's speed and power - IF we knew how to build them.  This paper proposes
and explores a promising but largely ignored information-flow approach.

For maximum speed and efficiency, massively parallel networks should be
structured so that processes can, brain-like, transform information flowing
through them with minimal delay.  And all should be embedded in appropriate
structures of processors.  Therefore hardware, software, and information
structures should all be designed to work together.  This paper examines the
set of issues involved in developing such systems, and describes several
designs for structures that process information flowing through.

These information-flow structures are brain-like in that processes continually
execute: integrate, differentiate, compare, decide, resolve, and compute a
general-purpose repertoire of useful functions.  The structures of processors
that carry out these processes are all built from neuron-like micro-modular
primitive structures.  They must work without any global or local controllers, 
and without any operating system (of today's traditional sort) that
synchronizes, handles communication, and avoids contention.  These issues must
be handled instead by the system's structure and the types of processes
executed.

%A Yannis E. Ioannidis
%A Raghu Ramakrishnan
%T Efficient Transitive Closure Algorithms
%D April 1988
%R TR 765
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X We present some efficient algorithms for computing the transitive
closure of a directed graph.  The algorithms are adapted to compute a
number of related queries such as the set of nodes reachable from a given
node, or queries posed over the set of paths in the transitive closure
such as the shortest path between each pair of nodes, or the longest path
from a given node.  We indicate how these algorithms could be adapted to a
significantly broader class of queries based on one-sided recursions.  We
also analyze these algorithms and compare them to algorithms in the
literature.  The results indicate that these algorithms, in addition to
their ability to deal with queries that are generalizations of transitive
closure, also perform very efficiently; in particular, in the context of a
disk-based database environment.

%A James R. Goodman
%A Philip J. Woest
%T The Wisconsin Multicube: A New Large-Scale Cache-Coherent Multiprocessor
%D April 1988
%R TR 766
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X The \fIWisconsin Multicube\fR, is a large-scale, shared-memory
multiprocessor architecture that employs a snooping cache protocol over a grid
of buses.  Each processor has a conventional (SRAM) cache optimized to
minimize memory latency and a large (DRAM) snooping cache optimized to reduce
bus traffic and to maintain consistency.  The large snooping cache should
guarantee that nearly all the traffic on the buses will be generated by I/O
and accesses to shared data.

The programmer's view of the system is like a multi -- a set of processors 
having access to a common shared memory with no notion of geographical 
locality.  Thus writing software, including the operating system, should be a
straightforward extension of those techniques being developed for multis.

The interconnection topology allows for a cache-coherent protocol for which
most bus requests can be satisfied with no more than twice the number of bus
operations required of a single-bus multi.  The total symmetry guarantees that
there are no topology-induced bottlenecks.  The total bus bandwidth grows in
proportion to the product of the number of processors and the average path
length. 

The proposed architecture is an example of a new class of interconnection 
topologies -- the \fIMulticube\fR -- which consists of \fIN = @n sup k@\fR
processors, where each processor is connected to \fIk\fR buses and each bus is
connected to \fIn\fR processors.  The hypercube is a special case where 
\fIn=\fR2.  The Wisconsin Multicube is a two-dimensional Multicube 
\fI(k=\fR2), where \fIn\fR scales to about 32, resulting in a proposed system
of over 1,000 processors.