leff@smu.UUCP (Laurence Leff) (05/12/88)
Bibliography of Technical Reports Computer Science Department University of Wisconsin December 1987 - April 1988 For copies, send requests to: Technical Report Librarian Computer Science Department University of Wisconsin 1210 W. Dayton Street Madison, WI 53706 %A Michael J. Carey %A David J. DeWitt %A Scott L. Vandenberg %T A Data Model and Query Language for EXODUS %D December 1987 %R TR 734 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper, we present the design of the EXTRA data model and the EXCESS query language for the EXODUS extensible database system. The EXTRA data model includes support for complex objects with shared subobjects, a novel mix of object- and value-oriented semantics for data, support for persistent objects of any type in the EXTRA type lattice, and user-defined abstract data types (ADTs). The EXCESS query language provides facilities for querying and updating complex object structures, and it can be extended through the addition of ADT functions and operators, procedures and functions for manipulating EXTRA schema types, and generic set functions. EXTRA and EXCESS are intended to serve as a test vehicle for tools developed under the EXODUS extensible database system project. %A Harry Plantinga %A Charles R. Dyer %T Construction and Display Algorithms for the Asp %D December 1987 %R TR 735 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X The \fIaspect representation\fR or \fIasp\fR is a continuous, viewer-centered representation for polyhedra introduced by Plantinga and Dyer (1987b). In this paper we present in detail algorithms for working with the asp under orthographic projection. We derive equations for aspect surfaces, show how to represent asps, and present a detailed algorithm for their construction. We then show how to display an image of the represented object from any viewpoint with hidden surfaces removed by finding a cross-section of the asp for a given viewpoint. We present separate algorithms for the convex and non-convex cases. %A Harry Plantinga %A Charles R. Dyer %T Visibility, Occlusion and the Aspect Graph %D December 1987 %R TR 736 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper we study the ways in which the topology of the image of a polyhedron changes with changing viewpoint. We catalog the ways that the "topological appearance" or \fIaspect\fR can change. This enables us to find maximal regions of viewpoints of the same aspect. We use these techniques to construct the \fIviewpoint space partition (VSP),\fR a partition of viewpoint space into maximal regions of constant aspect, and its dual, the \fIaspect graph\fR. In this paper we present tight bounds on the maximum size of the VSP and the aspect graph and give algorithms for their construction, first in the convex case and then in the general case. In particular, we give tight bounds on the maximum size of \(*H(@n sup 2@) and \(*H(@n sup 6@) under orthographic projection with a 2-D viewpoint space and of \(*H(@n sup 3@) and \(*H(@n sup 9@) under perspective projection with a 3-D viewpoint space. We also present properties of the VSP and aspect graph and give algorithms for their construction. %A Miron Livny %A Udi Manber %T \(*m - A System for Simulating and Implementing Distributed and Parallel Algorithms %D December 1987 %R TR 737 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X A system for simulation of distributed algorithms for a collection of workstations connected by a local area network is presented. The system, called \(*m, is intended for programmers who want to parallelize a program (or parts of it) in order to tap the CPU and memory space of idle workstations. Writing such parallel programs is difficult. \(*m simplifies this task significantly in many ways. First, \(*m uses only a small set of communication primitives on top of a regular programming language and makes the details of their implementation transparent to the user. \(*m translates the program into a discrete event system and interfaces with a simulated processing environment. The user need not be familiar at all with the simulation mechanism. Second, the environment is under complete control of the programmer. Tradeoffs such as communication speed vs. CPU speed can be easily tested, and decisions depending on them can be made more easily. Third, debugging is much simpler. The asynchronous and nondeterministic nature of the computation is preserved even though the simulation is executed on one machine. Fourth, \(*m is also the basis of a package for distributed programming. With it, \(*m programs are stripped (automatically) of their simulation parts and can be executed on several workstations with, in most cases, no modifications by the user. (This package is not completely implemented yet; it is expected to be ready very soon.) Several examples of application programs are presented. %A Deborah A. Joseph %A W. Harry Plantinga %T Efficient Algorithms for Polyghedron Collision Detection %D December 1987 %R TR 738 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper we present efficient algorithms for determining whether polyhedra undergoing linear translations will collide. We present algorithms for finding collisions between a pair of moving convex polyhedra and among several moving non-convex polyhedra. With the non-convex algorithm we also show how to determine \fIwhen\fR the objects will collide. The algorithms work by considering time to be a fourth dimension and solving the related 4-D polytope separation problem. In the convex case we present an algorithm that runs in O(n) time where n is the sum of the sizes of the objects. The algorithm is a generalization of an O(n)-time polyhedron separation algorithm of Dobkin and Kirkpatrick to 4-D convex prisms. In the non-convex case we present an algorithm that runs in O(@n sup 2@)-time. The algorithm is a generalization of the naive polyhedron separation algorithm. We generalize the naive algorithm to finding the separation of \fId\fR-polytopes in @E sup d@ for any \fId,\fR although we use only the 4-D case for collision detection. Both algorithms use O(n) space. In the process of finding the separation in the non-convex case we must determine whether a point is in a polytope in @E sup d@, and we present an algorithm for doing that. %A R.H. Clark %A R.R. Meyer %T Multiprocessor Algorithms for Generalized Network Flows %D December 1987 %R TR 739 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper presents parallel algorithms for the solution of generalized network optimization problems on a shared-memory multiprocessor. These algorithms exploit the quasi-tree forest basis structure of generalized networks by attempting to perform multiple simplex pivot operations in parallel on disconnected subtrees. We consider algorithms for both single-period generalized networks and multi-period generalized networks. In the latter case, the multi-period structure is utilized in the initial stage of the algorithms in order to initially partition the problem among processors. Computational experience on the Sequent Balance 21000 multiprocessor is presented that demonstrates linear and sometimes superlinear speedup for a large class of test problems. %A M. Muralikrishna %A David J. DeWitt %T Optimization of Multiple-Relation Multiple-Disjunct Queries %D January 1988 %R TR 740 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper we discuss the optimization of multiple-relation multiple-disjunct queries in a relational database system. Since optimization techniques for conjunctive (single disjunct) queries in relational databases are well known (Smith75, Wong76, Selinger79, Yao79, Youssefi79), the natural way to evaluate a multi-disjunct query was to execute each disjunct independently (Bernstein81, Kerschberg82). However, evaluating each disjunct independently may be very inefficient. In this paper, we develop methods that merge two or more disjuncts to form a term. The advantage of merging disjuncts to form terms lies in the fact that each term can be evaluated with a single scan of each relation that is present in the term. In addition, the number of times a join is performed will also be reduced when two or more disjuncts are merged. The criteria for merging a set of disjuncts will be presented. As we will see, the number of times each relation in the query is scanned will be equal to the number of terms. Thus, minimizing the number of terms will minimize the number of scans for each relation. We will formulate the problem of minimizing the number of scans as one of covering a merge graph by a minimum number of complete merge graphs which are a restricted class of cartesian product graphs. In general, the problem of minimizing the number of scans is NP-complete. We present polynomial time algorithms for special classes of merge graphs. We also present a hueristic for general merge graphs. Throughout this paper, we will assume that no relations have any indices on them and that we are only concerned with reducing the number of scans for each relation present in the query. What about relations that have indices on them? It turns out that our performance metric of reducing the number of scans is beneficial even in the case that there are indices. In (Muralikrishna88) we demonstrate that when optimizing single-relation multiple-disjunct queries, the cost (measured in terms of disk accesses) may be reduced if all the disjuncts are optimized together rather than individually. Thus, our algorithm for minimizing the number of terms is also very beneficial in cases where indices exist. %A Judy Goldsmith %A Deborah Joseph %A Paul Young %T A Note on Bi-Immunity and P-Closeness of P-Cheatable Sets in P/Poly %D January 1988 %R TR 741 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper we study the interplay between three measures of polynomial time behavior in sets: \fIp\fRcheatability, \fIp\fR-closeness, and membership in \fIP/poly\fR. First we construct @2 sup k@ - 1 \fIfor k - p\fRcheatable sets that are bi-immune with respect to all recursively enumerable sets. We show that the constructed sets are in \fIP/poly,\fR but can be constructed so that they are \fInot p\fR-close. In fact, they can be constructed so that they are not even \fIrecursively\fR-close. Next, we construct \fIn for 2 - p\fRcheatable sets that are bi-immune with respect to arbitrarily difficult deterministic time classes. These sets are also in \fIP/poly,\fR and they also can be constructed so that they are \fInot p\fRclose. Finally, we construct a set that is \fIn for 1 - p\fRcheatable but is \fInot p\fR-close, although it too is in \fIP/poly\fR. These results show that, although \fIp\fRcheatabe, \fIP/poly,\fR and \fIp\fR-close sets all exhibit some form of polynomial time behavior, the notions of \fIp\fRcheatability and \fIp\fR-closeness are often orthogonal. In addition, the results on \fIp\fR-closeness answer conjectures made in (A&G-87). %A David J. DeWitt %A Shahram Ghandeharizadeh %A Donovan Schneider %T A Performance Analysis of the Gamma Database Machine %D January 1988 %R TR 742 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper presents the results of an initial performance evaluation of the Gamma database machine based on an expanded version of the single-user Wisconsin benchmark. In our experiments we measured the effect of relation size and indices on response time for selection, join, and aggregation queries, and single-tuple updates. A Teradata DBC/1012 database machine of similar size is used as a basis for interpreting the results obtained. We also analyze the performance of Gamma relative to the number of processors employed and study the impact of varying the memory size and disk page size on the execution time of a variety of selection and join queries. We analyze and interpret the results of these experiments based on our understanding of the system hardware and software, and conclude with an assessment of the strengths and weaknesses of the two machines. %A Mary K. Vernon %A Udi Manber %T Distributed Round-Robin and First-Come First-Serve Protocols %D February 1988 %R TR 745 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Two new distributed protocols for fair and efficient bus arbitration are presented. The protocols implement round-robin (RR) and first-come first-serve (FCFS) scheduling, respectively. Both protocols use relatively few control lines on the bus, and their logic is simple. The round-robin protocol, which uses \fIstatically assigned\fR arbitration numbers to resolve conflict during an arbitration, is more robust and simpler to implement than previous distributed RR protocols that are based on rotating agent priorities. The proposed FCFS protocol uses partly static arbitration numbers, and is the first practical proposal for a FCFS arbiter known to the authors. The proposed protocols thus have a better combination of efficiency, cost, and fairness characteristics than existing multiprocessor bus arbitration algorithms. Three implementations of our RR protocol, and two implementations of our FCFS protocol, are discussed. Simulation results are presented that address: 1) the practical potential for unfairness in the simpler implementation of the FCFS protocol, 2) the practical implications of the higher waiting time variance in the RR protocol, and 3) the allocation of bus bandwidth among agents with unequal request rates in each protocol. The simulation results indicate that there is very little practical difference in the performance of the two protocols. %A M.K. Vernon %A E.D. Lazowska %A J. Zahorjan %T An Accurate and Efficient Performance Analysis Technique for Multiprocessor Snooping Cache-Consistency Protocols %D February 1988 %R TR 746 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X A number of dynamic cache consistency protocols have been developed for multiprocessors having a shared bus interconnect between processors and shared memory. The relative performance of these protocols has been studied extensively using simulation and detailed analytical models based on Markov chain techniques. Both of these approaches use relatively detailed models, which capture cache and bus interference rather precisely, but which are highly expensive to evaluate. In this paper, we investigate the use of a more abstract and significantly more efficient analytical model for evaluating the relative performance of cache consistency protocols. The model includes bus interference, cache interference, and main memory interference, but represents the interactions between the caches by steady-state mean collision rates which are computed by iterative solution of the model equations. We show that the speedup estimates obtained from the mean-value model are highly accurate. The results agree with the speedup estimates of the detailed analytical models to within 3%, over all modifications studied and over a wide range of parameter values. This result is surprising, given that the distinctions among the protocols are quite subtle. The validation experiments include sets of reasonable values of the workload parameters, as well as sets of unrealistic values for which one might expect the mean-value equations to break down. The conclusion we can draw is that this modeling technique shows promise for evaluating architectural tradeoffs at a much more detailed level than was previously thought possible. We also discuss the relationship between results of the analytical models and the results of independent evalutations of the protocols using simulation. %A Rajeev Jog %A Gurindar S. Sohi %A Mary K. Vernon %T The TREEBus Architecture and Its Analysis %D February 1988 %R TR 747 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper investigates a cache coherence protocol for a hierarchical, general-purpose multiprocessor that we refer to as the TREEBus architecture. We apply the concept of non-identical dual tag directories to the specification of the protocol. We then develop a sophisticated mean-value queueing model that considers bus latency, shared memory latency and bus interference as the primary sources of performance degradation in the system. A unique feature of the model is its high-level input parameters, which can be related to application program characteristics. Another nice property of the model is that it can be solved for very large systems in a matter of seconds. Using the model we reach several important conclusions. First, the system topology has an important effect on overall performance. We find optimal two-level, three-level and four-level topologies that distribute the load evenly across all bus levels, resulting in much higher performance. For a particular system size, the optimal topology is the topology with the minimum number of levels that has sufficient total bandwidth in the bus subsystem to support the memory request rates of the processors. We also provide processing power estimates for these topologies, under a given set of workload assumptions. For example, the optimal three-level TREEBus topology, supporting 512 processors each with a peak processing rate of 4 MIPS, assuming 22% of the data references are to globally shared data and that the shared data is read on the average by a significant fraction of processors between write operations, provides an effective 1400 (1700) MIPS in processing power, if the buses operate at 20 (40) MHz. Results of our study also indicate that for reasonbably low cache miss rates (3% at level 0), and 20 MHz buses, the bus subnetwork saturates with processor speeds of 6-8 MIPS, at least for topologies of five or fewer levels. Finally, we present experimental results that indicate how performance is affected by the locality and pattern of sharing, and by the miss ratios that are due to cache size limitations. %A Scott T. Leutenegger %A Mary K. Vernon %T A Mean-Value Performance Analysis of a New Multiprocessor Architecture %D February 1988 %R TR 748 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper presents a preliminary performance analysis of a new large-scale multiprocessor: the \fIWisconsin Multicube\fR. A key characteristic of the machine is that it is based on shared buses and a snooping cache coherence protocol. The organization of the shared buses and shared memory is unique and non-hierarchical. The two-dimensional version of the architecture is envisioned as scaling to 1024 processors. We develop an approximate mean-value analysis of bus interference for the proposed cache coherence protocol. The model includes FCFS scheduling at the bus queues with deterministic bus access times, and asynchronous memory write-backs and invalidation requests. We use our model to investigate the feasibility of the multiprocessor, and to study some initial system design issues. Our results indicate that a 1024-processor system can operate at 75 - 95% of its peak processing power, if the mean time between cache misses is larger than 1000 bus cycles (i.e. 50 microseconds for 20 MHz buses; 25 microseconds for 40 MHz buses). This miss rate is not unreasonable for the cache sizes specified in the design, which are comparable to main memory sizes in existing multiprocessors. We also present results which address the issues of optimal cache block size, optimal size of the two-dimensional Multicube, the effect of broadcast invalidations on system performance, and the viability of several hardware techniques for reducing the latency for remote memory requests. %A Judy Goldsmith %A Deborah Joseph %A Paul Young %T Using Self-Reducibilities to Characterize Polynomial Time %D February 1988 %R TR 749 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper we study the effect that the \fIself-reducibility\fR properties of a set have on its time complexity. In particular, we show that the extent to which a set is self-reducible can determine whether or not the set is polynomially decidable. Our results concern three variations on the notion of Turing self-reducibility: \fInear-testability, word decreasing query self-reducibility,\fR and \fIp-cheatability\fR. Our first pair of results provide characterizations of polynomial time by coupling the notion, first of Turing self-reducibility, then of near-testability, with the notion of p-cheatability. In contrast, our final theorems show that if specific nondeterministic and deterministic time classes do not collapse, then 1) we cannot similarly characterize polynomial time by coupling word decreasing query reducibility with a variant of p-cheatability, and 2) we cannot characterize polynomial time by coupling \fIparity-P\fR and p-cheatability. %A M. Muralikrishna %T Optimization of Multiple-Disjunct Queries in a Relational Database System %D February 1988 %R TR 750 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this thesis, we describe the optimization of arbitrarily complex queries expressed in relational calculus. The qualification list is allowed to be any complex boolean expression involving both ANDs and ORs. In other words, the qualification list may have an arbitrary number of disjuncts. The query graph of each disjunct may also have any number of components. Optimizing the various disjuncts independently of each other can be very inefficient. Considerable savings in cost can be achieved by optimizing the various disjuncts together. In a multiple-relation multiple-disjunct query, it may be possible to combine two or more disjuncts into one term. This will cut down the number of scans on each relation and also the number of times each join is performed. The objective will be to merge the disjuncts into the minimum number of terms. Minimizing the number of terms can be formulated as the problem of covering a merge graph with the minimum number of complete merge graphs, which are a restricted class of cartesian product graphs. The problem of minimizing the number of terms is NP-complete. We present polynomial time algorithms for special classes of merge graphs. We provide a heuristic for general merge graphs. For single-relation multiple-disjunct queries involving more than one attribute, an optimal access path might consist of more than one index. The cost in our optimization model, for single relation queries, is measured in terms of the number of pages fetched from disk. We will formulate the problem of finding a set of optimal access paths for a single-relation multiple-disjunct query as one of finding a minimum weighted vertex cover in a hypergraph. Finding the cheapest vertex cover in a hypergraph is NP-complete. We present a new approximation algorithm that gives near optimal vertex covers for random hypergraphs over a wide range of edge probabilities. We also demonstrate the usefulness of equi-depth multi-dimensional histograms in optimizing queries using multi-dimensional indices. %A Kifung C. Cheung %A Gurindar S. Sohi %A Kewal K. Saluja %A Dhiraj K. Pradhan %T Design and Analysis of a Gracefully-Degrading Interleaved Memory System %D February 1988 %R TR 751 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X A hardware mechanism has been proposed to reconfigure an interleaved memory system. The reconfiguration scheme is such that, at any instant all fault-free memory banks in the memory system can be utilized in an interleaved manner. The design of the hardware that enables the reconfiguration is discussed. The reconfiguration scheme proposed in this paper is analyzed for a number of distinct benchmark programs. It is shown that memory system performance degrades gracefully as the number of faulty banks increase if the memory system uses the proposed reconfiguration scheme. %A A.R. Pleszkun %A G.S. Sohi %T The Performance Potential of Multiple Functional Unit Processors %D February 1988 %R TR 752 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper, we look at the interaction of pipelining and multiple functional units in single processor machines. As a basis for our studies we use a CRAY-like processor model and the issue rate (instructions per clock cycle) as the performance measure. We initially find that in non-vector machines, pipelining multiple function units does not provide significant performance improvements. Dataflow limits are then derived for our benchmark programs to determine the performance potential of each benchmark. In addition to a traditional dataflow limit computation, several other limits are computed which apply more realistic constraints on a computation. Based on these more realistic limits, we determine it is worthwhile to investigate the performance improvements that can be achieved from issuing multiple instructions each clock cycle. Several approaches are proposed and evaluated for issuing multiple instructions each clock cycle. %A G.S. Sohi %T Logical Data Skewing Schemes for Interleaved Memories in Vector Processors %D February 1988 %R TR 753 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Sustained memory bandwidth for a range of strides is a key to high-performance vector processing. To reduce the number of memory bank conflicts and thereby increase memory bandwidth, one often resorts to data skewing schemes. In this paper, we present a family of data skewing schemes called logical skewing schemes. In logical skewing schemes, the location of a data element is determined solely by logical manipulation of the address bits. Our experiments show that logical skewing schemes allow for better vector memory system performance than other known data skewing schemes without the significant increase in memory latency that is traditionally associated with data skewing schemes. %A Barton P. Miller %A Jong-Deok Choi %T A Mechanism for Efficient DeBugging of Parallel Programs %D February 1988 %R TR 754 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). We describe the use of \fIflowback analysis\fR to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. We introduce a mechanism called \fIincremental tracing\fR that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. We extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions of the co-operating processes. %A Susan Horwitz %A Thomas Reps %A David Binkley %T Interprocedural Slicing Using Dependence Graphs %D March 1988 %R TR 756 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X A slice of a program with respect to a program point \fIp\fR and variable \fIx\fR consists of all statements of the program that might affect the value of \fIx\fR at point \fIp\fR. This paper concerns the problem of interprocedural slicing --generating a slice of an entire program, where the slice crosses the boundaries of procedure calls. To solve this problem, we introduce a new kind of graph to represent programs, called a \fIsystem dependence graph\fR, which extends previous dependence representations to incorporate collections of procedures (with procedure calls) rather than just monolithic programs. Our main result is an algorithm for interprocedural slicing that uses the new representation. The chief difficulty in interprocedural slicing is correctly accounting for the calling context of a called procedure. To handle this problem, system dependence graphs include some data-dependence edges that represent \fItransitive\fR dependencies due to the effects of procedure calls, in addition to the conventional direct-dependence edges. These edges are constructed with the aid of an auxiliary structure that represents calling and parameter-linkage relationships. This structure takes the form of an attribute grammar. The step of computing the required transitive-dependence edges is reduced to the construction of the subordinate characteristic graphs for the grammar's nonterminals. It should be noted that our work concerns a somewhat restricted kind of slice: Rather than permitting a program to be sliced with respect to program point \fIp\fR and an \fIarbitrary\fR variable, a slice must be taken with respect to a variable that is \fIdefined\fR at or \fIused\fR at \fIp\fR. %A Eric Bach %A Victor Shoup %T Factoring Polynomials Using Fewer Random Bits %D March 1988 %R TR 757 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Let \fIF\fR be a field of \fIq = @p sup n@\fR elements, where \fIp\fR is prime. We present two new probabilistic algorithms for factoring polynomials in \fIF(X)\fR that make particularly efficient use of random bits. They are easy to implement, and require no randomness beyond an initial seed whose length is proportional to the input size. The first algorithm is based on a procedure of Berlekamp; on input \fIf\fR in \fIF(X)\fR of degree \fId\fR, it uses \fId\fR@log sub 2@\fIp\fR random bits and produces in polynomial time a complete factorization of \fIf\fR with a failure probability of no more than 1/@\fIp sup (1-\(mo)1/2d@\fR. (Here \(mo denotes a fixed parameter between 0 and 1 that can be chosen by the implementor.) The second algorithm is based on a method of Cantor and Zassenhaus; it uses \fId@log sub 2@ \fIq\fR random bits and fails to find a complete factorization with probability no more than 1/@\fIq sup (1-\(mo)1/4d@\fR. For both of these algorithms, the failure probability is exponentially small in the number of random bits used. %A Michael J. Carey %A Miron Livny %T Distributed Concurrency Control Performance: A Study of Algorithms, Distribution, and Replication %D March 1988 %R TR 758 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Many concurrency control algorithms have been proposed for use in distributed database systems. Despite the large number of available algorithms, and the fact that distributed database systems are becoming a commercial reality, distributed concurrency control performance trade-offs are still not well understood. In this paper we attempt to shed light on some of the important issues by studying the perfomance of four representative algorithms -- distributed 2PL, wound-wait, basic timestamp ordering, and a distributed optimistic algorithm -- using a detailed simulation model of a distributed DBMS. We examine the performance of these algorithms for various levels of contention, "distributedness" of the workload, and data replication. The results should prove useful to designers of future distributed database systems. %A Barton P. Miller %T The Frequency of Dynamic Pointer References in "C" Programs %D March 1988 %R TR 759 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X A collection of "C" programs was measured for the number of dynamic references to pointers. The number of dynamic references to pointers is represented with respect to the total number of instructions a program executes, giving the percentage of pointer references executed in a "C" program. The measurements were done on a VAX 11/780 running the Berkeley UNIX operating system. The measured programs were selected by examining the most commonly run programs on the Computer Sciences Department UNIX machines. The measurement process was performed in two steps: (1) the dynamic counting of pointer references, and (2) the counting of the total number of instructions executed by the program. %A Charles V. Stewart %A Charles R. Dyer %T Simulation of a Connectionist Stereo Algorithm on a Shared-Memory Multiprocessor %D March 1988 %R TR 760 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper presents results on the simulation of a connectionist network for stereo matching on a shared-memory multiprocessor. The nodes in the network represent possible matches between points in each image. Because we only consider matches between intensity edges, only a few of these nodes actually represent candidate matches for a given pair of images; consequently, only the active part of the network is ever constructed by the simulation. This includes the nodes representing candidate matches and the connections between these candidate matches as defined by a variety of constraints. Thus, the simulation involves constructing a list of matches and a list of connections, and simulating the iterations of the network. Each of these phases can be partitioned among a number of processors with very little interference between the processors due to synchronization or mutual exclusion. The resulting parallel implementation produced nearly linear speed-ups for up to the nine processors in our system. %A Yannis E. Ioannidis %A Miron Livny %T Data Modeling in DE @size 14 L@AB %D March 1988 %R TR 761 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X DE @size 14 L@AB is a simulation laboratory currently under construction, that aims to provide programmers and system analysts with support for the construction of complex simulators and the management of long term simulation studies. The size of the data generated by such studies makes a DBMS an important module of the laboratory. Several unique data modeling issues, which can only be partially addressed by current models, are raised by the special nature of simulation studies. In this paper, we describe the salient features of MOOSE (MOdeling Objects in a Simulation Environment), which is the model we have developed to capture the semantics of such databases. MOOSE supports the notion of object in the strong sense, it supports collections of objects in various flavors (sets, multisets, and arrays), and it supports sharing among objects. Objects belong to various classes, some of which may be defined by specialization rules. Structural constraints between classes can be specified to capture the exact semantics of the relationships between classes, especially with respect to sharing. Finally, every MOOSE schema has a straightforward graph representation, thus facilitating a graphics interface to the database. %A Jorg Peters %T A Parallel Algorithm for Minimal Cost Network Flow Problems %D April 1988 %R TR 762 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this paper we explore how simplex-based algorithms can take advantage of multi-processor capability in solving minimal cost network flow problems. We present an implementation of such an algorithm that stresses the importance of parallelized pricing. This implementation solves all NETGEN test problems of size (5000 x 25000) in less than 50 seconds wall clock time on the Sequent Symmetry S-81 using 6 processors. Additionally, we contrast this algorithm, based on specialized processes, with algorithms that execute a uniform code on each processor. We discuss why specialized processes perform better on the set of test problems. %A Victor Shoup %T New Algorithms for Finding Irreducible Polynomials over Finite Fields %D April 1988 %R TR 763 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Let \fIp\fR be a prime number, \fIF\fR be the finite field with \fIp\fR elements, and \fIn\fR be a positive integer. We present new algorithms for finding irreducible polynomials in \fIF(X)\fR of degree \fIn\fR. We show that in time polynomial in \fIn\fR and log \fIp\fR we can reduce the problem of finding an irreducible polynomial over \fIF\fR of degree \fIn\fR to the problem of factoring polynomials over \fIF\fR. Combining this with Berlekamp's deterministic factoring algorithm, we obtain a deterministic algorithm for finding irreducible polynomials that runs in time polynomial in \fIn\fR and \fIp\fR. This is useful when \fIp\fR is small. Unlike earlier results in this area, ours does not rely on any unproven hypotheses, such as the Extended Riemann Hypothesis. We also present a new randomized algorithm for finding irreducible polynomials that runs in time polynomial in \fIn\fR and log \fIp\fR and makes particularly efficient use of randomness. It uses \fIn\fRlog\fIp\fR random bits, and fails with probability less than 1/@\fIp sup \(*a n@\fR where \(*a is a constant between 0 and 1/4. This result is interesting in a setting where random bits are viewed as a scarce resource. %A Leonard Uhr %T Process-Structured Architectures to Transform Information Flowing Through %D April 1988 %R TR 764 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X It is rapidly becoming possible to build networks of many millions of processors. These massively parallel networks should be capable of perceiving, remembering, and reasoning in real time, even approaching the brain's speed and power - IF we knew how to build them. This paper proposes and explores a promising but largely ignored information-flow approach. For maximum speed and efficiency, massively parallel networks should be structured so that processes can, brain-like, transform information flowing through them with minimal delay. And all should be embedded in appropriate structures of processors. Therefore hardware, software, and information structures should all be designed to work together. This paper examines the set of issues involved in developing such systems, and describes several designs for structures that process information flowing through. These information-flow structures are brain-like in that processes continually execute: integrate, differentiate, compare, decide, resolve, and compute a general-purpose repertoire of useful functions. The structures of processors that carry out these processes are all built from neuron-like micro-modular primitive structures. They must work without any global or local controllers, and without any operating system (of today's traditional sort) that synchronizes, handles communication, and avoids contention. These issues must be handled instead by the system's structure and the types of processes executed. %A Yannis E. Ioannidis %A Raghu Ramakrishnan %T Efficient Transitive Closure Algorithms %D April 1988 %R TR 765 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X We present some efficient algorithms for computing the transitive closure of a directed graph. The algorithms are adapted to compute a number of related queries such as the set of nodes reachable from a given node, or queries posed over the set of paths in the transitive closure such as the shortest path between each pair of nodes, or the longest path from a given node. We indicate how these algorithms could be adapted to a significantly broader class of queries based on one-sided recursions. We also analyze these algorithms and compare them to algorithms in the literature. The results indicate that these algorithms, in addition to their ability to deal with queries that are generalizations of transitive closure, also perform very efficiently; in particular, in the context of a disk-based database environment. %A James R. Goodman %A Philip J. Woest %T The Wisconsin Multicube: A New Large-Scale Cache-Coherent Multiprocessor %D April 1988 %R TR 766 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X The \fIWisconsin Multicube\fR, is a large-scale, shared-memory multiprocessor architecture that employs a snooping cache protocol over a grid of buses. Each processor has a conventional (SRAM) cache optimized to minimize memory latency and a large (DRAM) snooping cache optimized to reduce bus traffic and to maintain consistency. The large snooping cache should guarantee that nearly all the traffic on the buses will be generated by I/O and accesses to shared data. The programmer's view of the system is like a multi -- a set of processors having access to a common shared memory with no notion of geographical locality. Thus writing software, including the operating system, should be a straightforward extension of those techniques being developed for multis. The interconnection topology allows for a cache-coherent protocol for which most bus requests can be satisfied with no more than twice the number of bus operations required of a single-bus multi. The total symmetry guarantees that there are no topology-induced bottlenecks. The total bus bandwidth grows in proportion to the product of the number of processors and the average path length. The proposed architecture is an example of a new class of interconnection topologies -- the \fIMulticube\fR -- which consists of \fIN = @n sup k@\fR processors, where each processor is connected to \fIk\fR buses and each bus is connected to \fIn\fR processors. The hypercube is a special case where \fIn=\fR2. The Wisconsin Multicube is a two-dimensional Multicube \fI(k=\fR2), where \fIn\fR scales to about 32, resulting in a proposed system of over 1,000 processors.