[comp.doc.techreports] wisconsin.v

leff@smu.UUCP (Laurence Leff) (03/17/88)
Bibliography of Technical Reports
Computer Science Department
University of Wisconsin
September 1987 - December 1987

For copies, send requests to:

Technical Report Librarian
Computer Science Department
University of Wisconsin
1210 W. Dayton Street
Madison, WI 53706

%A Vasant Honavar
%A Leonard Uhr
%T Recognition Cones: A Neuronal Architecture for Perception and Learning
%D September 1987
%R TR 717
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X There is currently a great deal of interest and activity in 
developing connectionist, neuronal, brain-like models, in both Artificial 
Intelligence and Cognitive Science.  This paper specifies the main features
of such systems, and examines "recognition cone" models of perception from
this perspective, as examples of structures of neuron-like units combined 
into successively larger brain-like modules.  Issues addressed include 
architecture, information flow, and the parallel-distributed nature of 
processing and control in recognition cones; and their use in perception
and learning.

%A Deepankar Medhi
%T Decomposition of Structured Large-scale Optimization Problems and 
Parallel Optimization 
%D September 1987
%R TR 718
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this dissertation, we present serial and parallel algorithms 
to solve efficiently the problem: inf {@sum from i=1 to N~f sub i ( x sub i )~ |~ sum from i=1 to N
functions (not necessarily differentiable) taking values in the extended 
real line @(\- \(if, ~\(if@].  For example, block-angular linear programming 
problems and linear multi-commodity network optimization problems can be cast
into the above form.  In our approach, we take the Rockafellar dual of the
problem to arrive at an unconstrained nonsmooth maximization problem.  The
difficulty arises from the nonsmoothness of the dual objective.  Traditional
subgradient methods are not good enough as they do not have implementable 
stopping criterion and are reported to have slow convergence.  One also may
not obtain a primal optimal solution at the end.  Instead, we apply a modified
bundle algorithm, which has an implementable stopping criterion, and more
importantly, one can recover an approximate primal optimal solution.  We also
obtain some theoretical \fIa posteriori\fR error information on the approximate
solution.  We have implemented this algorithm on randomly generated block-
angular linear programming problems of size up to 4,000 equality constraints
and 10,000 variables.  Our implementation ran up to seventy times faster 
than MINOS version 5.0, and did substantially better than an advanced 
implementation of the Dantzig-Wolfe decomposition method.  Thus we think 
that for this type of problem, our algorithm is very promising.

A nice feature of the dual problem is that it breaks up the original problem 
into smaller \fIindependent\fR subproblems.  Exploiting this fact, we present
parallel algorithms implemented on the CRYSTAL multicomputer.  We considered
two groups of test problems for these algorithms, one in which the subproblems
required approximately equal amounts of time to solve, and another in which
the solution times varied.  In the first group, we obtained 70% -80% efficiency
with up to eleven processors.  In the second group, we obtained 60% or more 
efficiency with relatively small problems and with up to five processors.

%A Kenneth Kunen
%T Signed Data Dependencies in Logic Programs 
%D October 1987
%R TR 719
%I COMPUTER SCIENCES DEPAERTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Logic programming with negation has been given a declarative semantics
by Clark's Completed Database, CDB, and one can consider the consequences of
the CDB in either 2-valued or 3-valued logic.  Logic programming also has a
proof theory given by SLDNF derivations.  Assuming the data dependency condition
of \fIstrictness\fR, we prove that the 2-valued and 3-valued semantics are
equivalent.  Assuming \fIallowedness\fR (a condition on occurrences of 
variables), we prove that SLDNF is complete for the 3-valued semantics.  
Putting these two results together, we have completeness of SLDNF deductions
for strict and allowed databases and queries under the standard 2-valued 
semantics.  This improves a theorem of Casedon and Lloyd, who obtained the 
same result under the additional assumption of \fIstratifiability\fR.

%A Carl de Boor
%A Klaus Hollig
%A Sherman Riemenschneider
%T Fundamental Solutions for Multivariate Difference Equations and 
Applications to Cardinal Interpolation
%D October 1987
%R TR 720
%I COMPUTER SCIENCE DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Let @b~:~ZZ sup d~ \(->~ IR@ be a mesh-function with compact support.
It is shown that the difference equation @sum from j\(moZZ sup d ~b(k~-~j)a
(j)~=~f(k),~~~k~\(mo~ZZ sup d@, has a bounded solution \fIa\fR if@|f(j)|~=~O((1+
|j|) sup -n@) for some exponent \fIn\fR which depends on \fIb\fR.  This 
result is the discrete analogue of the existence of tempered fundamental 
solutions for partial differential operators with constant coefficients.  
It is applied to prove optimal convergence rates for interpolation with
box-splines.

%A Seymour V. Parter
%T Remarks on the Solution of Toeplitz Systems of Equations 
%D October 1987
%R TR 721
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X In this survey we recall some of the theory of Toeplitz matrices
which is relevant for the questions which arise in the inversion of Toeplitz
systems of equations.  In the course of our presentation we discuss some 
examples and raise some particular questions.

%A Wei-Chung Hsu
%T Register Allocation and Code Scheduling for Load/Store Architectures 
%D October 1987
%R TR 722
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X to achieve high performance, the structure of on-chip memory in
a single-chip computer must be appropriate, and it must be allocated 
effectively to minimize off-chip communication.  Since the off-chip memory
bandwidth of single-chip computers is severely limited, data caches that
exploit spatial locality to achieve high hit rates are not appropriate.  
A register file, which can be managed by compilers, might be more effective
than a data cache as an on-chip memory structure.  With a load/store 
architecture, compilers can separate operand fetches from their use by 
scheduling code, thus achieving high hit rates without increasing memory 
traffic.  Register allocation also exploits temporal locality better than 
a data cache does.

This thesis investigates how effective register allocation could be and studies
the interdependency problem between register allocation and code scheduling.
A model of perfect register allocation is explored.  An algorithm for optimal
local register allocation is then developed.  Since the optimal algorithm 
needs exponential time in the worst case, a heuristic algorithm which has 
near-optimal performance is proposed and compared with other existing heuristic
algorithms.  Through trace simulation, the perfect register allocation model
is shown to be much more effective in reducing off-chip memory traffic than
cache memory of the same size.

Code scheduling interferes with register allocation, especially for large 
basic blocks.  Two methods are proposed to solve this interference: (1) an
integrated code scheduling technique; and (2) a DAG-driven register allocator.
The integrated code scheduling method combines two scheduling techniques--one
to reduce pipeline delays and the other to minimize register usage--into a
single phase.  By keeping track of the number of available registers, the 
scheduler can choose the appropriate scheduling technique to schedule a better
code sequence.  The DAG-driven register allocator uses the Dependency DAG to
assist in assigning registers; it introduces much less extra dependency than
does an ordinary register allocator.  Both approaches are shown to generate
more efficient code sequences than conventional techniques in the 
simulations.

%A Yannis E. Ioannidis
%A Joanna Chen
%A Mark A. Friedman
%A Manolis M. Tsangaris
%T Bermuda - An Architectural Perspective on Interfacing Prolog to a Database
Machine
%D October 1987
%R TR 723
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X We describe the design and implementation of BERMUDA, which is a 
system interfacing Prolog to the Britton-Lee Intelligent Database Machine
(IDM).  We discuss several architectural issues faced by such systems, and we
present the solutions adopted in BERMUDA.  In BERMUDA, rules are stored in
Prolog, and facts are primarily stored in a database.  BERMUDA has been 
designed and implemented so that multiple concurrent Prolog processes, possibly
running on different machines, can share a database.  Moreover, the semantics of
Prolog programs remain unchanged and the use of a database system is transparent to the user.  Finally, BERMUDA has achieved a certain level of portability by 
using the given Prolog interpreter and database system (almost) unchanged.  
BERMUDA also employs several novel techniques to make the interface of Prolog
to the database efficient.

%A Goetz Graefe
%T Rule-Based Query Optimization in Extensible Database Systems 
%D November 1987
%R TR 724
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This thesis presents the problems of query optimization in extensible
database systems and proposes a solution.  It describes the design and an 
initial evaluation of the query optimizer generator developed for the EXODUS
extensible database system.  The goal of the EXODUS system is to provide 
software tools and libraries to structure and to ease the task of implementing
or extending a database system for a new data model.  Our basic model of 
optimization is to map a query tree, which consists of operators at the nodes
and stored data at the leaves, to an access plan, which is a tree with 
implementation methods at the nodes and scans at the leaves.  The optimizer
generator translates algebraic equivalence rules into an executable optimizer.
The equivalence rules are specific to the data model.  The generated optimizer
reorders query trees and selects implementation methods according to cost
functions associated with the methods.  The search strategy of the optimizer
avoids exhaustive search by learning from past experience.

We report on two operational optimizers.  Experiments with a restricted
relational system show that the generated optimizer produces access plans of
almost the same anticipated execution cost as those produced by exhaustive 
search, with the search time cut to a small fraction.  Another set of 
experiments shows that a generated optimizer is able to handle large queries.

An optimizer currently under development for a new query evaluation method
shows the power and flexibility of the approach.  Other researchers have 
decided to use the optimizer generator for their database implementation work.

Independently from the EXODUS project, the optimizer generator proved to be
a valuable tool for exploring the trade-offs between left-deep execution 
trees and general execution trees in relational database systems.  Our
experiments show that for bushy trees, the higher optimization cost and the
cost for creating and reading temporary files can be more than compensated
by the reduction in processing cost.

%A Victor Shoup
%T Finding Witnesses Using Fewer Random Bits
%D November 1987
%R TR 725
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Let @G@ be a proper subgroup of @ZZ* sub n@, the multiplicative
group of units modulo @n@.  Many number theoretic algorithms assume that an
element in @ZZ* sub n ~-~ G@ can easily be found.  In this context, an 
element in @ZZ* sub n ~-~ G@ is often called a "witness."  Ankeny's theorem
states that, assuming the ERH, the smallest witness is @O(log sup 2 ~n)@.
The purpose of this paper is to examine a "randomized" Ankeny's conjecture. 
Consider the following experiment.  Choose @a ~ \(mo ~ ZZ sub n@ at random.
Examine the elements @a,~ a+1,~.~.~.~,~a + k ~-~ 1@, where @k ~=~ O(log sup c
 n)@ for some constant @c@.  We would like the probability that none of these
are witnesses to be small.  The randomized Ankeny conjecture is that this
probability is @O(1/n sup \(*a@) for some constant 0 < @\(*a@ < 1.  We show
that if the randomized Ankeny conjecture is true, then a deterministic Ankeny
conjecture, which allows us to efficiently find witnesses deterministically, 
is already true.  We also prove some partial randomized Ankeny results, which
state that we can bound the probability of not finding a witness by 
@O(1/p sup{1/2 - \(mo}@), where @p@ is a prime divisor of @n@ that is 
"nontrivial" on @G@.

%A Charles V. Stewart
%A Charles R. Dyer
%T Local Constraint Integration in a Connectionist Model of Stereo Vision
%D November 1987
%R TR 726
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X The stereo matching algorithms proposed in the research literature
have been fairly successful, but none has completely solved the problem.  We
present a number of steps toward improving matching by: (1) analyzing and 
reformulating a number of important matching constraints, including uniqueness,
coarse-to-fine multiresolution, fine-to-coarse multiresolution, detailed 
match, figural continuity and the disparity gradient.  (2) Building an algorithm
that integrates the influence of these constraints @cooperatively@ and @in
parallel@ using the General Support Principle.  This principle states that
except for uniqueness, @only~positive@ constraint influences are employed.
(3) Implementing this General Support Algorithm using a connectionist network.
Such a network allows the constraints to be integrated naturally in a parallel,
relaxation computation.  The resulting connectionist network has been 
simulated and tested.  It generally produces a high percentage (95-99%) of
correct matching decisions.  This testing has enabled us to understand the
types of problems that @locally@ defined constraints can not accommodate, and
leads to ideas for non-local mechanisms to address these problems.

%A Gurindar Sohi
%T The Effects of Faults in Cache Memories
%D November 1987
%R TR 727
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X As processor speeds increase, on-chip caches to provide adequate
memory bandwidth are becoming increasingly important.  Such caches are prone
to faults both during manufacturing and during normal processor operation
because of the large density of active components.  Since the CPU's 
interactions with the memory dictate the performance of the processor, it
is important to evaluate the effect of faults in the cache memory system.

Faults in components such as registers, busses, control logic, etc., are
critical faults because the processor will cease to operate correctly unless
some action is taken to tolerate such faults.  Cache memory, on the other
hand, is not a critical component of the processor - it is present mainly
for performance reasons.  The processor will be able to operate in a correct 
but degraded fashion if parts of the cache memory are faulty and adequate 
means are provided to recover correct data through bypassing faulty cache 
components.  Traditional techniques for tolerating faults in memory systems
such as Single Error Correction and Double Error Detection (SECDED) codes may
not be appropriate for a cache since they increase the latency of the cache.
If the cache memory system does not have the ability to correct errors, two
questions arise: (i) how does one make sure that correct data can be
recovered at all times and (ii) how do faults in the cache affect performance.
In this paper, we discuss the performance of cache memories under fault
conditions.  Through the use of simulation, we study how the performance of
a cache is degraded under fault conditions.

%A Hung-Yang Chang
%T Dynamic Scheduling Algorithms for Distributed Soft Real-time Systems
%D November 1987
%R TR 728
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This dissertation is a study of dynamic scheduling algorithms for
distributed soft real-time systems with non-periodic workloads.  Soft real-time
systems permit deadline misses with the performance goal of minimizing the
\fIdeadline miss ratio\fR, the percentage of jobs missing their deadlines,
and meeting the \fIglobal priority\fR requirement.  In a distributed system
with non-periodic workloads, dynamic assignment of tasks to processors can 
significantly improve system performance.  However, job transfers and 
negotiations between processors exact a cost from the system, limiting
performance improvement.  Therefore, these algorithms must meet the real-time
goals while minimizing the scheduling overhead.

In order to minimize the global deadline miss ratio, an algorithm must employ
a negotiation protocol to coordinate schedulers.  We devise a \fItriage\fR 
protocol that attempts to transfer a job when the job cannot meet its deadline
locally but can meet it at a remote processor.  Based on this protocol, three
initiation approaches are compared.  The receiver-initiated approach is found
to be most robust, whereas the sender-initiated approach may cause thrashing
when the system load is high.

Global priority scheduling attempts to ensure that the N jobs being served at
any moment in the N-processor system are those having the highest priorities.  
If a job that is waiting has higher priority than the one that is running at
a remote processor, the waiting job should be transferred to receive immediate
service.  The tradeoffs between reducing overhead and relaxing the global 
priority requirement are explored by comparing four algorithms, each exhibiting
a different degree of compromise.

For jobs consisting of a group of cooperating tasks, a global priority 
scheduling algorithm should take job structure into account.  We devise
algorithms that give compatible priority service to tasks belonging to a
single job, so that cooperating tasks may progress at the same rate, avoiding
excessive blocking due to synchronization.  Under the assumption that there 
are two priority levels, our algorithms ensure that tasks are assigned to
processors with the lowest priority loads.  In total, we present six 
placement algorithms, evaluated under a workload consisting of jobs having a
pipeline structure.

%A David L. Cohrs
%A Barton P. Miller
%A Lisa A. Call
%T Distributed Upcalls: A Mechanism for Layering Asynchronous Abstractions
%D November 1987
%R TR 729
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X It is common to use servers to provide access to facilities in a
distributed system and to use remote procedure call semantics to access 
these servers.  Procedure calls provide a synchronous interface to call 
downward through successive layers of abstraction, and remote procedure calls
allow the layers to reside in different address spaces.  But servers often
need the ability to initiate asynchronous and independent actions.  Examples
of this asynchrony are when a network server needs to signal to an upper 
layer in a protocol, or when a window manager server needs to respond to user
input.

Upcalls are a facility that allow a lower level of abstraction to pass
information to a higher level of abstraction in a clean way. We describe a
facility for distributed upcalls, which allows upcalls to cross address space
boundaries.  Remote procedure calls (for handling asynchronous server requests)
and distributed upcalls (for handling asynchronous server activities), 
complement each other to form a powerful structuring tool.  These facilities,
together with the ability to dynamically load modules into a server, allow
the user to arbitrarily place abstractions in the server or in the client.

Distributed Upcalls have been built into a server structuring system called
CLAM, which is currently being used to support an extensible window manager
system.  The CLAM system, including distributed upcalls, remote procedure call
extensions to C++, dynamic loading, and basic window classes, is currently
running under 4.3BSD UNIX on Microvax workstations.

%A Michael J. Litzkow
%A Miron Livny
%A Matt W. Mutka
%T Condor - A Hunter of Idle Workstations
%D December 1987
%R TR 730
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This paper presents the design, implementation, and performance 
of the Condor scheduling system.  Condor operates a workstation environment.
The system aims to maximize the utilization of workstations with as little 
interference as possible between the jobs it schedules and the activities of
the people who own workstations.  It identifies idle workstations and 
schedules background jobs on them.  When the owner of a workstation resumes
activity at a station, Condor checkpoints the remote job running on the 
station and transfers it to another workstation.  The system guarantees that the
job will eventually complete, and that very little, if any, work will be 
performed more than once.  The system has been operational for more than 
three months.  In this paper we present a performance profile of the system
based on data that was accumulated from 23 stations during one month.  During
the one-month period, nearly 1000 jobs were scheduled by Condor.  The system was
used by heavy users and light users who consumed 
approximately 200 CPU days.
An analysis of the response times observed by the different users 
is a clear
display of the capacity.  Since a user of Condor has to devote some local 
capacity to support the remote execution of his/her jobs, the effectiveness of
the remote scheduling system
depends on the amount of this capacity.  We show
that this overhead is very small.  On the average, a user has to sacrifice
less than one minute of local CPU capacity to acquire a day of remote CPU
capacity.  Condor has proven to be an extremely effective means to improve
the productivity of our computing environment.

%A Rong-Jaye Chen
%T Parallel Algorithms For A Class of Convex Optimization Problems
%D December 1987
%R TR 731
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X This thesis is principally concerned with a piecewise-linear trust
region method for solving a class of structured convex optimization problems,
which includes the traffic assignment problems.  Piecewise-linear approximation
of nonlinear convex objective functions in linearly constrained optimization
produces subproblems that may be solved as linear programs.  This approach to
approximation may be used for nonseparable as well as separable functions, and
for the former class (the focus of this thesis), it lies between linear and
quadratic approximation with regard to its accuracy.  In order to have
additional control of the accuracy of the piecewise-linear approximation, we
consider two devices: rectangular trust regions and dynamic scaling.  The use
of rectangular trust regions in conjunction with the type of piecewise-linear
approximation considered here actually serves to simplify rather than complicate
the approximating problems.  This is a result of the equivalence of the trust
region and the use of a limited number of segments of comparable size in the
approximation.  The approach to dynamic scaling considered here may be applied
to problems in which each objective function term is a convex function of a 
linear function of the variables.  This scaling device allows the algorithm to
adjust the approximation between an underestimating function (corresponding
to a linear approximation) and an overestimating function (the non-separable
analog of the overestimate associated with separable approximation of convex
functions.)  The scaling factor is adjusted in accordance with the acceptance
criteria associated with the trust region method.

Another emphasis of this thesis is the development of parallel algorithms
suited to distributed computing and the comparison of the relative efficiencies
of these algorithms on different architectures.  Computational experience is
cited for some large-scale problems arising from traffic assignment 
applications.  The algorithms considered here also have the property that they
allow such problems to be decomposed into a set of smaller optimization 
problems at each major iteration.  These smaller problems correspond to linear
single-commodity networks, and may be solved in parallel.  Results are given
for the distributed solution of these problems on the CRYSTAL multicomputer.

%A R. J. Chen
%A R. R. Meyer
%T Parallel Optimization For Traffic Assignment
%D December 1987
%R TR 732
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Most large-scale optimization problems exhibit structures that allow
the possibility of attack via algorithms that exhibit a high level of 
parallelism.  The emphasis of this paper is the development of parallel
optimization algorithms for a class of convex, block-structured problems.
Computational experience is cited for some large-scale problems arising from 
traffic assignment applications.  The algorithms considered here have the 
property that they allow such problems to be decomposed into a set of smaller
optimization problems at each major iteration.  These smaller problems
correspond to linear single-commodity networks in the traffic assignment case,
and they may be solved in parallel.  Results are given for the distributed
solution of such problems on the CRYSTAL multicomputer.

%A M. Muralikrishna
%A David J. DeWitt
%T Equi-Depth Multi-Dimensional Histograms
%D December 1987
%R TR 733
%I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN
%C MADISON, WI
%X Multi-dimensional queries commonly occur in databases dealing with
geographical, image, and VLSI databases.  A typical two dimensional query in 
a geographical database might involve finding all cities within certain 
latitudinal and longitudinal bounds.  Several multi-dimensional index structures have been proposed in the literature.  KDB trees [Robinson81], R-trees
[Guttman84], and Grid files [Nievergelt84] are among the more popular ones.
However, there has been no work in designing multi-dimensional histograms to
aid in the optimization process using these multi-dimensional index structures.
In order for an optimizer to select an appropriate access path for a 
multi-dimensional query, fairly accurate selectivity estimates must be available
to it.  Selectivity estimates are also useful in determining appropriate join
methods that follow the selections.

In this paper we present an algorithm for generating equi-depth, multi-
dimensional histograms.  One might expect that the cost of building a 
d-dimensional histogram would be at least d times the cost of sorting the
relation on a single attribute.  We show, in our algorithm, that the sorting
cost of building a d-dimensional histogram is significantly less than the cost
of sorting the relation d times.  We present a main memory data structure for
storing the histograms and discuss two schemes for estimating the number of
tuples that will be retrieved by a given query.  Experimental results are
presented that show the efficacy of our histograms.  The usefulness of a
sampling technique in generating histograms at a very low cost is also explored.