[comp.parallel] Parallel Dissertations Available from Brown University

rph@cs.brown.edu (Richard Hughey) (06/04/91)

Two PhD dissertations on parallel computation are recently
available from the Department of Computer Science at Brown
University. They are:

"Programmable Systolic Arrays"
Richard Paul Hughey (rph@cs.brown.edu)
CS-91-34

and 

"Parallel VLSI Synthesis"
Markus G. Wloka (mgw@cs.brown.edu)
CS-91-35

To order (or to obtain a complete list of technical reports), send
electronic mail to maa@cs.brown.edu.   

---------------------------------------------------------
Abstract of "Programmable Systolic Arrays"
Richard Paul Hughey
CS-91-34

Systolic arrays can solve computationally intensive problems many
times faster than traditional computers or supercomputers.  Because
VLSI systems require many months to design, simulate, fabricate, and
test, research has turned to programmable systolic arrays.  This
thesis introduces a general, programmable systolic array architecture:
the Systolic Shared Register (SSR) architecture.  The Systolic Shared
Register architecture preserves the simple communication of
single-purpose systolic arrays while providing a fully programmable
systolic co-processor.

To test the SSR architecture, I designed, simulated, and had
fabricated the Brown Systolic Array.  B-SYS is an 8-bit SSR machine:
each 16-element register bank in the linear array is shared between
two functional units, providing simple and efficient systolic
communication.  The 85,000-transistor chip worked on first
fabrication and a 10-chip, 470-element prototype array performs
sequence comparison over 80 times faster than its Intel 80386 host.  A
custom-designed board could magnify this to over 600 times faster for
a 10-chip co-processor.  A 32-chip B-SYS system could process over
three billion 8-bit operations every second.

Although the SSR paradigm stands by itself, it is instructive to
simultaneously consider programming methods and applications.  This
thesis describes the principle of software fault detection for the
automatic generation of fault-tolerant programs, an efficient
alternative to rigid hardware methods. Additionally, this thesis
introduces the New Systolic Language; NSL is a programming language
for systolic co-processors based on data stream computation.  With the
aid of NSL, several systolic applications are examined in detail, in
particular sequence comparison problems from the Human Genome Project.
A comparison of B-SYS to existing parallel machines reveals its
unequivocal superiority in its targeted area.

--------------------------------------------------------
Abstract of "Parallel VLSI Synthesis"
Markus G. Wloka
CS-91-35

We investigate the parallel complexity of VLSI (very large scale integration)
CAD (computer aided design) synthesis problems.  Parallel algorithms are very
appropriate in VLSI CAD because computationally intensive optimization methods
are needed to derive ``good'' chip layouts.

We find that for many problems with polynomial-time serial complexity, it is
possible to find an efficient parallel algorithm that runs in polylogarithmic
time.  We illustrate this by parallelizing the ``left-edge'' channel routing
algorithm and the one-dimensional constraint-graph compaction algorithm.

Curiously enough, we find P-hard, or inherently-difficult-to-parallelize
algorithms when certain key heuristics are used to get approximate solutions
to NP-complete problems.  In particular, we show that many local search
heuristics for problems related to VLSI placement are P-hard or P-complete.
These include the Kernighan-Lin heuristic and the simulated annealing
heuristic for graph partitioning.  We also show that local search heuristics
for grid embeddings and hypercube embeddings based on vertex swaps are P-hard,
as are any local search heuristics that minimize the number of column
conflicts in a channel routing by accepting cost-improving swaps of tracks or
subtracks.  The P-hardness results put into perspective experimental results
reported in the literature: attempts to construct the exact parallel
equivalent of serial simulated-annealing-based heuristics for graph embedding
have yielded disappointing parallel speedups.

We introduce the massively parallel {\em Mob} heuristic and report on
experiments on the 32K-processor CM-2 Connection Machine.  We applied our
heuristic to the graph-partitioning, grid and hypercube-embedding problems.
The {\em Mob} heuristic achieves impressive reductions in edge costs and runs
in less than 30 minutes on random graphs of one million edges.  Due to
excessive run times, heuristics reported previously in the literature have
been able to construct graph partitions and grid and hypergraph embeddings
only for graphs that were 100 to 1000 times smaller than those used in our
experiments.  On small graphs, where simulated annealing and other heuristics
have been extensively tested, our heuristic was able to find solutions of
quality at least as good as these heuristics.





--------------------------------------- Richard Hughey              
INTERNET:  rph@cs.brown.edu	        Brown University            
BITNET:    rph@browncs		        Box 1910                    
(decvax, allegra, ...)!brunix!rph       Providence, RI 02912