rph@cs.brown.edu (Richard Hughey) (06/04/91)
Two PhD dissertations on parallel computation are recently available from the Department of Computer Science at Brown University. They are: "Programmable Systolic Arrays" Richard Paul Hughey (rph@cs.brown.edu) CS-91-34 and "Parallel VLSI Synthesis" Markus G. Wloka (mgw@cs.brown.edu) CS-91-35 To order (or to obtain a complete list of technical reports), send electronic mail to maa@cs.brown.edu. --------------------------------------------------------- Abstract of "Programmable Systolic Arrays" Richard Paul Hughey CS-91-34 Systolic arrays can solve computationally intensive problems many times faster than traditional computers or supercomputers. Because VLSI systems require many months to design, simulate, fabricate, and test, research has turned to programmable systolic arrays. This thesis introduces a general, programmable systolic array architecture: the Systolic Shared Register (SSR) architecture. The Systolic Shared Register architecture preserves the simple communication of single-purpose systolic arrays while providing a fully programmable systolic co-processor. To test the SSR architecture, I designed, simulated, and had fabricated the Brown Systolic Array. B-SYS is an 8-bit SSR machine: each 16-element register bank in the linear array is shared between two functional units, providing simple and efficient systolic communication. The 85,000-transistor chip worked on first fabrication and a 10-chip, 470-element prototype array performs sequence comparison over 80 times faster than its Intel 80386 host. A custom-designed board could magnify this to over 600 times faster for a 10-chip co-processor. A 32-chip B-SYS system could process over three billion 8-bit operations every second. Although the SSR paradigm stands by itself, it is instructive to simultaneously consider programming methods and applications. This thesis describes the principle of software fault detection for the automatic generation of fault-tolerant programs, an efficient alternative to rigid hardware methods. Additionally, this thesis introduces the New Systolic Language; NSL is a programming language for systolic co-processors based on data stream computation. With the aid of NSL, several systolic applications are examined in detail, in particular sequence comparison problems from the Human Genome Project. A comparison of B-SYS to existing parallel machines reveals its unequivocal superiority in its targeted area. -------------------------------------------------------- Abstract of "Parallel VLSI Synthesis" Markus G. Wloka CS-91-35 We investigate the parallel complexity of VLSI (very large scale integration) CAD (computer aided design) synthesis problems. Parallel algorithms are very appropriate in VLSI CAD because computationally intensive optimization methods are needed to derive ``good'' chip layouts. We find that for many problems with polynomial-time serial complexity, it is possible to find an efficient parallel algorithm that runs in polylogarithmic time. We illustrate this by parallelizing the ``left-edge'' channel routing algorithm and the one-dimensional constraint-graph compaction algorithm. Curiously enough, we find P-hard, or inherently-difficult-to-parallelize algorithms when certain key heuristics are used to get approximate solutions to NP-complete problems. In particular, we show that many local search heuristics for problems related to VLSI placement are P-hard or P-complete. These include the Kernighan-Lin heuristic and the simulated annealing heuristic for graph partitioning. We also show that local search heuristics for grid embeddings and hypercube embeddings based on vertex swaps are P-hard, as are any local search heuristics that minimize the number of column conflicts in a channel routing by accepting cost-improving swaps of tracks or subtracks. The P-hardness results put into perspective experimental results reported in the literature: attempts to construct the exact parallel equivalent of serial simulated-annealing-based heuristics for graph embedding have yielded disappointing parallel speedups. We introduce the massively parallel {\em Mob} heuristic and report on experiments on the 32K-processor CM-2 Connection Machine. We applied our heuristic to the graph-partitioning, grid and hypercube-embedding problems. The {\em Mob} heuristic achieves impressive reductions in edge costs and runs in less than 30 minutes on random graphs of one million edges. Due to excessive run times, heuristics reported previously in the literature have been able to construct graph partitions and grid and hypergraph embeddings only for graphs that were 100 to 1000 times smaller than those used in our experiments. On small graphs, where simulated annealing and other heuristics have been extensively tested, our heuristic was able to find solutions of quality at least as good as these heuristics. --------------------------------------- Richard Hughey INTERNET: rph@cs.brown.edu Brown University BITNET: rph@browncs Box 1910 (decvax, allegra, ...)!brunix!rph Providence, RI 02912