[comp.parallel] Top Ten Reading List

eugene@wilbur.nas.nasa.gov (Eugene N. Miya) (04/12/91)
[This is a retransmission. It appears that the version sent by email
 to those w/o usenet actually got there; usenet did not send.

 steve
]


%Z Date: Tue, 9 Apr 91 14:40:40 -0700
%A George S. Almasi
%A Allan Gottlieb
%T Highly Parallel Computing
%I Benjamin/Cummings division of Addison Wesley Inc.
%D 1989
%K ISBN # 0-8053-0177-1, book, text, Ultracomputer, grequired91,
%K enm, cb@uk, ag, jlh, dp, gl,
%$ $36.95
%X This is a kinda neat book.  There are special net
anecdotes which makes this interesting.
%X Oh, there are a few significant typos: LINPAK is really LINPACK. Etc.
%X (JLH & DP) The authors discuss the basic foundations, applications,
programming models, language and operating system issues and a wide
variety of architectural approaches.  The discussions of parallel
architectures include a section that describes the key concepts within
a particular approach.

%A C. L. Seitz
%T The Cosmic Cube
%J Communications of the ACM
%V 28
%N 1
%D January 1985
%P 22-33
%r Hm83
%d June 1984
%K enm, dmp, jlh, dp, j\-lb,
%K CR Categories and Subject Descriptors: C.1.2 [Processor Architectures]:
Multiple Data Stream Architectures (Multiprocessors);
C.5.4 [Computer System Implementation]: VLSI Systems;
D.1.2 [Programming Techniques]: Concurrent Programming;
D.4.1 [Operating Systems]: Process Management
General terms: Algorithms, Design, Experimentation
Additional Key Words and Phrases: highly concurrent computing,
message-passing architectures, message-based operating systems,
process programming, object-oriented programming, VLSI systems,
homogeneous machine, hypercube, C^3P,
grequired, Rcccp, Rhighnam,
%X Excellent survey of this project.
Reproduced in "Parallel Computing: Theory and Comparisons,"
by G. Jack Lipovski and Miroslaw Malek,
Wiley-Interscience, New York, 1987, pp. 295-311, appendix E.
%X * Brief survey of the cosmic cube, and its hardware
%X (JLH & DP) This is a good discussion of the Caltech approach, which
embodies the ideas several of these machines (often called hypercubes).
The work at Caltech is the basis for the machines at JPL and the Intel iPSC,
as well as closely related to the NCUBE design.  Another paper by Seitz
on this same topic appears in the Dec. 1984 issue of IEEE Trans.
on Computers.
%X Literature search yielded:
1450906 C85023854
The Cosmic Cube (Concurrent Computing)
Seitz, C.L.
Author Affil: Dept. Of Comput. Sci., California Inst. Of Technol.,
Pasadena, Ca, Usa
Source: Commun. Acm (Usa) Vol.28, No.1, Pp.: 22-33
Publication Year: Jan. 1985
Coden: Cacma2 Issn: 0001-0782
U. S. Copyright Clearance Center Code: 0001-0782/85/0100-002275c
Treatment: Practical;
Document Type: Journal Paper
Languages: English
(14 Refs)
Abstract: Sixty-four small computers are connected by a network of
point-to-point communication channels in the plan of a binary 6-cube. this
cosmic cube computer is a hardware simulation of a future vlsi
implementation that will consist of single-chip nodes. the machine offers
high degrees of concurrency in applications and suggests that future
machines with thousands of nodes are both feasible and attractive. it uses
message switching instead of shared variables for communicating between
concurrent processes.
descriptors: multiprocessing systems; message switching
identifiers: message passing architectures; process programming; vlsi
systems; point-to-point communication channels; binary 6-cube; cosmic cube;
hardware simulation; VLSI implementation; single-chip nodes; concurrency
class codes: C5440; C5620

%A M. Ben-Ari
%T Principles of Concurrent Programming
%C Englewood Cliffs, NJ
%I Prentice-Hall
%D 1986
%$ 29
%O ISBN 0-13-711821-X.
%K book, text,
%K grequired91,
%K sc, +3 votes posted from c.e. discussion.
%X The text covers all the significant paradigms and basic concurrency
concepts with examples in a pseudo language similar to C.
Syllabus for our course includes:
1. concurrent I/O and interrupt processing
2. concurrent programming abstractions
3. intro. to LSC and transputers
4. mutual exclusion; problems and principles
5. semaphores and monitors
6. synchronization
7. Linda-style message posting
8. performance monitoring and load balancing
%X This is the second edition of his
earlier book and cover more material, e.g. distributed systems,
time analysis for real-time systems, ...
It may however be too introductory for the someone after a good O/S
course (of course in O/S I tend to teach concurrent programming so ...)

%A George H. Barnes
%A Richard M. Brown
%A Maso Kato
%A David J. Kuck
%A Daniel L. Slotnick
%A Richard A. Stokes
%T The ILLIAC IV Computer
%J IEEE Transactions on Computers
%V C-17
%N 8
%D August 1968
%P 746-757
%K grequired91,
array, computer structures, look-ahead, machine language, parallel processing,
speed, thin-film memory, multiprocessors,
Rmaeder biblio: parallel hardware and devices,
%K architecture, ILLIAC-IV, SIMD, Rhighnam,
%K rwa,
%K ag, jlh, dp, j\-lb,
%X This was the original paper on the ILLIAC IV when it was proposed as
a 256 processing element machine, a follow on to the SOLOMON.  It was a
very ambitious design.
%X Contains ILLIAC IV assembler (among other things).
%X (JLH & DP) This is the original paper on the ILLIAC IV hardware;
some aspects of the machine (especially the memory system) changed
subsequently.  A later pape, cited as Bourknight, et. al. (1972)
provides a more accurate description of the real hardware.
%X (J-LB) paper of historical significance.

%A Edward Gehringer
%A Daniel P. Siewiorek
%A Zary Segall
%Z CMU
%T Parallel Processing: The Cm* Experience
%I Digital Press
%C Boston, MA
%D 1987
%K book, text, multiprocessor,
%K enm, ag, jlh, dp,
%K grequired91,
%$ 42
%X Looks okay!
%X [Extract from inside front cover]
... a comprehensive report of the important parallel-processing
research carried out on Cm* at Carnegie-Mellon University. Cm* is a
multiprocessing system consisting of 50 tightly coupled processors and
has been in operation since the mid-1970s. Two operating
systems-StarOs and Medusa-are part of its development, along with a
vast number of applications.
%X (JLH & DP) This book  reviews the Cm* experience.  The book
discusses hardware issues, operating system strategies,
programming systems, and includes an extensive discussion of the
experience with over 20 applications on Cm*.

%A J. R. Gurd
%A C. C. Kirkham
%A I. Watson
%T The Manchester Prototype Dataflow Computer
%J Communications of the CACM
%V 28
%N 1
%D January 1985
%P 34-52
%K CR Categories and Subject Descriptors:
C.1.3 [Processor Architectures]: Other Architecture Styles;
C.4 [ Performance of Systems]; D.3.2 [Programming Languages]: Language
Classifications
General Terms: Design, Languages, Performance
Additional Key Words and Phrases: tagged-token dataflow,
single assignment programming, SISAL, parallel computation,
grequired91,
%K enm, dmp, jlh, dp,
%X A special issue on Computer Architecture.  Mentions SISAL, but not LLNL.
Using tagged-token dataflow, the Manchester processor is running
reasonably large user programs at maximum rates of between 1 and 2 MIPS.
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 111-129.
%X (JH & DP) This paper discusses the machine, its software, and
evaluates performance.

%A Geoffrey C. Fox
%A Mark A. Johnson
%A Gregory Lyzenga
%A Steve W. Otto
%A John Salmon
%A David Walker
%Z Caltech
%T Solving Problems on Concurrent Processors
%V 1, General Techniques and Regular Problems
%I Prentice-Hall
%C Englewood Cliffs, NJ
%D 1988
%K book, text, hypercubes, CCCP, MIMD, parallel programming,
communication, applications, physics,
%K bb, jlh, dp,
%K grequired91,
%K suggested supplemental ref by jh and dp
%O ISBN 13-823022-6 (HB), 13-823469-8 (PB)
%X Interesting book.  Given out for free at Supercomputing'89.
%X My Bible of Distributed Parallel Computing; even if you are not using
Express it is a wonderful book to have !

%A Michael Wolfe
%T Optimizing Supercompilers for Supercomputers
%S Pitman Research Monographs in Parallel and Distributed Computing
%I MIT
%C Cambridge, MA
%D 1989
%K book, text,
%K grequired91,
%K cb@uk, dmp, lls,
%X Good technical intro to dependence analysis, based on Wolfe's PhD Thesis.

%A Robert H. Kuhn
%A David A. Padua, eds.
%T Tutorial on Parallel Processing
%I IEEE
%D August 1981
%K bmiya, book, text, survey,
%K grequired91,
%K enm, ag, fpst,
%X This is a collection of noted papers on the subject, collected for
the tutorial given at the 10th conference (1981) on Parallel Processing.
It eases the search problem for many of the obscure papers.
Some of these papers might not be considered academic, others are
applications oriented.  Data flow is given short coverage.  Still, a
quick source for someone getting into the field.  Where ever possible,
papers in this bibliography are noted as being in this text.
%X Check on literature search:
Tutorial on parallel processing; initially presented at the Tenth
International Conference on Parallel Processing, August 25-28, 1981,
Bellaire, Michigan / [edited by] Robert H. Kuhn, David A. Padua
Kuhn, Robert H; Padua, David A
CONFERENCE: International Conference on Parallel Processing (10th : 1981
: Bellaire, Mich.)
[Los Angeles? CA]: IEEE Computer Society Press : Order from IEEE
Computer Society, v, 498 p. : ill. ; 28 cm.
PUBLICATION DATE(S): 1981
PLACE OF PUBLICATION: California
LC CALL NO.: QA76.6 .I548 1981  DEWEY CALL NO.: 001.64
RECORD STATUS: New record
BIBLIOGRAPHIC LEVEL: Monograph
LANGUAGE: English
ILLUSTRATIONS: Illustrations
NOTES:
Bibliography: p. 497-498.
DESCRIPTORS:
Parallel processing (Electronic computers) -- Congresses
? 72  (COMPUTERS & DATA PROCESSING)

===

%A Gul A. Agha
%Z U. of Mich.
%T Actors: A Model of Concurrent Computation in Distributed Systems
%I MIT Press
%C Cambridge, MA
%D 1986
%K book, text, communication, evaluation, abstraction,
distributed computing, agents, grecommended91,
%K hcc, fpst,
%X See also his PhD thesis of the same title.
%X Now considered a classical text.

%A Gene M. Amdahl
%T Validity of the single processor approach to achieving large scale computing
capabilities
%J AFIPS Proc. of the SJCC
%V 31
%D 1967
%P 483-485
%K grecommended91,
%K bmiya,
%K ak,
%X should be reread every week
%X Well known (infamous ?) Amdahl's law that suggests that if x %
of an algorithm is not parallelizable then the maximum speedup is 1/x.
Limits of vectorization.
Arthur Goldberg @cs.ucla.edu

%A Gregory R. Andrews
%A Fred B. Schneider
%T Concepts and Notations for Concurrent Programming
%J Computing Surveys
%V 15
%N 1
%P 3-43
%O 133 REFS. Treatment BIBLIOGRAPHIC SURVEY, PRACTICAL
%D March 1983
%i University of Arizona, Tucson
%r CS Dept. TR 82-12
%d Sept. 1982.
%K grecommended91,
parallel processing programming
OS parallel processing concurrent programming language notations
processes communication synchronization primitives
%K bmiya,
%K fpst,
%X This is not a book, but probably the best place to start in understanding
concurrency. Well written, though dated a bit.
%X Literature search yields:
01391537   E.I. Monthly No: EI8309072043   E.I. Yearly No: EI83016908
Title: Concepts And Notations For Concurrent Programming.
Author: Andrews, Gregory R.; Schneider, Fred B.
Corporate Source: Univ of Arizona, Dep of Computer Science, Tucson, Ariz, USA
Source: Computing Surveys v 15 n 1 Mar 1983 p 3-43
Publication Year: 1983
CODEN: CMSVAN   ISSN: 0010-4892
Language: ENGLISH
Journal Announcement: 8309
Abstract: This paper identifies the major concepts of concurrent
programming and describes some of the more important language notations for
writing concurrent programs. The roles of processes, communication, and
synchronization are discussed. Language notations for expressing concurrent
execution and for specifying process interaction are surveyed.
Synchronization primitives based on shared variables and on message passing
are described. Finally, three general classes of concurrent programming
languages are identified and compared. 133 refs.
Descriptors: *COMPUTER PROGRAMMING
Classification Codes: 723  (Computer Software)
72  (COMPUTERS & DATA PROCESSING)

%A James Archibald
%A Jean-Loup Baer
%T Cache Coherence Protocols:
Evaluation Using a Multiprocessor Simulation Model
%J ACM Transactions on Computer Systems
%V 4
%N 4
%D November 1986
%P 273-298
%K j\-lb,
%K grecommended91,

%A Arvind
%A David E. Culler
%T Dataflow Architectures
%Z MIT
%J Annual Reviews in Computer Science
%V 1
%P 225-53
%D 1986
%r TM-294
%d February 1986
%K grecommended91,
%K jlh, dp, j\-lb,
%X Not detailed, but reasonably current survey paper on data flow.
Includes status information of American, English, France, and Japanese
dataflow projects like the SIGMA-1 (Japan), Manchester (English), and so
forth.
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 79-101.
%X (JLH & DP) This paper discusses the basic ideas behind dataflow machines.
The basic concepts of dataflow were espoused by Dennis: Dennis, J. [1980].
"Dataflow Cupercomputers," Computers vol. 13, no. 11 (November), pages 48-56.

%A Tom Axford
%T Concurrent Programming: Fundamental Techniques for
Real-Time and Parallel Software Design
%I John Wiley
%S Series in Parallel Computing
%D 1989
%K book, text,
%K grecommended91,
%O ISBN 0 471 92303 6
%K js, (2 c.e. votes),
%X more about software techniques for concurrency than about parallel
programming, but still useful.
%X ...quite happy with it. ... primary language used was Modula-2 ...
... concepts, architectures, and so forth. ... used transputers as
an inexpensive platform for parallel computing. We used C for
the transputer programming.

%A J. Backus
%T Can Programming be Liberated from the von Neumann Style?
A Functional Style and its Algebra of Programs
%J Communications of the ACM
%V 16
%N 8
%D August 1978
%P 613-641
%K grecommended91, Turing award lecture,
Key words and phrases: functional programming, algebra of programs,
combining forms, programming languages, von Neumann computers,
von Neumann languages, models of computing systems,
applicative computing systems, program transformation, program correctness,
program termination, metacomposition,
CR categories: 4.20, 4.29, 5.20, 5.24, 5.26,
%K Rhighnam, theory
%K ak,
%X Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 215-243.

%A J. L. Baer
%T A Survey of Some Theoretical Aspects of Multiprocessing
%J Computing Surveys
%V 5
%N 1
%D March 1973
%P 31-80
%K multiprocessing, mutual exclusion, semaphores, automatic detection of
parallelism, graph models, Petri nets, flow graph schemata, scheduling,
array processors, pipe-line computers
CR categories: 6.0, 8.1, 4.32, 5.24
maeder biblio: general, concepts, parallel programming, parallel architecture,
%K btartar
%K grecommended91,
%K ak,

%A Howard Barringer
%T A Survey of Verification Techniques for Parallel Programs
%I Springer-Verlag
%S Lecture Notes in Computer Science
%V 191
%C Berlin
%D 1985
%$ 11.25
%K Book, text,
%K grecommended91,
%K fpst,
%X For the theoretical at heart. Gives insights into what is so hard about
distributed and parallel processing. Compares many different approaches.

%A K. E. Batcher
%T STARAN Parallel Processor System Hardware
%J Proceedings AFIPS National Computer Conference
%D 1974
%P 405-410
%K grecommended91
%K btartar
%K Rhighnam, architecture, associative,
%K ag,
%X This paper is reproduced in Kuhn and Padua's (1981, IEEE)
survey "Tutorial on Parallel Processing."
%X Literature search provides:
00446338   E.I. Monthly No: EI7504022393   E.I. Yearly No: EI75013769
Title: Staran Parallel Processor System Hardware.
Author: Batcher, Kenneth E.
Corporate Source: Goodyear Aerosp Corp, Akron, Ohio
Source:  AFIPS Conference Proceedings v 43, 1974, for Meet, Chicago, Ill,
May 6-10 1974, p 405-410
Publication Year: 1974
CODEN: AFPGBT   ISSN: 0095-6880
Language: ENGLISH
Journal Announcement: 7504
Abstract: The parallel processing capability of STARAN resides in n array
modules (n LESS THAN EQUIVALENT TO 32). Each array module contains 256
small processing elements (PE's). They communicate with a multi-dimensional
access (MDA) memory through a "flip" network, which can permute a set of
operands to allow inter-PE communication. This paper deals with the MDA
memories, the STARAN array modules, the other elements of STARAN, and the
results of certain application studies. 4 refs.
Descriptors: *Computer Systems, Digital--*Parallel Processing
Identifiers: Staran Processors
Classification Codes: 722  (Computer Hardware)
72  (Computers & Data Processing)

%A John Beetem
%A Monty Denneau
%A Don Weingarten
%Z IBM TJW, Yorktown Heights
%T The GF11 Supercomputer
%J Proceedings of the 12th International Symposium on Computer Architecture
%I IEEE
%D June 1985
%C Boston, MA
%P 108-115
%K Quantum chromodynamics,
%K Special Purpose Parallel Processors, IBM
%r RC 10852
%K grecommended91,
%K jlh, dp,
%K suggested supplemental ref jh an dp
%X 576 processors, modified SIMD, 2 MB memory per processor at 20 MFLOPS.
Memphis Switch, 50 ns. 1,125 Macho Bytes, 11,520 Macho FLOPS.

%A John Beetem
%A Monty Denneau
%A Don Weingarten
%Z IBM TJW, Yorktown Heights
%T The GF11 Supercomputer
%B Experimental Parallel Computing Architectures
%E J. J. Dongarra
%S Special Topics in Supercomputing
%V 1
%I Elsevier Science Publishers B.V. (North-Holland)
%C Amsterdam
%D 1987
%P 255-298
%K Quantum chromodynamics,
%K grecommended91,
%K jlh, dp,
%K suggested supplemental ref jh an dp
%X 576 processors, modified SIMD, 2 MB memory per processor at 20 MFLOPS.
Memphis Switch, 50 ns. 1,125 Macho Bytes, 11,520 Macho FLOPS.
See also the Proceedings of the 12th International Symposium on
Computer Architecture, IEEE, June 1985, Boston, MA, pages 108-115
or see the IBM TR RC 10852 from TJW.

%A Dimitri P. Bertsekas
%A John N. Tsitsiklis
%T Parallel and Distributed Computation: Numerical Methods
%I Prentice Hall
%C Englewood Cliffs NJ
%D 1989
%K book, text,
%K grecommended91,
%K ab,
%O ISBN 0-13-648700-9
%X Received one verbal view that this book isn't great.
It was that person's opinion that the authors didn't implement
important details of algorithms (like boundary conditions).
The view is unconfirmed, and requires further investigation.

%A W. J. Bouknight
%A Stewart A. Denenberg
%A David E. McIntyre
%A J. M. Randall
%A Amed H. Sameh
%A Daniel L. Slotnick
%T The ILLIAC IV System
%J Proceedings of the IEEE
%V 60
%N 4
%D April 1972
%P 369-388
%K grecommended91, multiprocessors, parallel processing, SIMD,
%K btartar
%K Rhighnam, architecture, language,
%K ag,
%X This is the "what we did" paper in contrast to the design paper
Barnes et al in 1968.
A subsetted version of this paper appears in
"Computer Structures: Principles and Examples" by
Daniel P. Siewiorek, C. Gordon Bell, and Allen Newell,
McGraw-Hill, 1982, pp. 306-316.

%A Nicholas Carriero
%A David Gelernter
%T How to Write Parallel Programs: A Guide to the Perplexed
%J ACM Computing Surveys
%V 21
%N 3
%D September 1989
%P 323-357
%d April 1988
%r YALEU/DCS/RR-628
%K special issue on programming language paradigms,
%K Categories and Subject Descriptors:
D.1.3 [Programming Techniques]: Concurrent Programming;
D.3.2 [Programming Languages]: Language classifications -
parallel languages; D.3.3 [Programming Languages]:
Language constructs - concurrent programming structures;
E.1.m [Data Structures]: Miscellaneous -
distributed data structures; live data structures;
General Terms: Algorithms, Program Design, Languages,
Additional Key Words and Phrases: Linda,
parallel programming methodology, parallelism,
%K grecommended91,
%K hcc, ag,
%X From page: 326:
It is nonetheless a subtle but essential point that these approaches
represent three clearly separate ways of thinking about the problem:
Result parallelism focuses on the shape of the finished product;
specialist parallelism focuses on the makeup of the work crew; and
agenda parallelism focuses on the list of tasks to be performed.
Also the terms: message-passing, distributed data structures or
live data structures.  Notes that it does not deal with data parallelism
(ala CM) nor speculative parallelism (OR-parallelism).  Tries to be
practical, but it does admit distributed programs are harder and more complex.
%X The authors distinguish between three classes of parallelism,
result, agenda, and specialist,
and between three roughly corresponding methods for implementation,
live data structures, distributed (shared) data structures, and
message passing systems.  The Linda model is then introduced and related to
each class and method #209# it serves as a kind of universal model for
describing the essential parallelism, as opposed to sequential processes.
An example is treated in some detail.

%A Nicholas Carriero
%A David Gelernter
%T How to Write Parallel Programs: A First Course
%I MIT Press
%C Cambridge, MA
%D 1990
%K book, text,
%K grecommended91,
%K hcc,

%A K. Mani Chandy
%A Jayadev Misra
%Z University of Texas--Austin
%T Parallel Program Design: A Foundation
%I Addison-Wesley
%D 1988
%K book, text, UNITY,
%K grecommended91,
%K hcc, fpst,
%O ISBN 0-201-05866-9
%X requires a more in depth review.
%X Theoretically useful.

%A Robert P. Colwell
%A Robert P. Nix
%A John J. O'Donnell
%A David B. Papworth
%A Paul K. Rodman
%Z Multiflow
%T A VLIW Architecture for a Trace Scheduling Compiler
%J Proceedings Second International Conference on Architectural
Support for Programming Languages and Operating Systems
(ASPLOS II)
%J Computer Architecture News
%V 15
%N 5
%J Operating Systems Review
%V 21
%N 4
%J SIGPLAN Notices
%V 22
%N 10
%I ACM
%C Palo Alto, CA
%D October 1987
%P 180-192
%K Trace scheduling, VLIW, very long instruction word,
%K grecommended91,
%K j\-lb,

%A William J. Dally
%A Charles L. Seitz
%Z Caltech
%T Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
%J IEEE Transactions on Computers
%V C-36
%N 5
%D May 1987
%P 547-553
%r TR 5231:TR:86
%d June 1986
%K Caltech Cosmic Cube, hypercube, C^3P, wormhole,
%K grecommended91, Rcccp,
%K dmp, j-lb,

%A F. Darema-Rogers
%A G. F. Pfister
%A K. So
%T Memory Access Patterns of Parallel Scientific Programs
%J Proc. 1987 ACM/SIGMETRICS Conf. in Meas. and Mod. of Comp. Syst.
%V 15
%N 1
%D May 1987
%C Banff, Alberta, Canada
%P 46-57
%K parallel systems, RP3,
%K grecommended91,
%K jlh, dp,
%X (JLH & DP) Reports results of measurements of some applications for
the RP3.

%A Jarek Deminent
%T Experience with Multiprocessor Algorithms
%J IEEE Transactions on Computers
%V C-31
%N 4
%P 278-288
%D April 1982
%K bmiya, Cm*, parallel algorithms, applications,
%K grecommended91,
%K jlh, dp,
%X This paper reports experience using Cm* for several applications
There are other references available on experience with Cm*; these
are published as CMU technical reports or theses.  Also, see the book
by Gehringer, et al.

%A K. P. Eswaran
%A J. N. Gray
%A R. A. Lorie
%A I. L. Traiger
%T The notions of consistency and predicate locks in a database system
%J Communications of the ACM
%V 19
%N 11
%D Nov. 1976
%P 624-633
%K Rdpsdis.bib Rsingh
%K grecommended91,
%K dmp,

%A Jerome A. Feldman
%A Patrick J. Hayes
%A David E. Rumelhart, eds.
%T Parallel Distributed Processing, Explorations in the Microstructure
of Cognition
%V 1, Foundations
%I MIT Press
%D 1986
%K AT15 AI04 AI03 AI08
%K The PDP Perspective, book, text,
%K grecommended91,
%K jb,
%X Two volume set for $35.95.
%X AI meets parallelism and neural nets.
(A bible of sorts)

%A Jerome A. Feldman
%A Patrick J. Hayes
%A David E. Rumelhart, eds.
%T Parallel Distributed Processing, Explorations in the Microstructure
of Cognition
%I MIT Press
%V 2, Psychological and Biological Models
%D 1986
%K AT15 AI04 AI03 AI08
%K The PDP Perspective
%K book, text,
%K grecommended91,
%K jb,
%X Two volume set for $35.95.
%X AI meets parallelism and neural nets.
(A bible of sorts)

%A M. J. Flynn
%T Very High-Speed Computing Systems
%J Proceedings of the IEEE
%V 54
%D December 1966
%P 1901-1909
%K maeder biblio: parallel architectures,
%K grecommended91,
%K j\-lb,
%X The original paper which classified computer system into instruction
and data stream dimensions (SISD, SIMD, etc.).
%K btartar

%A Geoffrey C. Fox
%T Concurrent Processing for Scientific Calculations
%J Digest of Papers COMPCON, Spring 84
%I IEEE
%D Feb. 1984
%P 70-73
%r Hm62
%K super scientific computers, hypercube, Rcccp,
%K grecommended91,
%K jlh, dp,
%K suggested supplemental ref by jh and dp
%X An introduction the the current 64 PE Caltech hypercube.  Based
on the dissertation by Lang (Caltech 1982) on the `Homogeneous machine.'

%A Samuel H. Fuller
%A John K. Ousterhout
%A Levy Raskin
%A Paul I. Rubinfeld
%A Pradeep J. Sindhu
%A Richard J. Swan
%T Multi-Microprocessors: An Overview and Working Example
%J Proceedings of the IEEE
%V 66
%N 2
%D February 1978
%P 216-228
%K multiprocessor architectures and operating systems
%K bsatya
%K grecommended91,
%K jlh, dp,
%X 
Reprinted in "Tutorial: Distributed Processing," IEEE,
compiled by Burt H. Liebowitz and John H. Carson, 1981, 3rd edition.
%X (JLH & DP) Cm* was the first multiprocessor based on microprocessor
technology.  Despite the limited success of the machine, it's impact and ideas
are present in many machines being built today including the
Encore Ultramax (Multimax) and Stanford's DASH.

%A D. D. Gajski
%A D. A. Padua
%A D. J. Kuck
%A R. H. Kuhn
%T A Second Opinion on Data Flow Machines and Languages
%J Computer
%V 15
%N 2
%D Feb. 1982
%P 15-25
%K grecommended91, multiprocessing,
%K bmiya,
%K meb, jlh, dp,
%X (SKS) or why I'm afraid people won't use FORTRAN.
This paper should only be read (by beginners) in conjunction with a
pro-dataflow paper for balance: maybe McGraw's "Physics Today" May 1984.
Also reprinted in the text compiled by Kai Hwang:
"Supercomputers: Design and Application," IEEE, 1984.
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 165-176.
%X * Due to their simplicity and strong appeal to intuition, data flow
techniques attract a great deal of attention.  Other alternatives,
however, offer more hope for the future.
%X (JLH & DP) The most complete critique of the dataflow approach.

%A Daniel Gajski
%A Jih-Kwon Peir
%T Essential Issues in Multiprocessor Systems
%J Computer
%I IEEE
%V 18
%N 6
%D June 1985
%P 9-27
%K parallel processor vector shared memory message passing tightly
loosely coupled dataflow partitioning cedar csp occam hep
synchronization grecommended91
%K ak,
%X The performance of a multiprocessor system depends on how it
handles the key problems of control, partitioning, scheduling,
synchronization, and memory access.
On a second look, this paper has some nice ideas.  (rec added to %K).
%X Examines actual and proposed machines from the viewpoint of the
authors' key multiprocessing problems: control, partitioning,
scheduling, synchronization, and memory access.
%X Detailed classification scheme based upon control model of computation,
partitioning, scheduling, synchronization, memory access.
Classification is illustrated with many examples, including a summary table
for to Cray-1, Arvind's Data flow, HEP, NYU Ultracomputer, and Cedar.
Reproduced in "Computer Architecture," D. D. Gajski,
V. M. Milutinovic, H. J. Siegel, and B. P. Furht, eds., IEEE, 1987,
pp. 115-133.

%A Narain Gehani
%A Andrew D McGettrick, eds.
%T Concurrent Programming
%I Addison-Wesley
%D 1988
%O ISBN 0-201-17435-9
%K book, text, language,
%K grecommended91,
%K ps,

%A W. Morven Gentleman
%T Message Passing between Sequential Processes the Reply Primitive
and the Administrator Concept
%J Software Practice and Experience
%V 11
%N 5
%D 1981
%P 435-466
%K grecommended91,
%K cc,
%X The quinticential paper on how to program with message passing is
the following.  It is very readable, and provides an excellent
introduction on how to use a message passing environment effectively.

%A M. J. Gonzalez, Jr.
%T Deterministic processor scheduling
%J Computing Surveys
%V 9
%N 3
%D September 1977
%P 173-204
%K scheduling
%K bsatya
%K grecommended91,
%K dmp, ak,
%X
References are classified into various types (single, dual, multiple,
and flow shop) at end of paper.

%A Allen Gottlieb
%A Ralph Grishman
%A Clyde P. Krukal
%A Kevin P. McAuliffe
%A Larry Rudolph
%A Marc Snir
%T The NYU Ultracomputer -- Designing an MIMD Shared Memory Parallel Computer
%J IEEE Transactions on Computers
%V C-32
%N 2
%D February 1983
%P 175-190
%K multiprocessors, parallel processing, Ultracomputer,
computer architecture, fetch-and-add, MIMD, Omega-network,
parallel computer, shared memory, systolic queues, VLSI,
%K grecommended91,
%K jlh, dp, j\-lb,
%X describes the design of the NYU Ultracomputer, its synchronization,
the omega network. Analyzes the projected performance of the Ultracomputer.
Reproduced in "Computer Architecture," D. D. Gajski,
V. M. Milutinovic, H. J. Siegel, and B. P. Furht, eds., IEEE, 1987,
pp. 471-485.
%X This paper represents an architectural approach that uses shared memory
and caches without cache-coherence.  Several machines have explored
this approach including IBM's RP3, the U. of Illinois CEDAR, and a number
of early multiprocessors (e.g., C.mmp).

%A R. L. Graham
%T Bounds on multiprocessing anomalies and related packing algorithms
%J Proc. AFIPS SJCC
%V 40
%D 1972
%K theoretical results
%K bsatya
%K grecommended91,
%K ak,
%X

%A John L. Gustafson
%A Gary R. Montry
%A Robert E. Benner
%Z Sandia National Labs.
%T Development of Parallel Methods for a 1024-Processor Hypercube
%J SIAM Journal on Scientific and Statistical Computing
%V 9
%N 4
%D July 1988
%K fluid dynamics, hypercubes, MIMD machines, multiprocessor performance,
parallel computing, structural analysis, supercomputing, wave mechanics,
grecommended91,
%K jlh, dp,
%X Introduces concept of operation efficiency, scaled speed-up.
Also covers communication cost, beam strain analysis, and a bit on
benchmarking.  Winner of 1988 Bell and Karp Prizes.
%X (JLH & DP) This paper report interesting results in using a
large scale NCUBE.  The authors won the Gordon Bell prize with their work.
They also suggest the idea of problem scaling to overcome the limitations of
sequential portions of an application.

%A Robert H. Halstead, Jr.
%T Parallel Symbolic Computing
%J Computer
%V 19
%N 8
%P 35-43
%D August 1986
%K Special issue: domesticating parallelism, grecommended91, LISP,
%K hcc,

%A Michael Hennessey
%T Algebraic Theory of Processes
%I MIT Press
%D 1988
%K grecommended91,
%K fpst,
%X An attempt to look at problem of concurrency from process/algebra view.

%A W. Daniel Hillis
%T The Connection Machine
%I MIT Press
%C Cambridge, MA
%D 1985
%K book,
grecommended91, PhD thesis,
%K j\-lb,
%X Has a chapter on why computer science is no good.
%X Patent 4,709,327, Connection Machine, 24 Nov 87 (individuals)
"Parallel Processor / Memory Circuit", W. Daniel Hillis et al.
This looks like the meat of the connection machine design.
It probably has lots of stuff that up til the patent was considered
proprietary.

%A C. A. R. Hoare
%T Communicating Sequential Processes
%J Communications of the ACM
%V 21
%N 8
%P 666-677
%D August 1978
%K bhibbard
%K grecommended91,
%K hcc, ak,
%K programming, programming languages, programming primitives,
program structures, parallel programming, concurrency, input, output,
guarded commands, nondeterminacy, coroutines, procedures, multiple entries,
multiple exits, classes, data representations, recursion,
conditional critical regions, monitors, iterative arrays, CSP,
CR categories: 4.20, 4.22, 4.32
maeder biblio: synchronisation and concurrency in processes,
parallel programming,
%X This paper is now expanded into an excellent book detailed by Hoare
and published by Prentice-Hall.
This paper is reproduced in Kuhn and Padua's (1981, IEEE)
survey "Tutorial on Parallel Processing."
Reproduced in "Distributed Computing: Concepts and Implementations" edited by
McEntire, O'Reilly and Larson, IEEE, 1984.
%X Somewhat dated.

%A C. A. R. Hoare
%T Communicating Sequential Processes
%I Prentice-Hall
%C Englewood Cliffs, NJ
%D 1985
%O ISBN 0-13-153271-5 & 0-13-153289-8
%K CSP,
%K grecommended91,
%K hcc, fpst, jb,
%X A better book than the original CSP papers.  Hoare comes down to
earth and tries to give concrete examples of CSP notation.  Still
has some problems.
%X Somewhat esoteric.
%X Must reading for those interested in distributed processing. High
level discussions of various operators one might think about in the
message passing realm. Discusses failures.
%X This defines CSP, upon which occam is based.  REAL parallelism here!
Very theoretical.  Must read for serious parallism students!

%A R. W. Hockney
%A C. R. Jesshope
%T Parallel Computers: 2 Architecture, Programming, and Algorithms,
2nd ed.
%C Pennsylvania
%I IOP Publishing Ltd.
%D 1988
%O ISBN 0-85274-811-6
%K book, text,
%K grecommended91,
%K meb,
%X World's most expensive paperback; matched only by Hockney's book on
particle simulations; both worth the price.
%X This book tends to the more mainframe oriented machines and bigger
supercomputers.  Killer micros get a little coverage (CM).  Shows
its UK bias (DAP).

%A R. Michael Hord
%T The ILLIAC IV: The First Supercomputer
%I Computer Science Press
%D 1982
%K grecommended91, book, text,
%K bmiya, Rhighnam
%K jlh, dp,
%K suggested supplemental reference by jh and dp
%K analysis, algorithm, architecture, ILLIAC-IV, SIMD, software, survey
%O 510.7809
%X A collection of papers dealing with the ILLIAC IV.  These papers include
reminisces and applications on the ILLIAC.
It is slightly apologetic in tone.
%X Describes in detail the background of the Illiac IV,
programming and software tools, and applications. The chapters are a
little disjointed, and the instruction set is not well explained or motivated.

%A Paul Hudak
%T Para-Functional Programming
%J Computer
%V 19
%N 8
%P 60-69
%D August 1986
%K Special issue: domesticating parallelism
%K grecommended91,
%K hcc,

%A Paul Hudak
%Z Yale
%T Conception, Evolution, and Application of Functional
Programming Languages
%J ACM Computing Surveys
%V 21
%N 3
%D September 1989
%P 359-411
%K special issue on programming language paradigms,
%K Categories and Subject Descriptors:
D.1.1 [Programming Techniques]:
Applicative (Functional) Programming;
D.3.2 [Programming Languages]: Language classifications -
applicative languages; data-flow languages;
non-procedural languages; very-high-level languages;
F.4.1 [Mathematical Logic and Formal Languages]:
Mathematical Logic - lambda calculus and related systems;
K.2 [History of Computing]: software
General Terms: Languages,
Additional Key Words and Phrases: Data abstraction,
higher-order functions, lazy evaluation, referential transparency,
types, Lambda Calculus, Lisp, Iswim, APL, FP, FL, ML, SASL, KRC, Miranda,
Haskell, Hope, denotative [declarative] language,
%K grecommended91,
%K ag,
%X This is the second paper in the special issue which has a section on
non-determinism [along with Bal, et al] which begins with a statement
which would sound bizarre to non-programmers or those not familiar
with the issues of determinacy.

%A Kai Hwang
%A Faye A. Briggs
%T Computer Architecture and Parallel Processing
%I McGraw-Hill
%C New York, NY
%D 1984
%O ISBN 0-07-031556-6
%K grecommended91, book, text,
%K Rhighnam, analysis, architecture, survey
%K meb, jlh, dp,
%X This text is quite weighty.  It covers much about the interconnection
problem.  It's a bit weak on software and algorithms.  Dated.
%X Extensive survey with large bibliography.  Lots of details.
%X (JLH & DP) Covers a wide variety of subjects including sections
on interconnection networks, special-purpose machines, dataflow, and
programming and applications issues for parallel processing.
%X A good book on the theory of high-level design. Hwang is at USC and is
interested in supercomputing, and the book reflects that, though cost
issues occassionally come in. They do take a sort of narrow view of
SIMD machines, seeing them mainly as vector processors. It seems to be
strong on pipelining, which is an important topic in microprocessors
these days. It does occassionally reach down to the gate level for
such matters as HW implementation of cache replacement policies. It
doesn't cover such issues as instruction set design, which I'm
interested in, but other than that most of its flaws are that it's
already five years old. Work on a second edition has reportedly begun,
but it's likely to be a while before it's out.

%A Robert G. Babb, II, ed.
%T Programming Parallel Processors
%I Addison-Wesley
%C Reading, MA
%D 1988
%K book, text, software, Alliant FX/8, BBN Butterfly, Cray X-MP,
FPS T-Series, IBM 3090, Loral LDF-100, Intel iPSC, hypercube, shared memory,
message passing, Sequent Balance, grecommended91,
%K dwf, enm,
%X Reviewed, IEEE Computer, by T. DeMarco, April 1988, pp. 141.
Good quote in review:
	If juggling three balls is of difficulty 3 on a scale of 1 to 10,
	then juggling four balls is a difficulty of 20. -- from a
	juggler's handbook.
This book is a compilation of experiences by grad students (and undergrads)
on a diversity of machines.  I suspect the monogram format will become
popular as a publishing vehicle on this topic for the future since
comparisons will be difficult.  Over half of the book consists of
appendices of source code or commentary.  I like the way certain
significant things are encircled by boxes: . . . WANTED
for Hazardous Journey. Small wages, bitter cold, long month

%A M. Kallstrom
%A S. S. Thakkar
%T Programming Three Parallel Computers
%K Comparison for Traveling Salesman Problem, C/Intel iPSC,
Occam/transputer, C/Balance 8000
%D January 1988
%J IEEE Software
%V 5
%N 1
%P 11-22
%K grecommended91,
%K jb,
%X See also earlier compcon paper.
%X Nice low-level introduction to comparing parallel machines and
portability.  Discusses three approaches to parallel programming.
Great for students learning parallel programming.  Includes transputers,
Sequent Balance, and iPSC.

%A Alan H. Karp
%Z IBM Palo Alto Scientific Center
%T Programming for Parallelism
%J Computer
%V 20
%N 5
%D May 1987
%P 43-57
%r G320-3490
%d June 1986
%K grecommended91,
%K meb,
%X Describes simple fork-join type constructs to be added to FORTRAN.
Taxonomy, shared memory systems.  An okay survey of the problems.
Leaves out certain key issues: deadlock, atomicity, exception handling,
debugging, etc.

%A William A. Kornfeld
%A Carl E. Hewitt
%T The Scientific Community Metaphor
%J IEEE Transactions on Systems, Man, and Cybernetics
%V SMC-11
%N 1
%D January 1981
%P 24-33
%K ai, distributed problem solving
%K grecommended91,
%K hcc,

%A D. J. Kuck
%A R. H. Kuhn
%A D. A. Padua
%A B. Leasure
%A M. Wolfe
%T Dependence Graphs and Compiler Optimization
%J Proceedings of the Eighth Symposium on the Principles of
Programming Languages (POPL)
%D January 1981
%P 207-218
%K Rdf, parallelization,
%K grecommended91,
%K j\-lb,

%A David J. Kuck
%T ILLIAC IV Software and Application Programming
%J IEEE Transactions on Computers
%V C-17
%N 8
%P 758-770
%D August 1968
%K bhibbard
%K applications of array computer, array computer, array language, compiler,
operating system,
%K Rhighnam, language, TRANQUIL,
%K grecommended91,
%K jlh, dp,
%X The early proposals for system software for the 256 PE ILLIAC IV.
Examples are given.
%X Contains a rationale for the ``TRANQUIL'' language.
%X (JLH & DP) Kuck's paper discusses the software and programming strategies
for ILLIAC.  Hord's book has more material about software, as well as some
discussion of experience in using the machine.

%A David J. Kuck
%A Edward S. Davidson
%A Duncan H. Lawrie
%A Ahmed H. Sameh
%T Parallel Supercomputing Today and the Cedar Approach
%J Science
%V 231
%N 4741
%D February 28, 1986
%P 967-974
%K paracompiler,
%K grecommended91,
%K ab,
%X Cedar uses a "paracompiler" to parallelize do-loops.
A more recent paper appears in Dongarra's Experimental Parallel
Computing Architectures book.  Photos.

%A H. T. Kung
%T Why Systolic Architectures?
%J IEEE Computer
%V 15
%N 1
%D January 1982
%P 37-46
%K Rhighnam, analysis, architecture,
%K j\-lb,
%K grecommended91,
multiprocessors, parallel processing, systolic arrays, VLSI,
%K bmiya,
%X * Systolic architectures, which permit multiple computations for
each memory access, can speed execution of compute-bound problems
without increasing I/O requirements.
reconfigured to suit new computational structures; however, this
capability places new demands on efficient architecture use.
Note: Kung also has a machine readable bibliography in Scribe
format which is also distributed with the MP biblio on tape, best
to request from Kung on the CMU `sam' machine.
Reproduced in Dharma P. Agrawal's (ed.) "Advanced Computer Architecture,"
IEEE, 1986, pp. 300-309.
%X In order to achieve the simplicity and density needed for effective VLSI
design, Kung's strategy is to optimize processor number, interconnection
topology and I/O structures for particular points in his space of parallel
algorithms.  He defines a family of systolic designs for computing various
forms of the convolution computation.  This is a family of computations
each member of which generates a sequence of values formed by taking a sum
of products of values of corresponding elements in two other sequences,
according to some indexing scheme.  In this paper Kung also gives examples
of some of the ways the movement of data could be organized: (1) Should
vector elements be pre-loaded into Processing Elements (PEs)?  (2) Which
way should data move?  Note that this is a two dimensional pipelining
strategy, so that the choice of data flow direction has much more freedom
than with simple linear pipelines.  Some of the organizations that Kung
uses are: square, hexagonal, and triangular arrays.  Among these schemes,
the relative direction of data flow of the two input vectors is another
design parameter.  (3) Should information be broadcasted or shifted through
the network?  (4) Which vectors should shift through the PEs?  Which should
remain stationary in the PEs?  Should vector entries come in a temporally
interleaved fashion, and if so, at what relative rates?
%X Each member of this family of architectures has a particular
interprocessor communication structure that matches the flow of data
required by the underlying algorithms.  It is a wise choice to match this
flow with particular algorithms in mind; previous attempts at
multiprocessor parallelism have met with the problem of interprocessor
communication being a bottleneck.
%X There are many considerations in chosing the design parameters of a
systolic architecture.  Probably the major factor is that it is highly
desirable to match the speed of processing to the available I/O bandwidth.
One way to accomplish this goal is to make multiple use of each input data
item.  This is done by either using broadcasting with unlimited fan-in or
by re-using each value of a vector at each stage of a pipeline.  Since it
is usually not possible to accurately estimate available I/O bandwidth in a
complex system, the hope is to make the system modular to allow for
adjustments to this ratio.
%X A surprising number of applications have been found where systolic
algorithms and architectures lead to effective, highly parallel computing
systems.  Among these are applications in signal and image processing,
matrix arithmetic, and non-numeric applications.

%A Leslie Lamport
%T Time, Clocks, and the Ordering of Events in a Distributed System
%J Communications of the ACM
%V 21
%N 7
%D July 1978
%P 558-565
%K distributed systems, computer networks, clock synchronization,
multiprocess systems, grecommended91
CR categories: 4.32, 5.29
distributed processing computer networks multiprocessing programs
ordering of events distributed system synchronising total ordering
clocks computer networks multiprocessing
%K bsatya
%K enm, dmp, jw,
%O 4 Refs.
treatment: theoretical
%X classic paper on logical clocks.
%X A classic paper on synchronization.
Reproduced in "Distributed Computing: Concepts and Implementations" edited by
McEntire, O'Reilly and Larson, IEEE, 1984.
%X The concept of one event happening before another in a distributed system
is examined, and is shown to define a partial ordering of the events. A
distributed algorithm is given for synchronising a system of logical clocks
which can be used to totally order the events. The use of the total
ordering is illustrated with a method for solving synchronisation problems.
The algorithm is then specialised for synchronising physical clocks, and a
bound is derived on how far out of synchrony the clocks can become.

%A Duncan H. Lawrie
%T Access and Alignment of Data in an Array Processor
%J IEEE Trans. on Computers
%V C-24
%N 12
%D Dec. 1975
%P 1145-1155
%K Alignment network, array processor, array storage, conflict-free access,
data alignment, indexing network, omega network, parallel processing,
permutation network, shuffle-exchange network, storage mapping,
switching network
grecommended91, U Ill, N log N nets,
Ginsberg biblio:
%K bmiya,
%K j\-lb,
%X This paper is reproduced in Kuhn and Padua's (1981, IEEE)
survey "Tutorial on Parallel Processing."
Reproduced in the 1984 tutorial: "Interconnection Networks for parallel
and distributed processing" by Wu and Feng.

%A F. C. H. Lin
%A R. M. Keller
%T The Gradient Model Load Balancing Method
%J IEEE Transactions on Software Engineering
%V SE-13
%N 1
%D January 1987
%P 32-38
%K Special Issue on Distributed Systems, Applicative systems,
computer architecture, data flow, distributed systems, load
balancing, multiprocessor systems, reduction architecture
%K grecommended91,
%K dmp, (+1),

%A Mamoru Maekawa
%A Arthur E. Oldehoeft
%A Rodney R. Oldehoeft
%T Operating system: advanced concepts
%C Menlo Park, CA
%I The Benjamin/Cummings Publishing Company, Inc.
%D 1986
%K book, text,
%K grecommended91,
%K fpst,
%X Excellent reference for a variety of subjects. Discusses parallel
and distributed systems from several viewpoints. Includes some good
discussions on distributed databases. Readable for computer scientists;
others might have to review some operating system terminology

%A Frank H. McMahon
%T The Livermore Kernels: A Computer Test of the Numerical Performance Range
%R UCRL-53745
%I LLNL
%C Livermore, CA
%D December 1986
%K Livermore loops,
%K grecommended91,
%K meb,
%X See also J. Martin's book on Supercomputer Performance Evaluation.
This report has more detail and raw data.

%A Robin Milner
%T A Calculus of Communicating Systems
%S Lecture Notes in Computer Science
%V 92
%I Springer-Verlag
%C Berlin
%D 1980
%K grecommended91,
%K fpst,
%K CCS parallel process communication theory equivalence congruence
%O (LA has)
%X Also see S-V's LNCS Vol. 158 for paper of same title and author, 1983.
%X Classical text, accessible with little computer science background.

%A Philip Arne Nelson
%T Parallel Programming Paradigms
%I Computer Science Department, University of Washington
%C Seattle, WA
%R Technical Report 87-07-02
%D July 1987
%K Ph.D. Dissertation
%K grecommended91,
%K dmp,

%A James M. Ortega
%Z U VA.
%T Introduction to Parallel and Vector Solution of Linear Systems
%S Frontiers of Computer Science
%I Plenum Press
%D New York
%D 1988
%K book, text,
%K grecommended91,
%K dwf,

%A David A. Padua
%A Michael J. Wolfe
%Z CSRD, U. Ill.
%T Advanced Compiler Optimization for Supercomputers
%J Communications of the ACM
%V 29
%N 12
%D December 1986
%P 1184-1201
%K Special issue on parallel processing,
CR Categories and Subject Descriptors: C.1.2 [Processor Architectures]:
Multiple Data Stream Architectures (Multiprocessors) -
array and vector processors;
multiple-instruction-stream, multiple-data-stream processors (MIMD);
parallel processors; pipeline processors;
single-instruction-stream, multiple-data-stream processors (SIMD);
D.2.7 [Software Engineering]: Distribution and Maintenance -
restructuring; D.3.3 [Programming Languages] Language Constructs -
concurrent programming structures; D.3.4 [Programming Languages] Processors -
code generation; compilers; optimization; preprocessors;
General Terms: Languages, performance
%K RCedar,
%K grecommended91,
%K ak,

%A R. H. Perrott
%T Parallel Programming
%S International Computer Science Series
%I Addison-Wesley
%C Menlo Park
%D 1987
%O ISBN 0201 14231 7
%$ 27
%K Book, text, Pascal Plus, communication, Modula-2, Ada, Occam,
mutual exclusion, synchronization, CFT, CFD, Cyber Fortran, FTN,
distributed array processor, DAP, mesh, ACTUS, Cray, dataflow,
message passing,
%K grecommended91,
%K ps,
%X An earlier edition of this book was published in 1985.
Book does not cover hypercubes.

%A Constantine D. Polychronopoulos
%T Parallel Programming and Compilers
%I Kluwer Academic Publishers
%C Boston
%D 1988
%K book, text,
%K grecommended91,
%K lls,

%A T. W. Pratt
%T Programming Languages: Design and Implementation
%C Englewood Cliffs, NJ
%I Prentice-Hall
%D 1984
%K book, text,
%K grecommended91,
%K fpst,
%X A good book for those without a background in programming languages.
Primarily discusses uniprocessor languages; gives examples of many different
languages. Help non programming language researchers understand the issues.

%A Michael J. Quinn
%T Designing Efficient Algorithms for Parallel Computers
%I McGraw Hill
%D 1987
%K Book, text
%K grecommended91,
%K dgg, fpst,
%X Have used in classes. Readable.

%A Richard M. Russell
%T The Cray-1 Computer System
%J Communications of the ACM
%V 21
%N 1
%P 63-72
%D January 1978
%K grecommended91,
existing classic architecture,
maeder biblio: parallel hardware and devices, implementation,
ginsberg biblio:
%K bhibbard
%K enm, j\-lb,
%X The original paper describing the Cray-1.
This paper is reproduced in Kuhn and Padua's (1981, IEEE)
survey "Tutorial on Parallel Processing."
Also reproduced in "Computer Structures: Principles and Examples" by
Daniel P. Siewiorek, C. Gordon Bell, and Allen Newell, McGraw-Hill,
1982, pp. 743-752.
Reproduced in Dharma P. Agrawal's (ed.) "Advanced Computer Architecture,"
IEEE, 1986, pp.15-24.
%X Literature search yields:
00712248   E.I. Monthly No: EI7804023850   E.I. Yearly No: EI78014612
Title: Cray-1 Computer System.
Author: Russell, Richard M.
Corporate Source: Cray Res Inc, Minneapolis, Minn
Source: Communications of the ACM v 21 n 1 Jan 1978 p 63-72
Publication Year: 1978
CODEN: CACMA2   ISSN: 0001-0782
Language: ENGLISH
Journal Announcement: 7804
Abstract: The CRAY-1 is described, the evolution of its architecture is
discussed, and an account is given of some of the problems that were
overcome during its manufacture. The CRAY-1 is the only computer to have
been built to date that satisfies ERDA's Class VI requirement (a computer
capable of processing from 20 to 60 million floating point operations per
second). The CRAY-1's Fortran compiler (CFT) is designed to give the
scientific user immediate access to the benefits of the CRAY-1's vector
processing architecture. An optimizing compiler, CFT, " vectorizes "
innermost DO loops. Compatible with the ANSI 1966 Fortran Standard and with
many commonly supported Fortran extensions, CFT does not require any source
program modifications or the use of additional nonstandard Fortran
statements to achieve vectorization. 6 refs.
Descriptors: *COMPUTER ARCHITECTURE; COMPUTER SYSTEMS, DIGITAL
Classification Codes: 722  (Computer Hardware); 723  (Computer Software)
72  (COMPUTERS & DATA PROCESSING)

%A Howard J. Siegel
%T Interconnection Networks and Masking for Single Instream Multiple Data
Stream Machines
%R PhD Dissertation
%I Princeton University
%D May 1977
%O 171 pages
%K grecommended91,
%K ag,
%X I don't reference it much since it is not as easy to get.
But it is a Good Thing to read.

%A Lawrence Snyder
%A Leah H. Jamieson
%A Dennis B. Gannon
%A Howard Jay Siegel
%T Algorithmically Specialized Computers
%I Academic Press
%C Orlando, FL
%D 1985
%K Rhighnam, analysis, architecture, survey, VLSI, book, text,
%K grecommended91,
%K meb,
%O 510.7862 A394
%X Proceedings from the Purdue NSF Workshop on Algorithmically Specialized
Computers.

%A Harold S. Stone
%T Introduction to Computer Architecture
%I Addison-Wesley
%D 1987
%K bhibbard, book, text,
%K grecommended91,
%K ak,
%X First entry was dated 1975.  This is the third edition.
SRA was old publisher.
%X Has a couple of very good chapters on multiprocessors/multiprocessing.

%A Quinton F. Stout
%A Russ Miller
%T Parallel Algorithms for Regular Architectures
%I MIT Press
%D 1988
%K grecommended91,
%K fpst,
%X No direct experience, but I've heard good things.

%A Richard J. Swan
%A S. H. Fuller
%A Daniel P. Siewiorek
%T Cm* -- A Modular, Multi-Microprocessor
%J Proceedings AFIPS National Computer Conference
%I AFIPS Press
%V 46
%D 1977
%P 637-644
%K CMU, grecommended91
%K btartar
%K jlh, dp,
%X This paper is reproduced in Kuhn and Padua's (1981, IEEE)
survey "Tutorial on Parallel Processing."
%X (JLH & DP) Cm* was the first multiprocessor based on microprocessor
technology.  Despite the limited success of the machine, it's impact and ideas
are present in many machines being built today including the
Encore Ultramax (Multimax) and Stanford's DASH.
%X Literature search yields:
00719649   E.I. Monthly No: EI7806040291   E.I. Yearly No: EI78016355
Title: Cm* -- A Modular, Multi-Microprocessor.
Author: Swan, R. J.; Fuller, S. H.; Siewiorek, D. P.
Corporate Source: Carnegie-Mellon Univ, Pittsburgh, Pa
Source:  AFIPS  Conference  Proceedings v 46 1977, Dallas, Tex, Jun 13-16
1977. Publ by AFIPS Press, Montvale, NJ, 1977 p 637-644
Publication Year: 1977
CODEN: AFPGBT   ISSN: 0095-6880
Language: ENGLISH
Journal Announcement: 7806
Abstract: The paper describes the architecture of a new large
multiprocessor computer system being built at Carnegie-Mellon University.
The system allows close cooperation between large numbers of inexpensive
processors. All processors share access to a single virtual memory address
space. There are no arbitrary limits on the number of processors, amount of
memory or communication bandwidth in the system. Considerable support is
provided for low-level operating system primitives and inter-process
communication. 18 refs.
Descriptors: *COMPUTERS, MICROPROCESSOR
Classification Codes: 721  (Computer Circuits & Logic Elements); 722
(Computer Hardware)
72  (COMPUTERS & DATA PROCESSING)

%A Shreekant S. Thakkar
%A Mark Sweiger
%T Performance of an OLTP Application on the Symmetry Multiprocessor System
%J Proc. 17th Annual Symposium on Computer Architecture,
Computer Architecture News
%I ACM
%V 18
%N 2
%D June 1990
%P 228-238
%K Applications, Sequent,
%K grecommended91,
%K jlh, dp,
%X (JLH & DP) One of the few paper evaluating a
nonscientific application running on a multiprocessor.

%Q Thinking Machines Corporation
%T Connection Machine Model CM-2 technical summary
%R Technical Report HA87-4
%C Boston, MA
%D April 1987
%K grecommended91, hardware architecture,
%K Rhighnam, Connection Machine
%K jlh, dp,
%X (JLH & DP) Another architecture reference for the CM-2 is Tucker, L.
and G. Robertson [1988].  "Architecture and Applications of the
Connection Machine," Computer, vol. 21, no. 8 (August), pages 26-38.

%A Philip C. Treleaven
%A David R. Brownbridge
%A Richard P. Hopkins
%T Data-Driven and Demand-Driven Computer Architecture
%J Computing Surveys
%V 14
%N 1
%D March 1982
%P 93-143
%K grecommended91,
CR Categories and Subject Descriptors:
C.0 [Computer System Organization]:
General - hardware/software interfaces; system architectures;
C.1.2 [Processor Architecture]:
Multiple Data Stream Architectures (Multiprocessors);
C.1.3 [Processor Architecture]: Other Architecture Styles
- data flow architectures; high level language architectures;
D.3.2 [Programming Languages]: Language Classifications - data-flow
languages; macro and assembly languages; very high-level languages
General Terms: Design
Additional Key Words and Phrases: Demand = driven architecture,
data = driven architecture
%K Rdf, enm, dmp, ak,
%X * The aim of this paper is to identify the concepts and relationships
that exist both within and between the two areas of research of
data-driven and demand-driven architectures.
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 4-54.

%A Robert Voigt
%T Where are the Parallel Algorithms
%I NASA Langley Research Center
%R ICASE Report No. 85-2
%D 1985
%K grecommended91,
%K ab,

%A Chuan-Lin Wu
%A Tse-Yun Feng
%T On a Class of Multistage Interconnection Networks
%J IEEE Transactions on Computers
%V C-29
%N 8
%D August 1980
%P 694-702
%K Array processing, computer architecture, conflict resolution,
interconnection networks, MIMD machine, multiple-processor systems,
network configurations, parallel processing, routing techniques,
SIMD machine
%K Rhighnam, analysis,
%K grecommended91,
%K ag,
%X
Reproduced in the 1984 tutorial: "Interconnection Networks for parallel
and distributed processing" by Wu and Feng.
Also reprinted in the text compiled by Kai Hwang:
"Supercomputers: Design and Application," IEEE, 1984.

%A William A. Wulf
%A Roy Levin
%A Samuel P. Harbison
%T HYDRA/C.mmp: An Experimental Computer System
%I McGraw-Hill
%D 1981
%K grecommended91, CMU, C.mmp, HYDRA OS,
multiprocessor architecture and operating systems
%K bsatya, book, text,
%K enm, ag,
%X * Describes the architecture of C.mmp, and details the goals, design, and
performance of HYDRA, its capability based OS.

%A Hans P. Zima
%A Barbara Chapman
%T Supercompilers for Parallel and Vector Computers
%I Addison-Wesley (ACM)
%D 1990
%O ISBN 0-201-17560-6
%K book, text,
%K grecommended91,
%K cb@uk,
%$39.95
%X Ken Kennedy, in the intro, describes this as the first satisfactory
textbook on dependence analysis and its application to vectorization and
parallelization.  Good bibliography.
%X Contents (chapter headings):-
1. Supercomputers and Supercompilers - general stuf,
architecture, applications, early work...
2. Supercomputer Architectures - vector and parallel systems
3. Scalar Analysis - control and data flow, monotone data flow
systems, use-definition chains, dominance relation, simple
optimizations,...
4. Data Dependence - basic concepts and definition, dependence
testing, algorithms for DD testing.
5. Standard Transformations - DO-loop normalization, subscript
normalization, scalar renaming.
6. Vectorization - fundamental transformations, concurrency in
loops, vector code-generation, loop transformations, control dependence
and vectorization, procedure calls.
7. Parallelization - parallel code generation for DOALL,
DOACROSS loops, shared memory parallelization, distributed memory
parallelization.
8. Supercompilers and their Environments - case studies of
'supercompilers', and issues in their environments.
A. Tarjan's algorithm
B. The Banerjee Test
C. Mathematical Notation
%X Each chapter has a series of bibliographic notes and there is a large
bibliography which is very helpful.  The book is quite mathematical in
style, making familiarity with set notation helpful.  I am most
interested in tools to aid in automatic parallelization, particularly
for distributed memory machines, and have therefore read more in the
chapters more relevant to this area - in particular I haven't read
chapters 5 or 6 at all.
Chapters 1 and 2 are fairly standard summaries.
Chapter 3 (53 pages) provides a fairly weighty intro to scalar analysis.
Chapter 4 (62 pages): a thorough introduction, including direction
vectors, gcd, Banerjee and separability tests, etc.  Personally I found
the book form of Wolfe's thesis (Optimizing Supercompilers for
Supercomputers) clearer as a newcomer to the field, but everything is
covered here.
Chapter 7 (56 pages) is biased towards shared memory systems, as that is
where most of the exisiting work has been done.  The style is clear, and
less mathematical in formulation than earlier chapters.  Definitions of
shared, local, reduction, shared ordered, etc.  variables are clear.
For the distributed memory case an SPMD model is assumed, and all
discussion is in terms of this model.
Chapter 8 (20 pages) gives case studies on Parafrase (Illinois), PFC
(Rice) and SUPERB (Suprenum) as well as looking at the place of
parallelization tools within an integrated development environment for
high-performance computing.
Overall - good summary of the field, somewhat mathematical, and as Ken
Kennedy writes in the Foreword:
"...no satisfactory textbook on dependence analysis and its
applications to vectorization and parallelization has yet emerged.
Given the importance of the subject, this is quite surprising.


-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell