eugene@wilbur.nas.nasa.gov (Eugene N. Miya) (04/12/91)
[This is a retransmission. It appears that the version sent by email to those w/o usenet actually got there; usenet did not send. steve ] %Z Date: Tue, 9 Apr 91 14:40:40 -0700 %A George S. Almasi %A Allan Gottlieb %T Highly Parallel Computing %I Benjamin/Cummings division of Addison Wesley Inc. %D 1989 %K ISBN # 0-8053-0177-1, book, text, Ultracomputer, grequired91, %K enm, cb@uk, ag, jlh, dp, gl, %$ $36.95 %X This is a kinda neat book. There are special net anecdotes which makes this interesting. %X Oh, there are a few significant typos: LINPAK is really LINPACK. Etc. %X (JLH & DP) The authors discuss the basic foundations, applications, programming models, language and operating system issues and a wide variety of architectural approaches. The discussions of parallel architectures include a section that describes the key concepts within a particular approach. %A C. L. Seitz %T The Cosmic Cube %J Communications of the ACM %V 28 %N 1 %D January 1985 %P 22-33 %r Hm83 %d June 1984 %K enm, dmp, jlh, dp, j\-lb, %K CR Categories and Subject Descriptors: C.1.2 [Processor Architectures]: Multiple Data Stream Architectures (Multiprocessors); C.5.4 [Computer System Implementation]: VLSI Systems; D.1.2 [Programming Techniques]: Concurrent Programming; D.4.1 [Operating Systems]: Process Management General terms: Algorithms, Design, Experimentation Additional Key Words and Phrases: highly concurrent computing, message-passing architectures, message-based operating systems, process programming, object-oriented programming, VLSI systems, homogeneous machine, hypercube, C^3P, grequired, Rcccp, Rhighnam, %X Excellent survey of this project. Reproduced in "Parallel Computing: Theory and Comparisons," by G. Jack Lipovski and Miroslaw Malek, Wiley-Interscience, New York, 1987, pp. 295-311, appendix E. %X * Brief survey of the cosmic cube, and its hardware %X (JLH & DP) This is a good discussion of the Caltech approach, which embodies the ideas several of these machines (often called hypercubes). The work at Caltech is the basis for the machines at JPL and the Intel iPSC, as well as closely related to the NCUBE design. Another paper by Seitz on this same topic appears in the Dec. 1984 issue of IEEE Trans. on Computers. %X Literature search yielded: 1450906 C85023854 The Cosmic Cube (Concurrent Computing) Seitz, C.L. Author Affil: Dept. Of Comput. Sci., California Inst. Of Technol., Pasadena, Ca, Usa Source: Commun. Acm (Usa) Vol.28, No.1, Pp.: 22-33 Publication Year: Jan. 1985 Coden: Cacma2 Issn: 0001-0782 U. S. Copyright Clearance Center Code: 0001-0782/85/0100-002275c Treatment: Practical; Document Type: Journal Paper Languages: English (14 Refs) Abstract: Sixty-four small computers are connected by a network of point-to-point communication channels in the plan of a binary 6-cube. this cosmic cube computer is a hardware simulation of a future vlsi implementation that will consist of single-chip nodes. the machine offers high degrees of concurrency in applications and suggests that future machines with thousands of nodes are both feasible and attractive. it uses message switching instead of shared variables for communicating between concurrent processes. descriptors: multiprocessing systems; message switching identifiers: message passing architectures; process programming; vlsi systems; point-to-point communication channels; binary 6-cube; cosmic cube; hardware simulation; VLSI implementation; single-chip nodes; concurrency class codes: C5440; C5620 %A M. Ben-Ari %T Principles of Concurrent Programming %C Englewood Cliffs, NJ %I Prentice-Hall %D 1986 %$ 29 %O ISBN 0-13-711821-X. %K book, text, %K grequired91, %K sc, +3 votes posted from c.e. discussion. %X The text covers all the significant paradigms and basic concurrency concepts with examples in a pseudo language similar to C. Syllabus for our course includes: 1. concurrent I/O and interrupt processing 2. concurrent programming abstractions 3. intro. to LSC and transputers 4. mutual exclusion; problems and principles 5. semaphores and monitors 6. synchronization 7. Linda-style message posting 8. performance monitoring and load balancing %X This is the second edition of his earlier book and cover more material, e.g. distributed systems, time analysis for real-time systems, ... It may however be too introductory for the someone after a good O/S course (of course in O/S I tend to teach concurrent programming so ...) %A George H. Barnes %A Richard M. Brown %A Maso Kato %A David J. Kuck %A Daniel L. Slotnick %A Richard A. Stokes %T The ILLIAC IV Computer %J IEEE Transactions on Computers %V C-17 %N 8 %D August 1968 %P 746-757 %K grequired91, array, computer structures, look-ahead, machine language, parallel processing, speed, thin-film memory, multiprocessors, Rmaeder biblio: parallel hardware and devices, %K architecture, ILLIAC-IV, SIMD, Rhighnam, %K rwa, %K ag, jlh, dp, j\-lb, %X This was the original paper on the ILLIAC IV when it was proposed as a 256 processing element machine, a follow on to the SOLOMON. It was a very ambitious design. %X Contains ILLIAC IV assembler (among other things). %X (JLH & DP) This is the original paper on the ILLIAC IV hardware; some aspects of the machine (especially the memory system) changed subsequently. A later pape, cited as Bourknight, et. al. (1972) provides a more accurate description of the real hardware. %X (J-LB) paper of historical significance. %A Edward Gehringer %A Daniel P. Siewiorek %A Zary Segall %Z CMU %T Parallel Processing: The Cm* Experience %I Digital Press %C Boston, MA %D 1987 %K book, text, multiprocessor, %K enm, ag, jlh, dp, %K grequired91, %$ 42 %X Looks okay! %X [Extract from inside front cover] ... a comprehensive report of the important parallel-processing research carried out on Cm* at Carnegie-Mellon University. Cm* is a multiprocessing system consisting of 50 tightly coupled processors and has been in operation since the mid-1970s. Two operating systems-StarOs and Medusa-are part of its development, along with a vast number of applications. %X (JLH & DP) This book reviews the Cm* experience. The book discusses hardware issues, operating system strategies, programming systems, and includes an extensive discussion of the experience with over 20 applications on Cm*. %A J. R. Gurd %A C. C. Kirkham %A I. Watson %T The Manchester Prototype Dataflow Computer %J Communications of the CACM %V 28 %N 1 %D January 1985 %P 34-52 %K CR Categories and Subject Descriptors: C.1.3 [Processor Architectures]: Other Architecture Styles; C.4 [ Performance of Systems]; D.3.2 [Programming Languages]: Language Classifications General Terms: Design, Languages, Performance Additional Key Words and Phrases: tagged-token dataflow, single assignment programming, SISAL, parallel computation, grequired91, %K enm, dmp, jlh, dp, %X A special issue on Computer Architecture. Mentions SISAL, but not LLNL. Using tagged-token dataflow, the Manchester processor is running reasonably large user programs at maximum rates of between 1 and 2 MIPS. Reproduced in "Selected Reprints on Dataflow and Reduction Architectures" ed. S. S. Thakkar, IEEE, 1987, pp. 111-129. %X (JH & DP) This paper discusses the machine, its software, and evaluates performance. %A Geoffrey C. Fox %A Mark A. Johnson %A Gregory Lyzenga %A Steve W. Otto %A John Salmon %A David Walker %Z Caltech %T Solving Problems on Concurrent Processors %V 1, General Techniques and Regular Problems %I Prentice-Hall %C Englewood Cliffs, NJ %D 1988 %K book, text, hypercubes, CCCP, MIMD, parallel programming, communication, applications, physics, %K bb, jlh, dp, %K grequired91, %K suggested supplemental ref by jh and dp %O ISBN 13-823022-6 (HB), 13-823469-8 (PB) %X Interesting book. Given out for free at Supercomputing'89. %X My Bible of Distributed Parallel Computing; even if you are not using Express it is a wonderful book to have ! %A Michael Wolfe %T Optimizing Supercompilers for Supercomputers %S Pitman Research Monographs in Parallel and Distributed Computing %I MIT %C Cambridge, MA %D 1989 %K book, text, %K grequired91, %K cb@uk, dmp, lls, %X Good technical intro to dependence analysis, based on Wolfe's PhD Thesis. %A Robert H. Kuhn %A David A. Padua, eds. %T Tutorial on Parallel Processing %I IEEE %D August 1981 %K bmiya, book, text, survey, %K grequired91, %K enm, ag, fpst, %X This is a collection of noted papers on the subject, collected for the tutorial given at the 10th conference (1981) on Parallel Processing. It eases the search problem for many of the obscure papers. Some of these papers might not be considered academic, others are applications oriented. Data flow is given short coverage. Still, a quick source for someone getting into the field. Where ever possible, papers in this bibliography are noted as being in this text. %X Check on literature search: Tutorial on parallel processing; initially presented at the Tenth International Conference on Parallel Processing, August 25-28, 1981, Bellaire, Michigan / [edited by] Robert H. Kuhn, David A. Padua Kuhn, Robert H; Padua, David A CONFERENCE: International Conference on Parallel Processing (10th : 1981 : Bellaire, Mich.) [Los Angeles? CA]: IEEE Computer Society Press : Order from IEEE Computer Society, v, 498 p. : ill. ; 28 cm. PUBLICATION DATE(S): 1981 PLACE OF PUBLICATION: California LC CALL NO.: QA76.6 .I548 1981 DEWEY CALL NO.: 001.64 RECORD STATUS: New record BIBLIOGRAPHIC LEVEL: Monograph LANGUAGE: English ILLUSTRATIONS: Illustrations NOTES: Bibliography: p. 497-498. DESCRIPTORS: Parallel processing (Electronic computers) -- Congresses ? 72 (COMPUTERS & DATA PROCESSING) === %A Gul A. Agha %Z U. of Mich. %T Actors: A Model of Concurrent Computation in Distributed Systems %I MIT Press %C Cambridge, MA %D 1986 %K book, text, communication, evaluation, abstraction, distributed computing, agents, grecommended91, %K hcc, fpst, %X See also his PhD thesis of the same title. %X Now considered a classical text. %A Gene M. Amdahl %T Validity of the single processor approach to achieving large scale computing capabilities %J AFIPS Proc. of the SJCC %V 31 %D 1967 %P 483-485 %K grecommended91, %K bmiya, %K ak, %X should be reread every week %X Well known (infamous ?) Amdahl's law that suggests that if x % of an algorithm is not parallelizable then the maximum speedup is 1/x. Limits of vectorization. Arthur Goldberg @cs.ucla.edu %A Gregory R. Andrews %A Fred B. Schneider %T Concepts and Notations for Concurrent Programming %J Computing Surveys %V 15 %N 1 %P 3-43 %O 133 REFS. Treatment BIBLIOGRAPHIC SURVEY, PRACTICAL %D March 1983 %i University of Arizona, Tucson %r CS Dept. TR 82-12 %d Sept. 1982. %K grecommended91, parallel processing programming OS parallel processing concurrent programming language notations processes communication synchronization primitives %K bmiya, %K fpst, %X This is not a book, but probably the best place to start in understanding concurrency. Well written, though dated a bit. %X Literature search yields: 01391537 E.I. Monthly No: EI8309072043 E.I. Yearly No: EI83016908 Title: Concepts And Notations For Concurrent Programming. Author: Andrews, Gregory R.; Schneider, Fred B. Corporate Source: Univ of Arizona, Dep of Computer Science, Tucson, Ariz, USA Source: Computing Surveys v 15 n 1 Mar 1983 p 3-43 Publication Year: 1983 CODEN: CMSVAN ISSN: 0010-4892 Language: ENGLISH Journal Announcement: 8309 Abstract: This paper identifies the major concepts of concurrent programming and describes some of the more important language notations for writing concurrent programs. The roles of processes, communication, and synchronization are discussed. Language notations for expressing concurrent execution and for specifying process interaction are surveyed. Synchronization primitives based on shared variables and on message passing are described. Finally, three general classes of concurrent programming languages are identified and compared. 133 refs. Descriptors: *COMPUTER PROGRAMMING Classification Codes: 723 (Computer Software) 72 (COMPUTERS & DATA PROCESSING) %A James Archibald %A Jean-Loup Baer %T Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model %J ACM Transactions on Computer Systems %V 4 %N 4 %D November 1986 %P 273-298 %K j\-lb, %K grecommended91, %A Arvind %A David E. Culler %T Dataflow Architectures %Z MIT %J Annual Reviews in Computer Science %V 1 %P 225-53 %D 1986 %r TM-294 %d February 1986 %K grecommended91, %K jlh, dp, j\-lb, %X Not detailed, but reasonably current survey paper on data flow. Includes status information of American, English, France, and Japanese dataflow projects like the SIGMA-1 (Japan), Manchester (English), and so forth. Reproduced in "Selected Reprints on Dataflow and Reduction Architectures" ed. S. S. Thakkar, IEEE, 1987, pp. 79-101. %X (JLH & DP) This paper discusses the basic ideas behind dataflow machines. The basic concepts of dataflow were espoused by Dennis: Dennis, J. [1980]. "Dataflow Cupercomputers," Computers vol. 13, no. 11 (November), pages 48-56. %A Tom Axford %T Concurrent Programming: Fundamental Techniques for Real-Time and Parallel Software Design %I John Wiley %S Series in Parallel Computing %D 1989 %K book, text, %K grecommended91, %O ISBN 0 471 92303 6 %K js, (2 c.e. votes), %X more about software techniques for concurrency than about parallel programming, but still useful. %X ...quite happy with it. ... primary language used was Modula-2 ... ... concepts, architectures, and so forth. ... used transputers as an inexpensive platform for parallel computing. We used C for the transputer programming. %A J. Backus %T Can Programming be Liberated from the von Neumann Style? A Functional Style and its Algebra of Programs %J Communications of the ACM %V 16 %N 8 %D August 1978 %P 613-641 %K grecommended91, Turing award lecture, Key words and phrases: functional programming, algebra of programs, combining forms, programming languages, von Neumann computers, von Neumann languages, models of computing systems, applicative computing systems, program transformation, program correctness, program termination, metacomposition, CR categories: 4.20, 4.29, 5.20, 5.24, 5.26, %K Rhighnam, theory %K ak, %X Reproduced in "Selected Reprints on Dataflow and Reduction Architectures" ed. S. S. Thakkar, IEEE, 1987, pp. 215-243. %A J. L. Baer %T A Survey of Some Theoretical Aspects of Multiprocessing %J Computing Surveys %V 5 %N 1 %D March 1973 %P 31-80 %K multiprocessing, mutual exclusion, semaphores, automatic detection of parallelism, graph models, Petri nets, flow graph schemata, scheduling, array processors, pipe-line computers CR categories: 6.0, 8.1, 4.32, 5.24 maeder biblio: general, concepts, parallel programming, parallel architecture, %K btartar %K grecommended91, %K ak, %A Howard Barringer %T A Survey of Verification Techniques for Parallel Programs %I Springer-Verlag %S Lecture Notes in Computer Science %V 191 %C Berlin %D 1985 %$ 11.25 %K Book, text, %K grecommended91, %K fpst, %X For the theoretical at heart. Gives insights into what is so hard about distributed and parallel processing. Compares many different approaches. %A K. E. Batcher %T STARAN Parallel Processor System Hardware %J Proceedings AFIPS National Computer Conference %D 1974 %P 405-410 %K grecommended91 %K btartar %K Rhighnam, architecture, associative, %K ag, %X This paper is reproduced in Kuhn and Padua's (1981, IEEE) survey "Tutorial on Parallel Processing." %X Literature search provides: 00446338 E.I. Monthly No: EI7504022393 E.I. Yearly No: EI75013769 Title: Staran Parallel Processor System Hardware. Author: Batcher, Kenneth E. Corporate Source: Goodyear Aerosp Corp, Akron, Ohio Source: AFIPS Conference Proceedings v 43, 1974, for Meet, Chicago, Ill, May 6-10 1974, p 405-410 Publication Year: 1974 CODEN: AFPGBT ISSN: 0095-6880 Language: ENGLISH Journal Announcement: 7504 Abstract: The parallel processing capability of STARAN resides in n array modules (n LESS THAN EQUIVALENT TO 32). Each array module contains 256 small processing elements (PE's). They communicate with a multi-dimensional access (MDA) memory through a "flip" network, which can permute a set of operands to allow inter-PE communication. This paper deals with the MDA memories, the STARAN array modules, the other elements of STARAN, and the results of certain application studies. 4 refs. Descriptors: *Computer Systems, Digital--*Parallel Processing Identifiers: Staran Processors Classification Codes: 722 (Computer Hardware) 72 (Computers & Data Processing) %A John Beetem %A Monty Denneau %A Don Weingarten %Z IBM TJW, Yorktown Heights %T The GF11 Supercomputer %J Proceedings of the 12th International Symposium on Computer Architecture %I IEEE %D June 1985 %C Boston, MA %P 108-115 %K Quantum chromodynamics, %K Special Purpose Parallel Processors, IBM %r RC 10852 %K grecommended91, %K jlh, dp, %K suggested supplemental ref jh an dp %X 576 processors, modified SIMD, 2 MB memory per processor at 20 MFLOPS. Memphis Switch, 50 ns. 1,125 Macho Bytes, 11,520 Macho FLOPS. %A John Beetem %A Monty Denneau %A Don Weingarten %Z IBM TJW, Yorktown Heights %T The GF11 Supercomputer %B Experimental Parallel Computing Architectures %E J. J. Dongarra %S Special Topics in Supercomputing %V 1 %I Elsevier Science Publishers B.V. (North-Holland) %C Amsterdam %D 1987 %P 255-298 %K Quantum chromodynamics, %K grecommended91, %K jlh, dp, %K suggested supplemental ref jh an dp %X 576 processors, modified SIMD, 2 MB memory per processor at 20 MFLOPS. Memphis Switch, 50 ns. 1,125 Macho Bytes, 11,520 Macho FLOPS. See also the Proceedings of the 12th International Symposium on Computer Architecture, IEEE, June 1985, Boston, MA, pages 108-115 or see the IBM TR RC 10852 from TJW. %A Dimitri P. Bertsekas %A John N. Tsitsiklis %T Parallel and Distributed Computation: Numerical Methods %I Prentice Hall %C Englewood Cliffs NJ %D 1989 %K book, text, %K grecommended91, %K ab, %O ISBN 0-13-648700-9 %X Received one verbal view that this book isn't great. It was that person's opinion that the authors didn't implement important details of algorithms (like boundary conditions). The view is unconfirmed, and requires further investigation. %A W. J. Bouknight %A Stewart A. Denenberg %A David E. McIntyre %A J. M. Randall %A Amed H. Sameh %A Daniel L. Slotnick %T The ILLIAC IV System %J Proceedings of the IEEE %V 60 %N 4 %D April 1972 %P 369-388 %K grecommended91, multiprocessors, parallel processing, SIMD, %K btartar %K Rhighnam, architecture, language, %K ag, %X This is the "what we did" paper in contrast to the design paper Barnes et al in 1968. A subsetted version of this paper appears in "Computer Structures: Principles and Examples" by Daniel P. Siewiorek, C. Gordon Bell, and Allen Newell, McGraw-Hill, 1982, pp. 306-316. %A Nicholas Carriero %A David Gelernter %T How to Write Parallel Programs: A Guide to the Perplexed %J ACM Computing Surveys %V 21 %N 3 %D September 1989 %P 323-357 %d April 1988 %r YALEU/DCS/RR-628 %K special issue on programming language paradigms, %K Categories and Subject Descriptors: D.1.3 [Programming Techniques]: Concurrent Programming; D.3.2 [Programming Languages]: Language classifications - parallel languages; D.3.3 [Programming Languages]: Language constructs - concurrent programming structures; E.1.m [Data Structures]: Miscellaneous - distributed data structures; live data structures; General Terms: Algorithms, Program Design, Languages, Additional Key Words and Phrases: Linda, parallel programming methodology, parallelism, %K grecommended91, %K hcc, ag, %X From page: 326: It is nonetheless a subtle but essential point that these approaches represent three clearly separate ways of thinking about the problem: Result parallelism focuses on the shape of the finished product; specialist parallelism focuses on the makeup of the work crew; and agenda parallelism focuses on the list of tasks to be performed. Also the terms: message-passing, distributed data structures or live data structures. Notes that it does not deal with data parallelism (ala CM) nor speculative parallelism (OR-parallelism). Tries to be practical, but it does admit distributed programs are harder and more complex. %X The authors distinguish between three classes of parallelism, result, agenda, and specialist, and between three roughly corresponding methods for implementation, live data structures, distributed (shared) data structures, and message passing systems. The Linda model is then introduced and related to each class and method #209# it serves as a kind of universal model for describing the essential parallelism, as opposed to sequential processes. An example is treated in some detail. %A Nicholas Carriero %A David Gelernter %T How to Write Parallel Programs: A First Course %I MIT Press %C Cambridge, MA %D 1990 %K book, text, %K grecommended91, %K hcc, %A K. Mani Chandy %A Jayadev Misra %Z University of Texas--Austin %T Parallel Program Design: A Foundation %I Addison-Wesley %D 1988 %K book, text, UNITY, %K grecommended91, %K hcc, fpst, %O ISBN 0-201-05866-9 %X requires a more in depth review. %X Theoretically useful. %A Robert P. Colwell %A Robert P. Nix %A John J. O'Donnell %A David B. Papworth %A Paul K. Rodman %Z Multiflow %T A VLIW Architecture for a Trace Scheduling Compiler %J Proceedings Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS II) %J Computer Architecture News %V 15 %N 5 %J Operating Systems Review %V 21 %N 4 %J SIGPLAN Notices %V 22 %N 10 %I ACM %C Palo Alto, CA %D October 1987 %P 180-192 %K Trace scheduling, VLIW, very long instruction word, %K grecommended91, %K j\-lb, %A William J. Dally %A Charles L. Seitz %Z Caltech %T Deadlock-Free Message Routing in Multiprocessor Interconnection Networks %J IEEE Transactions on Computers %V C-36 %N 5 %D May 1987 %P 547-553 %r TR 5231:TR:86 %d June 1986 %K Caltech Cosmic Cube, hypercube, C^3P, wormhole, %K grecommended91, Rcccp, %K dmp, j-lb, %A F. Darema-Rogers %A G. F. Pfister %A K. So %T Memory Access Patterns of Parallel Scientific Programs %J Proc. 1987 ACM/SIGMETRICS Conf. in Meas. and Mod. of Comp. Syst. %V 15 %N 1 %D May 1987 %C Banff, Alberta, Canada %P 46-57 %K parallel systems, RP3, %K grecommended91, %K jlh, dp, %X (JLH & DP) Reports results of measurements of some applications for the RP3. %A Jarek Deminent %T Experience with Multiprocessor Algorithms %J IEEE Transactions on Computers %V C-31 %N 4 %P 278-288 %D April 1982 %K bmiya, Cm*, parallel algorithms, applications, %K grecommended91, %K jlh, dp, %X This paper reports experience using Cm* for several applications There are other references available on experience with Cm*; these are published as CMU technical reports or theses. Also, see the book by Gehringer, et al. %A K. P. Eswaran %A J. N. Gray %A R. A. Lorie %A I. L. Traiger %T The notions of consistency and predicate locks in a database system %J Communications of the ACM %V 19 %N 11 %D Nov. 1976 %P 624-633 %K Rdpsdis.bib Rsingh %K grecommended91, %K dmp, %A Jerome A. Feldman %A Patrick J. Hayes %A David E. Rumelhart, eds. %T Parallel Distributed Processing, Explorations in the Microstructure of Cognition %V 1, Foundations %I MIT Press %D 1986 %K AT15 AI04 AI03 AI08 %K The PDP Perspective, book, text, %K grecommended91, %K jb, %X Two volume set for $35.95. %X AI meets parallelism and neural nets. (A bible of sorts) %A Jerome A. Feldman %A Patrick J. Hayes %A David E. Rumelhart, eds. %T Parallel Distributed Processing, Explorations in the Microstructure of Cognition %I MIT Press %V 2, Psychological and Biological Models %D 1986 %K AT15 AI04 AI03 AI08 %K The PDP Perspective %K book, text, %K grecommended91, %K jb, %X Two volume set for $35.95. %X AI meets parallelism and neural nets. (A bible of sorts) %A M. J. Flynn %T Very High-Speed Computing Systems %J Proceedings of the IEEE %V 54 %D December 1966 %P 1901-1909 %K maeder biblio: parallel architectures, %K grecommended91, %K j\-lb, %X The original paper which classified computer system into instruction and data stream dimensions (SISD, SIMD, etc.). %K btartar %A Geoffrey C. Fox %T Concurrent Processing for Scientific Calculations %J Digest of Papers COMPCON, Spring 84 %I IEEE %D Feb. 1984 %P 70-73 %r Hm62 %K super scientific computers, hypercube, Rcccp, %K grecommended91, %K jlh, dp, %K suggested supplemental ref by jh and dp %X An introduction the the current 64 PE Caltech hypercube. Based on the dissertation by Lang (Caltech 1982) on the `Homogeneous machine.' %A Samuel H. Fuller %A John K. Ousterhout %A Levy Raskin %A Paul I. Rubinfeld %A Pradeep J. Sindhu %A Richard J. Swan %T Multi-Microprocessors: An Overview and Working Example %J Proceedings of the IEEE %V 66 %N 2 %D February 1978 %P 216-228 %K multiprocessor architectures and operating systems %K bsatya %K grecommended91, %K jlh, dp, %X Reprinted in "Tutorial: Distributed Processing," IEEE, compiled by Burt H. Liebowitz and John H. Carson, 1981, 3rd edition. %X (JLH & DP) Cm* was the first multiprocessor based on microprocessor technology. Despite the limited success of the machine, it's impact and ideas are present in many machines being built today including the Encore Ultramax (Multimax) and Stanford's DASH. %A D. D. Gajski %A D. A. Padua %A D. J. Kuck %A R. H. Kuhn %T A Second Opinion on Data Flow Machines and Languages %J Computer %V 15 %N 2 %D Feb. 1982 %P 15-25 %K grecommended91, multiprocessing, %K bmiya, %K meb, jlh, dp, %X (SKS) or why I'm afraid people won't use FORTRAN. This paper should only be read (by beginners) in conjunction with a pro-dataflow paper for balance: maybe McGraw's "Physics Today" May 1984. Also reprinted in the text compiled by Kai Hwang: "Supercomputers: Design and Application," IEEE, 1984. Reproduced in "Selected Reprints on Dataflow and Reduction Architectures" ed. S. S. Thakkar, IEEE, 1987, pp. 165-176. %X * Due to their simplicity and strong appeal to intuition, data flow techniques attract a great deal of attention. Other alternatives, however, offer more hope for the future. %X (JLH & DP) The most complete critique of the dataflow approach. %A Daniel Gajski %A Jih-Kwon Peir %T Essential Issues in Multiprocessor Systems %J Computer %I IEEE %V 18 %N 6 %D June 1985 %P 9-27 %K parallel processor vector shared memory message passing tightly loosely coupled dataflow partitioning cedar csp occam hep synchronization grecommended91 %K ak, %X The performance of a multiprocessor system depends on how it handles the key problems of control, partitioning, scheduling, synchronization, and memory access. On a second look, this paper has some nice ideas. (rec added to %K). %X Examines actual and proposed machines from the viewpoint of the authors' key multiprocessing problems: control, partitioning, scheduling, synchronization, and memory access. %X Detailed classification scheme based upon control model of computation, partitioning, scheduling, synchronization, memory access. Classification is illustrated with many examples, including a summary table for to Cray-1, Arvind's Data flow, HEP, NYU Ultracomputer, and Cedar. Reproduced in "Computer Architecture," D. D. Gajski, V. M. Milutinovic, H. J. Siegel, and B. P. Furht, eds., IEEE, 1987, pp. 115-133. %A Narain Gehani %A Andrew D McGettrick, eds. %T Concurrent Programming %I Addison-Wesley %D 1988 %O ISBN 0-201-17435-9 %K book, text, language, %K grecommended91, %K ps, %A W. Morven Gentleman %T Message Passing between Sequential Processes the Reply Primitive and the Administrator Concept %J Software Practice and Experience %V 11 %N 5 %D 1981 %P 435-466 %K grecommended91, %K cc, %X The quinticential paper on how to program with message passing is the following. It is very readable, and provides an excellent introduction on how to use a message passing environment effectively. %A M. J. Gonzalez, Jr. %T Deterministic processor scheduling %J Computing Surveys %V 9 %N 3 %D September 1977 %P 173-204 %K scheduling %K bsatya %K grecommended91, %K dmp, ak, %X References are classified into various types (single, dual, multiple, and flow shop) at end of paper. %A Allen Gottlieb %A Ralph Grishman %A Clyde P. Krukal %A Kevin P. McAuliffe %A Larry Rudolph %A Marc Snir %T The NYU Ultracomputer -- Designing an MIMD Shared Memory Parallel Computer %J IEEE Transactions on Computers %V C-32 %N 2 %D February 1983 %P 175-190 %K multiprocessors, parallel processing, Ultracomputer, computer architecture, fetch-and-add, MIMD, Omega-network, parallel computer, shared memory, systolic queues, VLSI, %K grecommended91, %K jlh, dp, j\-lb, %X describes the design of the NYU Ultracomputer, its synchronization, the omega network. Analyzes the projected performance of the Ultracomputer. Reproduced in "Computer Architecture," D. D. Gajski, V. M. Milutinovic, H. J. Siegel, and B. P. Furht, eds., IEEE, 1987, pp. 471-485. %X This paper represents an architectural approach that uses shared memory and caches without cache-coherence. Several machines have explored this approach including IBM's RP3, the U. of Illinois CEDAR, and a number of early multiprocessors (e.g., C.mmp). %A R. L. Graham %T Bounds on multiprocessing anomalies and related packing algorithms %J Proc. AFIPS SJCC %V 40 %D 1972 %K theoretical results %K bsatya %K grecommended91, %K ak, %X %A John L. Gustafson %A Gary R. Montry %A Robert E. Benner %Z Sandia National Labs. %T Development of Parallel Methods for a 1024-Processor Hypercube %J SIAM Journal on Scientific and Statistical Computing %V 9 %N 4 %D July 1988 %K fluid dynamics, hypercubes, MIMD machines, multiprocessor performance, parallel computing, structural analysis, supercomputing, wave mechanics, grecommended91, %K jlh, dp, %X Introduces concept of operation efficiency, scaled speed-up. Also covers communication cost, beam strain analysis, and a bit on benchmarking. Winner of 1988 Bell and Karp Prizes. %X (JLH & DP) This paper report interesting results in using a large scale NCUBE. The authors won the Gordon Bell prize with their work. They also suggest the idea of problem scaling to overcome the limitations of sequential portions of an application. %A Robert H. Halstead, Jr. %T Parallel Symbolic Computing %J Computer %V 19 %N 8 %P 35-43 %D August 1986 %K Special issue: domesticating parallelism, grecommended91, LISP, %K hcc, %A Michael Hennessey %T Algebraic Theory of Processes %I MIT Press %D 1988 %K grecommended91, %K fpst, %X An attempt to look at problem of concurrency from process/algebra view. %A W. Daniel Hillis %T The Connection Machine %I MIT Press %C Cambridge, MA %D 1985 %K book, grecommended91, PhD thesis, %K j\-lb, %X Has a chapter on why computer science is no good. %X Patent 4,709,327, Connection Machine, 24 Nov 87 (individuals) "Parallel Processor / Memory Circuit", W. Daniel Hillis et al. This looks like the meat of the connection machine design. It probably has lots of stuff that up til the patent was considered proprietary. %A C. A. R. Hoare %T Communicating Sequential Processes %J Communications of the ACM %V 21 %N 8 %P 666-677 %D August 1978 %K bhibbard %K grecommended91, %K hcc, ak, %K programming, programming languages, programming primitives, program structures, parallel programming, concurrency, input, output, guarded commands, nondeterminacy, coroutines, procedures, multiple entries, multiple exits, classes, data representations, recursion, conditional critical regions, monitors, iterative arrays, CSP, CR categories: 4.20, 4.22, 4.32 maeder biblio: synchronisation and concurrency in processes, parallel programming, %X This paper is now expanded into an excellent book detailed by Hoare and published by Prentice-Hall. This paper is reproduced in Kuhn and Padua's (1981, IEEE) survey "Tutorial on Parallel Processing." Reproduced in "Distributed Computing: Concepts and Implementations" edited by McEntire, O'Reilly and Larson, IEEE, 1984. %X Somewhat dated. %A C. A. R. Hoare %T Communicating Sequential Processes %I Prentice-Hall %C Englewood Cliffs, NJ %D 1985 %O ISBN 0-13-153271-5 & 0-13-153289-8 %K CSP, %K grecommended91, %K hcc, fpst, jb, %X A better book than the original CSP papers. Hoare comes down to earth and tries to give concrete examples of CSP notation. Still has some problems. %X Somewhat esoteric. %X Must reading for those interested in distributed processing. High level discussions of various operators one might think about in the message passing realm. Discusses failures. %X This defines CSP, upon which occam is based. REAL parallelism here! Very theoretical. Must read for serious parallism students! %A R. W. Hockney %A C. R. Jesshope %T Parallel Computers: 2 Architecture, Programming, and Algorithms, 2nd ed. %C Pennsylvania %I IOP Publishing Ltd. %D 1988 %O ISBN 0-85274-811-6 %K book, text, %K grecommended91, %K meb, %X World's most expensive paperback; matched only by Hockney's book on particle simulations; both worth the price. %X This book tends to the more mainframe oriented machines and bigger supercomputers. Killer micros get a little coverage (CM). Shows its UK bias (DAP). %A R. Michael Hord %T The ILLIAC IV: The First Supercomputer %I Computer Science Press %D 1982 %K grecommended91, book, text, %K bmiya, Rhighnam %K jlh, dp, %K suggested supplemental reference by jh and dp %K analysis, algorithm, architecture, ILLIAC-IV, SIMD, software, survey %O 510.7809 %X A collection of papers dealing with the ILLIAC IV. These papers include reminisces and applications on the ILLIAC. It is slightly apologetic in tone. %X Describes in detail the background of the Illiac IV, programming and software tools, and applications. The chapters are a little disjointed, and the instruction set is not well explained or motivated. %A Paul Hudak %T Para-Functional Programming %J Computer %V 19 %N 8 %P 60-69 %D August 1986 %K Special issue: domesticating parallelism %K grecommended91, %K hcc, %A Paul Hudak %Z Yale %T Conception, Evolution, and Application of Functional Programming Languages %J ACM Computing Surveys %V 21 %N 3 %D September 1989 %P 359-411 %K special issue on programming language paradigms, %K Categories and Subject Descriptors: D.1.1 [Programming Techniques]: Applicative (Functional) Programming; D.3.2 [Programming Languages]: Language classifications - applicative languages; data-flow languages; non-procedural languages; very-high-level languages; F.4.1 [Mathematical Logic and Formal Languages]: Mathematical Logic - lambda calculus and related systems; K.2 [History of Computing]: software General Terms: Languages, Additional Key Words and Phrases: Data abstraction, higher-order functions, lazy evaluation, referential transparency, types, Lambda Calculus, Lisp, Iswim, APL, FP, FL, ML, SASL, KRC, Miranda, Haskell, Hope, denotative [declarative] language, %K grecommended91, %K ag, %X This is the second paper in the special issue which has a section on non-determinism [along with Bal, et al] which begins with a statement which would sound bizarre to non-programmers or those not familiar with the issues of determinacy. %A Kai Hwang %A Faye A. Briggs %T Computer Architecture and Parallel Processing %I McGraw-Hill %C New York, NY %D 1984 %O ISBN 0-07-031556-6 %K grecommended91, book, text, %K Rhighnam, analysis, architecture, survey %K meb, jlh, dp, %X This text is quite weighty. It covers much about the interconnection problem. It's a bit weak on software and algorithms. Dated. %X Extensive survey with large bibliography. Lots of details. %X (JLH & DP) Covers a wide variety of subjects including sections on interconnection networks, special-purpose machines, dataflow, and programming and applications issues for parallel processing. %X A good book on the theory of high-level design. Hwang is at USC and is interested in supercomputing, and the book reflects that, though cost issues occassionally come in. They do take a sort of narrow view of SIMD machines, seeing them mainly as vector processors. It seems to be strong on pipelining, which is an important topic in microprocessors these days. It does occassionally reach down to the gate level for such matters as HW implementation of cache replacement policies. It doesn't cover such issues as instruction set design, which I'm interested in, but other than that most of its flaws are that it's already five years old. Work on a second edition has reportedly begun, but it's likely to be a while before it's out. %A Robert G. Babb, II, ed. %T Programming Parallel Processors %I Addison-Wesley %C Reading, MA %D 1988 %K book, text, software, Alliant FX/8, BBN Butterfly, Cray X-MP, FPS T-Series, IBM 3090, Loral LDF-100, Intel iPSC, hypercube, shared memory, message passing, Sequent Balance, grecommended91, %K dwf, enm, %X Reviewed, IEEE Computer, by T. DeMarco, April 1988, pp. 141. Good quote in review: If juggling three balls is of difficulty 3 on a scale of 1 to 10, then juggling four balls is a difficulty of 20. -- from a juggler's handbook. This book is a compilation of experiences by grad students (and undergrads) on a diversity of machines. I suspect the monogram format will become popular as a publishing vehicle on this topic for the future since comparisons will be difficult. Over half of the book consists of appendices of source code or commentary. I like the way certain significant things are encircled by boxes: . . . WANTED for Hazardous Journey. Small wages, bitter cold, long month %A M. Kallstrom %A S. S. Thakkar %T Programming Three Parallel Computers %K Comparison for Traveling Salesman Problem, C/Intel iPSC, Occam/transputer, C/Balance 8000 %D January 1988 %J IEEE Software %V 5 %N 1 %P 11-22 %K grecommended91, %K jb, %X See also earlier compcon paper. %X Nice low-level introduction to comparing parallel machines and portability. Discusses three approaches to parallel programming. Great for students learning parallel programming. Includes transputers, Sequent Balance, and iPSC. %A Alan H. Karp %Z IBM Palo Alto Scientific Center %T Programming for Parallelism %J Computer %V 20 %N 5 %D May 1987 %P 43-57 %r G320-3490 %d June 1986 %K grecommended91, %K meb, %X Describes simple fork-join type constructs to be added to FORTRAN. Taxonomy, shared memory systems. An okay survey of the problems. Leaves out certain key issues: deadlock, atomicity, exception handling, debugging, etc. %A William A. Kornfeld %A Carl E. Hewitt %T The Scientific Community Metaphor %J IEEE Transactions on Systems, Man, and Cybernetics %V SMC-11 %N 1 %D January 1981 %P 24-33 %K ai, distributed problem solving %K grecommended91, %K hcc, %A D. J. Kuck %A R. H. Kuhn %A D. A. Padua %A B. Leasure %A M. Wolfe %T Dependence Graphs and Compiler Optimization %J Proceedings of the Eighth Symposium on the Principles of Programming Languages (POPL) %D January 1981 %P 207-218 %K Rdf, parallelization, %K grecommended91, %K j\-lb, %A David J. Kuck %T ILLIAC IV Software and Application Programming %J IEEE Transactions on Computers %V C-17 %N 8 %P 758-770 %D August 1968 %K bhibbard %K applications of array computer, array computer, array language, compiler, operating system, %K Rhighnam, language, TRANQUIL, %K grecommended91, %K jlh, dp, %X The early proposals for system software for the 256 PE ILLIAC IV. Examples are given. %X Contains a rationale for the ``TRANQUIL'' language. %X (JLH & DP) Kuck's paper discusses the software and programming strategies for ILLIAC. Hord's book has more material about software, as well as some discussion of experience in using the machine. %A David J. Kuck %A Edward S. Davidson %A Duncan H. Lawrie %A Ahmed H. Sameh %T Parallel Supercomputing Today and the Cedar Approach %J Science %V 231 %N 4741 %D February 28, 1986 %P 967-974 %K paracompiler, %K grecommended91, %K ab, %X Cedar uses a "paracompiler" to parallelize do-loops. A more recent paper appears in Dongarra's Experimental Parallel Computing Architectures book. Photos. %A H. T. Kung %T Why Systolic Architectures? %J IEEE Computer %V 15 %N 1 %D January 1982 %P 37-46 %K Rhighnam, analysis, architecture, %K j\-lb, %K grecommended91, multiprocessors, parallel processing, systolic arrays, VLSI, %K bmiya, %X * Systolic architectures, which permit multiple computations for each memory access, can speed execution of compute-bound problems without increasing I/O requirements. reconfigured to suit new computational structures; however, this capability places new demands on efficient architecture use. Note: Kung also has a machine readable bibliography in Scribe format which is also distributed with the MP biblio on tape, best to request from Kung on the CMU `sam' machine. Reproduced in Dharma P. Agrawal's (ed.) "Advanced Computer Architecture," IEEE, 1986, pp. 300-309. %X In order to achieve the simplicity and density needed for effective VLSI design, Kung's strategy is to optimize processor number, interconnection topology and I/O structures for particular points in his space of parallel algorithms. He defines a family of systolic designs for computing various forms of the convolution computation. This is a family of computations each member of which generates a sequence of values formed by taking a sum of products of values of corresponding elements in two other sequences, according to some indexing scheme. In this paper Kung also gives examples of some of the ways the movement of data could be organized: (1) Should vector elements be pre-loaded into Processing Elements (PEs)? (2) Which way should data move? Note that this is a two dimensional pipelining strategy, so that the choice of data flow direction has much more freedom than with simple linear pipelines. Some of the organizations that Kung uses are: square, hexagonal, and triangular arrays. Among these schemes, the relative direction of data flow of the two input vectors is another design parameter. (3) Should information be broadcasted or shifted through the network? (4) Which vectors should shift through the PEs? Which should remain stationary in the PEs? Should vector entries come in a temporally interleaved fashion, and if so, at what relative rates? %X Each member of this family of architectures has a particular interprocessor communication structure that matches the flow of data required by the underlying algorithms. It is a wise choice to match this flow with particular algorithms in mind; previous attempts at multiprocessor parallelism have met with the problem of interprocessor communication being a bottleneck. %X There are many considerations in chosing the design parameters of a systolic architecture. Probably the major factor is that it is highly desirable to match the speed of processing to the available I/O bandwidth. One way to accomplish this goal is to make multiple use of each input data item. This is done by either using broadcasting with unlimited fan-in or by re-using each value of a vector at each stage of a pipeline. Since it is usually not possible to accurately estimate available I/O bandwidth in a complex system, the hope is to make the system modular to allow for adjustments to this ratio. %X A surprising number of applications have been found where systolic algorithms and architectures lead to effective, highly parallel computing systems. Among these are applications in signal and image processing, matrix arithmetic, and non-numeric applications. %A Leslie Lamport %T Time, Clocks, and the Ordering of Events in a Distributed System %J Communications of the ACM %V 21 %N 7 %D July 1978 %P 558-565 %K distributed systems, computer networks, clock synchronization, multiprocess systems, grecommended91 CR categories: 4.32, 5.29 distributed processing computer networks multiprocessing programs ordering of events distributed system synchronising total ordering clocks computer networks multiprocessing %K bsatya %K enm, dmp, jw, %O 4 Refs. treatment: theoretical %X classic paper on logical clocks. %X A classic paper on synchronization. Reproduced in "Distributed Computing: Concepts and Implementations" edited by McEntire, O'Reilly and Larson, IEEE, 1984. %X The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronising a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronisation problems. The algorithm is then specialised for synchronising physical clocks, and a bound is derived on how far out of synchrony the clocks can become. %A Duncan H. Lawrie %T Access and Alignment of Data in an Array Processor %J IEEE Trans. on Computers %V C-24 %N 12 %D Dec. 1975 %P 1145-1155 %K Alignment network, array processor, array storage, conflict-free access, data alignment, indexing network, omega network, parallel processing, permutation network, shuffle-exchange network, storage mapping, switching network grecommended91, U Ill, N log N nets, Ginsberg biblio: %K bmiya, %K j\-lb, %X This paper is reproduced in Kuhn and Padua's (1981, IEEE) survey "Tutorial on Parallel Processing." Reproduced in the 1984 tutorial: "Interconnection Networks for parallel and distributed processing" by Wu and Feng. %A F. C. H. Lin %A R. M. Keller %T The Gradient Model Load Balancing Method %J IEEE Transactions on Software Engineering %V SE-13 %N 1 %D January 1987 %P 32-38 %K Special Issue on Distributed Systems, Applicative systems, computer architecture, data flow, distributed systems, load balancing, multiprocessor systems, reduction architecture %K grecommended91, %K dmp, (+1), %A Mamoru Maekawa %A Arthur E. Oldehoeft %A Rodney R. Oldehoeft %T Operating system: advanced concepts %C Menlo Park, CA %I The Benjamin/Cummings Publishing Company, Inc. %D 1986 %K book, text, %K grecommended91, %K fpst, %X Excellent reference for a variety of subjects. Discusses parallel and distributed systems from several viewpoints. Includes some good discussions on distributed databases. Readable for computer scientists; others might have to review some operating system terminology %A Frank H. McMahon %T The Livermore Kernels: A Computer Test of the Numerical Performance Range %R UCRL-53745 %I LLNL %C Livermore, CA %D December 1986 %K Livermore loops, %K grecommended91, %K meb, %X See also J. Martin's book on Supercomputer Performance Evaluation. This report has more detail and raw data. %A Robin Milner %T A Calculus of Communicating Systems %S Lecture Notes in Computer Science %V 92 %I Springer-Verlag %C Berlin %D 1980 %K grecommended91, %K fpst, %K CCS parallel process communication theory equivalence congruence %O (LA has) %X Also see S-V's LNCS Vol. 158 for paper of same title and author, 1983. %X Classical text, accessible with little computer science background. %A Philip Arne Nelson %T Parallel Programming Paradigms %I Computer Science Department, University of Washington %C Seattle, WA %R Technical Report 87-07-02 %D July 1987 %K Ph.D. Dissertation %K grecommended91, %K dmp, %A James M. Ortega %Z U VA. %T Introduction to Parallel and Vector Solution of Linear Systems %S Frontiers of Computer Science %I Plenum Press %D New York %D 1988 %K book, text, %K grecommended91, %K dwf, %A David A. Padua %A Michael J. Wolfe %Z CSRD, U. Ill. %T Advanced Compiler Optimization for Supercomputers %J Communications of the ACM %V 29 %N 12 %D December 1986 %P 1184-1201 %K Special issue on parallel processing, CR Categories and Subject Descriptors: C.1.2 [Processor Architectures]: Multiple Data Stream Architectures (Multiprocessors) - array and vector processors; multiple-instruction-stream, multiple-data-stream processors (MIMD); parallel processors; pipeline processors; single-instruction-stream, multiple-data-stream processors (SIMD); D.2.7 [Software Engineering]: Distribution and Maintenance - restructuring; D.3.3 [Programming Languages] Language Constructs - concurrent programming structures; D.3.4 [Programming Languages] Processors - code generation; compilers; optimization; preprocessors; General Terms: Languages, performance %K RCedar, %K grecommended91, %K ak, %A R. H. Perrott %T Parallel Programming %S International Computer Science Series %I Addison-Wesley %C Menlo Park %D 1987 %O ISBN 0201 14231 7 %$ 27 %K Book, text, Pascal Plus, communication, Modula-2, Ada, Occam, mutual exclusion, synchronization, CFT, CFD, Cyber Fortran, FTN, distributed array processor, DAP, mesh, ACTUS, Cray, dataflow, message passing, %K grecommended91, %K ps, %X An earlier edition of this book was published in 1985. Book does not cover hypercubes. %A Constantine D. Polychronopoulos %T Parallel Programming and Compilers %I Kluwer Academic Publishers %C Boston %D 1988 %K book, text, %K grecommended91, %K lls, %A T. W. Pratt %T Programming Languages: Design and Implementation %C Englewood Cliffs, NJ %I Prentice-Hall %D 1984 %K book, text, %K grecommended91, %K fpst, %X A good book for those without a background in programming languages. Primarily discusses uniprocessor languages; gives examples of many different languages. Help non programming language researchers understand the issues. %A Michael J. Quinn %T Designing Efficient Algorithms for Parallel Computers %I McGraw Hill %D 1987 %K Book, text %K grecommended91, %K dgg, fpst, %X Have used in classes. Readable. %A Richard M. Russell %T The Cray-1 Computer System %J Communications of the ACM %V 21 %N 1 %P 63-72 %D January 1978 %K grecommended91, existing classic architecture, maeder biblio: parallel hardware and devices, implementation, ginsberg biblio: %K bhibbard %K enm, j\-lb, %X The original paper describing the Cray-1. This paper is reproduced in Kuhn and Padua's (1981, IEEE) survey "Tutorial on Parallel Processing." Also reproduced in "Computer Structures: Principles and Examples" by Daniel P. Siewiorek, C. Gordon Bell, and Allen Newell, McGraw-Hill, 1982, pp. 743-752. Reproduced in Dharma P. Agrawal's (ed.) "Advanced Computer Architecture," IEEE, 1986, pp.15-24. %X Literature search yields: 00712248 E.I. Monthly No: EI7804023850 E.I. Yearly No: EI78014612 Title: Cray-1 Computer System. Author: Russell, Richard M. Corporate Source: Cray Res Inc, Minneapolis, Minn Source: Communications of the ACM v 21 n 1 Jan 1978 p 63-72 Publication Year: 1978 CODEN: CACMA2 ISSN: 0001-0782 Language: ENGLISH Journal Announcement: 7804 Abstract: The CRAY-1 is described, the evolution of its architecture is discussed, and an account is given of some of the problems that were overcome during its manufacture. The CRAY-1 is the only computer to have been built to date that satisfies ERDA's Class VI requirement (a computer capable of processing from 20 to 60 million floating point operations per second). The CRAY-1's Fortran compiler (CFT) is designed to give the scientific user immediate access to the benefits of the CRAY-1's vector processing architecture. An optimizing compiler, CFT, " vectorizes " innermost DO loops. Compatible with the ANSI 1966 Fortran Standard and with many commonly supported Fortran extensions, CFT does not require any source program modifications or the use of additional nonstandard Fortran statements to achieve vectorization. 6 refs. Descriptors: *COMPUTER ARCHITECTURE; COMPUTER SYSTEMS, DIGITAL Classification Codes: 722 (Computer Hardware); 723 (Computer Software) 72 (COMPUTERS & DATA PROCESSING) %A Howard J. Siegel %T Interconnection Networks and Masking for Single Instream Multiple Data Stream Machines %R PhD Dissertation %I Princeton University %D May 1977 %O 171 pages %K grecommended91, %K ag, %X I don't reference it much since it is not as easy to get. But it is a Good Thing to read. %A Lawrence Snyder %A Leah H. Jamieson %A Dennis B. Gannon %A Howard Jay Siegel %T Algorithmically Specialized Computers %I Academic Press %C Orlando, FL %D 1985 %K Rhighnam, analysis, architecture, survey, VLSI, book, text, %K grecommended91, %K meb, %O 510.7862 A394 %X Proceedings from the Purdue NSF Workshop on Algorithmically Specialized Computers. %A Harold S. Stone %T Introduction to Computer Architecture %I Addison-Wesley %D 1987 %K bhibbard, book, text, %K grecommended91, %K ak, %X First entry was dated 1975. This is the third edition. SRA was old publisher. %X Has a couple of very good chapters on multiprocessors/multiprocessing. %A Quinton F. Stout %A Russ Miller %T Parallel Algorithms for Regular Architectures %I MIT Press %D 1988 %K grecommended91, %K fpst, %X No direct experience, but I've heard good things. %A Richard J. Swan %A S. H. Fuller %A Daniel P. Siewiorek %T Cm* -- A Modular, Multi-Microprocessor %J Proceedings AFIPS National Computer Conference %I AFIPS Press %V 46 %D 1977 %P 637-644 %K CMU, grecommended91 %K btartar %K jlh, dp, %X This paper is reproduced in Kuhn and Padua's (1981, IEEE) survey "Tutorial on Parallel Processing." %X (JLH & DP) Cm* was the first multiprocessor based on microprocessor technology. Despite the limited success of the machine, it's impact and ideas are present in many machines being built today including the Encore Ultramax (Multimax) and Stanford's DASH. %X Literature search yields: 00719649 E.I. Monthly No: EI7806040291 E.I. Yearly No: EI78016355 Title: Cm* -- A Modular, Multi-Microprocessor. Author: Swan, R. J.; Fuller, S. H.; Siewiorek, D. P. Corporate Source: Carnegie-Mellon Univ, Pittsburgh, Pa Source: AFIPS Conference Proceedings v 46 1977, Dallas, Tex, Jun 13-16 1977. Publ by AFIPS Press, Montvale, NJ, 1977 p 637-644 Publication Year: 1977 CODEN: AFPGBT ISSN: 0095-6880 Language: ENGLISH Journal Announcement: 7806 Abstract: The paper describes the architecture of a new large multiprocessor computer system being built at Carnegie-Mellon University. The system allows close cooperation between large numbers of inexpensive processors. All processors share access to a single virtual memory address space. There are no arbitrary limits on the number of processors, amount of memory or communication bandwidth in the system. Considerable support is provided for low-level operating system primitives and inter-process communication. 18 refs. Descriptors: *COMPUTERS, MICROPROCESSOR Classification Codes: 721 (Computer Circuits & Logic Elements); 722 (Computer Hardware) 72 (COMPUTERS & DATA PROCESSING) %A Shreekant S. Thakkar %A Mark Sweiger %T Performance of an OLTP Application on the Symmetry Multiprocessor System %J Proc. 17th Annual Symposium on Computer Architecture, Computer Architecture News %I ACM %V 18 %N 2 %D June 1990 %P 228-238 %K Applications, Sequent, %K grecommended91, %K jlh, dp, %X (JLH & DP) One of the few paper evaluating a nonscientific application running on a multiprocessor. %Q Thinking Machines Corporation %T Connection Machine Model CM-2 technical summary %R Technical Report HA87-4 %C Boston, MA %D April 1987 %K grecommended91, hardware architecture, %K Rhighnam, Connection Machine %K jlh, dp, %X (JLH & DP) Another architecture reference for the CM-2 is Tucker, L. and G. Robertson [1988]. "Architecture and Applications of the Connection Machine," Computer, vol. 21, no. 8 (August), pages 26-38. %A Philip C. Treleaven %A David R. Brownbridge %A Richard P. Hopkins %T Data-Driven and Demand-Driven Computer Architecture %J Computing Surveys %V 14 %N 1 %D March 1982 %P 93-143 %K grecommended91, CR Categories and Subject Descriptors: C.0 [Computer System Organization]: General - hardware/software interfaces; system architectures; C.1.2 [Processor Architecture]: Multiple Data Stream Architectures (Multiprocessors); C.1.3 [Processor Architecture]: Other Architecture Styles - data flow architectures; high level language architectures; D.3.2 [Programming Languages]: Language Classifications - data-flow languages; macro and assembly languages; very high-level languages General Terms: Design Additional Key Words and Phrases: Demand = driven architecture, data = driven architecture %K Rdf, enm, dmp, ak, %X * The aim of this paper is to identify the concepts and relationships that exist both within and between the two areas of research of data-driven and demand-driven architectures. Reproduced in "Selected Reprints on Dataflow and Reduction Architectures" ed. S. S. Thakkar, IEEE, 1987, pp. 4-54. %A Robert Voigt %T Where are the Parallel Algorithms %I NASA Langley Research Center %R ICASE Report No. 85-2 %D 1985 %K grecommended91, %K ab, %A Chuan-Lin Wu %A Tse-Yun Feng %T On a Class of Multistage Interconnection Networks %J IEEE Transactions on Computers %V C-29 %N 8 %D August 1980 %P 694-702 %K Array processing, computer architecture, conflict resolution, interconnection networks, MIMD machine, multiple-processor systems, network configurations, parallel processing, routing techniques, SIMD machine %K Rhighnam, analysis, %K grecommended91, %K ag, %X Reproduced in the 1984 tutorial: "Interconnection Networks for parallel and distributed processing" by Wu and Feng. Also reprinted in the text compiled by Kai Hwang: "Supercomputers: Design and Application," IEEE, 1984. %A William A. Wulf %A Roy Levin %A Samuel P. Harbison %T HYDRA/C.mmp: An Experimental Computer System %I McGraw-Hill %D 1981 %K grecommended91, CMU, C.mmp, HYDRA OS, multiprocessor architecture and operating systems %K bsatya, book, text, %K enm, ag, %X * Describes the architecture of C.mmp, and details the goals, design, and performance of HYDRA, its capability based OS. %A Hans P. Zima %A Barbara Chapman %T Supercompilers for Parallel and Vector Computers %I Addison-Wesley (ACM) %D 1990 %O ISBN 0-201-17560-6 %K book, text, %K grecommended91, %K cb@uk, %$39.95 %X Ken Kennedy, in the intro, describes this as the first satisfactory textbook on dependence analysis and its application to vectorization and parallelization. Good bibliography. %X Contents (chapter headings):- 1. Supercomputers and Supercompilers - general stuf, architecture, applications, early work... 2. Supercomputer Architectures - vector and parallel systems 3. Scalar Analysis - control and data flow, monotone data flow systems, use-definition chains, dominance relation, simple optimizations,... 4. Data Dependence - basic concepts and definition, dependence testing, algorithms for DD testing. 5. Standard Transformations - DO-loop normalization, subscript normalization, scalar renaming. 6. Vectorization - fundamental transformations, concurrency in loops, vector code-generation, loop transformations, control dependence and vectorization, procedure calls. 7. Parallelization - parallel code generation for DOALL, DOACROSS loops, shared memory parallelization, distributed memory parallelization. 8. Supercompilers and their Environments - case studies of 'supercompilers', and issues in their environments. A. Tarjan's algorithm B. The Banerjee Test C. Mathematical Notation %X Each chapter has a series of bibliographic notes and there is a large bibliography which is very helpful. The book is quite mathematical in style, making familiarity with set notation helpful. I am most interested in tools to aid in automatic parallelization, particularly for distributed memory machines, and have therefore read more in the chapters more relevant to this area - in particular I haven't read chapters 5 or 6 at all. Chapters 1 and 2 are fairly standard summaries. Chapter 3 (53 pages) provides a fairly weighty intro to scalar analysis. Chapter 4 (62 pages): a thorough introduction, including direction vectors, gcd, Banerjee and separability tests, etc. Personally I found the book form of Wolfe's thesis (Optimizing Supercompilers for Supercomputers) clearer as a newcomer to the field, but everything is covered here. Chapter 7 (56 pages) is biased towards shared memory systems, as that is where most of the exisiting work has been done. The style is clear, and less mathematical in formulation than earlier chapters. Definitions of shared, local, reduction, shared ordered, etc. variables are clear. For the distributed memory case an SPMD model is assumed, and all discussion is in terms of this model. Chapter 8 (20 pages) gives case studies on Parafrase (Illinois), PFC (Rice) and SUPERB (Suprenum) as well as looking at the place of parallelization tools within an integrated development environment for high-performance computing. Overall - good summary of the field, somewhat mathematical, and as Ken Kennedy writes in the Foreword: "...no satisfactory textbook on dependence analysis and its applications to vectorization and parallelization has yet emerged. Given the importance of the subject, this is quite surprising. -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell