larry@bbn.com (Larry Denenberg) (11/01/90)
Data Structures and Their Algorithms by Harry R. Lewis and Larry Denenberg HarperCollins Publishers, Glenview, IL, 1991. (The book should be available in late December, 1990.) This book contains an introductory but rigorous treatment of the data structures that are at the foundation of software engineering. It is not exclusively either a programming book or a book on mathematical theory; it aims to show students at the sophomore level how these subjects fit together. Chapters 1 and 2 contain general motivation and illustrative examples, as well as a development of the necessary mathematical preliminaries. Chapters 3, 4, and 5 discuss the basic data structures: lists, stacks, queues, trees and their traversals, arrays, and strings. Chapters 6 and 7 discuss the use of lists and trees for storage and retrieval of data, including binary and interpolation search, skip lists, construction of static and dynamic binary search trees, AVL trees, 2-3 trees, B-trees, and self-adjusting trees (also known as "splay trees"). Chapter 8 is on tries and their relatives as search structures and hash tables of various kinds. Chapter 9 is on priority queue structures, including heaps and leftist trees, and solutions to the union-find problem and range searching problems. Chapter 10 is on memory management problems, including garbage collection and boundary-tag and buddy systems. Chapter 11 is a systematic presentation of the most important sorting algorithms for internal and external memory. And Chapter 12 is on graphs and graph algorithms, especially network flow. Chapter 13 contains a unique set of open-ended, exploratory exercises. Each is a complete software design project amenable to many possible solutions, each of which involves a combination of techniques and concepts from different sections of the text. A few simple principles have governed our choice of topics. First, we have chosen only practically useful techniques. We omit treatment of some theoretically excellent algorithms that are not practical for data sets of reasonable size. Second, we have included both classical and recently discovered methods, relying on inherent simplicity, wide applicability, and potential usefulness as the criteria for inclusion rather than any preconceived exhaustive catalogue. For example, Chapter 6 includes both the classical algorithm for construction of optimal binary search trees on static data, and the newer skip list structures for dynamic data. In other chapters there are sections on splay trees, extendible hashing, grid files, and other elegant recently developed methods. Third, we have included an analysis of almost every method we describe. One of our major objectives has been to present analyses that are relatively brief and nontechnical but illuminate the important performance characteristics of the algorithms. We omit unnecessary syntactic detail from the presentations. Our subject matter is algorithms, not the expression of algorithms in the syntax of particular programming languages, so we have adopted a pseudocode notation that is readily understandable to programmers but has a simple syntax. It is assumed that the reader will have had a first course in computer programming in a language like Pascal or C, and will therefore be able to translate our pseudocode into such a language without difficulty. In the same way, we give detailed analyses of the algorithms, but avoid mathematical techniques that are likely to be inaccessible to college sophomores. Logarithms, exponentials, and sums of geometric series play a central role in many analyses, so we give some elementary examples of these topics in Chapter 1. Naive probabilistic reasoning is also essential, and the book has a self-contained introduction. On the other hand the differential calculus is used in only a few spots (the integral calculus not at all), and precalculus readers can simply skip to the conclusion of those arguments. Each chapter ends with problems and references. The problems are split up into sections that correspond to the main sections of the text of that chapter. Within those sections the problems range from straightforward simulations of the algorithms on small data sets, to requests for completion of arguments whose details were omitted in the text, to the design and analysis of new or extended data structures and algorithms. The problems are extensive and number more than 450 in all. The references cite publications that are of historical significance or present good summaries of a particular set of topics. In general terms the approach is to treat data structures as defined by the set of abstract operations that they provide. When considering a data structure the book typically gives a brief intuitive discussion, then gives concrete implementations (in pseudocode) and analyzes them in terms of time and space efficiency. Worst-case, expected-case, and amortized analyses are all used. The book covers the material of course CS7 of the ACM curriculum, as well as part of the prerequisite course CS5. Further information is available from Don Childress at HarperCollins, 1900 East Lake Avenue, Glenview IL 60025, (708) 729-3000 extension 2170, or from the authors, lewis@harvard.edu or larry@harvard.edu. TABLE OF CONTENTS Preface 1. Introduction Programming as an Engineering Activity Computer Science Background Memory and Data in Von Neumann Computers Notation for Programs Locatives Abstract Data Types Mathematical Background Finite and Infinite Series Logarithms, Powers, and Exponentials Order Notation Recurrence Relations Naive Probability Theory 2. Algorithm Analysis Properties of an Algorithm Effectiveness Correctness Termination Efficiency Program Complexity Exact vs. Growth-Rate Analysis Principles of Mathematical Analysis Expected-Case and Amortized Analysis Algorithm Paradigms Brute-Force and Exhaustive Search Greedy Algorithms Dynamic Programming NP-Completeness 3. Lists List Operations Basic List Representations Stack Representation in Contiguous Memory Queue Representation in Contiguous Memory Stack Representation in Linked Memory Queue Representation in Linked Memory Stacks and Recursion List Representations for Traversals Doubly Linked Lists 4. Trees Basic Definitions Special Kinds of Trees Tree Operations and Traversals Tree Implementations Representation of Binary Trees Representation of Ordered Trees Representation of Complete Binary Trees Implementing Tree Traversals and Scans Stack-Based Traversals Link-Inversion Traversal Scanning a Tree in Constant Space Threaded Trees Implementing Level-Order Traversal Summary 5. Arrays and Strings Arrays as Abstract Data Types Multidimensional Arrays Contiguous Representation of Arrays Constant-Time Initialization Sparse Arrays List Representations Hierarchical Tables Arrays with Special Shapes Representations of Strings Huffman Encoding Lempel-Ziv Encoding String Searching The Knuth-Morris-Pratt Algorithm The Boyer-Moore Algorithm Fingerprinting and the Karp-Rabin Algorithm 6. List and Tree Implementations of Sets Sets and Dictionaries as Abstract Data Types Unordered Lists Ordered Lists Binary Search Interpolation Search Skip Lists Binary Search Trees Insertion Deletion Static Binary Search Trees Optimal Trees Probability-Balanced Trees Median Split Trees 7. Tree Structures for Dynamic Dictionaries AVL Trees Insertion Deletion 2-3 Trees and B-Trees 2-3 Trees Red-Black Trees (a,b)-Trees and B-Trees Self-Adjusting Binary Search Trees 8. Sets of Digital Data Bit Vectors Tries and Digital Search Trees Hashing Techniques Chaining Strategies Open Addressing Strategies Deletions Extendible Hashing Hashing Functions Hashing by Division Hashing by Multiplication Perfect Hashing of Static Data Universal Classes of Hash Functions 9. Sets with Special Operations Priority Queues Balanced Tree Implementations Heaps Leftist Trees Disjoint Sets with Union Up-Trees Path Compression Range Searching k-d-Trees for Multidimensional Searching Quad Trees Grid Files 10. Memory Management The Problem of Memory Management Records of a Single Size Reference Counts Mark and Sweep Garbage Collection Collecting by Copying Final Cautions on Garbage Collection Compaction of Records of Various Sizes Managing a Pool of Blocks of Various Sizes Allocation Strategies Data Structures for Freeing Buddy Systems 11. Sorting Kinds of Sorting Algorithms Insertion and Shell Sort Selection and Heap Sort Quick Sort The Information-Theoretic Lower Bound Digital Sorting Bucket Sort Radix Sort Radix Exchange Sort External Sorting Merge Sorts Polyphase Merge Sort Generating the Initial Runs Finding the Median 12. Graphs Graphs and Their Representations Trees Graph Searching Breadth-First Search Depth-First Search Greedy Algorithms on Graphs Minimum Spanning Trees Single-Source Least-Cost Paths All Pairs Least-Cost Paths Network Flow Finding Maximum Flows Implementing the Max Flow Algorithm Applications of Max Flow 13. Engineering with Data Structures A. Locatives