larry@bbn.com (Larry Denenberg) (11/01/90)
Data Structures and Their Algorithms
by Harry R. Lewis and Larry Denenberg
HarperCollins Publishers, Glenview, IL, 1991. (The book should be available in
late December, 1990.)
This book contains an introductory but rigorous treatment of the data
structures that are at the foundation of software engineering. It is not
exclusively either a programming book or a book on mathematical theory; it
aims to show students at the sophomore level how these subjects fit together.
Chapters 1 and 2 contain general motivation and illustrative examples, as
well as a development of the necessary mathematical preliminaries. Chapters
3, 4, and 5 discuss the basic data structures: lists, stacks, queues, trees
and their traversals, arrays, and strings. Chapters 6 and 7 discuss the use
of lists and trees for storage and retrieval of data, including binary and
interpolation search, skip lists, construction of static and dynamic binary
search trees, AVL trees, 2-3 trees, B-trees, and self-adjusting trees (also
known as "splay trees"). Chapter 8 is on tries and their relatives as search
structures and hash tables of various kinds. Chapter 9 is on priority queue
structures, including heaps and leftist trees, and solutions to the
union-find problem and range searching problems. Chapter 10 is on memory
management problems, including garbage collection and boundary-tag and buddy
systems. Chapter 11 is a systematic presentation of the most important
sorting algorithms for internal and external memory. And Chapter 12 is on
graphs and graph algorithms, especially network flow. Chapter 13 contains a
unique set of open-ended, exploratory exercises. Each is a complete software
design project amenable to many possible solutions, each of which involves a
combination of techniques and concepts from different sections of the text.
A few simple principles have governed our choice of topics. First, we have
chosen only practically useful techniques. We omit treatment of some
theoretically excellent algorithms that are not practical for data sets of
reasonable size. Second, we have included both classical and recently
discovered methods, relying on inherent simplicity, wide applicability, and
potential usefulness as the criteria for inclusion rather than any
preconceived exhaustive catalogue. For example, Chapter 6 includes both the
classical algorithm for construction of optimal binary search trees on static
data, and the newer skip list structures for dynamic data. In other chapters
there are sections on splay trees, extendible hashing, grid files, and other
elegant recently developed methods. Third, we have included an analysis of
almost every method we describe. One of our major objectives has been to
present analyses that are relatively brief and nontechnical but illuminate
the important performance characteristics of the algorithms.
We omit unnecessary syntactic detail from the presentations. Our subject
matter is algorithms, not the expression of algorithms in the syntax of
particular programming languages, so we have adopted a pseudocode notation
that is readily understandable to programmers but has a simple syntax. It is
assumed that the reader will have had a first course in computer programming
in a language like Pascal or C, and will therefore be able to translate our
pseudocode into such a language without difficulty.
In the same way, we give detailed analyses of the algorithms, but avoid
mathematical techniques that are likely to be inaccessible to college
sophomores. Logarithms, exponentials, and sums of geometric series play a
central role in many analyses, so we give some elementary examples of these
topics in Chapter 1. Naive probabilistic reasoning is also essential, and the
book has a self-contained introduction. On the other hand the differential
calculus is used in only a few spots (the integral calculus not at all), and
precalculus readers can simply skip to the conclusion of those arguments.
Each chapter ends with problems and references. The problems are split up
into sections that correspond to the main sections of the text of that
chapter. Within those sections the problems range from straightforward
simulations of the algorithms on small data sets, to requests for completion
of arguments whose details were omitted in the text, to the design and
analysis of new or extended data structures and algorithms. The problems are
extensive and number more than 450 in all. The references cite publications
that are of historical significance or present good summaries of a particular
set of topics.
In general terms the approach is to treat data structures as defined by the
set of abstract operations that they provide. When considering a data
structure the book typically gives a brief intuitive discussion, then gives
concrete implementations (in pseudocode) and analyzes them in terms of time
and space efficiency. Worst-case, expected-case, and amortized analyses are
all used.
The book covers the material of course CS7 of the ACM curriculum, as well as
part of the prerequisite course CS5. Further information is available from
Don Childress at HarperCollins, 1900 East Lake Avenue, Glenview IL 60025,
(708) 729-3000 extension 2170, or from the authors, lewis@harvard.edu or
larry@harvard.edu.
TABLE OF CONTENTS
Preface
1. Introduction
Programming as an Engineering Activity
Computer Science Background
Memory and Data in Von Neumann Computers
Notation for Programs
Locatives
Abstract Data Types
Mathematical Background
Finite and Infinite Series
Logarithms, Powers, and Exponentials
Order Notation
Recurrence Relations
Naive Probability Theory
2. Algorithm Analysis
Properties of an Algorithm
Effectiveness
Correctness
Termination
Efficiency
Program Complexity
Exact vs. Growth-Rate Analysis
Principles of Mathematical Analysis
Expected-Case and Amortized Analysis
Algorithm Paradigms
Brute-Force and Exhaustive Search
Greedy Algorithms
Dynamic Programming
NP-Completeness
3. Lists
List Operations
Basic List Representations
Stack Representation in Contiguous Memory
Queue Representation in Contiguous Memory
Stack Representation in Linked Memory
Queue Representation in Linked Memory
Stacks and Recursion
List Representations for Traversals
Doubly Linked Lists
4. Trees
Basic Definitions
Special Kinds of Trees
Tree Operations and Traversals
Tree Implementations
Representation of Binary Trees
Representation of Ordered Trees
Representation of Complete Binary Trees
Implementing Tree Traversals and Scans
Stack-Based Traversals
Link-Inversion Traversal
Scanning a Tree in Constant Space
Threaded Trees
Implementing Level-Order Traversal
Summary
5. Arrays and Strings
Arrays as Abstract Data Types
Multidimensional Arrays
Contiguous Representation of Arrays
Constant-Time Initialization
Sparse Arrays
List Representations
Hierarchical Tables
Arrays with Special Shapes
Representations of Strings
Huffman Encoding
Lempel-Ziv Encoding
String Searching
The Knuth-Morris-Pratt Algorithm
The Boyer-Moore Algorithm
Fingerprinting and the Karp-Rabin Algorithm
6. List and Tree Implementations of Sets
Sets and Dictionaries as Abstract Data Types
Unordered Lists
Ordered Lists
Binary Search
Interpolation Search
Skip Lists
Binary Search Trees
Insertion
Deletion
Static Binary Search Trees
Optimal Trees
Probability-Balanced Trees
Median Split Trees
7. Tree Structures for Dynamic Dictionaries
AVL Trees
Insertion
Deletion
2-3 Trees and B-Trees
2-3 Trees
Red-Black Trees
(a,b)-Trees and B-Trees
Self-Adjusting Binary Search Trees
8. Sets of Digital Data
Bit Vectors
Tries and Digital Search Trees
Hashing Techniques
Chaining Strategies
Open Addressing Strategies
Deletions
Extendible Hashing
Hashing Functions
Hashing by Division
Hashing by Multiplication
Perfect Hashing of Static Data
Universal Classes of Hash Functions
9. Sets with Special Operations
Priority Queues
Balanced Tree Implementations
Heaps
Leftist Trees
Disjoint Sets with Union
Up-Trees
Path Compression
Range Searching
k-d-Trees for Multidimensional Searching
Quad Trees
Grid Files
10. Memory Management
The Problem of Memory Management
Records of a Single Size
Reference Counts
Mark and Sweep Garbage Collection
Collecting by Copying
Final Cautions on Garbage Collection
Compaction of Records of Various Sizes
Managing a Pool of Blocks of Various Sizes
Allocation Strategies
Data Structures for Freeing
Buddy Systems
11. Sorting
Kinds of Sorting Algorithms
Insertion and Shell Sort
Selection and Heap Sort
Quick Sort
The Information-Theoretic Lower Bound
Digital Sorting
Bucket Sort
Radix Sort
Radix Exchange Sort
External Sorting
Merge Sorts
Polyphase Merge Sort
Generating the Initial Runs
Finding the Median
12. Graphs
Graphs and Their Representations
Trees
Graph Searching
Breadth-First Search
Depth-First Search
Greedy Algorithms on Graphs
Minimum Spanning Trees
Single-Source Least-Cost Paths
All Pairs Least-Cost Paths
Network Flow
Finding Maximum Flows
Implementing the Max Flow Algorithm
Applications of Max Flow
13. Engineering with Data Structures
A. Locatives