paul@athertn.Atherton.COM (Paul Sander) (07/25/90)
There has been some discussion in comp.software-eng lately regarding reusable software components. This discussion focused mainly on why they are not being used more; the more prominent opinions on this seemed be: - There aren't many - It's easier or more enjoyable or more rewarding to write from scratch than to reuse someone else's - When they exist, they're hard to find Some of the fundamental problems which exhibit themselves as these three points are these: - There aren't many reusable software components because what's out there that claims to be reusable isn't - There is no good way to catalog the components - There is no good way to index the catalog Though all of these topics are interesting and worth discussion, I would like to spur some discussion on just what a reusable software component is. Following is (at the risk of doing some undergraduate's first-quarter Data Structures homework) an implementation of an in-memory B-tree. What I'd like to discuss are: - Is this a reusable software component? - What are its characteristics that make it one? Or don't make it one? - Is it a good one? If so, why? If not, why? - Would anyone use it? Why or why not? - How can it be made better? - Anything else relevant to producing a good reusable component that is reused Other interesting topics that are probably best deferred to email are: - Choice of language - Programming style - Optimizations - Which newsgroups distribute source code I've attempted to direct followups to comp.sw.components. Please double- check your followup headers. Here is the code. I have not enveloped it in any form of archive so that those not on Unix systems (and do not have the benefit of "shar") can read it. By the way, every test I could think of has not revealed any bugs in this code: Moderate-sized trees are really B-trees, there are no coredumps, there are no memory leaks or corruptions. Paul Sander paul@Atherton.COM (408) 734-9822 ********************************************************************* **************************** btree.doc ****************************** ********************************************************************* BTREE() MISC. REFERENCE MANUAL PAGES BTREE() NAME bt_dump, bt_search, bt_new, bt_destroy, bt_insert, bt_traverse, bt_delete - In-memory B-tree manipulation SYNOPSIS #include <btree.h> BTREE bt_new(order,comp) int order; int (*comp)(); void bt_destroy(tree,free_key) BTREE tree; void (*free_key)(); int bt_insert(tree,key) BTREE tree; char *key; char *bt_delete(tree,key) BTREE tree; char *key; char *bt_search(tree,target) BTREE tree; char *target; void bt_traverse(tree,fn,parms) BTREE tree; void (*fn)(); char *parms; void bt_dump(tree,key_dump) BTREE tree; void (*key_dump)(); DESCRIPTION These functions implement an in-memory B-tree data struc- ture. The tree itself stores only pointers to the client's data, and relies on client- provided functions for any com- parisons and storage deallocation. bt_new creates a new B-tree. The client specifies the desired order (the maximum number of children allowed each node) of the tree (for performance trade-offs based on the efficiency of comparisons of client-provided data), and a pointer to a function that compares two client-provided data structures. The comparison function, comp, has an interface identical to strcmp(3), except that its arguments point to client-provided data structures. Client-provided data structures that compare greater by this function will appear later in the lexical order of the data stored in the tree. Sun Release 4.0 Last change: 1 BTREE() MISC. REFERENCE MANUAL PAGES BTREE() bt_new returns a pointer to a handle for the B-tree, or NULL if an error occurs. bt_destroy deallocates all storage allocated to a B-tree. The client provides a pointer to a visitation function that is called once for each item stored in the tree. The items are visited in arbitrary order. If NULL is passed instead of a pointer to a function, no action is taken for the client-provided data, but the tree structure itself is freed. bt_insert inserts a new item into the tree. 1 is returned if the insertion was successful, -1 is returned if the new item matches another item that has already been inserted into the tree, and 0 is returned in the event of an error. bt_delete deletes an item from a tree. The value returned is the same as that passed to bt_insert when the item was inserted, or NULL if the item is not found. bt_search searches for an item in a tree. The value returned is the same as that passed to bt_insert when the item was inserted, or NULL if the item is not found. bt_traverse traverses the tree, calling a client-provided visitation function fn once for each item stored there. fn has the following interface: void fn(item,parms) char *item; char *parms; item is the pointer stored when the item was inserted into the tree. parms is an arbitrary pointer that the client wishes to pass to the visitation function, but is otherwise unused by the B-tree implementation. Items are visited in their lexical order. bt_dump displays the contents of the tree to stdout, along with some diagnostic and statistical information. The key_dump function is called once for each item in the tree, in arbitrary order. It may be NULL if no action is desired. Its interface follows: void key_dump(item) char *item; BUGS This implementation is written in K&R C, and tested only on a Sun 3 running SunOS 4.0.3. This implementation assumes "char*" is the generic pointer. This goes against practice used by most "modern" compilers, but aids portability to older compilers. Note also that this will probably generate lots of complaints from Lint. Sun Release 4.0 Last change: 2 BTREE() MISC. REFERENCE MANUAL PAGES BTREE() bt_dump assumes that pointers are the same size as integers, and that they can be displayed in total in eight hexidecimal digits. Sun Release 4.0 Last change: 3 **************************************************************** ******************** btree.h *********************************** **************************************************************** /* Header file for in-memory B-tree implementation */ typedef char *BTREE; extern void bt_dump(); extern char *bt_search(); extern BTREE bt_new(); extern void bt_destroy(); extern int bt_insert(); extern void bt_traverse(); extern char *bt_delete(); **************************************************************** ******************** btree.h *********************************** **************************************************************** /* #define DEBUG */ /********************************************************************** * * btree.c -- This file implements an in-memory B-tree abstract data * type. When the tree is created, the caller specifies * the number of children each node may have (i.e., the * tree's "order") and a pointer to a function that compares * two keys. The function has the same calling protocol as * strcmp(). * **********************************************************************/ #include <stdio.h> /* The following structure describes a B-tree node. */ struct btnode { int nkeys; /* Number of keys stored here */ char **keys; /* array[order-1] of keys */ struct btnode **children; /* array[order] of children */ }; /* The following structure describes a B-tree. */ struct btree { int order; /* Max children permitted */ int (*comp)(); /* Comparison function for keys */ struct btnode *root; /* Root of tree */ int capacity; /* Max keys that will fit */ int contains; /* Key currently stored */ }; typedef struct btree *BTREE; typedef struct btnode BTNODE; /* The following structure is used for debugging heap corruptions */ #ifdef DEBUG struct heapblk { char *blk; int freeNext; }; struct heap_struct { int allocSeq; int freeLast; struct heapblk blks[10000]; }; struct heap_struct btheap = {0,-1}; #endif DEBUG /* * The following code is used for allocating space on the heap. When not * debugging heap corruptions, a macro aliases bt_malloc with malloc. */ #ifdef DEBUG char *bt_malloc(size) unsigned size; { char *temp; temp = (char*) malloc(size); btheap.blks[btheap.allocSeq].blk = temp; btheap.blks[btheap.allocSeq].freeNext = -2; btheap.allocSeq++; return temp; } #else #define bt_malloc(size) malloc(size) #endif /* * The following code is used for freeing memory on the heap. If not * debugging heap corruptions, bt_free is aliased with free. */ #ifdef DEBUG bt_free(ptr) char *ptr; { int i; printf("Freeing heap storage at %x\n",ptr); for (i = 0; i < btheap.allocSeq; i++) { if (ptr == btheap.blks[i].blk) { if (btheap.blks[i].freeNext == -2) { btheap.blks[i].freeNext = btheap.freeLast; btheap.freeLast = i; free(ptr); return; } else { printf( "ERROR: freeing %x which was already freed\n", ptr); printf("Block\n"); i = btheap.freeLast; while (i > 0) { printf("%x\n",btheap.blks[i].blk); i = btheap.blks[i].freeNext; } fflush(stdout); kill(getpid(),5); } } } printf("ERROR: Freeing %x which has not been allocated\n",ptr); fflush(stdout); kill(getpid(),5); } #else #define bt_free(ptr) free(ptr) #endif /* The following verifies the integrity of the heap. */ int bt_malloc_verify() { int retval; int once; int i; retval = 1; once = 0; #ifdef DEBUG for (i = 0; i < btheap.allocSeq; i++) { if (btheap.blks[i].freeNext == -2) { if (!once) { once = 1; printf("The following blocks are allocated:\n"); } printf("%x\n",btheap.blks[i].blk); } } fflush(stdout); #endif #ifdef sun retval = malloc_verify(); #endif return retval; } /********************************************************************* * * dump_node -- This function displays the contents of a node and * then recursively displays its children. Note * that this function assumes that a pointer to a node * is the same size as an integer. * *********************************************************************/ void dump_node(node,key_dump) BTNODE *node; void (*key_dump)(); { int i; if (node == NULL) { printf("ERROR: null node pointer\n"); return; } printf("%08x: %d keys(%08x, %08x)\n",(int) node,node->nkeys, node->keys,node->children); printf("--------\n"); for (i = 0; i < node->nkeys; i++) { if (node->children != NULL) { printf(" %08x\n",node->children[i]); } else { printf(" %08x\n",0); } if (key_dump != NULL) { printf(" "); key_dump(node->keys[i]); } } if (node->children != NULL) { printf(" %08x\n",node->children[i]); for (i = 0; i <= node->nkeys; i++) { dump_node(node->children[i],key_dump); } } else { printf(" %08x\n",0); } fflush(stdout); return; } /********************************************************************* * * bt_dump -- This function displays the contents of a B-tree. The * caller passes in a function that displays the contents * of one of the keys stored in the tree. The calling * protocol for this function is: * * key_dump(key) * char *key; * * Also, if key_dump is NULL, no action is taken. This is * useful if the structure of the tree is desired without * including its contents. * *********************************************************************/ void bt_dump(ptree,key_dump) char *ptree; void (*key_dump)(); { BTREE tree; tree = (BTREE) ptree; printf("B-tree dump:\n\n"); printf("order = %d\n",tree->order); printf("capacity = %d\n",tree->capacity); printf("contains %d keys\n",tree->contains); printf("efficiency is %d per cent\n", (tree->contains*100)/tree->capacity); printf("handle at %08x\n",tree); printf("root = %08x\n\n",(int) tree->root); dump_node(tree->root,key_dump); return; } /******************************************************************** * * lin_search -- This function searches a node for a key. If the key * is found, its index in the node's key array is * returned, along with a flag indicating that the key * was found. If the key is not found, the index of the * next higher key in the key array is returned, along * with a "not found" indication. * ********************************************************************/ static int lin_search(node,target,comp,eq) BTNODE *node; /* Node to be searched */ char *target; /* Key to search for */ int (*comp)(); /* strcmp()-like comparison function */ int *eq; /* 0 if not found, non-zero otherwise */ { int i; int res; int hi; int lo; res = -1; /* * Perform linear search of node contains 7 keys or less, * otherwise perform binary search */ if (node->nkeys <= 7) { for (i = 0; i < node->nkeys; i++) { res = (*comp)(node->keys[i],target); if (res >= 0) break; } } else { lo = 0; hi = node->nkeys - 1; while ((lo <= hi) && (res != 0)) { i = (lo + hi) / 2; res = (*comp)(node->keys[i],target); if (res < 0) lo = i + 1; else if (res > 0) hi = i - 1; } if (res < 0) i++; } *eq = (res == 0); /* Indicate successful search */ return i; } /*********************************************************************** * * node_search -- This function performs a recursive-descent search of * a sub-tree for a key. The key is returned if found, * or NULL otherwise. * ***********************************************************************/ static char *node_search(node,target,comp) BTNODE *node; char *target; int (*comp)(); { int res; int i; int eq; if (node == NULL) return NULL; i = lin_search(node,target,comp,&eq); if (eq) return node->keys[i]; else if (node->children == NULL) return NULL; else return node_search(node->children[i],target,comp); } /*********************************************************************** * * bt_search -- This function searches a tree for a specified key. The * key is returned if found, or NULL if not. * ***********************************************************************/ char *bt_search(tree,target) BTREE tree; char *target; { if (tree == NULL) return NULL; return node_search(tree->root,target,tree->comp); } /*********************************************************************** * * bt_new -- Given the number of children each node is allowed to have, * and a pointer to a strcmp-like function that compares two * keys, this function creates an empty b-tree structure. * ***********************************************************************/ char *bt_new(order,comp) int order; int (*comp)(); { BTREE tree; int i; /* Validate parameters */ if ((comp == NULL) || (order < 3)) return (char *) NULL; /* Allocate the handle */ tree = (BTREE) bt_malloc(sizeof(struct btree)); if (tree != NULL) { /* Allocate the root */ tree->root = (BTNODE*) bt_malloc(sizeof(struct btnode)); if (tree->root != NULL) { /* Initialize root */ tree->order = order; tree->comp = comp; tree->root->nkeys = 0; tree->root->children = NULL; /* Allocate key array */ tree->root->keys = (char**)bt_malloc((order - 1) * sizeof(char*)); if (tree->root->keys != NULL) { /* Initialize keys */ for (i = 0; i < order - 1; i++) { tree->root->keys[i] = NULL; } tree->root->nkeys = 0; tree->contains = 0; tree->capacity = order - 1; return (char *) tree; } /* Failed to allocate keys, free tree */ bt_free(tree->root); } /* Failed to allocate root, free handle */ bt_free(tree); } /* Failed to allocate handle, return null (error) */ return (char *) NULL; } /******************************************************************** * * free_node -- This function frees a node in a b-tree. It is * provided with a function that frees each key in the * node also. The free_key function has the following * protocol: * * free_key(key) * char *key; * * If NULL is passed as the free_key function, the node * is freed, but no action is taken for the keys. * ********************************************************************/ static void free_node(node,free_key) BTNODE *node; void (*free_key)(); { int i; if (node->children != NULL) { for (i = 0; i <= node->nkeys; i++) { free_node(node->children[i],free_key); } bt_free(node->children); } if (free_key != NULL) { for (i = 0; i < node->nkeys; i++) { (*free_key)(node->keys[i]); } } bt_free(node->keys); bt_free(node); return; } /******************************************************************** * * bt_destroy -- This function frees a b-tree structure. It is also * passed a function that frees the keys contained by the * tree. The free_key function has the same calling * protocol as the free_key function passed to free_node * above. * ********************************************************************/ void bt_destroy(ptree,free_key) char *ptree; void (*free_key)(); { BTREE tree; tree = (BTREE) ptree; free_node(tree->root,free_key); bt_free(tree); return; } /****************************************************************** * * rot_right -- Given a node and an index into its key array, this * function rotates the node right at the specified * key. This is used during insertion and deletion to * keep the tree balanced. The return value is 0 if * the function fails, i.e. the node on the left * cannot lose keys, the node on the right cannot * gain keys, or the specified node is a leaf; 1 is * returned if successful. * ******************************************************************/ static int rot_right(node,n,order) BTNODE *node; int n; int order; { int i; BTNODE *right; BTNODE *left; /* Test parameters */ if ((n >= node->nkeys) || (n < 0)) { return 0; } /* Point to children */ left = node->children[n]; right = node->children[n+1]; /* Return FALSE if rotation is not possible */ if ((left == NULL) || (left->nkeys <= order/2) || (right->nkeys >= order - 1)) { return 0; } /* Shift the right child's keys */ for (i = right->nkeys; i > 0; i--) { right->keys[i] = right->keys[i - 1]; } /* Rotate keys */ right->keys[0] = node->keys[n]; node->keys[n] = left->keys[left->nkeys - 1]; left->keys[left->nkeys - 1] = NULL; if (right->children != NULL) { /* Shift the right child's children */ for (i = right->nkeys; i >= 0; i--) { right->children[i + 1] = right->children[i]; } /* Rotate children */ right->children[0] = left->children[left->nkeys]; left->children[left->nkeys] = NULL; } /* Update key count and return TRUE */ right->nkeys++; left->nkeys--; return 1; } /****************************************************************** * * rot_left -- Given a node and an index into its key array, this * function rotates the node left at the specified * key. This is used during insertion and deletion to * keep the tree balanced. The return value is 0 if * the function fails, i.e. the node on the right * cannot lose keys, the node on the left cannot * gain keys, or the specified node is a leaf; 1 is * returned when successful. * ******************************************************************/ static int rot_left(node,n,order) BTNODE *node; int n; int order; { int i; BTNODE *right; BTNODE *left; /* Test parameters */ if ((n < 0) || (n >= node->nkeys)) { return 0; } /* Point to children */ right = node->children[n+1]; left = node->children[n]; /* Return FALSE if rotation is not possible */ if ((left == NULL) || (right->nkeys <= order/2) || (left->nkeys >= order - 1)) { return 0; } /* Rotate keys */ left->keys[left->nkeys] = node->keys[n]; node->keys[n] = right->keys[0]; /* Update key counts */ left->nkeys ++; right->nkeys --; /* Shift the right child's keys */ for (i = 0; i < right->nkeys; i++) { right->keys[i] = right->keys[i+1]; } right->keys[i] = NULL; if (left->children != NULL) { /* Rotate children */ left->children[left->nkeys] = right->children[0]; /* Shift the right child's children */ for (i = 0; i <= right->nkeys; i++) { right->children[i] = right->children[i+1]; } right->children[i+1] = NULL; } /* Return TRUE */ return 1; } /********************************************************************* * * ins_key -- This function inserts a key into a specified node at * a specified location in its key array. 0 is returned * if the node is already full, 1 otherwise. * *********************************************************************/ static int ins_key(node,key,n,order) BTNODE *node; char *key; int n; int order; { int i; /* Return FALSE if insertion is not possible */ if (node->nkeys >= order - 1) { return 0; } /* Shift keys to make room, then insert new key */ for (i = node->nkeys - 1; i >= n; i--) { node->keys[i+1] = node->keys[i]; } node->keys[n] = key; node->nkeys++; return 1; } /********************************************************************* * * burst -- Given a node and an index into its key and child arrays, * this function splits the specified child node in half, * elevating the key at the child's split point into this * specified node. 0 is returned if the child array index * is out of range, or the node is full, or malloc fails; * 1 is returned otherwise. * *********************************************************************/ static int burst(tree,node,n) BTREE tree; BTNODE *node; int n; { BTNODE *new; int i; int m; BTNODE *left; int lkeys; /* Test parameters */ if (((node->nkeys > 0) && (node->nkeys >= tree->order - 1)) || (n < 0)) { return 0; } /* Create a new node */ new = (BTNODE*) bt_malloc(sizeof(BTNODE)); if (new != NULL) { /* Allocate new key array */ new->keys = (char**) bt_malloc(tree->order*sizeof(char*)); if (new->keys != NULL) { /* Calculate needed partial results */ m = tree->order / 2; left = node->children[n]; lkeys = left->nkeys; /* Allocate new child array */ if (left->children != NULL) { new->children = (BTNODE**) bt_malloc((tree->order + 1) * sizeof(BTNODE*)); if (new->children != NULL) { for (i = 0; i <= lkeys - m; i++) { new->children[i] = left->children[m + i]; } } else { bt_free(new->keys); bt_free(new); return 0; } } else { new->children = NULL; } /* Split keys */ for (i = 0; i < lkeys - m; i++) { new->keys[i] = left->keys[m + i]; } new->nkeys = lkeys - m; left->nkeys = m - 1; /* Elevate the key where the split was made */ for (i = node->nkeys; i > n; i--) { node->keys[i] = node->keys[i-1]; node->children[i+1] = node->children[i]; } node->children[n+1] = new; node->keys[n] = left->keys[m - 1]; node->nkeys++; tree->capacity += tree->order - 1; return 1; } /* Failed to allocate new child array, free node */ bt_free(new); } /* Failed to allocate new node, return FALSE */ return 0; } /********************************************************************* * * rebalance -- This function tries to rebalance the two adjacent * nodes. This is needed when an insertion is attempted * into a node that is already full but its neighbors are * not, or when a deletion is made and the node is less * than half-full. * *********************************************************************/ static rebalance(node,n,order) BTNODE *node; int n; int order; { int res; if (node->children[n]->nkeys < (order-1)/2) { res = rot_left(node,n,order); } else if ((n < node->nkeys) && (node->children[n+1]->nkeys < (order-1)/2)) { res = rot_right(node,n,order); } return; } /********************************************************************* * * ins_node -- This function performs a recursive-descent of a b-tree, * inserting a new key in a leaf node. If the leaf is * full, the tree is reorganized to make room. -1 is * returned if the insert fails due to a duplicate key. * 0 is returned if the insert fails for some other * reason. 1 is returned if successful. * *********************************************************************/ static int ins_node(tree,node,key) BTREE tree; BTNODE *node; char *key; { int i; int res; /* Locate proper place for new key in node */ i = lin_search(node,key,tree->comp,&res); if (res) return -1; /* If no children, insert key in this node */ if (node->children == NULL) { return ins_key(node,key,i,tree->order); } /* Try to insert the new key in a child node */ res = ins_node(tree,node->children[i],key); /* Child is full, reorganize */ if (res == 0) { /* Try rotating right */ if ((res == 0) && (i < node->nkeys) && (node->children[i+1]->nkeys < tree->order - 2)) { res = rot_right(node,i,tree->order); } /* Try rotating left */ if ((res == 0) && (i > 0) && (node->children[i-1]->nkeys < tree->order - 2)) { res = rot_left(node,i - 1,tree->order); } /* Can't rotate, try bursting */ if (res == 0) res = burst(tree,node,i); if (res > 0) { res = ins_node(tree,node,key); if (res > 0) { rebalance(node,i,tree->order); } } } return res; } /********************************************************************* * * bt_insert -- This function inserts a new key into a b-tree. -1 is * returned if the insert fails due to a duplicate key. * 0 is returned if the insert fails for some other * reason. 1 is returned if successful. * *********************************************************************/ int bt_insert(ptree,key) char *ptree; char *key; { int res; BTNODE *new; BTNODE *node; BTREE tree; tree = (BTREE) ptree; /* Begin at root */ node = tree->root; /* If root is empty, insert first key */ if (node->nkeys == 0) { node->keys[0] = key; node->nkeys = 1; tree->contains = 1; return 1; } /* Try inserting new key */ res = ins_node(tree,node,key); /* If 0 return, burst the root, rebalance, and try again */ if (res == 0) { new = (BTNODE*) bt_malloc(sizeof(BTNODE)); if (new != NULL) { new->children = (BTNODE**) bt_malloc((tree->order + 1) * sizeof(BTNODE*)); if (new->children != NULL) { new->keys = (char**) bt_malloc((tree->order) * sizeof(char*)); if (new->keys != NULL) { new->nkeys = 0; new->children[0] = node; tree->root = new; tree->capacity += tree->order - 1; res = burst(tree,new,0); if (res > 0) { res = ins_node(tree,new,key); } if (res > 0) { tree->contains++; rebalance(new,0,tree->order); } return res; } bt_free(new->children); } bt_free(new); } } if (res > 0) tree->contains++; return res; } /**************************************************************** * * visit_node -- This function is used while traversing a b-tree. * It is called once for each node. It calls some * specified function once for each key in its * key array, and calls itself once for each child * in its child array. * ****************************************************************/ static void visit_node(node,fn,parms) BTNODE *node; void (*fn)(); char *parms; { int i; if (node == NULL) return; for (i = 0; i < node->nkeys; i++) { if (node->children != NULL) { visit_node(node->children[i],fn,parms); } (*fn)(node->keys[i],parms); } if (node->children != NULL) { visit_node(node->children[i],fn,parms); } return; } /**************************************************************** * * bt_traverse -- This function traverses a b-tree, calling some * specified function once for each key stored in * it. The specified function has the following * protocol: * * fn(key,parms) * char *key; * char *parms; * * where key is a pointer to a key stored in the * btree, and parms is the parameter structure * passed by the caller. * * The nodes are visited in descending order, if * the comp function passed to bt_new is * strcmp-like. * ****************************************************************/ void bt_traverse(ptree,fn,parms) char *ptree; void (*fn)(); char *parms; { BTREE tree; tree = (BTREE) ptree; visit_node(tree->root,fn,parms); return; } /******************************************************************** * * del_key -- This function deletes the specified key from the * specified node. * ********************************************************************/ static del_key(node,n) BTNODE *node; int n; { int i; for (i = n; i < node->nkeys - 1; i++) { node->keys[i] = node->keys[i+1]; } if (node->children != NULL) { for (i = n; i < node->nkeys; i++) { node->children[i] = node->children[i+1]; } } node->nkeys--; return; } /******************************************************************** * * weld -- This function is the opposite of burst, above. It joins * two neighboring children of the given node, and lowers a * key into the result. * ********************************************************************/ static weld(tree,node,n) BTREE tree; BTNODE *node; int n; { int i; int j; BTNODE *left; BTNODE *right; if (node->children == NULL) return; #ifdef DEBUG printf("Welding node at %x, key %d\n",(int)node,n); #endif left = node->children[n]; right = node->children[n+1]; i = left->nkeys; left->keys[i] = node->keys[n]; for (i++, j = 0; j < right->nkeys; i++, j++) { left->keys[i] = right->keys[j]; } for (i = n + 1; i <= node->nkeys; i++) { node->keys[i-1] = node->keys[i]; node->children[i] = node->children[i+1]; } node->nkeys--; if (left->children != NULL) { for (i = left->nkeys + 1, j = 0; j <= right->nkeys; i++, j++) { left->children[i] = right->children[j]; } bt_free(right->children); } left->nkeys += right->nkeys + 1; bt_free(right->keys); bt_free(right); tree->capacity -= tree->order - 1; return; } /******************************************************************** * * del_pred -- This function searches a sub-tree, and deletes the * highest key it finds, and returns it to its caller. * If any node along the way loses its b-tree'ness, the * tree is reorganized after the deletion. * ********************************************************************/ static char *del_pred(tree,node) BTREE tree; BTNODE *node; { BTNODE *pnode; int res; char *retval; /* * If at a leaf node, delete the highest key, otherwise * call self recursively */ if (node->children == NULL) { retval = node->keys[node->nkeys - 1]; node->nkeys--; } else { pnode = node->children[node->nkeys]; retval = del_pred(tree,pnode); /* Reorganize tree if node gets too empty */ if (pnode->nkeys < (tree->order - 1) / 2) { res = rot_right(node,node->nkeys - 1,tree->order); if (res == 0) weld(tree,node,node->nkeys - 1); } } return retval; } /******************************************************************** * * del_node -- This function searches a sub-tree for a key, and * deletes the key. If the key is found in a leaf node, * it is simple removed and returned to the caller; * otherwise, it is swapped with its predecessor, and * the returned to the caller. If any node loses its * b-tree'ness, the tree is reorganized. If the key is * not found, NULL is returned. * ********************************************************************/ static char *del_node(tree,node,key) BTREE tree; BTNODE *node; char *key; { int i; int res; char *retval; /* Key not found */ if (node == NULL) return NULL; /* Search for key, descend tree until found or find leaf */ i = lin_search(node,key,tree->comp,&res); if (res == 0) { if (node->children == NULL) return NULL; retval = del_node(tree,node->children[i],key); } else { /* Return the deleted node to caller */ retval = node->keys[i]; if (node->children == NULL) { /* Delete from leaf */ del_key(node,i); } else { /* Swap with predecessor */ node->keys[i] = del_pred(tree,node->children[i]); } } /* If a child node was reorganized, rebalancing may be necessary */ if ((node->children != NULL) && (node->children[i]->nkeys < (tree->order - 1) / 2)) { res = rot_right(node,i - 1,tree->order); if (res == 0) res = rot_left(node,i,tree->order); if (res == 0) { if (i >= node->nkeys) i--; weld(tree,node,i); } } return retval; } /******************************************************************** * * bt_delete -- This function deletes the specified key from a * b-tree and reorganizes it as necessary. The deleted * key is returned to the caller if it is found, NULL * is returned otherwise. * ********************************************************************/ char *bt_delete(ptree,key) char *ptree; char *key; { BTNODE *root; char *retval; BTREE tree; tree = (BTREE) ptree; /* Do no deletion if tree is empty */ root = tree->root; if (root->nkeys == 0) return NULL; /* Delete the key */ retval = del_node(tree,root,key); /* * Delete the root if it is empty, yet its children are not. * This happens after a weld on the last key in the root. */ if ((root->nkeys == 0) && (root->children != NULL)) { tree->root = root->children[0]; bt_free(root->keys); bt_free(root->children); bt_free(root); tree->capacity -= tree->order - 1; } if (retval != NULL) tree->contains--; return retval; } ***************************************************************** ****************** test.c *************************************** ***************************************************************** #include <stdio.h> #include "btree.h" /* #define DEBUGGING */ #define LOOPS 500 #define ORDER 12 typedef struct key { int key; } KEY; KEY savedKeys[LOOPS]; dump(key) KEY *key; { printf("%x: %d\n",key,key->key); } visit(key,parms) KEY *key; char *parms; { dump(key); } comp(k1,k2) KEY *k1,*k2; { return (k1->key - k2->key); } main() { BTREE tree; KEY *key; int i; int res; #ifdef DEBUGGING #ifdef sun malloc_debug(2); #endif #endif tree = (BTREE)bt_new(ORDER,comp); if (tree == NULL) { printf("Failed to create tree\n"); exit(1); } (void) bt_malloc_verify(); for (i = 0; i < LOOPS; i++) { printf("Iteration %d\n",i+1); key = &savedKeys[i]; key->key = rand(); printf("Inserting %d\n",key->key); fflush(stdout); res = bt_insert(tree,key); if (res > 0) { printf("Inserted %d\n",key->key); } else if (res < 0) { printf("Duplicate key %d\n",key->key); } else { printf("Error inserting key %d\n"); exit(3); } printf("Trying duplicate\n"); res = bt_insert(tree,key); if (res < 0) { printf("Failed to insert duplicate\n"); } else if (res == 0) { printf("Error inserting duplicate\n"); exit(4); } else { printf("Succeeded in inserting duplicate\n"); exit(5); } #if 0 #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); #endif #endif 0 } printf("Finished creating the tree.\n"); printf("Contents of tree follow:\n"); bt_traverse(tree,visit,NULL); for (i = 0; i < LOOPS; i++) { printf("Iteration %d\n",i); printf("Seeking %d at %x\n",savedKeys[i].key,&savedKeys[i]); key = (KEY*)bt_search(tree,&savedKeys[i]); if (key != &savedKeys[i]) { printf("Can't find %d, returned %x\n",savedKeys[i].key, key); } printf("Freeing %d at %x\n",savedKeys[i].key,&savedKeys[i]); fflush(stdout); key = (KEY*)bt_delete(tree,&savedKeys[i]); if (key != &savedKeys[i]) { printf("Problem freeing %d at %x, returned %x\n", savedKeys[i].key,&savedKeys[i].key,key); } else { printf("Freed %d, returned %x\n",savedKeys[i].key,key); } #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); fflush(stdout); #endif key = (KEY*)bt_search(tree,&savedKeys[i]); if (key == &savedKeys[i]) { printf("Found %d after deletion\n",savedKeys[i].key); } key = (KEY*)bt_delete(tree,&savedKeys[i]); if (key != NULL) { printf("%d remained in tree after deletion at %x\n", savedKeys[i].key,key); } #if 0 #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); #endif #endif 0 } (void) bt_malloc_verify(); printf("Freeing tree\n"); bt_destroy(tree,NULL); printf("Finished freeing tree\n"); printf("Malloc verify: "); if (bt_malloc_verify()) { printf("success\n"); fflush(stdout); } else { printf("failure\n"); fflush(stdout); exit(6); } printf("Creating second tree\n"); tree = (BTREE)bt_new(ORDER,comp); if (tree == NULL) { printf("Failed to create tree\n"); exit(7); } (void) bt_malloc_verify(); printf("Contents of tree:\n"); bt_dump(tree,dump); for (i = 0; i < LOOPS; i++) { printf("Iteration %d\n",i+1); key = &savedKeys[i]; key->key = rand(); printf("Inserting %d\n",key->key); fflush(stdout); res = bt_insert(tree,key); if (res > 0) { printf("Inserted %d\n",key->key); } else if (res < 0) { printf("Duplicate key %d\n",key->key); } else { printf("Error inserting key %d\n"); exit(8); } printf("Trying duplicate\n"); res = bt_insert(tree,key); if (res < 0) { printf("Failed to insert duplicate\n"); } else if (res == 0) { printf("Error inserting duplicate\n"); exit(9); } else { printf("Succeeded in inserting duplicate\n"); exit(10); } #if 0 #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); #endif #endif 0 } printf("Finished creating the tree.\n"); printf("Contents of tree follow:\n"); bt_traverse(tree,visit,NULL); printf("Freeing tree\n"); bt_destroy(tree,NULL); printf("Finished freeing tree\n"); printf("Malloc verify: "); if (bt_malloc_verify()) printf("success\n"); else printf("failure\n"); fflush(stdout); exit(0); } -- Paul Sander (408) 734-9822 | "Passwords are like underwear," she said, paul@Atherton.COM | "Both should be changed often." {decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"
ogden@seal.cis.ohio-state.edu (William F Ogden) (07/26/90)
Paul Sanders writes: >There has been some discussion in comp.software-eng lately regarding >reusable software components. ... > ... I would like >to spur some discussion on just what a reusable software component is. >Following is (at the risk of doing some undergraduate's first-quarter Data >Structures homework) an implementation of an in-memory B-tree. What I'd >like to discuss are: > >- Is this a reusable software component? >- What are its characteristics that make it one? Or don't make it one? >- Is it a good one? If so, why? If not, why? >- Would anyone use it? Why or why not? >- How can it be made better? >- Anything else relevant to producing a good reusable component that is reused At the risk of getting back into the discussion of why software development isn't like the construction of other engineering artifacts, let me suggest that, from a reusability perspective, a B-tree facility is rather like a circuit board in a computer. You certainly can't argue that it's not an electronic component, but it's not a very likely candidate for reuse in the next computer you design. >- What are its characteristics that ... don't make it one? First, the conceptual interface that it presents to its clients is not simple. Clients don't really want to fool around restructuring trees or reclaiming storage; they just want a fast storage and retrieval facility. Here they are forced to deal with details that shouldn't concern them [poor information hiding]. Second, the client interface is inadequately specified. Descriptions of the various B-tree operations are supplied only in English, which is inherently either verbose or imprecise. Global constraints about keeping the tree height uniform, the branchyness from dropping too low, the indices in left-to-right order, etc. are not spelled out. Performance characteristics are not indicated. The type of the items stored in the B-tree is kept commendably general, but no mention is made of the fact that correct operation depends on the comparison procedure testing a <= relation which must be a TOTAL ordering of the items. Etc. >- How can it be made better? >- Anything else relevant to producing a good reusable component that is reused Just as with a circuit board, you may try to improve it, but it shouldn't be done with an eye to creating a reusable component per se. The circuit board contains reusable components, and it is part of a larger reusable component. That is also the situation with the B-tree facility. The reusable component upon which the B-tree facility should have been based is the exploration-tree facility. It is a general purpose facility which provides clients with trees and operations for navigating around in them. In contrast to the B-tree, it can be used as a component in a wide variety of situations. The reusable component which the B-tree facility supports is the almost-constant-function facility. Conceptually, it provides a function F from an arbitrary totally ordered domain set to an arbitrary range set. For all but a few values, F(x) = C, hence the name. A typical application would be sparse matrices, where the domain would be pairs of integers (lexicogphically ordered), the range would be floating point numbers, and C = 0.0. Another might be symbol tables where the domain would be character strings, the range would be say type names, and C = Undefined_Type. Data bases, polynomials, etc. provide additional applications. The <x,y> pairs where F(x) = y # C can be stored in a B-tree. Note, however, that a wide variety of other implementations are posible (e.g. AVL tree, linked list, hashing, etc.). Analogously, a computer architecture can be realized using a variety of chips and board layouts. The exploration-tree and the almost-constant-function present clients with simple interfaces which have clean specifications. That's surely a necessary feature of a reusable component. Another important property is that, once you discover them, you find that they could be used in a wide variety of current applications. It's not so clear that you can design reusable components which are only useful in future software. /Bill
ath@prosys.se (Anders Thulin) (07/27/90)
In article <27705@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes: > [ ... deleted ... ] >- How can it be made better? By providing: * external storage capability These should be some way to store the B-tree on secondary memory, so that the data it contains could be reused in another program. * browser That is, providing a `bt_user_search(tree)' which presents the data in the tree, and lets the user select an item for further computation. This is something I think most components that implement data structures should have. At least those implementing `complicated' structures. -- Anders Thulin ath@prosys.se {uunet,mcsun}!sunic!prosys!ath Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden
paul@athertn.Atherton.COM (Paul Sander) (07/28/90)
In article <82454@tut.cis.ohio-state.edu> William F Ogden <ogden@cis.ohio-state.edu> writes: >Paul Sander [that's me] writes: [Introductory comments and questions omitted] >At the risk of getting back into the discussion of why software development >isn't like the construction of other engineering artifacts, let me suggest >that, from a reusability perspective, a B-tree facility is rather like >a circuit board in a computer. You certainly can't argue that it's not an >electronic component, but it's not a very likely candidate for reuse in >the next computer you design. I prefer to think of code of this sort as being something like integrated circuits, rather than circuit boards. They are more abstract in function than the individual lines of code (as a gate or VLSI circuit is more abstract in function than the active and passive components that are etched into it), but less abstract in function than a subsystem or product (as an IC is less abstract in function than a circuit board or product). Simple data structures can be built into more complicated ones in the same way that standard cells are built into ASICs (Application Specific ICs) with the addition of a certain amount of "glue" logic. [Just one man's opinion here] In any case, when designed well, chips, boards, and (hopefully) software can be reused in new designs with little or no modification. The flexibility of the design and quality of implementation will govern its suitability for new applications. It is very common for ICs to be reused in new designs; it is less common, but certainly not unusual to find circuit boards used in new product designs. A data structure that sorts things will likely be used the next time someone needs to sort things, if they don't have make substantial changes. [Regarding my B-tree implementation:] >>- What are its characteristics that ... don't make it one? > >First, the conceptual interface that it presents to its clients is not >simple. Clients don't really want to fool around restructuring trees or >reclaiming storage; they just want a fast storage and retrieval facility. >Here they are forced to deal with details that shouldn't concern them >[poor information hiding]. But the client is spared the details of restructuring trees; the tree collects its own garbage. The client is forced to manage its own storage, because the tree can't do an effective job of it anyway; the tree has no clue as to what kind of stuff the client is storing there. If it's a bunch of linked structures, the only way the tree can manage it is to include some sort of compiler with which a client describes its data, and how to manage it. This seems too complicated and error prone. If the client needs to store such complex data structures in a tree, then another software component that maintains those data structures can be used, and objects it creates can then be stored in the tree. If both components do a good job managing their storage, and the client is good about avoiding aliasing problems, then the storage manages itself in the eyes of the client. As for the information hiding issue, this was the best I could do with C, since C doesn't have the data abstraction tools that, say, Modula-2 and Ada have. How can it be improved within C's constraints? >Second, the client interface is inadequately specified. >Descriptions of the various B-tree operations are supplied only in English, >which is inherently either verbose or imprecise. Okay, there is a documentation problem. How can it be improved in a way that a large number of potential clients can understand? I know of no good way to describe a data structure mathematically, and even if I could do it, few would understand it. >Global constraints about keeping the tree height uniform, the branchyness from >dropping too low, the indices in left-to-right order, etc. are not spelled out. Why does the client need to know these? All it sees are the interfaces documented in the man page. It doesn't even need to know that this data structure that provides an indexed retrieval system that sorts data is really a B-tree. That information is provided to the client as a motivation for providing a particular parameter when the data structure is created, and provides a hook where the client might make performance trade-offs. >Performance characteristics are not indicated. The performance characteristics are listed below, and have been added to the man page. Here, "m" is the order of the tree, "n" is the number of items stored in it, and "M" is either O(log m) if n > 7 or O(n) if n <= 7 (this is the point where a binary search begins to outperform a linear search). Retrieval: O(M * log n) Creation: O(1) Destruction: O(log m) + O(n) Insertion: O(m * log n) Deletion: O(m * log n) Traversal: O(n) I couldn't measure the performance in any meaningful way. Performing some sort of benchmark on this implementation is of only limited value, because it specifies only how the tree performed on a particular type of machine with a particular load with a particular set of data having particular characteristics. >The type of the items stored in the B-tree is kept commendably general, >but no mention is made of the fact that correct operation depends on the >comparison procedure testing a <= relation which must be a TOTAL ordering >of the items. All that the comparison procedure need guarantee is that, given a domain of input, the output be the same every time the same two items are compared. I had thought that the semantics of "strcmp" were defined that way when I made reference to it in the B-tree man page. In any case, this is another simple documentation problem. >Etc. What else? >>- How can it be made better? >>- Anything else relevant to producing a good reusable component that is reused > >Just as with a circuit board, you may try to improve it, but it shouldn't >be done with an eye to creating a reusable component per se. The circuit >board contains reusable components, and it is part of a larger reusable >component. That is also the situation with the B-tree facility. I don't understand this. The intent was to build a piece of software that could be used in a variety of ways to build bigger pieces of software. It was in fact created with an eye to creating a reusable component. Is there anything to prevent one from building a nested scope symbol table, using this as the index? Is there anything to prevent one from using this to build a temporary index for an external file? I guess there are several interpretations of what it means to say "improve it." It could mean "how can it be made to have more of the qualities that make it a reusable software component?" or "how can it be made more bug free?" or "how can it be made faster, without introducing more bugs?" or "how can its restrictions be reduced?". My question about "how can it be improved" in my original posting was really intended to be the first of these interpretations. I expect the second and third are ignored in practice because "it works, why fix it?". The fourth is usually addressed in a rewrite that is in turn reusable and may have a different set of applications that it addresses better than the first. That doesn't mean that the earlier implementation is more or less useful (or reusable) than the second, just that the second is more appropriate for some things. >The reusable component upon which the B-tree facility should have been based >is the exploration-tree facility. It is a general purpose facility which >provides clients with trees and operations for navigating around in them. >In contrast to the B-tree, it can be used as a component in a wide variety >of situations. This is the first I've heard about exploration-tree facilities. Without knowing more about it, I can't ask specific questions. Could you email to me some additional information and references, please? >The reusable component which the B-tree facility supports is the >almost-constant-function facility. Conceptually, it provides a >function F from an arbitrary totally ordered domain set to an arbitrary >range set. For all but a few values, F(x) = C, hence the name. A typical >application would be sparse matrices, where the domain would be pairs >of integers (lexicogphically ordered), the range would be floating point >numbers, and C = 0.0. Another might be symbol tables where the domain >would be character strings, the range would be say type names, and >C = Undefined_Type. Data bases, polynomials, etc. provide additional >applications. The <x,y> pairs where F(x) = y # C can be stored in a >B-tree. Note, however, that a wide variety of other implementations >are posible (e.g. AVL tree, linked list, hashing, etc.). Analogously, >a computer architecture can be realized using a variety of chips and >board layouts. > >The exploration-tree and the almost-constant-function present clients >with simple interfaces which have clean specifications. That's >surely a necessary feature of a reusable component. >Another important property is that, once you discover them, you find >that they could be used in a wide variety of current applications. >It's not so clear that you can design reusable components which are >only useful in future software. This coincides with my vision of what a reusable component is. The B-tree implementation presented is an attempt to provide a data structure with a simple interface with clean specifications. Granted, the specification may not be as precise as some would like, it does present enough information to make a decision as to whether or not it is adequate for a specific task, and how to use the interfaces. Besides the documentation issues mentioned above, how can this be made to exhibit more of the qualities one associates with reusable software components? -- Paul Sander (408) 734-9822 | "Passwords are like underwear," she said, paul@Atherton.COM | "Both should be changed often." {decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"
paul@athertn.Atherton.COM (Paul Sander) (07/28/90)
In article <533@helios.prosys.se> ath@prosys.se (Anders Thulin) writes: >In article <27705@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes: > >> [ ... deleted ... ] >>- How can it be made better? > >By providing: > >* external storage capability > > These should be some way to store the B-tree on secondary memory, so > that the data it contains could be reused in another program. There are a number of features missing from this implementation that could be useful. These include Anders' secondary storage capability (which is what B-trees are used most for), perhaps with multi user locking. An implementation that uses shared memory, again with multi-user locking might be useful. Still another that is implemented as a server that could allow data to be shared across machines would be useful. I guess these issues begin to illustrate why reusable software components are so difficult to develop. My implementation addresses none of these issues, so its utility is limited to those applications that don't need them. On the other hand, applications that don't need the additional features need not pay the overhead demanded by these other implementations if my B-tree satisfies their needs. This implies then, that a reusable software component includes in its documentation a list of particular features and limitations, probably in an index. My implementation would include the following: B-tree: in-memory, single user, single machine, scope limited to single running applications >* browser > > That is, providing a `bt_user_search(tree)' which presents the data > in the tree, and lets the user select an item for further computation. > >This is something I think most components that implement data structures >should have. At least those implementing `complicated' structures. It is debatable whether or not a component should provide a browsing capability or not. It has no way of knowing what the client has stored there. It could, however, provide bt_getFirst, bt_getLast, bt_getNext, and bt_getPrev functions that work in conjunction with the bt_search function to provide the functionality needed for the client to locate information and provide a more effective browser. It is also debatable as to whether or not a data structure such as a B-tree should provide the user interface code needed to support such a browser. On the other hand, a reusable Browser software component could be designed that could work with a large number of data structures without regard to their implementation. It is easy for me to visualize a simple user interface that provides commands for start, stop, move left, right, up, down, in, out, and show summary or detail. One such implementation could register callbacks for each command and provide a way for the client to format its data. -- Paul Sander (408) 734-9822 | "Passwords are like underwear," she said, paul@Atherton.COM | "Both should be changed often." {decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"
ogden@seal.cis.ohio-state.edu (William F Ogden) (07/31/90)
In response to comments from Anders, Paul writes: ... >>* browser >> >> That is, providing a `bt_user_search(tree)' which presents the data >> in the tree, and lets the user select an item for further computation. >> >>This is something I think most components that implement data structures >>should have. At least those implementing `complicated' structures. > >It is debatable whether or not a component should provide a browsing >capability or not. It has no way of knowing what the client has stored >there. It could, however, provide bt_getFirst, bt_getLast, bt_getNext, >and bt_getPrev functions that work in conjunction with the bt_search >function to provide the functionality needed for the client to locate >information and provide a more effective browser. Now here we are getting into a real reusablity issue. In designing a facility for reuse, it is important to try to distinguish between primary and secondary operations on a structure. The Get_First, Get_Next, etc. are primary operations on the reusable structure of which the B-tree is one realization. Traverse or print-out operations are secondary, since they can be implemented easily and efficiently using the primary ones -- provided Get_Next, etc. are included as primaries. There are generally a vast number of potentially useful secondary operations on a structure, but any particular application will only require a small subset. Moreover the next application may need other strange secondary operations which nobody ever considered. Under these circumstances, achieving facility reuse without modification seems paradoxical until you see that with proper design, secondary operations can be added without changing the underlying implementation. Determining which operations are primary and which secondary is somewhat like finding a basis for a vector space of an independent set of axioms for a mathematical theory. You list all the operations on a structure that you can imagine anyone wanting (of course you'll turn out to lack imagination) then you figure out which ones can be implemented using which others. A small (and generally useful) subset, in terms of which all the others can be implemented, form the nucleus of the reusable facility. >It is also debatable as to whether or not a data structure such as a >B-tree should provide the user interface code needed to support such a >browser. Clearly it shouldn't. It involves either a set of secondary operations or more likely a separate facility. /Bill
cweir@cpg.trs.reuter.com (Charles Weir) (08/01/90)
In article <27705@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes: > - Is this a reusable software component? Yes - a classic example of the kind of reusability that functional languages make easy. > - What are its characteristics that make it one? Simple interface. Use of reference parameters (structure pointers) as "magic cookies" where the meaning is not known by the caller to hide the meaning of the data. Use of function pointers to allow "call back" so that the module can interrogate the caller. [I should mention that all these are what is now thought of as Object-Orientation....] > - Anything else relevant to producing a good reusable component that is > reused ... Now lets get to the meat!! Here are a couple of issues that the example did not come to grips with... 1) Error and Exception handling. In Paul's example, exception handling was simple. He returns an error code. But what happens to that error code in the module that calls his code? Is is passed back - and where, finally, does the buck stop? Most modules end up needing some kind of error handling mechanism - and this can provide a real headache for reusable components:- A function supplied by the caller? perror(), followed by exit() - [I'm a C programmer]? Pass back a return code - and if so what? Each option is appropriate in some applications and contexts, and not in others. Yet the reusable module writer must choose one, since they are mutually exclusive. Hence (s)he looses potential customers for the reuse whichever option is chosen. 2) Configuration This is another embarrassment in re-usable code. Configuration parameters come in several forms:- Size parameters: If one application using a reusable module must run in a small space, it will want to make all structures (buffers, strings etc.) as small as reliably possible. A second application, though, has speed as a priority, and wants them as large as possible. Text strings: Some target users like messages like "DBLX fault at 0xfc0040": others must be treated more gently, or speak different languages. Magic items: Patterns on a screen, locations of files, etc. I can think of several ways to implement such configuration decisions:- Compilation constants (#defines) mean that the code must actually be changed for reuse. Parameters passed through a functional interface make the interface complex, and may mean that the calling application must "know" rather too much about the module. Configuration files provide a source on confusion and errors. Any option may be appropriate, but each cuts out potential for re-use. ---------------------------------------------------- I don't have any solutions, or even preferred options. I'm sure others have - any comments? - Charles -- Charles Weir, Rich Inc, 1400 Kensington, Oak Brook, IL 60521 Email: cweir@richsun.uucp uunet!richsun!cweir Phone: +1 708 574 7424 X2766
rwallace@vax1.tcd.ie (08/12/90)
In article <27914@athertn.Atherton.COM>, paul@athertn.Atherton.COM (Paul Sander) writes: >> These should be some way to store the B-tree on secondary memory, so >> that the data it contains could be reused in another program. > > There are a number of features missing from this implementation that could > be useful. These include Anders' secondary storage capability (which is > what B-trees are used most for), perhaps with multi user locking. An > implementation that uses shared memory, again with multi-user > locking might be useful. Still another that is implemented as a server > that could allow data to be shared across machines would be useful. As a matter of fact I think the module already has too much functionality. The main reason I don't reuse software components (and I program in C++ which combines the advantages of being the best OOP/software engineering language around with the vast range of available C code) is that people who write generic modules put in everything expect the kitchen sink. At the moment I'm working on a project which requires screen handling, file handling via indexes, and file and in-memory sort functions. There are commercially available packages available to do all these things. I ignored them and wrote my own for two reasons: 1. The commercial packages contained orders of magnitude more code than I need for this project (a fourth-generation language for writing accounting software). Their use would have expanded the size of applications produced with the 4GL from around 100K to an estimated 400K (!!!). 2. To wade through all the junk and figure out how to make the commercial packages actually work would have taken nearly as long as writing my own code. The first problem can be solved by what I hope will be the next generation of super-optimizing compilers that can examine all modules of a project simultaneously and eliminate code that will never be used e.g. main () { printf ("Hello, World!\n"); } would end up as 100 bytes because the compiler would eliminate all the code from printf to handle numbers etc. The second problem requires an attitude change. "To summarize the summary of the summary: people are a problem" Russell Wallace, Trinity College, Dublin rwallace@vax1.tcd.ie
paul@athertn.Atherton.COM (Paul Sander) (08/15/90)
In article <6729.26c481cb@vax1.tcd.ie> rwallace@vax1.tcd.ie writes: >In article <27914@athertn.Atherton.COM>, paul@athertn.Atherton.COM (Paul Sander) writes: >>> These should be some way to store the B-tree on secondary memory, so >>> that the data it contains could be reused in another program. >> >> There are a number of features missing from this implementation that could >> be useful. These include Anders' secondary storage capability (which is >> what B-trees are used most for), perhaps with multi user locking. An >> implementation that uses shared memory, again with multi-user >> locking might be useful. Still another that is implemented as a server >> that could allow data to be shared across machines would be useful. > >As a matter of fact I think the module already has too much functionality. These additional features need not be added to this particular implementation. Most of the new features in fact cannot be implemented with same the interfaces as the example. Here, the programmer would have to choose the implementation that best suits their needs. > The >main reason I don't reuse software components (and I program in C++ which >combines the advantages of being the best OOP/software engineering language >around with the vast range of available C code) is that people who write >generic modules put in everything expect the kitchen sink. It is true that many such components include many features, more than most clients need. There is a utility/size trade-off made. However, if the component is carefully designed, and the development environment is decent, the size impact can be minimized; the linker and librarian would omit the unused code from the executable image. > At the moment I'm >working on a project which requires screen handling, file handling via indexes, >and file and in-memory sort functions. There are commercially available >packages available to do all these things. I ignored them and wrote my own for >two reasons: > >1. The commercial packages contained orders of magnitude more code than I need >for this project (a fourth-generation language for writing accounting >software). Their use would have expanded the size of applications produced with >the 4GL from around 100K to an estimated 400K (!!!). > >2. To wade through all the junk and figure out how to make the commercial >packages actually work would have taken nearly as long as writing my own code. > >The first problem can be solved by what I hope will be the next generation of >super-optimizing compilers that can examine all modules of a project >simultaneously and eliminate code that will never be used e.g. > >main () { printf ("Hello, World!\n"); } > >would end up as 100 bytes because the compiler would eliminate all the code >from printf to handle numbers etc. These super-optimizing compilers are actually performing global analyses to use the CPU resources more efficiently; allocating registers in a manner that is efficient for the program as a whole (as opposed to a manner that is efficient for a loop or function) is one such benefit, but not the only one. If the library is designed well, a new compiler isn't needed to dump the unused code. On the other hand, the unused code is already there, waiting to be used should the need arise. Printf is a special case because of its polymorphism. Here, the compiler would have to recognize the printf call and parse the format string (assuming it is a constant pointer to a constant) to optimize the code. Such optimizations wouldn't be done in many practical applications; national language support enabled applications come to mind here. >The second problem requires an attitude change. And good documentation, with a good index. How many projects like your current one have you done? How much code have you (or others) rewritten? Have you ever turned up bugs in related applications where you had to make the same bug fix in many places? There are three big wins you get from using reusable packages: Your software will (theoretically) be developed faster, because less needs to be written from scratch; Your software will (theoretically) be more bug free because it is built from (theoretically) bug free components; Your program will (theoretically) be easier to maintain because bug fixes need only be made in one place. Some might even argue that your program would be "better" (faster, smaller) because (theoretically) the implementor was expert at implementing the thing at hand, and was able to make various optimizations you wouldn't happen to think of. Finally, come might argue that your program is more portable, because the component has been ported to many platforms (malloc and printf are handy examples of this). What do you think should be taken out? -- Paul Sander (408) 734-9822 | "Passwords are like underwear," she said, paul@Atherton.COM | "Both should be changed often." {decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"
paul@athertn.Atherton.COM (Paul Sander) (08/18/90)
Okay, I've taken people's suggestions into account, and now have an updated implementation of my in-memory B-tree. The documentation and some miscellaneous stuff follow in "shar" format, and the actual implementation will appear in a subsequent posting. The changes I've made are the following: - Added first/next/last/prev browsing capability - Added "big-O" performance figures to the documentation - Split the implementation into several files that a linker can mix and match as needed to save space - Ported to VMS - Included troff source for documentation My questions now are: - Is this still a reusable component? - Is it an improvement over the previous attempt? - How can it be improved still further? - If you were building a system in C, would _you_, personally, use it? Why, or why not? - If this were installed in a much larger component library, would you use it? Why, or why not? Paul Sander paul@Atherton.COM (408) 734-9822 -------------------------------- #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # README # btree.doc # Makefile # Makefile.vms # btree.man # This archive created: Fri Aug 17 20:48:29 1990 export PATH; PATH=/bin:/usr/bin:$PATH if test -f 'README' then echo shar: "will not over-write existing file 'README'" else cat << \SHAR_EOF > 'README' This directory contains source code to an in-memory B-tree implementation. The tree can hold arbitrary keys, but duplicate keys cannot be inserted. Also, this was written in K&R C, so please handle accordingly. This implementation has been placed in the public domain by its author, Paul Sander, to be used as you wish. However, if you add neat new features, I'd appreciate having a copy sent to me at paul.Atherton.COM, or sun!athertn!paul . Following are the files included, and create a library called libbtree.a on Unix, and LIBBTREE.OLB on VMS. Makefile Rules for building library and test case (SunOS) Makefile.vms Rules for building library and test case (VMS MMS) README This file bt_last.c Implementation of bt_last function bt_next.c Implementation of bt_next function bt_prev.c Implementation of bt_prev function btdelete.c Implementation of bt_delete function btdestroy.c Implementation of bt_destroy function btdump.c Implementation of bt_dump function (non-portable) btfirst.c Implementation of bt_first function btinsert.c Implementation of bt_insert function btmalloc.c Heap utilities used for debugging btnew.c Implementation of bt_new function btpriv.h Private header file used by B-tree source files btree.doc Formatted man page, this is ASCII text btree.h Public include file, to be used by B-tree clients btree.man Man page source -- format with "[tn]roff -man" btsearch.c Implementation of bt_search function bttraverse.c Implementation of bt_traverse function btutil.c Utility functions test.c Test case, links with library to make executable SHAR_EOF fi if test -f 'btree.doc' then echo shar: "will not over-write existing file 'btree.doc'" else cat << \SHAR_EOF > 'btree.doc' BTREE() MISC. REFERENCE MANUAL PAGES BTREE() NAME bt_dump, bt_search, bt_new, bt_destroy, bt_insert, bt_traverse, bt_delete - In-memory B-tree manipulation SYNOPSIS #include <btree.h> BTREE bt_new(order,comp) int order; int (*comp)(); void bt_destroy(tree,free_key) BTREE tree; void (*free_key)(); int bt_insert(tree,key) BTREE tree; char *key; char *bt_delete(tree,key) BTREE tree; char *key; char *bt_search(tree,target) BTREE tree; char *target; void bt_traverse(tree,fn,parms) BTREE tree; void (*fn)(); char *parms; void bt_dump(tree,key_dump) BTREE tree; void (*key_dump)(); char *bt_first(tree) BTREE tree; char *bt_last(tree) BTREE tree; char *bt_next(tree) BTREE tree; char *bt_prev(tree) BTREE tree; DESCRIPTION These functions implement an in-memory B-tree data struc- ture. The tree itself stores only pointers to the client's data, and relies on client-provided functions for any Sun Release 4.0 Last change: 1 BTREE() MISC. REFERENCE MANUAL PAGES BTREE() comparisons and storage deallocation. bt_new creates a new B-tree. The client specifies the desired order (the maximum number of children allowed each node) of the tree (for performance trade-offs based on the efficiency of comparisons of client-provided data), and a pointer to a function that compares two client-provided data structures. The comparison function, comp, has the follow- ing interface: int comp(k1,k2) char *k1,*k2; The result of this function is -1 if the object pointed to by k1 is "less than" the object pointed to by k2, 0 if they are "equal", and 1 otherwise. Client-provided data struc- tures that compare greater by this function will appear later in the lexical order of the data stored in the tree. bt_new returns a pointer to a handle for the B-tree, or NULL if an error occurs. bt_destroy deallocates all storage allocated to a B-tree. The client provides a pointer to a visitation function that is called once for each item stored in the tree. The items are visited in arbitrary order. If NULL is passed instead of a pointer to a function, no action is taken for the client-provided data, but the tree structure itself is freed. bt_insert inserts a new item into the tree. 1 is returned if the insertion was successful, -1 is returned if the new item matches another item that has already been inserted into the tree, and 0 is returned in the event of an error. bt_delete deletes an item from a tree. The value returned is the same as that passed to bt_insert when the item was inserted, or NULL if the item is not found. bt_search searches for an item in a tree. The value returned is the same as that passed to bt_insert when the item was inserted, or NULL if the item is not found. bt_traverse traverses the tree, calling a client-provided visitation function fn once for each item stored there. fn has the following interface: void fn(item,parms) char *item; char *parms; item is the pointer stored when the item was inserted into the tree. parms is an arbitrary pointer that the client wishes to pass to the visitation function, but is otherwise unused by the B-tree implementation. Items are visited in their lexical order. Sun Release 4.0 Last change: 2 BTREE() MISC. REFERENCE MANUAL PAGES BTREE() bt_dump displays the contents of the tree to stdout, along with some diagnostic and statistical information. The key_dump function is called once for each item in the tree, in arbitrary order. It may be NULL if no action is desired. Its interface follows: void key_dump(item) char *item; bt_first returns the item that falls earliest in the lexical order of the items stored in the tree, or NULL if the tree is empty. bt_last returns the item that falls latest in the lexical order of the items stored in the tree, or NULL if the tree is empty. bt_next returns the next item higher in the lexical order after the last key returned by bt_first, bt_next, or bt_search. If bt_search failed to find an item, bt_next returns the next item higher in the lexical order that was stored in the tree. NULL is returned if the last call to bt_next returned the item falling highest in the lexical order of the items stored in the tree, or if the tree was modified since the last call to bt_next or bt_prev. bt_prev returns the next item lower in the lexical order after the last key returned by bt_last, bt_next, or bt_search. If bt_search failed to find an item, bt_prev returns the next item lower in the lexical order that was stored in the tree. NULL is returned if the last call to bt_prev returned the item falling lowest in the lexical order of the items stored in the tree, or if the tree was modified since the last call to bt_next or bt_prev. Worst case performance characteristics are listed below. Here, "m" is number passed as the "order" parameter to bt_new, and "n" is the number of items stored in the tree. bt_search: O(m * log n) bt_new: O(1) bt_destroy: O(log n) + O(n) when free_key is not NULL O(log n) otherwise bt_insert: O(m * log n) bt_delete: O(m * log n) bt_traverse: O(n) bt_next: O(log n) bt_prev: O(log n) bt_first: O(log n) bt_last: O(log n) BUGS This implementation is written in K&R C, and tested only on Sun Release 4.0 Last change: 3 BTREE() MISC. REFERENCE MANUAL PAGES BTREE() a Sun 3 running SunOS 4.0.3 and on VMS V5.3. This implementation assumes "char*" is the generic pointer. This goes against practice used by most "modern" compilers, but aids portability to older compilers. Note also that this will probably generate lots of complaints from Lint. bt_dump assumes that pointers are the same size as integers, and that they can be displayed in total in eight hexidecimal digits. Sun Release 4.0 Last change: 4 SHAR_EOF fi if test -f 'Makefile' then echo shar: "will not over-write existing file 'Makefile'" else cat << \SHAR_EOF > 'Makefile' TARGET = libbtree.a OBJECTS = btmalloc.o btutil.o bt_last.o bt_next.o bt_prev.o \ btdelete.o btdestroy.o btdump.o btfirst.o btinsert.o btnew.o \ btsearch.o bttraverse.o $(TARGET): $(OBJECTS) ar r $@ $? ranlib $@ $(OBJECTS): btpriv.h test: test.o $(TARGET) cc -o test test.o $(TARGET) /usr/lib/debug/malloc.o btree.doc: btree.man nroff -man btree.man > btree.doc clean: rm *.o *.a *.doc test SHAR_EOF fi if test -f 'Makefile.vms' then echo shar: "will not over-write existing file 'Makefile.vms'" else cat << \SHAR_EOF > 'Makefile.vms' TARGET = libbtree.olb OBJECTS = btmalloc.obj btutil.obj bt_last.obj bt_next.obj \ bt_prev.obj btdelete.obj btdestroy.obj btdump.obj btfirst.obj \ btinsert.obj btnew.obj btsearch.obj bttraverse.obj $(TARGET) : $(OBJECTS) if f$search("$(TARGET)") .eqs. "" then $library/create/object $@ library/replace $@ $? $(OBJECTS) : btpriv.h test : test.obj $(TARGET) link/exe=$@.exe test.obj,$(TARGET)/lib,sys$library:vaxcrtl.olb/lib clean : delete *.obj;*,*.lib;*,test.exe SHAR_EOF fi if test -f 'btree.man' then echo shar: "will not over-write existing file 'btree.man'" else cat << \SHAR_EOF > 'btree.man' .TH BTREE .SH NAME bt_dump, bt_search, bt_new, bt_destroy, bt_insert, bt_traverse, bt_delete - In-memory B-tree manipulation .SH SYNOPSIS #include <btree.h> .sp BTREE bt_new(order,comp) .br int order; .br int (*comp)(); .sp void bt_destroy(tree,free_key) .br BTREE tree; .br void (*free_key)(); .sp int bt_insert(tree,key) .br BTREE tree; .br char *key; .sp char *bt_delete(tree,key) .br BTREE tree; .br char *key; .sp char *bt_search(tree,target) .br BTREE tree; .br char *target; .sp void bt_traverse(tree,fn,parms) .br BTREE tree; .br void (*fn)(); .br char *parms; .sp void bt_dump(tree,key_dump) .br BTREE tree; .br void (*key_dump)(); .sp char *bt_first(tree) .br BTREE tree; .sp char *bt_last(tree) .br BTREE tree; .sp char *bt_next(tree) .br BTREE tree; .sp char *bt_prev(tree) .br BTREE tree; .SH DESCRIPTION These functions implement an in-memory B-tree data structure. The tree itself stores only pointers to the client's data, and relies on client-provided functions for any comparisons and storage deallocation. .sp .B bt_new creates a new B-tree. The client specifies the desired order (the maximum number of children allowed each node) of the tree (for performance trade-offs based on the efficiency of comparisons of client-provided data), and a pointer to a function that compares two client-provided data structures. The comparison function, .BR comp , has the following interface: .RS .B int comp(k1,k2) .br .B char *k1,*k2; .RE The result of this function is -1 if the object pointed to by k1 is "less than" the object pointed to by k2, 0 if they are "equal", and 1 otherwise. Client-provided data structures that compare greater by this function will appear later in the lexical order of the data stored in the tree. .B bt_new returns a pointer to a handle for the B-tree, or NULL if an error occurs. .sp .B bt_destroy deallocates all storage allocated to a B-tree. The client provides a pointer to a visitation function that is called once for each item stored in the tree. The items are visited in arbitrary order. If NULL is passed instead of a pointer to a function, no action is taken for the client-provided data, but the tree structure itself is freed. .sp .B bt_insert inserts a new item into the tree. 1 is returned if the insertion was successful, -1 is returned if the new item matches another item that has already been inserted into the tree, and 0 is returned in the event of an error. .sp .B bt_delete deletes an item from a tree. The value returned is the same as that passed to .B bt_insert when the item was inserted, or NULL if the item is not found. .sp .B bt_search searches for an item in a tree. The value returned is the same as that passed to .B bt_insert when the item was inserted, or NULL if the item is not found. .sp .B bt_traverse traverses the tree, calling a client-provided visitation function .B fn once for each item stored there. .B fn has the following interface: .RS .B void fn(item,parms) .br .B char *item; .B .br char *parms; .RE .B item is the pointer stored when the item was inserted into the tree. .B parms is an arbitrary pointer that the client wishes to pass to the visitation function, but is otherwise unused by the B-tree implementation. Items are visited in their lexical order. .sp .B bt_dump displays the contents of the tree to stdout, along with some diagnostic and statistical information. The .B key_dump function is called once for each item in the tree, in arbitrary order. It may be NULL if no action is desired. Its interface follows: .RS .B void key_dump(item) .br .B char *item; .RE .sp .B bt_first returns the item that falls earliest in the lexical order of the items stored in the tree, or NULL if the tree is empty. .sp .B bt_last returns the item that falls latest in the lexical order of the items stored in the tree, or NULL if the tree is empty. .sp .B bt_next returns the next item higher in the lexical order after the last key returned by .BR bt_first , .BR bt_next , .b bt_prev , or .BR bt_search . If .B bt_search failed to find an item, .B bt_next returns the next item higher in the lexical order that was stored in the tree. NULL is returned if the last call to .B bt_next returned the item falling highest in the lexical order of the items stored in the tree, or if the tree was modified since the last call to .B bt_next or .BR bt_prev . .sp .B bt_prev returns the next item lower in the lexical order after the last key returned by .BR bt_last , .BR bt_next , .b bt_prev , or .BR bt_search . If .B bt_search failed to find an item, .B bt_prev returns the next item lower in the lexical order that was stored in the tree. NULL is returned if the last call to .B bt_prev returned the item falling lowest in the lexical order of the items stored in the tree, or if the tree was modified since the last call to .B bt_next or .BR bt_prev . .sp Worst case performance characteristics are listed below. Here, "m" is number passed as the "order" parameter to bt_new, and "n" is the number of items stored in the tree. .RS bt_search: O(m * log n) .br bt_new: O(1) .br bt_destroy: O(log n) + O(n) when free_key is not NULL .br O(log n) otherwise .br bt_insert: O(m * log n) .br bt_delete: O(m * log n) .br bt_traverse: O(n) .br bt_next: O(log n) .br bt_prev: O(log n) .br bt_first: O(log n) .br bt_last: O(log n) .RE .SH BUGS This implementation is written in K&R C, and tested only on a Sun 3 running SunOS 4.0.3 and on VMS V5.3. .sp This implementation assumes "char*" is the generic pointer. This goes against practice used by most "modern" compilers, but aids portability to older compilers. Note also that this will probably generate lots of complaints from Lint. .sp .B bt_dump assumes that pointers are the same size as integers, and that they can be displayed in total in eight hexidecimal digits. SHAR_EOF fi exit 0 # End of shell archive -- Paul Sander (408) 734-9822 | "Passwords are like underwear," she said, paul@Atherton.COM | "Both should be changed often." {decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"
paul@athertn.Atherton.COM (Paul Sander) (08/18/90)
Here is the updated implementation of the in-memory B-tree that has been used as sample code for this thread. Please see the related posting identified as <28982@athertn.Atherton.COM>. ---------- #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # btpriv.h # btree.h # bt_last.c # bt_next.c # bt_prev.c # btdelete.c # btdestroy.c # btdump.c # btfirst.c # btinsert.c # btmalloc.c # btnew.c # btsearch.c # bttraverse.c # btutil.c # test.c # This archive created: Fri Aug 17 20:48:43 1990 export PATH; PATH=/bin:/usr/bin:$PATH if test -f 'btpriv.h' then echo shar: "will not over-write existing file 'btpriv.h'" else cat << \SHAR_EOF > 'btpriv.h' /* The following structure describes a B-tree node. */ struct btnode { int nkeys; /* Number of keys stored here */ int currKey; /* Last key found */ struct btnode *parent; /* Parent of this node */ char **keys; /* array[order-1] of keys */ struct btnode **children; /* array[order] of children */ }; /* The following structure describes a B-tree. */ struct btree { int order; /* Max children permitted */ int (*comp)(); /* Comparison function for keys */ struct btnode *root; /* Root of tree */ int capacity; /* Max keys that will fit */ int contains; /* Key currently stored */ struct btnode *currNode; /* Node accessed */ int nextOk; /* Indicates last search successful */ }; typedef struct btree *BTREE; typedef struct btnode BTNODE; #ifndef DEBUG #define bt_malloc malloc #define bt_free free #else extern char *bt_malloc(); extern char *bt_free(); #endif extern bt_malloc_verify(); #ifdef __BTUTIL_C__ extern int bt_searchNode(); extern int bt_rotateRight(); extern int bt_rotateLeft(); #endif SHAR_EOF fi if test -f 'btree.h' then echo shar: "will not over-write existing file 'btree.h'" else cat << \SHAR_EOF > 'btree.h' /* Header file for in-memory B-tree implementation */ typedef char *BTREE; extern void bt_dump(); extern char *bt_search(); extern BTREE bt_new(); extern void bt_destroy(); extern int bt_insert(); extern void bt_traverse(); extern char *bt_delete(); extern char *bt_first(); extern char *bt_next(); extern char *bt_last(); extern char *bt_prev(); SHAR_EOF fi if test -f 'bt_last.c' then echo shar: "will not over-write existing file 'bt_last.c'" else cat << \SHAR_EOF > 'bt_last.c' /******************************************************************** * * bt_last.c -- This file contains functions needed to find the * key that is highest in the lexical order of keys * stored in an in-memory B-tree. ********************************************************************/ #include <stdio.h> #include "btpriv.h" /******************************************************************** * * bt_last -- This function returns the key that appears last in * the lexical order of the items stored in the tree. * NULL is returned if the tree is empty. * *******************************************************************/ char *bt_last(tree) BTREE tree; { BTNODE *node; if (tree == NULL) return NULL; node = tree->root; if (node->nkeys == 0) return NULL; while (node->children != NULL) { node->currKey = node->nkeys; node = node->children[node->nkeys]; } node->currKey = node->nkeys - 1; tree->currNode = node; tree->nextOk = 1; return node->keys[node->nkeys - 1]; } SHAR_EOF fi if test -f 'bt_next.c' then echo shar: "will not over-write existing file 'bt_next.c'" else cat << \SHAR_EOF > 'bt_next.c' /******************************************************************** * * bt_next.c -- This file contains functions needed to scan an in-memory * B-tree in ascending order. * ********************************************************************/ #include <stdio.h> #include "btpriv.h" /******************************************************************** * * bt_next -- This function returns the key that appears next in * the lexical order of the items stored in the tree, * after the last bt_search, bt_next, bt_prev, bt_last, * or bt_first. NULL is returned if the tree is empty, an * insertion or deletion invalidated the state of the * tree, or the last item in the tree was already visited. * *******************************************************************/ char *bt_next(tree) BTREE tree; { BTNODE *node; int key; /* Return if error, or insertion/deletion invalidated state */ if ((tree == NULL) || (tree->currNode == NULL)) return NULL; /* Set up temporaries */ node = tree->currNode; key = node->currKey; /* Return if we've overrun the tree */ if (key >= node->nkeys) return NULL; /* * Bump current key in current node, compensating for unsuccessful * search */ if (tree->nextOk) key++; tree->nextOk = 1; /* If node has children, return leftmost key of next child */ if (node->children != NULL) { node->currKey = key; node = node->children[key]; while (node->children != NULL) { node->currKey = 0; node = node->children[0]; } node->currKey = 0; tree->currNode = node; return node->keys[0]; } /* Leaf node, return next key */ if (key < node->nkeys) { node->currKey = key; return node->keys[key]; } /* Already visited rightmost key of leaf, go back to parent */ while ((node->parent != NULL) && (key >= node->nkeys)) { node = node->parent; key = node->currKey; } /* Return NULL after last key in tree */ if (key >= node->nkeys) { tree->currNode = node; node->currKey = key; return NULL; } /* Return next key in parent */ tree->currNode = node; return node->keys[key]; } SHAR_EOF fi if test -f 'bt_prev.c' then echo shar: "will not over-write existing file 'bt_prev.c'" else cat << \SHAR_EOF > 'bt_prev.c' /******************************************************************** * * btprev.c -- This file contains functions needed to scan an in-memory * B-tree in descending order. * ********************************************************************/ #include <stdio.h> #include "btpriv.h" /******************************************************************** * * bt_prev -- This function returns the next key that appears * earlier in the lexical order of the items stored in * the tree, after the last bt_search, bt_next, bt_prev, or * bt_last. NULL is returned if the tree is empty, an * insertion or deletion invalidated the state of the * tree, or the next item in the tree was already visited. * *******************************************************************/ char *bt_prev(tree) BTREE tree; { BTNODE *node; int key; /* Return if error, or insertion/deletion invalidated state */ if ((tree == NULL) || (tree->currNode == NULL)) return NULL; tree->nextOk = 1; /* Set up temporaries */ node = tree->currNode; key = node->currKey; /* Return if we've overrun the tree */ if (key < 0) return NULL; /* * If node has children, return rightmost key of previous * child */ if (node->children != NULL) { node = node->children[key]; while (node->children != NULL) { node->currKey = node->nkeys; node = node->children[node->nkeys]; } node->currKey = node->nkeys - 1; tree->currNode = node; return node->keys[node->currKey]; } /* Leaf node, return previous key */ key--; if (key >= 0) { node->currKey = key; return node->keys[key]; } /* Already visited leftmost key of node, go back to parent */ while ((node->parent != NULL) && (key < 0)) { node = node->parent; key = node->currKey - 1; } /* Return NULL after last key in tree */ if (key < 0) { node->currKey = -1; tree->currNode = node; return NULL; } /* Return next key in parent */ node->currKey = key; tree->currNode = node; return node->keys[key]; } SHAR_EOF fi if test -f 'btdelete.c' then echo shar: "will not over-write existing file 'btdelete.c'" else cat << \SHAR_EOF > 'btdelete.c' /******************************************************************** * * btdelete.c -- This file contains functions needed to delete a key * from an in-memory B-tree. * ********************************************************************/ #include <stdio.h> #include "btpriv.h" /******************************************************************** * * bt_delKey -- This function deletes the specified key from the * specified node. * ********************************************************************/ static bt_delKey(node,n) BTNODE *node; int n; { int i; for (i = n; i < node->nkeys - 1; i++) { node->keys[i] = node->keys[i+1]; } if (node->children != NULL) { for (i = n; i < node->nkeys; i++) { node->children[i] = node->children[i+1]; } } node->nkeys--; return; } /******************************************************************** * * bt_weld -- This function is the opposite of burst, above. It joins * two neighboring children of the given node, and lowers a * key into the result. * ********************************************************************/ static bt_weld(tree,node,n) BTREE tree; BTNODE *node; int n; { int i; int j; BTNODE *left; BTNODE *right; if (node->children == NULL) return; left = node->children[n]; right = node->children[n+1]; i = left->nkeys; left->keys[i] = node->keys[n]; for (i++, j = 0; j < right->nkeys; i++, j++) { left->keys[i] = right->keys[j]; } for (i = n + 1; i <= node->nkeys; i++) { node->keys[i-1] = node->keys[i]; node->children[i] = node->children[i+1]; } node->nkeys--; if (left->children != NULL) { for (i = left->nkeys + 1, j = 0; j <= right->nkeys; i++, j++) { left->children[i] = right->children[j]; left->children[i]->parent = left; } bt_free(right->children); } left->nkeys += right->nkeys + 1; bt_free(right->keys); bt_free(right); tree->capacity -= tree->order - 1; return; } /******************************************************************** * * bt_delPred -- This function searches a sub-tree, and deletes the * highest key it finds, and returns it to its caller. * If any node along the way loses its b-tree'ness, the * tree is reorganized after the deletion. * ********************************************************************/ static char *bt_delPred(tree,node) BTREE tree; BTNODE *node; { BTNODE *pnode; int res; char *retval; /* * If at a leaf node, delete the highest key, otherwise * call self recursively */ if (node->children == NULL) { retval = node->keys[node->nkeys - 1]; node->nkeys--; } else { pnode = node->children[node->nkeys]; retval = bt_delPred(tree,pnode); /* Reorganize tree if node gets too empty */ if (pnode->nkeys < (tree->order - 1) / 2) { res = bt_rotateRight(node,node->nkeys - 1,tree->order); if (res == 0) bt_weld(tree,node,node->nkeys - 1); } } return retval; } /******************************************************************** * * bt_delNode -- This function searches a sub-tree for a key, and * deletes the key. If the key is found in a leaf node, * it is simple removed and returned to the caller; * otherwise, it is swapped with its predecessor, and * the returned to the caller. If any node loses its * b-tree'ness, the tree is reorganized. If the key is * not found, NULL is returned. * ********************************************************************/ static char *bt_delNode(tree,node,key) BTREE tree; BTNODE *node; char *key; { int i; int res; char *retval; /* Key not found */ if (node == NULL) return NULL; /* Search for key, descend tree until found or find leaf */ i = bt_searchNode(node,key,tree->comp,&res); if (res == 0) { if (node->children == NULL) return NULL; retval = bt_delNode(tree,node->children[i],key); } else { /* Return the deleted node to caller */ retval = node->keys[i]; if (node->children == NULL) { /* Delete from leaf */ bt_delKey(node,i); } else { /* Swap with predecessor */ node->keys[i] = bt_delPred(tree,node->children[i]); } } /* If a child node was reorganized, rebalancing may be necessary */ if ((node->children != NULL) && (node->children[i]->nkeys < (tree->order - 1) / 2)) { res = bt_rotateRight(node,i - 1,tree->order); if (res == 0) res = bt_rotateLeft(node,i,tree->order); if (res == 0) { if (i >= node->nkeys) i--; bt_weld(tree,node,i); } } return retval; } /******************************************************************** * * bt_delete -- This function deletes the specified key from a * b-tree and reorganizes it as necessary. The deleted * key is returned to the caller if it is found, NULL * is returned otherwise. * ********************************************************************/ char *bt_delete(ptree,key) char *ptree; char *key; { BTNODE *root; char *retval; BTREE tree; tree = (BTREE) ptree; /* Do no deletion if tree is empty */ root = tree->root; if (root->nkeys == 0) return NULL; /* Delete the key */ retval = bt_delNode(tree,root,key); /* * Delete the root if it is empty, yet its children are not. * This happens after a weld on the last key in the root. */ if ((root->nkeys == 0) && (root->children != NULL)) { tree->root = root->children[0]; bt_free(root->keys); bt_free(root->children); bt_free(root); tree->capacity -= tree->order - 1; tree->root->parent = NULL; } if (retval != NULL) { tree->contains--; tree->currNode = NULL; } return retval; } SHAR_EOF fi if test -f 'btdestroy.c' then echo shar: "will not over-write existing file 'btdestroy.c'" else cat << \SHAR_EOF > 'btdestroy.c' /******************************************************************** * * btdestroy.c -- This function contains functions needed to deallocate * an in-memory B-tree in its entirety. * ********************************************************************/ #include <stdio.h> #include "btpriv.h" /******************************************************************** * * bt_freeNode -- This function frees a node in a b-tree. It is * provided with a function that frees each key in the * node also. The free_key function has the following * protocol: * * free_key(key) * char *key; * * If NULL is passed as the free_key function, the node * is freed, but no action is taken for the keys. * ********************************************************************/ static void bt_freeNode(node,free_key) BTNODE *node; void (*free_key)(); { int i; if (node->children != NULL) { for (i = 0; i <= node->nkeys; i++) { bt_freeNode(node->children[i],free_key); } bt_free(node->children); } if (free_key != NULL) { for (i = 0; i < node->nkeys; i++) { (*free_key)(node->keys[i]); } } bt_free(node->keys); bt_free(node); return; } /******************************************************************** * * bt_destroy -- This function frees a b-tree structure. It is also * passed a function that frees the keys contained by the * tree. The free_key function has the same calling * protocol as the free_key function passed to bt_freeNode * above. * ********************************************************************/ void bt_destroy(ptree,free_key) char *ptree; void (*free_key)(); { BTREE tree; tree = (BTREE) ptree; bt_freeNode(tree->root,free_key); bt_free(tree); return; } SHAR_EOF fi if test -f 'btdump.c' then echo shar: "will not over-write existing file 'btdump.c'" else cat << \SHAR_EOF > 'btdump.c' /********************************************************************* * * btdump.c -- This function contains functions needed to display * the contents of an in-memory B-tree, passing each key * to a callback function. This file contains non-portable * code that is intended to help debug the B-tree * implementation. * *********************************************************************/ #include <stdio.h> #include "btpriv.h" /********************************************************************* * * bt_dumpNode -- This function displays the contents of a node and * then recursively displays its children. Note * that this function assumes that a pointer to a node * is the same size as an integer. * *********************************************************************/ void bt_dumpNode(node,key_dump) BTNODE *node; void (*key_dump)(); { int i; if (node == NULL) { printf("ERROR: null node pointer\n"); return; } printf("%08x: %d keys (keys %08x, children %08x), currKey = %d,\n", (int) node,node->nkeys,node->keys,node->children,node->currKey); printf("parent = %08x\n", (int) node->parent); printf("--------\n"); for (i = 0; i < node->nkeys; i++) { if (node->children != NULL) { printf(" %08x\n",node->children[i]); } else { printf(" %08x\n",0); } if (key_dump != NULL) { printf(" "); (*key_dump)(node->keys[i]); } } if (node->children != NULL) { printf(" %08x\n",node->children[i]); for (i = 0; i <= node->nkeys; i++) { bt_dumpNode(node->children[i],key_dump); } } else { printf(" %08x\n",0); } fflush(stdout); return; } /********************************************************************* * * bt_dump -- This function displays the contents of a B-tree. The * caller passes in a function that displays the contents * of one of the keys stored in the tree. The calling * protocol for this function is: * * key_dump(key) * char *key; * * Also, if key_dump is NULL, no action is taken. This is * useful if the structure of the tree is desired without * including its contents. * *********************************************************************/ void bt_dump(ptree,key_dump) char *ptree; void (*key_dump)(); { BTREE tree; tree = (BTREE) ptree; printf("B-tree dump:\n\n"); printf("order = %d\n",tree->order); printf("capacity = %d\n",tree->capacity); printf("contains %d keys\n",tree->contains); printf("efficiency is %d per cent\n", (tree->contains*100)/tree->capacity); printf("handle at %08x\n",tree); printf("root = %08x\n\n",(int) tree->root); printf("currNode = %08x\n\n",(int) tree->currNode); bt_dumpNode(tree->root,key_dump); return; } SHAR_EOF fi if test -f 'btfirst.c' then echo shar: "will not over-write existing file 'btfirst.c'" else cat << \SHAR_EOF > 'btfirst.c' /******************************************************************** * * btfirst.c -- This file contains functions to locate the key * falling first in the lexical order of items stored * in an in-memory B-tree. * ********************************************************************/ #include <stdio.h> #include "btpriv.h" /******************************************************************** * * bt_first -- This function returns the key that appears first in * the lexical order of the items stored in the tree. * NULL is returned if the tree is empty. * *******************************************************************/ char *bt_first(tree) BTREE tree; { BTNODE *node; if (tree == NULL) return NULL; node = tree->root; if (node->nkeys == 0) return NULL; while (node->children != NULL) { node->currKey = 0; node = node->children[0]; } node->currKey = 0; tree->currNode = node; tree->nextOk = 1; return node->keys[0]; } SHAR_EOF fi if test -f 'btinsert.c' then echo shar: "will not over-write existing file 'btinsert.c'" else cat << \SHAR_EOF > 'btinsert.c' /********************************************************************* * * btinsert.c -- This file contains functions needed to insert new * items into an in-memory B-tree. * *********************************************************************/ #include <stdio.h> #include "btpriv.h" /********************************************************************* * * bt_insertKey -- This function inserts a key into a specified node at * a specified location in its key array. 0 is returned * if the node is already full, 1 otherwise. * *********************************************************************/ static int bt_insertKey(node,key,n,order) BTNODE *node; char *key; int n; int order; { int i; /* Return FALSE if insertion is not possible */ if (node->nkeys >= order - 1) { return 0; } /* Shift keys to make room, then insert new key */ for (i = node->nkeys - 1; i >= n; i--) { node->keys[i+1] = node->keys[i]; } node->keys[n] = key; node->nkeys++; return 1; } /********************************************************************* * * bt_burst -- Given a node and an index into its key and child arrays, * this function splits the specified child node in half, * elevating the key at the child's split point into this * specified node. 0 is returned if the child array index * is out of range, or the node is full, or malloc fails; * 1 is returned otherwise. * *********************************************************************/ static int bt_burst(tree,node,n) BTREE tree; BTNODE *node; int n; { BTNODE *new; int i; int m; BTNODE *left; int lkeys; /* Test parameters */ if (((node->nkeys > 0) && (node->nkeys >= tree->order - 1)) || (n < 0)) { return 0; } /* Create a new node */ new = (BTNODE*) bt_malloc(sizeof(BTNODE)); if (new != NULL) { new->parent = node; /* Allocate new key array */ new->keys = (char**) bt_malloc(tree->order*sizeof(char*)); if (new->keys != NULL) { /* Calculate needed partial results */ m = tree->order / 2; left = node->children[n]; lkeys = left->nkeys; /* Allocate new child array */ if (left->children != NULL) { new->children = (BTNODE**) bt_malloc((tree->order + 1) * sizeof(BTNODE*)); if (new->children != NULL) { for (i = 0; i <= lkeys - m; i++) { new->children[i] = left->children[m + i]; new->children[i]->parent = new; } } else { bt_free(new->keys); bt_free(new); return 0; } } else { new->children = NULL; } /* Split keys */ for (i = 0; i < lkeys - m; i++) { new->keys[i] = left->keys[m + i]; } new->nkeys = lkeys - m; left->nkeys = m - 1; /* Elevate the key where the split was made */ for (i = node->nkeys; i > n; i--) { node->keys[i] = node->keys[i-1]; node->children[i+1] = node->children[i]; } node->children[n+1] = new; node->keys[n] = left->keys[m - 1]; node->nkeys++; tree->capacity += tree->order - 1; return 1; } /* Failed to allocate new child array, free node */ bt_free(new); } /* Failed to allocate new node, return FALSE */ return 0; } /********************************************************************* * * bt_rebalance -- This function tries to rebalance the two adjacent * nodes. This is needed when an insertion is attempted * into a node that is already full but its neighbors are * not, or when a deletion is made and the node is less * than half-full. * *********************************************************************/ static bt_rebalance(node,n,order) BTNODE *node; int n; int order; { int res; if (node->children[n]->nkeys < (order-1)/2) { res = bt_rotateLeft(node,n,order); } else if ((n < node->nkeys) && (node->children[n+1]->nkeys < (order-1)/2)) { res = bt_rotateRight(node,n,order); } return; } /********************************************************************* * * bt_insertNode -- This function performs a recursive-descent of a b-tree, * inserting a new key in a leaf node. If the leaf is * full, the tree is reorganized to make room. -1 is * returned if the insert fails due to a duplicate key. * 0 is returned if the insert fails for some other * reason. 1 is returned if successful. * *********************************************************************/ static int bt_insertNode(tree,node,key) BTREE tree; BTNODE *node; char *key; { int i; int res; /* Locate proper place for new key in node */ i = bt_searchNode(node,key,tree->comp,&res); if (res) return -1; /* If no children, insert key in this node */ if (node->children == NULL) { return bt_insertKey(node,key,i,tree->order); } /* Try to insert the new key in a child node */ res = bt_insertNode(tree,node->children[i],key); /* Child is full, reorganize */ if (res == 0) { /* Try rotating right */ if ((res == 0) && (i < node->nkeys) && (node->children[i+1]->nkeys < tree->order - 2)) { res = bt_rotateRight(node,i,tree->order); } /* Try rotating left */ if ((res == 0) && (i > 0) && (node->children[i-1]->nkeys < tree->order - 2)) { res = bt_rotateLeft(node,i - 1,tree->order); } /* Can't rotate, try bursting */ if (res == 0) res = bt_burst(tree,node,i); if (res > 0) { res = bt_insertNode(tree,node,key); if (res > 0) { bt_rebalance(node,i,tree->order); } } } return res; } /********************************************************************* * * bt_insert -- This function inserts a new key into a b-tree. -1 is * returned if the insert fails due to a duplicate key. * 0 is returned if the insert fails for some other * reason. 1 is returned if successful. * *********************************************************************/ int bt_insert(ptree,key) char *ptree; char *key; { int res; BTNODE *new; BTNODE *node; BTREE tree; tree = (BTREE) ptree; /* Begin at root */ node = tree->root; /* If root is empty, insert first key */ if (node->nkeys == 0) { node->keys[0] = key; node->nkeys = 1; tree->contains = 1; tree->currNode = NULL; return 1; } /* Try inserting new key */ res = bt_insertNode(tree,node,key); /* If 0 return, burst the root, rebalance, and try again */ if (res == 0) { new = (BTNODE*) bt_malloc(sizeof(BTNODE)); if (new != NULL) { new->children = (BTNODE**) bt_malloc((tree->order + 1) * sizeof(BTNODE*)); if (new->children != NULL) { new->keys = (char**) bt_malloc((tree->order) * sizeof(char*)); if (new->keys != NULL) { new->nkeys = 0; new->children[0] = node; node->parent = new; tree->root = new; tree->capacity += tree->order - 1; res = bt_burst(tree,new,0); if (res > 0) { res=bt_insertNode(tree,new,key); } if (res > 0) { tree->contains++; bt_rebalance(new,0,tree->order); tree->currNode = NULL; } return res; } bt_free(new->children); } bt_free(new); } } if (res > 0) { tree->contains++; tree->currNode = NULL; } return res; } SHAR_EOF fi if test -f 'btmalloc.c' then echo shar: "will not over-write existing file 'btmalloc.c'" else cat << \SHAR_EOF > 'btmalloc.c' /************************************************************** * * btmalloc.c -- This file contains functions that augment the * C runtime library memory allocator in order * to aid debugging. * **************************************************************/ #include <stdio.h> /* The following structure is used for debugging heap corruptions */ struct heapblk { char *blk; int freeNext; }; struct heap_struct { int allocSeq; int freeLast; struct heapblk blks[10000]; }; struct heap_struct btheap = {0,-1}; /* * The following code is used for allocating space on the heap. When not * debugging heap corruptions, a macro aliases bt_malloc with malloc. */ char *bt_malloc(size) unsigned size; { char *temp; temp = (char*) malloc(size); btheap.blks[btheap.allocSeq].blk = temp; btheap.blks[btheap.allocSeq].freeNext = -2; btheap.allocSeq++; return temp; } /* * The following code is used for freeing memory on the heap. If not * debugging heap corruptions, bt_free is aliased with free. */ bt_free(ptr) char *ptr; { int i; printf("Freeing heap storage at %x\n",ptr); for (i = 0; i < btheap.allocSeq; i++) { if (ptr == btheap.blks[i].blk) { if (btheap.blks[i].freeNext == -2) { btheap.blks[i].freeNext = btheap.freeLast; btheap.freeLast = i; free(ptr); return; } else { printf( "ERROR: freeing %x which was already freed\n", ptr); printf("Block\n"); i = btheap.freeLast; while (i > 0) { printf("%x\n",btheap.blks[i].blk); i = btheap.blks[i].freeNext; } fflush(stdout); kill(getpid(),5); } } } printf("ERROR: Freeing %x which has not been allocated\n",ptr); fflush(stdout); kill(getpid(),5); } /* The following verifies the integrity of the heap. */ int bt_malloc_verify() { int retval; int once; int i; retval = 1; once = 0; #ifdef DEBUG for (i = 0; i < btheap.allocSeq; i++) { if (btheap.blks[i].freeNext == -2) { if (!once) { once = 1; printf("The following blocks are allocated:\n"); } printf("%x\n",btheap.blks[i].blk); } } fflush(stdout); #endif #ifdef sun retval = malloc_verify(); #endif return retval; } SHAR_EOF fi if test -f 'btnew.c' then echo shar: "will not over-write existing file 'btnew.c'" else cat << \SHAR_EOF > 'btnew.c' /*********************************************************************** * * btnew.c -- This file contains functions to create a new in-memory * B-tree. * ***********************************************************************/ #include <stdio.h> #include "btpriv.h" /*********************************************************************** * * bt_new -- Given the number of children each node is allowed to have, * and a pointer to a strcmp-like function that compares two * keys, this function creates an empty b-tree structure. * ***********************************************************************/ char *bt_new(order,comp) int order; int (*comp)(); { BTREE tree; int i; /* Validate parameters */ if ((comp == NULL) || (order < 3)) return (char *) NULL; /* Allocate the handle */ tree = (BTREE) bt_malloc(sizeof(struct btree)); if (tree != NULL) { /* Allocate the root */ tree->root = (BTNODE*) bt_malloc(sizeof(struct btnode)); if (tree->root != NULL) { /* Initialize root */ tree->order = order; tree->comp = comp; tree->root->nkeys = 0; tree->root->children = NULL; /* Allocate key array */ tree->root->keys = (char**)bt_malloc((order - 1) * sizeof(char*)); if (tree->root->keys != NULL) { /* Initialize keys */ for (i = 0; i < order - 1; i++) { tree->root->keys[i] = NULL; } tree->root->nkeys = 0; tree->root->parent = NULL; tree->contains = 0; tree->capacity = order - 1; tree->currNode = NULL; return (char *) tree; } /* Failed to allocate keys, free tree */ bt_free(tree->root); } /* Failed to allocate root, free handle */ bt_free(tree); } /* Failed to allocate handle, return null (error) */ return (char *) NULL; } SHAR_EOF fi if test -f 'btsearch.c' then echo shar: "will not over-write existing file 'btsearch.c'" else cat << \SHAR_EOF > 'btsearch.c' /******************************************************************** * btsearch -- This file contains functions needed to locate an * item in an in-memory B-tree. * ********************************************************************/ #include <stdio.h> #include "btpriv.h" /******************************************************************** * * bt_searchNode -- This function searches a node for a key. If the key * is found, its index in the node's key array is * returned, along with a flag indicating that the key * was found. If the key is not found, the index of the * next higher key in the key array is returned, along * with a "not found" indication. * ********************************************************************/ static int bt_searchNode(node,target,comp,eq) BTNODE *node; /* Node to be searched */ char *target; /* Key to search for */ int (*comp)(); /* strcmp()-like comparison function */ int *eq; /* 0 if not found, non-zero otherwise */ { int i; int res; int hi; int lo; res = -1; /* * Perform linear search of node contains 7 keys or less, * otherwise perform binary search */ if (node->nkeys <= 7) { for (i = 0; i < node->nkeys; i++) { res = (*comp)(node->keys[i],target); if (res >= 0) break; } } else { lo = 0; hi = node->nkeys - 1; while ((lo <= hi) && (res != 0)) { i = (lo + hi) / 2; res = (*comp)(node->keys[i],target); if (res < 0) lo = i + 1; else if (res > 0) hi = i - 1; } if (res < 0) i++; } *eq = (res == 0); /* Indicate successful search */ node->currKey = i + *eq - 1; return i; } /*********************************************************************** * * bt_locate -- This function performs a recursive-descent search of * a sub-tree for a key. The key is returned if found, * or NULL otherwise. * ***********************************************************************/ static char *bt_locate(tree,node,target,comp) BTREE tree; BTNODE *node; char *target; int (*comp)(); { int res; int i; int eq; i = bt_searchNode(node,target,comp,&eq); node->currKey = i; if (eq) { tree->currNode = node; tree->nextOk = 1; return node->keys[i]; } else if (node->children == NULL) { tree->currNode = node; tree->nextOk = 0; return NULL; } else { return bt_locate(tree,node->children[i],target,comp); } } /*********************************************************************** * * bt_search -- This function searches a tree for a specified key. The * key is returned if found, or NULL if not. * ***********************************************************************/ char *bt_search(tree,target) BTREE tree; char *target; { if (tree == NULL) return NULL; return bt_locate(tree,tree->root,target,tree->comp); } SHAR_EOF fi if test -f 'bttraverse.c' then echo shar: "will not over-write existing file 'bttraverse.c'" else cat << \SHAR_EOF > 'bttraverse.c' /**************************************************************** * * bttraverse.c -- This file contains functions needed to * traverse an in-memory B-tree, passing each * item stored there to a callback function. * ****************************************************************/ #include <stdio.h> #include "btpriv.h" /**************************************************************** * * bt_visit -- This function is used while traversing a b-tree. * It is called once for each node. It calls some * specified function once for each key in its * key array, and calls itself once for each child * in its child array. * ****************************************************************/ static void bt_visit(node,fn,parms) BTNODE *node; void (*fn)(); char *parms; { int i; if (node == NULL) return; for (i = 0; i < node->nkeys; i++) { if (node->children != NULL) { bt_visit(node->children[i],fn,parms); } (*fn)(node->keys[i],parms); } if (node->children != NULL) { bt_visit(node->children[i],fn,parms); } return; } /**************************************************************** * * bt_traverse -- This function traverses a b-tree, calling some * specified function once for each key stored in * it. The specified function has the following * protocol: * * fn(key,parms) * char *key; * char *parms; * * where key is a pointer to a key stored in the * btree, and parms is the parameter structure * passed by the caller. * * The nodes are visited in descending order, if * the comp function passed to bt_new is * strcmp-like. * ****************************************************************/ void bt_traverse(ptree,fn,parms) char *ptree; void (*fn)(); char *parms; { BTREE tree; tree = (BTREE) ptree; bt_visit(tree->root,fn,parms); return; } SHAR_EOF fi if test -f 'btutil.c' then echo shar: "will not over-write existing file 'btutil.c'" else cat << \SHAR_EOF > 'btutil.c' /******************************************************************** * * btutil.c -- This function contains utility functions that are * used for many features of an in-memory B-tree * implementation. * ********************************************************************/ #define __BTUTIL_C__ #include <stdio.h> #include "btpriv.h" /******************************************************************** * * bt_searchNode -- This function searches a node for a key. If the key * is found, its index in the node's key array is * returned, along with a flag indicating that the key * was found. If the key is not found, the index of the * next higher key in the key array is returned, along * with a "not found" indication. * ********************************************************************/ int bt_searchNode(node,target,comp,eq) BTNODE *node; /* Node to be searched */ char *target; /* Key to search for */ int (*comp)(); /* strcmp()-like comparison function */ int *eq; /* 0 if not found, non-zero otherwise */ { int i; int res; int hi; int lo; res = -1; /* * Perform linear search of node contains 7 keys or less, * otherwise perform binary search */ if (node->nkeys <= 7) { for (i = 0; i < node->nkeys; i++) { res = (*comp)(node->keys[i],target); if (res >= 0) break; } } else { lo = 0; hi = node->nkeys - 1; while ((lo <= hi) && (res != 0)) { i = (lo + hi) / 2; res = (*comp)(node->keys[i],target); if (res < 0) lo = i + 1; else if (res > 0) hi = i - 1; } if (res < 0) i++; } *eq = (res == 0); /* Indicate successful search */ node->currKey = i + *eq - 1; return i; } /****************************************************************** * * bt_rotateRight -- Given a node and an index into its key array, this * function rotates the node right at the specified * key. This is used during insertion and deletion to * keep the tree balanced. The return value is 0 if * the function fails, i.e. the node on the left * cannot lose keys, the node on the right cannot * gain keys, or the specified node is a leaf; 1 is * returned if successful. * ******************************************************************/ int bt_rotateRight(node,n,order) BTNODE *node; int n; int order; { int i; BTNODE *right; BTNODE *left; /* Test parameters */ if ((n >= node->nkeys) || (n < 0)) { return 0; } /* Point to children */ left = node->children[n]; right = node->children[n+1]; /* Return FALSE if rotation is not possible */ if ((left == NULL) || (left->nkeys <= order/2) || (right->nkeys >= order - 1)) { return 0; } /* Shift the right child's keys */ for (i = right->nkeys; i > 0; i--) { right->keys[i] = right->keys[i - 1]; } /* Rotate keys */ right->keys[0] = node->keys[n]; node->keys[n] = left->keys[left->nkeys - 1]; left->keys[left->nkeys - 1] = NULL; if (right->children != NULL) { /* Shift the right child's children */ for (i = right->nkeys; i >= 0; i--) { right->children[i + 1] = right->children[i]; } /* Rotate children */ right->children[0] = left->children[left->nkeys]; right->children[0]->parent = right; left->children[left->nkeys] = NULL; } /* Update key count and return TRUE */ right->nkeys++; left->nkeys--; return 1; } /****************************************************************** * * bt_rotateLeft -- Given a node and an index into its key array, this * function rotates the node left at the specified * key. This is used during insertion and deletion to * keep the tree balanced. The return value is 0 if * the function fails, i.e. the node on the right * cannot lose keys, the node on the left cannot * gain keys, or the specified node is a leaf; 1 is * returned when successful. * ******************************************************************/ int bt_rotateLeft(node,n,order) BTNODE *node; int n; int order; { int i; BTNODE *right; BTNODE *left; /* Test parameters */ if ((n < 0) || (n >= node->nkeys)) { return 0; } /* Point to children */ right = node->children[n+1]; left = node->children[n]; /* Return FALSE if rotation is not possible */ if ((left == NULL) || (right->nkeys <= order/2) || (left->nkeys >= order - 1)) { return 0; } /* Rotate keys */ left->keys[left->nkeys] = node->keys[n]; node->keys[n] = right->keys[0]; /* Update key counts */ left->nkeys ++; right->nkeys --; /* Shift the right child's keys */ for (i = 0; i < right->nkeys; i++) { right->keys[i] = right->keys[i+1]; } right->keys[i] = NULL; if (left->children != NULL) { /* Rotate children */ left->children[left->nkeys] = right->children[0]; left->children[left->nkeys]->parent = left; /* Shift the right child's children */ for (i = 0; i <= right->nkeys; i++) { right->children[i] = right->children[i+1]; } right->children[i+1] = NULL; } /* Return TRUE */ return 1; } SHAR_EOF fi if test -f 'test.c' then echo shar: "will not over-write existing file 'test.c'" else cat << \SHAR_EOF > 'test.c' #include <stdio.h> #include "btree.h" /* #define DEBUGGING */ #define LOOPS 500 #define ORDER 12 typedef struct key { int key; } KEY; KEY savedKeys[LOOPS]; dump(key) KEY *key; { if (key == NULL) printf("********NULL**********\n"); else printf("%x: %d\n",key,key->key); } visit(key,parms) KEY *key; char *parms; { dump(key); } comp(k1,k2) KEY *k1,*k2; { return (k1->key - k2->key); } main() { BTREE tree; KEY *key; int i; int res; KEY *key1; KEY *key2; KEY *key3; KEY *key4; KEY tkey; #ifdef DEBUGGING #ifdef sun malloc_debug(2); #endif #endif tree = (BTREE)bt_new(ORDER,comp); if (tree == NULL) { printf("Failed to create tree\n"); exit(1); } (void) bt_malloc_verify(); for (i = 0; i < LOOPS; i++) { printf("Iteration %d\n",i+1); key = &savedKeys[i]; key->key = rand(); printf("Inserting %d\n",key->key); fflush(stdout); res = bt_insert(tree,key); if (res > 0) { printf("Inserted %d\n",key->key); } else if (res < 0) { printf("Duplicate key %d\n",key->key); } else { printf("Error inserting key %d\n"); exit(3); } printf("Trying duplicate\n"); res = bt_insert(tree,key); if (res < 0) { printf("Failed to insert duplicate\n"); } else if (res == 0) { printf("Error inserting duplicate\n"); exit(4); } else { printf("Succeeded in inserting duplicate\n"); exit(5); } #if 0 #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); #endif #endif 0 } printf("Finished creating the tree.\n"); printf("Contents of tree follow:\n"); bt_traverse(tree,visit,NULL); for (i = 0; i < LOOPS; i++) { printf("Iteration %d\n",i); printf("Seeking %d at %x\n",savedKeys[i].key,&savedKeys[i]); key = (KEY*)bt_search(tree,&savedKeys[i]); if (key != &savedKeys[i]) { printf("Can't find %d, returned %x\n",savedKeys[i].key, key); } printf("Freeing %d at %x\n",savedKeys[i].key,&savedKeys[i]); fflush(stdout); key = (KEY*)bt_delete(tree,&savedKeys[i]); if (key != &savedKeys[i]) { printf("Problem freeing %d at %x, returned %x\n", savedKeys[i].key,&savedKeys[i].key,key); } else { printf("Freed %d, returned %x\n",savedKeys[i].key,key); } #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); fflush(stdout); #endif key = (KEY*)bt_search(tree,&savedKeys[i]); if (key == &savedKeys[i]) { printf("Found %d after deletion\n",savedKeys[i].key); } key = (KEY*)bt_delete(tree,&savedKeys[i]); if (key != NULL) { printf("%d remained in tree after deletion at %x\n", savedKeys[i].key,key); } #if 0 #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); #endif #endif 0 } (void) bt_malloc_verify(); printf("Freeing tree\n"); bt_destroy(tree,NULL); printf("Finished freeing tree\n"); printf("Malloc verify: "); if (bt_malloc_verify()) { printf("success\n"); fflush(stdout); } else { printf("failure\n"); fflush(stdout); exit(6); } printf("Creating second tree\n"); tree = (BTREE)bt_new(ORDER,comp); if (tree == NULL) { printf("Failed to create tree\n"); exit(7); } (void) bt_malloc_verify(); printf("Contents of tree:\n"); bt_dump(tree,dump); for (i = 0; i < LOOPS; i++) { printf("Iteration %d\n",i+1); key = &savedKeys[i]; key->key = rand(); printf("Inserting %d\n",key->key); fflush(stdout); res = bt_insert(tree,key); if (res > 0) { printf("Inserted %d\n",key->key); } else if (res < 0) { printf("Duplicate key %d\n",key->key); } else { printf("Error inserting key %d\n"); exit(8); } #if 0 #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); #endif #endif 0 } printf("Finished creating the tree.\n"); printf("Contents of tree follow:\n"); bt_traverse(tree,visit,NULL); printf("Contents of tree:\n"); bt_dump(tree,dump); printf("Testing first and next\n"); key = (KEY *) bt_first(tree); while (key != NULL) { dump(key); key = (KEY *) bt_next(tree); } printf("Backing up\n"); key = (KEY *) bt_prev(tree); dump(key); printf("Testing last and prev\n"); key = (KEY *) bt_last(tree); while (key != NULL) { dump(key); key = (KEY *) bt_prev(tree); } printf("Stepping forward\n"); key = (KEY *) bt_next(tree); dump(key); printf("Testing search\n"); key1 = (KEY *) bt_first(tree); key2 = (KEY *) bt_next(tree); key4 = (KEY *) bt_last(tree); key3 = (KEY *) bt_prev(tree); dump(key1); dump(key2); dump(key3); dump(key4); printf("Testing search at beginning\n"); key = (KEY *) bt_search(tree,key1); if (key == key1) { printf("Successful search okay\n"); key = (KEY *) bt_next(tree); if (key == key2) printf("next okay\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,key1); key = (KEY *) bt_prev(tree); if (key == NULL) printf("prev Ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Successful search failed\n"); } printf("Testing search at end\n"); key = (KEY *) bt_search(tree,key4); if (key == key4) { printf("Successful search okay\n"); key = (KEY *) bt_prev(tree); if (key == key3) printf("prev okay\n"); else { printf("prev bad\n"); dump(key); } key = (KEY *) bt_search(tree,key4); key = (KEY *) bt_next(tree); if (key == NULL) printf("next Ok\n"); else { printf("next bad\n"); dump(key); } } else { printf("Successful search failed\n"); } printf("Testing first unsuccessful key\n"); tkey.key = key1->key + 1; key = (KEY *) bt_search(tree,&tkey); if (key == NULL) { printf("Unsuccessful search okay\n"); key = (KEY *) bt_next(tree); if (key == key2) printf("next ok\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,&tkey); key = (KEY *) bt_prev(tree); if (key == key1) printf("prev ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Unsuccessful search failed\n"); } printf("Testing second unsuccessful key\n"); tkey.key = key4->key - 1; key = (KEY *) bt_search(tree,&tkey); if (key == NULL) { printf("Unsuccessful search okay\n"); key = (KEY *) bt_next(tree); if (key == key4) printf("next ok\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,&tkey); key = (KEY *) bt_prev(tree); if (key == key3) printf("prev ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Unsuccessful search failed\n"); } printf("Testing third unsuccessful key\n"); tkey.key = key1->key - 1; key = (KEY *) bt_search(tree,&tkey); if (key == NULL) { printf("Unsuccessful search okay\n"); key = (KEY *) bt_next(tree); if (key == key1) printf("next ok\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,&tkey); key = (KEY *) bt_prev(tree); if (key == NULL) printf("prev ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Unsuccessful search failed\n"); } printf("Testing fourth unsuccessful key\n"); tkey.key = key4->key + 1; key = (KEY *) bt_search(tree,&tkey); if (key == NULL) { printf("Unsuccessful search okay\n"); key = (KEY *) bt_next(tree); if (key == NULL) printf("next ok\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,&tkey); key = (KEY *) bt_prev(tree); if (key == key4) printf("prev ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Unsuccessful search failed\n"); } printf("Freeing 3/4 of the keys\n"); for (i = 0; i < 3 * LOOPS / 4; i++) { printf("Iteration %d\n",i+1); key = &savedKeys[i]; key1 = (KEY*) bt_delete(tree,key); if (key1 == key) printf("Deletion okay\n"); else printf("Deletion error, wanted %x, got %x\n",key,key1); fflush(stdout); #if 0 #ifdef DEBUGGING printf("Contents of tree:\n"); bt_dump(tree,dump); #endif #endif 0 } printf("Finished deleting keys.\n"); printf("Contents of tree follow:\n"); bt_traverse(tree,visit,NULL); printf("Contents of tree:\n"); bt_dump(tree,dump); printf("Testing first and next\n"); key = (KEY *) bt_first(tree); while (key != NULL) { dump(key); key = (KEY *) bt_next(tree); } printf("Backing up\n"); key = (KEY *) bt_prev(tree); dump(key); printf("Testing last and prev\n"); key = (KEY *) bt_last(tree); while (key != NULL) { dump(key); key = (KEY *) bt_prev(tree); } printf("Stepping forward\n"); key = (KEY *) bt_next(tree); dump(key); printf("Testing search\n"); key1 = (KEY *) bt_first(tree); key2 = (KEY *) bt_next(tree); key4 = (KEY *) bt_last(tree); key3 = (KEY *) bt_prev(tree); dump(key1); dump(key2); dump(key3); dump(key4); printf("Testing search at beginning\n"); key = (KEY *) bt_search(tree,key1); if (key == key1) { printf("Successful search okay\n"); key = (KEY *) bt_next(tree); if (key == key2) printf("next okay\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,key1); key = (KEY *) bt_prev(tree); if (key == NULL) printf("prev Ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Successful search failed\n"); } printf("Testing search at end\n"); key = (KEY *) bt_search(tree,key4); if (key == key4) { printf("Successful search okay\n"); key = (KEY *) bt_prev(tree); if (key == key3) printf("prev okay\n"); else { printf("prev bad\n"); dump(key); } key = (KEY *) bt_search(tree,key4); key = (KEY *) bt_next(tree); if (key == NULL) printf("next Ok\n"); else { printf("next bad\n"); dump(key); } } else { printf("Successful search failed\n"); } printf("Testing first unsuccessful key\n"); tkey.key = key1->key + 1; key = (KEY *) bt_search(tree,&tkey); if (key == NULL) { printf("Unsuccessful search okay\n"); key = (KEY *) bt_next(tree); if (key == key2) printf("next ok\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,&tkey); key = (KEY *) bt_prev(tree); if (key == key1) printf("prev ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Unsuccessful search failed\n"); } printf("Testing second unsuccessful key\n"); tkey.key = key4->key - 1; key = (KEY *) bt_search(tree,&tkey); if (key == NULL) { printf("Unsuccessful search okay\n"); key = (KEY *) bt_next(tree); if (key == key4) printf("next ok\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,&tkey); key = (KEY *) bt_prev(tree); if (key == key3) printf("prev ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Unsuccessful search failed\n"); } printf("Testing third unsuccessful key\n"); tkey.key = key1->key - 1; key = (KEY *) bt_search(tree,&tkey); if (key == NULL) { printf("Unsuccessful search okay\n"); key = (KEY *) bt_next(tree); if (key == key1) printf("next ok\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,&tkey); key = (KEY *) bt_prev(tree); if (key == NULL) printf("prev ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Unsuccessful search failed\n"); } printf("Testing fourth unsuccessful key\n"); tkey.key = key4->key + 1; key = (KEY *) bt_search(tree,&tkey); if (key == NULL) { printf("Unsuccessful search okay\n"); key = (KEY *) bt_next(tree); if (key == NULL) printf("next ok\n"); else { printf("next bad\n"); dump(key); } key = (KEY *) bt_search(tree,&tkey); key = (KEY *) bt_prev(tree); if (key == key4) printf("prev ok\n"); else { printf("prev bad\n"); dump(key); } } else { printf("Unsuccessful search failed\n"); } printf("Freeing tree\n"); bt_destroy(tree,NULL); printf("Finished freeing tree\n"); printf("Malloc verify: "); if (bt_malloc_verify()) printf("success\n"); else printf("failure\n"); fflush(stdout); exit(0); } SHAR_EOF fi exit 0 # End of shell archive -- Paul Sander (408) 734-9822 | "Passwords are like underwear," she said, paul@Atherton.COM | "Both should be changed often." {decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"
rwallace@vax1.tcd.ie (08/19/90)
In article <28772@athertn.Atherton.COM>, paul@athertn.Atherton.COM (Paul Sander) writes: >>> There are a number of features missing from this implementation that could >>> be useful. These include Anders' secondary storage capability (which is >>> what B-trees are used most for), perhaps with multi user locking. An >>> implementation that uses shared memory, again with multi-user >>> locking might be useful. Still another that is implemented as a server >>> that could allow data to be shared across machines would be useful. >> >>As a matter of fact I think the module already has too much functionality. > > These additional features need not be added to this particular implementation. > Most of the new features in fact cannot be implemented with same the interfaces > as the example. Here, the programmer would have to choose the implementation > that best suits their needs. > As far as I can see, what you're saying is that packages shouldn't' be made to do everything, they should be made expandable so that people can add the features they need. I agree with you there. >> The >>main reason I don't reuse software components (and I program in C++ which >>combines the advantages of being the best OOP/software engineering language >>around with the vast range of available C code) is that people who write >>generic modules put in everything expect the kitchen sink. > > It is true that many such components include many features, more than most > clients need. There is a utility/size trade-off made. However, if the > component is carefully designed, and the development environment is decent, > the size impact can be minimized; the linker and librarian would omit the > unused code from the executable image. Some of it but by no means all. Look at the source code for any reusable component written by someone else. Think about a typical use, e.g. a screen package for accounting software: it will have facilities for handling movable resizable windows in graphics or text mode, with handling of carriage return / linefeed characters, beep characters, probably ANSI codes to boot and most of this will be in the core code. The linker can do a very limited amount to eliminate unused code. > >> At the moment I'm >>working on a project which requires screen handling, file handling via indexes, >>and file and in-memory sort functions. There are commercially available >>packages available to do all these things. I ignored them and wrote my own for >>two reasons: >> >>1. The commercial packages contained orders of magnitude more code than I need >>for this project (a fourth-generation language for writing accounting >>software). Their use would have expanded the size of applications produced with >>the 4GL from around 100K to an estimated 400K (!!!). >> >>2. To wade through all the junk and figure out how to make the commercial >>packages actually work would have taken nearly as long as writing my own code. >> >>The first problem can be solved by what I hope will be the next generation of >>super-optimizing compilers that can examine all modules of a project >>simultaneously and eliminate code that will never be used e.g. >> >>main () { printf ("Hello, World!\n"); } >> >>would end up as 100 bytes because the compiler would eliminate all the code >>from printf to handle numbers etc. > > These super-optimizing compilers are actually performing global analyses to > use the CPU resources more efficiently; allocating registers in a manner that > is efficient for the program as a whole (as opposed to a manner that is > efficient for a loop or function) is one such benefit, but not the only one. > If the library is designed well, a new compiler isn't needed to dump the > unused code. On the other hand, the unused code is already there, waiting > to be used should the need arise. > In the library yes. However for the reasons given above, with current compilers it'll be in the executable as well, wasting space. > Printf is a special case because of its polymorphism. Here, the compiler > would have to recognize the printf call and parse the format string (assuming > it is a constant pointer to a constant) to optimize the code. Such > optimizations wouldn't be done in many practical applications; national > language support enabled applications come to mind here. > I don't agree. I think reusable software components could be useful in most applications, but to produce acceptable efficiency such compilers need to come into general use. >>The second problem requires an attitude change. > > And good documentation, with a good index. > > How many projects like your current one have you done? How much code have > you (or others) rewritten? Have you ever turned up bugs in related > applications where you had to make the same bug fix in many places? There > are three big wins you get from using reusable packages: Your software will > (theoretically) be developed faster, because less needs to be written from > scratch; Your software will (theoretically) be more bug free because it is > built from (theoretically) bug free components; Your program will > (theoretically) be easier to maintain because bug fixes need only be made in > one place. Some might even argue that your program would be "better" > (faster, smaller) because (theoretically) the implementor was expert at > implementing the thing at hand, and was able to make various optimizations > you wouldn't happen to think of. Finally, come might argue that your > program is more portable, because the component has been ported to many > platforms (malloc and printf are handy examples of this). > > What do you thi By and large the documentation for modern development software is excellent, but no matter how good the documentation you're going to have to do some experimentation and the more complicated the product the longer it'll take to figure out how to use it. Admittedly this is by far the biggest project I've ever worked on but my answers to your points are as follows: 1. _The_ advantage of reusable software components is that it reduces software development time, at least if the tendency to overcomplicate things is controlled. As I explained, this advantage is becoming less and less relevant as developers put more and more unnecessary junk into their products. 2. I can make my routines as bug free as a reusable component developer, and consider: If there's a bug in one of my routines, I can fix it on the spot. If there's a bug in a routine supplied by someone else to which I don't have the source, I have to wait for an upgrade. 3. Two examples: First, screen display routines. My routines are much more efficent than anyone elses because mine just transfer data direct to video RAM while other people's routines have to cope with things like linefeed and beep characters. Second, sort routines. Admittedly the third party routine under consideration was faster than mine for merge sorts too big to fit into memory, because they'd spent a lot of time optimizing the algorithm. However for sorts that fit into memory (almost all in the application in question) my routine is faster because there's less junk in it. To answer your last point: malloc() and printf() are part of the language standard. Third party packages may only be available on some systems. This project is being developed on MS-DOS with a view to porting to Unix. What if I used a third-party routine not available on Unix? My own routines can be ported to Unix in a few hours (at least in theory :-)). From the binary tree package which was originally under discussion: I can't remember most of what was in it but I think the facility for N-way trees should be removed because it won't be used in most cases (none that I can think of) and it reduces efficiency. (After all the whole point of binary trees is their efficiency compared to linked lists!). Sorry if this posting is a bit garbled, I'm using terminal emulator softwar which doesn't work very well with this editor. "To summarize the summary of the summary: people are a problem" Russell Wallace, Trinity College, Dublin rwallace@vax1.tcd.ie
edwards@IDA.ORG (Steve Edwards) (08/21/90)
Well, I've been listening to this discussion from the sidelines for a while, but I finally decided to interject a few thoughts. In article <6775.26cdc403@vax1.tcd.ie>, rwallace@vax1.tcd.ie writes: > In the library yes. However for the reasons given above, with current > compilers it'll [extra functionality] be in the executable as well, > wasting space. True. Of course, one man's wasted space is another's investment in future expandability. Isn't writing a custom version of the component to address this problem ``wasted effort''? It all depends on your perspective. It is very important to know that there are many ``costs'' that come along with reusable software. The extra code it may insert into your image, the extra execution time it may impose over a more optimized version of the same module written for a specific application, the extra complexity added to the interface by code and functionality you don't need at the moment, and so on. Tools (like the smart linkers discussed in the above message) can help to reduce some of these costs, but they certainly don't make the central difficulty go away. Deciding whether to reuse a specific component or write a custom version is a tradeoff--do the costs of reusing outweigh the benefits you get from reusing (measured in savings, of course)? I'll discuss the benefits a little bit more below. > I don't agree. I think reusable software components could be useful in most > applications, but to produce acceptable efficiency such compilers need > to come into general use. Again, what is ``acceptable''? The important point is that it's _relative_ to the tradeoff mentioned above. The efficiency penalty is a cost. What are the real efficiency requirements for the thing your building? Those requirements may drive strict boundaries on how much inefficiency (in time, space, or whatever resource you choose to name) you can ``accept.'' Then, its a matter of whether that efficiency penalty buys you a ``useful'' amount of benefit from reusing. Again, ``useful'' is relative as well. > 1. _The_ advantage of reusable software components is that it reduces > software development time, at least if the tendency to overcomplicate > things is controlled. As I explained, this advantage is becoming less > and less relevant as developers put more and more unnecessary junk > into their products. This is the central statement with which I disagree. I think _way_ too much emphasis has been placed on the development savings that reuse offers. In fact, the _real_ cost of developing software on a large scale isn't development at all (particularly not the cost of writing the code). It's _maintenance_. Sure, reuse allows you to ``share'' the cost of developing a component with other people (particularly the guy who wrote it originally), and you come out on the winning end of this deal. But for large projects, the maintenance cost really dwarfs the development effort (typical estimates for large projects place maintenance at 70% of total life cycle costs, sometimes more). But reuse also lets you ``share'' the cost of finding and repairing bugs (if all the right reuse library mechanisms are in place, of course), and of performing maintenance. This sharing can be spread over all past, current, and future users. Thus, over time reusable components become much more robust and bug free because of the accumulating investment in their maintenance. And new users get to share in the benefits of this investment. Although many developers in smaller projects fail to see these ramifications, I believe they are the most important benefits reuse has to offer. And more importantly, they have to be figured into the ``to reuse or not to reuse'' tradeoff before you decide whether or not to reuse. > 2. I can make my routines as bug free as a reusable component > developer, and consider: If there's a bug in one of my routines, I > can fix it on the spot. If there's a bug in a routine supplied by > someone else to which I don't have the source, I have to wait for > an upgrade. _Given enough $$$$_, this is true. The point is that reuse provides the opportunity for all the reusers of a component (past and present) to, in effect, pool their maintenance resources to get this level of quality. While I'm sure you're capable of debugging a component as well as the next guy, how much money can your project pour into debugging (during development), and then how much more during maintenance? Why is this duplication of effort (if an acceptable component is available) necessary? It's always possible that for any given module, the tradeoff will swing towards writing a custom version. In fact, in the MSDOS world, where efficiency and size are premium concerns and few good components are available, this is likely to be the most common case today. But you have to actually think about everything involved and make an _educated_ tradeoff. It's not good engineering to make that decision ``just because'' ... Your second point, about availability of source code, is a different issue. It's specific to a given component and a given vendor, and has to be taken into account when deciding whether to reuse _that component_. > 3. Two examples: First, screen display routines. My routines are much more > efficent than anyone elses because mine just transfer data direct to video > RAM while other people's routines have to cope with things like linefeed > and beep characters. Again, is this efficiency _required_? [I'm not asking this about your particular application specifically, just trying to make a point] If it is, then that's an important factor in deciding whether to reuse. If _not_, then it _should not_ drive the decision alone! Again, remember that in the MSDOS world, things can be very different than in a different application environment. Just because certain requirements exist in one doesn't mean those requirements (for image size, for example) are universal. Well, I hope this didn't sound too much like a flame--I didn't intend it to be one. I was just trying to provide a few more ideas for discussion. Since I haven't posted here recently, I'll also include just a little bit about myself so that you know where my ideas are coming from. I'm an Ada programmer, although I programmed in C years before that. I've participated in many small/medium-size Ada projects (i.e., each about 50KSLOC), all of which included at least some reusable components. I'm also currently working on a paper about software reuse in Ada, including many of these ideas, as part of a government-sponsored research project. Needless to say, these opinions are mine, and not my company's (or our sponsor's). -- Steve ------------------------------- Stephen Edwards Institute for Defense Analyses 5111 Leesburg Pike Suite 300 Falls Church, VA 22041 (703)845-3536 edwards@ida.org
rwallace@vax1.tcd.ie (08/26/90)
In article <1990Aug21.144152.3513@IDA.ORG>, edwards@IDA.ORG (Steve Edwards) writes: > In article <6775.26cdc403@vax1.tcd.ie>, rwallace@vax1.tcd.ie writes: > >> In the library yes. However for the reasons given above, with current >> compilers it'll [extra functionality] be in the executable as well, >> wasting space. > > True. Of course, one man's wasted space is another's investment in > future expandability. Isn't writing a custom version of the component > to address this problem ``wasted effort''? It all depends on your > perspective. > > It is very important to know that there are many ``costs'' that come > along with reusable software. The extra code it may insert into your > image, the extra execution time it may impose over a more optimized > version of the same module written for a specific application, the > extra complexity added to the interface by code and functionality > you don't need at the moment, and so on. Tools (like the smart > linkers discussed in the above message) can help to reduce some of these > costs, but they certainly don't make the central difficulty go away. > Deciding whether to reuse a specific component or write a custom > version is a tradeoff--do the costs of reusing outweigh the benefits > you get from reusing (measured in savings, of course)? I'll > discuss the benefits a little bit more below. True, and usually development time is more important than efficiency. However in this case we're talking about a very small penalty in development time (since it didn't take much longer to write my own routines than it would have taken to figure out how to use third party ones) and a ridiculously huge penalty in executable file size (rouch estimate is an increase from 100K to 400K, with corresponding losses in speed because the routines are slower and you've got less memory to play with for disk caching etc.) >> 1. _The_ advantage of reusable software components is that it reduces >> software development time, at least if the tendency to overcomplicate >> things is controlled. As I explained, this advantage is becoming less >> and less relevant as developers put more and more unnecessary junk >> into their products. > > This is the central statement with which I disagree. I think _way_ too > much emphasis has been placed on the development savings that reuse > offers. In fact, the _real_ cost of developing software on a large > scale isn't development at all (particularly not the cost of writing > the code). It's _maintenance_. > > Sure, reuse allows you to ``share'' the cost of developing a component > with other people (particularly the guy who wrote it originally), and > you come out on the winning end of this deal. But for large projects, > the maintenance cost really dwarfs the development effort (typical > estimates for large projects place maintenance at 70% of total life > cycle costs, sometimes more). But reuse also lets you ``share'' the > cost of finding and repairing bugs (if all the right reuse library > mechanisms are in place, of course), and of performing maintenance. This > sharing can be spread over all past, current, and future users. Thus, > over time reusable components become much more robust and bug free because > of the accumulating investment in their maintenance. And new users get > to share in the benefits of this investment. > Writing my own routines substantially improves maintenance time because I know exactly what the routines do, they're custom written for this application and the application code doesn't have to be forced into awkward contortions to work with them, and most important they're much more reliable because they contain much less code. > >> 2. I can make my routines as bug free as a reusable component >> developer, and consider: If there's a bug in one of my routines, I >> can fix it on the spot. If there's a bug in a routine supplied by >> someone else to which I don't have the source, I have to wait for >> an upgrade. > > _Given enough $$$$_, this is true. The point is that reuse provides the > opportunity for all the reusers of a component (past and present) to, > in effect, pool their maintenance resources to get this level of quality. > While I'm sure you're capable of debugging a component as well as the > next guy, how much money can your project pour into debugging (during > development), and then how much more during maintenance? Why is this > duplication of effort (if an acceptable component is available) > necessary? > Maintenance and debugging effort increase as something like the square or cube of the number of lines of source code involved. My routines contain _orders of magnitude_ less code than their third party equivalents (e.g. all my screen handling routines for overlapping windows, data entry and verification, scrolling etc. are only a few hundred lines of code; a normal third party package would be at least several thousand). The effort of debugging my routines is completely trivial because they're so simple. This improvement in reliability is another reason why writing my own routines reduces maintenance costs. "To summarize the summary of the summary: people are a problem" Russell Wallace, Trinity College, Dublin rwallace@vax1.tcd.ie