[comp.sw.components] What is a reusable software component?

paul@athertn.Atherton.COM (Paul Sander) (07/25/90)

There has been some discussion in comp.software-eng lately regarding
reusable software components.  This discussion focused mainly on why
they are not being used more; the more prominent opinions on this
seemed be:

- There aren't many
- It's easier or more enjoyable or more rewarding to write from scratch than
  to reuse someone else's
- When they exist, they're hard to find

Some of the fundamental problems which exhibit themselves as these three
points are these:

-  There aren't many reusable software components because what's out there
   that claims to be reusable isn't
-  There is no good way to catalog the components
-  There is no good way to index the catalog

Though all of these topics are interesting and worth discussion, I would like
to spur some discussion on just what a reusable software component is.
Following is (at the risk of doing some undergraduate's first-quarter Data
Structures homework) an implementation of an in-memory B-tree.  What I'd
like to discuss are:

- Is this a reusable software component?
- What are its characteristics that make it one?  Or don't make it one?
- Is it a good one?  If so, why?  If not, why?
- Would anyone use it?  Why or why not?
- How can it be made better?
- Anything else relevant to producing a good reusable component that is reused

Other interesting topics that are probably best deferred to email are:
- Choice of language
- Programming style
- Optimizations
- Which newsgroups distribute source code

I've attempted to direct followups to comp.sw.components.  Please double-
check your followup headers.

Here is the code.  I have not enveloped it in any form of archive so that
those not on Unix systems (and do not have the benefit of "shar") can read
it.  By the way, every test I could think of has not revealed any bugs in
this code:  Moderate-sized trees are really B-trees, there are no coredumps,
there are no memory leaks or corruptions.

Paul Sander
paul@Atherton.COM
(408) 734-9822

*********************************************************************
**************************** btree.doc ******************************
*********************************************************************


BTREE()           MISC. REFERENCE MANUAL PAGES            BTREE()



NAME
     bt_dump,   bt_search,   bt_new,    bt_destroy,    bt_insert,
     bt_traverse, bt_delete - In-memory B-tree manipulation

SYNOPSIS
     #include <btree.h>

     BTREE     bt_new(order,comp)
     int  order;
     int  (*comp)();

     void bt_destroy(tree,free_key)
     BTREE     tree;
     void (*free_key)();

     int bt_insert(tree,key)
     BTREE     tree;
     char *key;

     char *bt_delete(tree,key)
     BTREE     tree;
     char *key;

     char *bt_search(tree,target)
     BTREE     tree;
     char *target;

     void bt_traverse(tree,fn,parms)
     BTREE     tree;
     void (*fn)();
     char *parms;

     void bt_dump(tree,key_dump)
     BTREE     tree;
     void (*key_dump)();

DESCRIPTION
     These functions implement an in-memory  B-tree  data  struc-
     ture.   The tree itself stores only pointers to the client's
     data, and relies on client- provided functions for any  com-
     parisons and storage deallocation.

     bt_new creates a  new  B-tree.   The  client  specifies  the
     desired  order  (the maximum number of children allowed each
     node) of the tree (for performance trade-offs based  on  the
     efficiency  of  comparisons  of client-provided data), and a
     pointer to a function that compares two client-provided data
     structures.  The comparison function, comp, has an interface
     identical to strcmp(3), except that its arguments  point  to
     client-provided   data   structures.   Client-provided  data
     structures that compare greater by this function will appear
     later  in  the lexical order of the data stored in the tree.



Sun Release 4.0           Last change:                          1






BTREE()           MISC. REFERENCE MANUAL PAGES            BTREE()



     bt_new returns a pointer to a handle for the B-tree, or NULL
     if an error occurs.

     bt_destroy deallocates all storage allocated  to  a  B-tree.
     The  client provides a pointer to a visitation function that
     is called once for each item stored in the tree.  The  items
     are  visited  in arbitrary order.  If NULL is passed instead
     of a pointer to a function,  no  action  is  taken  for  the
     client-provided  data,  but  the  tree  structure  itself is
     freed.

     bt_insert inserts a new item into the tree.  1  is  returned
     if  the  insertion was successful, -1 is returned if the new
     item matches another item that  has  already  been  inserted
     into the tree, and 0 is returned in the event of an error.

     bt_delete deletes an item from a tree.  The  value  returned
     is  the  same  as that passed to bt_insert when the item was
     inserted, or NULL if the item is not found.

     bt_search searches  for  an  item  in  a  tree.   The  value
     returned  is  the  same as that passed to bt_insert when the
     item was inserted, or NULL if the item is not found.

     bt_traverse traverses the tree,  calling  a  client-provided
     visitation  function fn once for each item stored there.  fn
     has the following interface:
          void fn(item,parms)
          char *item;
          char *parms;
     item is the pointer stored when the item was  inserted  into
     the  tree.   parms  is  an arbitrary pointer that the client
     wishes to pass to the visitation function, but is  otherwise
     unused  by  the B-tree implementation.  Items are visited in
     their lexical order.

     bt_dump displays the contents of the tree to  stdout,  along
     with  some  diagnostic  and  statistical  information.   The
     key_dump function is called once for each item in the  tree,
     in arbitrary order.  It may be NULL if no action is desired.
     Its interface follows:
          void key_dump(item)
          char *item;

BUGS
     This implementation is written in K&R C, and tested only  on
     a Sun 3 running SunOS 4.0.3.

     This implementation assumes "char*" is the generic  pointer.
     This  goes against practice used by most "modern" compilers,
     but aids portability to older  compilers.   Note  also  that
     this will probably generate lots of complaints from Lint.



Sun Release 4.0           Last change:                          2






BTREE()           MISC. REFERENCE MANUAL PAGES            BTREE()



     bt_dump assumes that pointers are the same size as integers,
     and that they can be displayed in total in eight hexidecimal
     digits.




















































Sun Release 4.0           Last change:                          3


****************************************************************
******************** btree.h ***********************************
****************************************************************

/* Header file for in-memory B-tree implementation */

typedef char *BTREE;

extern	void	bt_dump();
extern	char	*bt_search();
extern	BTREE	bt_new();
extern	void	bt_destroy();
extern	int	bt_insert();
extern	void	bt_traverse();
extern	char	*bt_delete();

****************************************************************
******************** btree.h ***********************************
****************************************************************

/* #define DEBUG */
/**********************************************************************
 *
 * btree.c -- This file implements an in-memory B-tree abstract data
 *            type.  When the tree is created, the caller specifies
 *            the number of children each node may have (i.e., the
 *            tree's "order") and a pointer to a function that compares
 *            two keys.  The function has the same calling protocol as
 *            strcmp().
 *
 **********************************************************************/

#include <stdio.h>

/* The following structure describes a B-tree node. */

struct btnode {
	int		nkeys;		/* Number of keys stored here */
	char		**keys;		/* array[order-1] of keys */
	struct btnode	**children;	/* array[order] of children */
};

/* The following structure describes a B-tree. */

struct btree {
	int		order;		/* Max children permitted */
	int		(*comp)();	/* Comparison function for keys */
	struct btnode	*root;		/* Root of tree */
	int		capacity;	/* Max keys that will fit */
	int		contains;	/* Key currently stored */
};

typedef struct btree *BTREE;
typedef struct btnode BTNODE;


/* The following structure is used for debugging heap corruptions */

#ifdef DEBUG

struct heapblk {
	char	*blk;
	int	freeNext;
};

struct heap_struct {
	int		allocSeq;
	int		freeLast;
	struct heapblk	blks[10000];
};

struct heap_struct btheap = {0,-1}; 

#endif DEBUG

/*
 * The following code is used for allocating space on the heap.  When not
 * debugging heap corruptions, a macro aliases bt_malloc with malloc.
 */

#ifdef DEBUG

char *bt_malloc(size)
unsigned	size;
{
	char	*temp;

	temp = (char*) malloc(size);
	btheap.blks[btheap.allocSeq].blk = temp;
	btheap.blks[btheap.allocSeq].freeNext = -2;
	btheap.allocSeq++;
	return temp;
}

#else

#define bt_malloc(size) malloc(size)

#endif

/*
 * The following code is used for freeing memory on the heap.  If not
 * debugging heap corruptions, bt_free is aliased with free.
 */

#ifdef DEBUG

bt_free(ptr)
char	*ptr;
{
	int	i;

	printf("Freeing heap storage at %x\n",ptr);
	for (i = 0; i < btheap.allocSeq; i++)
	{
		if (ptr == btheap.blks[i].blk)
		{
			if (btheap.blks[i].freeNext == -2)
			{
				btheap.blks[i].freeNext = btheap.freeLast;
				btheap.freeLast = i;
				free(ptr);
				return;
			}
			else
			{
				printf(
				  "ERROR: freeing %x which was already freed\n",
				  ptr);
				printf("Block\n");
				i = btheap.freeLast;
				while (i > 0)
				{
					printf("%x\n",btheap.blks[i].blk);
					i = btheap.blks[i].freeNext;
				}
				fflush(stdout);
				kill(getpid(),5);
			}
		}
	}
	printf("ERROR:  Freeing %x which has not been allocated\n",ptr);
	fflush(stdout);
	kill(getpid(),5);
}

#else

#define bt_free(ptr) free(ptr)

#endif

/* The following verifies the integrity of the heap. */

int bt_malloc_verify()
{
	int	retval;
	int	once;
	int	i;

	retval = 1;
	once = 0;

#ifdef DEBUG
	for (i = 0; i < btheap.allocSeq; i++)
	{
		if (btheap.blks[i].freeNext == -2)
		{
			if (!once)
			{
				once = 1;
				printf("The following blocks are allocated:\n");
			}
			printf("%x\n",btheap.blks[i].blk);
		}
	}
	fflush(stdout);
#endif

#ifdef sun
	retval = malloc_verify();
#endif

	return retval;
}

/*********************************************************************
 *
 * dump_node -- This function displays the contents of a node and
 *              then recursively displays its children.  Note
 *              that this function assumes that a pointer to a node
 *              is the same size as an integer.
 *
 *********************************************************************/

void dump_node(node,key_dump)
BTNODE	*node;
void	(*key_dump)();
{
	int	i;

	if (node == NULL)
	{
		printf("ERROR:  null node pointer\n");
		return;
	}
	printf("%08x:  %d keys(%08x, %08x)\n",(int) node,node->nkeys,
	       node->keys,node->children);
	printf("--------\n");
	for (i = 0; i < node->nkeys; i++)
	{
		if (node->children != NULL)
		{
			printf("    %08x\n",node->children[i]);
		}
		else
		{
			printf("    %08x\n",0);
		}
		if (key_dump != NULL)
		{
			printf("        ");
			key_dump(node->keys[i]);
		}
	}
	if (node->children != NULL)
	{
		printf("    %08x\n",node->children[i]);
		for (i = 0; i <= node->nkeys; i++)
		{
			dump_node(node->children[i],key_dump);
		}
	}
	else
	{
		printf("    %08x\n",0);
	}

	fflush(stdout);
	return;
}

/*********************************************************************
 *
 * bt_dump -- This function displays the contents of a B-tree.  The
 *            caller passes in a function that displays the contents
 *            of one of the keys stored in the tree.  The calling
 *            protocol for this function is:
 *
 *                       key_dump(key)
 *                       char *key;
 *
 *            Also, if key_dump is NULL, no action is taken.  This is
 *            useful if the structure of the tree is desired without
 *            including its contents.
 *
 *********************************************************************/

void bt_dump(ptree,key_dump)
char	*ptree;
void	(*key_dump)();
{
	BTREE	tree;

	tree = (BTREE) ptree;
	printf("B-tree dump:\n\n");
	printf("order = %d\n",tree->order);
	printf("capacity = %d\n",tree->capacity);
	printf("contains %d keys\n",tree->contains);
	printf("efficiency is %d per cent\n",
	       (tree->contains*100)/tree->capacity);
	printf("handle at %08x\n",tree);
	printf("root  = %08x\n\n",(int) tree->root);
	dump_node(tree->root,key_dump);
	return;
}

/********************************************************************
 *
 * lin_search -- This function searches a node for a key.  If the key
 *               is found, its index in the node's key array is
 *               returned, along with a flag indicating that the key
 *               was found.  If the key is not found, the index of the
 *               next higher key in the key array is returned, along
 *               with a "not found" indication.
 *
 ********************************************************************/

static int lin_search(node,target,comp,eq)
BTNODE	*node;		/* Node to be searched */
char	*target;	/* Key to search for */
int	(*comp)();	/* strcmp()-like comparison function */
int	*eq;		/* 0 if not found, non-zero otherwise */
{
	int	i;
	int	res;
	int	hi;
	int	lo;

	res = -1;

	/*
	 * Perform linear search of node contains 7 keys or less,
	 * otherwise perform binary search
	 */
	if (node->nkeys <= 7)
	{
		for (i = 0; i < node->nkeys; i++)
		{
			res = (*comp)(node->keys[i],target);
			if (res >= 0) break;
		}
	}
	else
	{
		lo = 0;
		hi = node->nkeys - 1;
		while ((lo <= hi) && (res != 0))
		{
			i = (lo + hi) / 2;
			res = (*comp)(node->keys[i],target);
			if (res < 0) lo = i + 1;
			else if (res > 0) hi = i - 1;
		}
		if (res < 0) i++;
	}
	*eq = (res == 0);	/* Indicate successful search */
	return i;
}

/***********************************************************************
 *
 * node_search -- This function performs a recursive-descent search of
 *                a sub-tree for a key.  The key is returned if found,
 *                or NULL otherwise.
 *
 ***********************************************************************/

static char *node_search(node,target,comp)
BTNODE	*node;
char	*target;
int	(*comp)();
{
	int	res;
	int	i;
	int	eq;

	if (node == NULL) return NULL;
	i = lin_search(node,target,comp,&eq);
	if (eq) return node->keys[i];
	else if (node->children == NULL) return NULL;
	else return node_search(node->children[i],target,comp);
}

/***********************************************************************
 *
 * bt_search -- This function searches a tree for a specified key.  The
 *              key is returned if found, or NULL if not.
 *
 ***********************************************************************/

char *bt_search(tree,target)
BTREE	tree;
char	*target;
{
	if (tree == NULL) return NULL;
	return node_search(tree->root,target,tree->comp);
}

/***********************************************************************
 *
 * bt_new -- Given the number of children each node is allowed to have,
 *           and a pointer to a strcmp-like function that compares two
 *           keys, this function creates an empty b-tree structure.
 *
 ***********************************************************************/

char	*bt_new(order,comp)
int	order;
int	(*comp)();
{
	BTREE	tree;
	int	i;

	/* Validate parameters */
	if ((comp == NULL) || (order < 3)) return (char *) NULL;

	/* Allocate the handle */
	tree = (BTREE) bt_malloc(sizeof(struct btree));
	if (tree != NULL)
	{
		/* Allocate the root */
		tree->root = (BTNODE*) bt_malloc(sizeof(struct btnode));
		if (tree->root != NULL)
		{
			/* Initialize root */
			tree->order = order;
			tree->comp = comp;
			tree->root->nkeys = 0;
			tree->root->children = NULL;

			/* Allocate key array */
			tree->root->keys =
				(char**)bt_malloc((order - 1) * sizeof(char*));
			if (tree->root->keys != NULL)
			{
				/* Initialize keys */
				for (i = 0; i < order - 1; i++)
				{
					tree->root->keys[i] = NULL;
				}
				tree->root->nkeys = 0;
				tree->contains = 0;
				tree->capacity = order - 1;
				return (char *) tree;
			}

			/* Failed to allocate keys, free tree */
			bt_free(tree->root);
		}
		/* Failed to allocate root, free handle */
		bt_free(tree);
	}
	/* Failed to allocate handle, return null (error) */
	return (char *) NULL;
}

/********************************************************************
 *
 * free_node -- This function frees a node in a b-tree.  It is
 *              provided with a function that frees each key in the
 *              node also.  The free_key function has the following
 *              protocol:
 *
 *			free_key(key)
 *			char *key;
 *
 *              If NULL is passed as the free_key function, the node
 *              is freed, but no action is taken for the keys.
 *
 ********************************************************************/

static void free_node(node,free_key)
BTNODE	*node;
void	(*free_key)();
{
	int	i;

	if (node->children != NULL)
	{
		for (i = 0; i <= node->nkeys; i++)
		{
			free_node(node->children[i],free_key);
		}
		bt_free(node->children);
	}
	if (free_key != NULL)
	{
		for (i = 0; i < node->nkeys; i++)
		{
			(*free_key)(node->keys[i]);
		}
	}
	bt_free(node->keys);
	bt_free(node);
	return;
}

/********************************************************************
 *
 * bt_destroy -- This function frees a b-tree structure.  It is also
 *            passed a function that frees the keys contained by the
 *            tree.  The free_key function has the same calling
 *            protocol as the free_key function passed to free_node
 *            above.
 *
 ********************************************************************/

void bt_destroy(ptree,free_key)
char	*ptree;
void	(*free_key)();
{
	BTREE	tree;

	tree = (BTREE) ptree;
	free_node(tree->root,free_key);
	bt_free(tree);
	return;
}

/******************************************************************
 *
 * rot_right -- Given a node and an index into its key array, this
 *              function rotates the node right at the specified
 *              key.  This is used during insertion and deletion to
 *              keep the tree balanced.  The return value is 0 if
 *              the function fails, i.e. the node on the left
 *              cannot lose keys, the node on the right cannot
 *              gain keys, or the specified node is a leaf; 1 is
 *              returned if successful.
 *
 ******************************************************************/

static int rot_right(node,n,order)
BTNODE	*node;
int	n;
int	order;
{
	int	i;
	BTNODE	*right;
	BTNODE	*left;

	/* Test parameters */
	if ((n >= node->nkeys) || (n < 0))
	{
		return 0;
	}

	/* Point to children */
	left = node->children[n];
	right = node->children[n+1];

	/* Return FALSE if rotation is not possible */
	if ((left == NULL) ||
	    (left->nkeys <= order/2) ||
	    (right->nkeys >= order - 1))
	{
		return 0;
	}

	/* Shift the right child's keys */
	for (i = right->nkeys; i > 0; i--)
	{
		right->keys[i] = right->keys[i - 1];
	}

	/* Rotate keys */
	right->keys[0] = node->keys[n];
	node->keys[n] = left->keys[left->nkeys - 1];
	left->keys[left->nkeys - 1] = NULL;

	if (right->children != NULL)
	{
		/* Shift the right child's children */
		for (i = right->nkeys; i >= 0; i--)
		{
			right->children[i + 1] = right->children[i];
		}

		/* Rotate children */
		right->children[0] = left->children[left->nkeys];
		left->children[left->nkeys] = NULL;
	}

	/* Update key count and return TRUE */
	right->nkeys++;
	left->nkeys--;
	return 1;
}

/******************************************************************
 *
 * rot_left -- Given a node and an index into its key array, this
 *             function rotates the node left at the specified
 *             key.  This is used during insertion and deletion to
 *             keep the tree balanced.  The return value is 0 if
 *             the function fails, i.e. the node on the right
 *             cannot lose keys, the node on the left cannot
 *             gain keys, or the specified node is a leaf; 1 is
 *             returned when successful.
 *
 ******************************************************************/

static int rot_left(node,n,order)
BTNODE	*node;
int	n;
int	order;
{
	int	i;
	BTNODE	*right;
	BTNODE	*left;

	/* Test parameters */
	if ((n < 0) || (n >= node->nkeys))
	{
		return 0;
	}

	/* Point to children */
	right = node->children[n+1];
	left = node->children[n];

	/* Return FALSE if rotation is not possible */
	if ((left == NULL) ||
	    (right->nkeys <= order/2) ||
	    (left->nkeys >= order - 1))
	{
		return 0;
	}

	/* Rotate keys */
	left->keys[left->nkeys] = node->keys[n];
	node->keys[n] = right->keys[0];

	/* Update key counts */
	left->nkeys ++;
	right->nkeys --;

	/* Shift the right child's keys */
	for (i = 0; i < right->nkeys; i++)
	{
		right->keys[i] = right->keys[i+1];
	}
	right->keys[i] = NULL;

	if (left->children != NULL)
	{
		/* Rotate children */
		left->children[left->nkeys] = right->children[0];

		/* Shift the right child's children */
		for (i = 0; i <= right->nkeys; i++)
		{
			right->children[i] = right->children[i+1];
		}
		right->children[i+1] = NULL;
	}

	/* Return TRUE */
	return 1;
}

/*********************************************************************
 *
 * ins_key -- This function inserts a key into a specified node at
 *            a specified location in its key array.  0 is returned
 *            if the node is already full, 1 otherwise.
 *
 *********************************************************************/

static int ins_key(node,key,n,order)
BTNODE	*node;
char	*key;
int	n;
int	order;
{
	int	i;

	/* Return FALSE if insertion is not possible */
	if (node->nkeys >= order - 1)
	{
		return 0;
	}

	/* Shift keys to make room, then insert new key */
	for (i = node->nkeys - 1; i >= n; i--)
	{
		node->keys[i+1] = node->keys[i];
	}
	node->keys[n] = key;
	node->nkeys++;
	return 1;
}

/*********************************************************************
 *
 * burst -- Given a node and an index into its key and child arrays,
 *          this function splits the specified child node in half,
 *          elevating the key at the child's split point into this
 *          specified node.  0 is returned if the child array index
 *          is out of range, or the node is full, or malloc fails;
 *          1 is returned otherwise.
 *
 *********************************************************************/

static int burst(tree,node,n)
BTREE	tree;
BTNODE	*node;
int	n;
{
	BTNODE	*new;
	int	i;
	int	m;
	BTNODE	*left;
	int	lkeys;

	/* Test parameters */
	if (((node->nkeys > 0) && (node->nkeys >= tree->order - 1)) ||
	    (n < 0))
	{
		return 0;
	}
	/* Create a new node */
	new = (BTNODE*) bt_malloc(sizeof(BTNODE));
	if (new != NULL)
	{
		/* Allocate new key array */
		new->keys = (char**) bt_malloc(tree->order*sizeof(char*));
		if (new->keys != NULL)
		{
			/* Calculate needed partial results */
			m = tree->order / 2;
			left = node->children[n];
			lkeys = left->nkeys;

			/* Allocate new child array */
			if (left->children != NULL)
			{
				new->children =
				       (BTNODE**) bt_malloc((tree->order + 1) * 
				                            sizeof(BTNODE*));
				if (new->children != NULL)
				{
					for (i = 0; i <= lkeys - m; i++)
					{
						new->children[i] =
						         left->children[m + i];
					}
				}
				else
				{
					bt_free(new->keys);
					bt_free(new);
					return 0;
				}
			}
			else
			{
				new->children = NULL;
			}

			/* Split keys */
			for (i = 0; i < lkeys - m; i++)
			{
				new->keys[i] = left->keys[m + i];
			}
			new->nkeys = lkeys - m;
			left->nkeys = m - 1;

			/* Elevate the key where the split was made */
			for (i = node->nkeys; i > n; i--)
			{
				node->keys[i] = node->keys[i-1];
				node->children[i+1] = node->children[i];
			}
			node->children[n+1] = new;
			node->keys[n] = left->keys[m - 1];
			node->nkeys++;
			tree->capacity += tree->order - 1;
			return 1;
		}
		/* Failed to allocate new child array, free node */
		bt_free(new);
	}
	/* Failed to allocate new node, return FALSE */
	return 0;
}

/*********************************************************************
 *
 * rebalance -- This function tries to rebalance the two adjacent
 *              nodes.  This is needed when an insertion is attempted
 *              into a node that is already full but its neighbors are
 *              not, or when a deletion is made and the node is less
 *              than half-full.
 *
 *********************************************************************/

static rebalance(node,n,order)
BTNODE	*node;
int	n;
int	order;
{
	int	res;

	if (node->children[n]->nkeys < (order-1)/2)
	{
		res = rot_left(node,n,order);
	}
	else
	if ((n < node->nkeys) && (node->children[n+1]->nkeys < (order-1)/2))
	{
		res = rot_right(node,n,order);
	}
	return;
}

/*********************************************************************
 *
 * ins_node -- This function performs a recursive-descent of a b-tree,
 *             inserting a new key in a leaf node.  If the leaf is
 *             full, the tree is reorganized to make room.  -1 is
 *             returned if the insert fails due to a duplicate key.
 *             0 is returned if the insert fails for some other
 *             reason.  1 is returned if successful.
 *
 *********************************************************************/

static int ins_node(tree,node,key)
BTREE	tree;
BTNODE	*node;
char	*key;
{
	int	i;
	int	res;

	/* Locate proper place for new key in node */
	i = lin_search(node,key,tree->comp,&res);
	if (res) return -1;

	/* If no children, insert key in this node */
	if (node->children == NULL)
	{
		return ins_key(node,key,i,tree->order);
	}

	/* Try to insert the new key in a child node */
	res = ins_node(tree,node->children[i],key);

	/* Child is full, reorganize */
	if (res == 0)
	{
		/* Try rotating right */
		if ((res == 0) && (i < node->nkeys) &&
		    (node->children[i+1]->nkeys < tree->order - 2))
		{
			res = rot_right(node,i,tree->order);
		}

		/* Try rotating left */
		if ((res == 0) && (i > 0) &&
		    (node->children[i-1]->nkeys < tree->order - 2))
		{
			res = rot_left(node,i - 1,tree->order);
		}

		/* Can't rotate, try bursting */
		if (res == 0) res = burst(tree,node,i);
		if (res > 0)
		{
			res = ins_node(tree,node,key);
			if (res > 0)
			{
				rebalance(node,i,tree->order);
			}
		}
	}
	return res;
}

/*********************************************************************
 *
 * bt_insert -- This function inserts a new key into a b-tree.  -1 is
 *             returned if the insert fails due to a duplicate key.
 *             0 is returned if the insert fails for some other
 *             reason.  1 is returned if successful.
 *
 *********************************************************************/

int bt_insert(ptree,key)
char	*ptree;
char	*key;
{
	int	res;
	BTNODE	*new;
	BTNODE	*node;
	BTREE	tree;

	tree = (BTREE) ptree;

	/* Begin at root */
	node = tree->root;

	/* If root is empty, insert first key */
	if (node->nkeys == 0)
	{
		node->keys[0] = key;
		node->nkeys = 1;
		tree->contains = 1;
		return 1;
	}

	/* Try inserting new key */
	res = ins_node(tree,node,key);

	/* If 0 return, burst the root, rebalance, and try again */
	if (res == 0)
	{
		new = (BTNODE*) bt_malloc(sizeof(BTNODE));
		if (new != NULL)
		{
			new->children = (BTNODE**) bt_malloc((tree->order + 1) *
			                                  sizeof(BTNODE*));
			if (new->children != NULL)
			{
				new->keys = (char**) bt_malloc((tree->order) *
				                                sizeof(char*));
				if (new->keys != NULL)
				{
					new->nkeys = 0;
					new->children[0] = node;
					tree->root = new;
					tree->capacity += tree->order - 1;
					res = burst(tree,new,0);
					if (res > 0)
					{
						res = ins_node(tree,new,key);
					}
					if (res > 0)
					{
						tree->contains++;
						rebalance(new,0,tree->order);
					}
					return res;
				}
				bt_free(new->children);
			}
			bt_free(new);
		}
	}
	if (res > 0) tree->contains++;
	return res;
}

/****************************************************************
 *
 * visit_node -- This function is used while traversing a b-tree.
 *               It is called once for each node.  It calls some
 *               specified function once for each key in its
 *               key array, and calls itself once for each child
 *               in its child array.
 *
 ****************************************************************/

static void visit_node(node,fn,parms)
BTNODE	*node;
void	(*fn)();
char	*parms;
{
	int	i;

	if (node == NULL) return;
	for (i = 0; i < node->nkeys; i++)
	{
		if (node->children != NULL)
		{
			visit_node(node->children[i],fn,parms);
		}
		(*fn)(node->keys[i],parms);
	}
	if (node->children != NULL)
	{
		visit_node(node->children[i],fn,parms);
	}
	return;
}

/****************************************************************
 *
 * bt_traverse -- This function traverses a b-tree, calling some
 *                specified function once for each key stored in
 *                it.  The specified function has the following
 *                protocol:
 *
 *			fn(key,parms)
 *			char *key;
 *			char *parms;
 *
 *                where key is a pointer to a key stored in the
 *                btree, and parms is the parameter structure
 *                passed by the caller.
 *
 *                The nodes are visited in descending order, if
 *                the comp function passed to bt_new is
 *                strcmp-like.
 *
 ****************************************************************/

void bt_traverse(ptree,fn,parms)
char	*ptree;
void	(*fn)();
char	*parms;
{
	BTREE	tree;

	tree = (BTREE) ptree;
	visit_node(tree->root,fn,parms);
	return;
}

/********************************************************************
 *
 * del_key -- This function deletes the specified key from the
 *            specified node.
 *
 ********************************************************************/

static del_key(node,n)
BTNODE	*node;
int	n;
{
	int	i;

	for (i = n; i < node->nkeys - 1; i++)
	{
		node->keys[i] = node->keys[i+1];
	}
	if (node->children != NULL)
	{
		for (i = n; i < node->nkeys; i++)
		{
			node->children[i] = node->children[i+1];
		}
	}
	node->nkeys--;
	return;
}

/********************************************************************
 *
 * weld -- This function is the opposite of burst, above.  It joins
 *         two neighboring children of the given node, and lowers a
 *         key into the result.
 *
 ********************************************************************/

static weld(tree,node,n)
BTREE	tree;
BTNODE	*node;
int	n;
{
	int	i;
	int	j;
	BTNODE	*left;
	BTNODE	*right;

	if (node->children == NULL) return;
#ifdef DEBUG
printf("Welding node at %x, key %d\n",(int)node,n);
#endif
	left = node->children[n];
	right = node->children[n+1];
	i = left->nkeys;
	left->keys[i] = node->keys[n];
	for (i++, j = 0; j < right->nkeys; i++, j++)
	{
		left->keys[i] = right->keys[j];
	}
	for (i = n + 1; i <= node->nkeys; i++)
	{
		node->keys[i-1] = node->keys[i];
		node->children[i] = node->children[i+1];
	}
	node->nkeys--;
	if (left->children != NULL)
	{
		for (i = left->nkeys + 1, j = 0; j <= right->nkeys; i++, j++)
		{
			left->children[i] = right->children[j];
		}
		bt_free(right->children);
	}
	left->nkeys += right->nkeys + 1;
	bt_free(right->keys);
	bt_free(right);
	tree->capacity -= tree->order - 1;
	return;
}

/********************************************************************
 *
 * del_pred -- This function searches a sub-tree, and deletes the
 *             highest key it finds, and returns it to its caller.
 *             If any node along the way  loses its b-tree'ness, the
 *             tree is reorganized after the deletion.
 *
 ********************************************************************/

static char *del_pred(tree,node)
BTREE	tree;
BTNODE	*node;
{
	BTNODE	*pnode;
	int	res;
	char	*retval;

	/*
	 * If at a leaf node, delete the highest key, otherwise
	 * call self recursively
	 */
	if (node->children == NULL)
	{
		retval = node->keys[node->nkeys - 1];
		node->nkeys--;
	}
	else
	{
		pnode = node->children[node->nkeys];
		retval = del_pred(tree,pnode);

		/* Reorganize tree if node gets too empty */
		if (pnode->nkeys < (tree->order - 1) / 2)
		{
			res = rot_right(node,node->nkeys - 1,tree->order);
			if (res == 0) weld(tree,node,node->nkeys - 1);
		}
	}
	return retval;
}

/********************************************************************
 *
 * del_node -- This function searches a sub-tree for a key, and
 *             deletes the key.  If the key is found in a leaf node,
 *             it is simple removed and returned to the caller;
 *             otherwise, it is swapped with its predecessor, and
 *             the returned to the caller.  If any node loses its
 *             b-tree'ness, the tree is reorganized.  If the key is
 *             not found, NULL is returned.
 *
 ********************************************************************/

static char *del_node(tree,node,key)
BTREE	tree;
BTNODE	*node;
char	*key;
{
	int	i;
	int	res;
	char	*retval;

	/* Key not found */
	if (node == NULL) return NULL;

	/* Search for key, descend tree until found or find leaf */
	i = lin_search(node,key,tree->comp,&res);
	if (res == 0)
	{
		if (node->children == NULL) return NULL;
		retval = del_node(tree,node->children[i],key);
	}
	else
	{
		/* Return the deleted node to caller */
		retval = node->keys[i];

		if (node->children == NULL)
		{
			/* Delete from leaf */
			del_key(node,i);
		}
		else
		{
			/* Swap with predecessor */
			node->keys[i] = del_pred(tree,node->children[i]);
		}
	}

	/* If a child node was reorganized, rebalancing may be necessary */
	if ((node->children != NULL) &&
	    (node->children[i]->nkeys < (tree->order - 1) / 2))
	{
		res = rot_right(node,i - 1,tree->order);
		if (res == 0) res = rot_left(node,i,tree->order);
		if (res == 0)
		{
			if (i >= node->nkeys) i--;
			weld(tree,node,i);
		}
	}
	return retval;
}

/********************************************************************
 *
 * bt_delete -- This function deletes the specified key from a
 *              b-tree and reorganizes it as necessary.  The deleted
 *              key is returned to the caller if it is found, NULL
 *              is returned otherwise.
 *
 ********************************************************************/

char	*bt_delete(ptree,key)
char	*ptree;
char	*key;
{
	BTNODE	*root;
	char	*retval;
	BTREE	tree;

	tree = (BTREE) ptree;

	/* Do no deletion if tree is empty */
	root = tree->root;
	if (root->nkeys == 0) return NULL;

	/* Delete the key */
	retval = del_node(tree,root,key);

	/*
	 * Delete the root if it is empty, yet its children are not.
	 * This happens after a weld on the last key in the root.
	 */
	if ((root->nkeys == 0) && (root->children != NULL))
	{
		tree->root = root->children[0];
		bt_free(root->keys);
		bt_free(root->children);
		bt_free(root);
		tree->capacity -= tree->order - 1;
	}
	if (retval != NULL) tree->contains--;
	return retval;
}

*****************************************************************
****************** test.c ***************************************
*****************************************************************

#include <stdio.h>
#include "btree.h"

/* #define DEBUGGING */

#define	LOOPS	500
#define	ORDER	12

typedef struct key {
	int	key;
} KEY;

KEY	savedKeys[LOOPS];

dump(key)
KEY	*key;
{
	printf("%x: %d\n",key,key->key);
}

visit(key,parms)
KEY	*key;
char	*parms;
{
	dump(key);
}

comp(k1,k2)
KEY	*k1,*k2;
{
	return (k1->key - k2->key);
}

main()
{
	BTREE	tree;
	KEY	*key;
	int	i;
	int	res;

#ifdef DEBUGGING
#ifdef sun
	malloc_debug(2);
#endif
#endif
	tree = (BTREE)bt_new(ORDER,comp);
	if (tree == NULL)
	{
		printf("Failed to create tree\n");
		exit(1);
	}
	(void) bt_malloc_verify();

	for (i = 0; i < LOOPS; i++)
	{
		printf("Iteration %d\n",i+1);
		key = &savedKeys[i];
		key->key = rand();
		printf("Inserting %d\n",key->key);
		fflush(stdout);
		res = bt_insert(tree,key);
		if (res > 0)
		{
			printf("Inserted %d\n",key->key);
		}
		else if (res < 0)
		{
			printf("Duplicate key %d\n",key->key);
		}
		else
		{
			printf("Error inserting key %d\n");
			exit(3);
		}
		printf("Trying duplicate\n");
		res = bt_insert(tree,key);
		if (res < 0)
		{
			printf("Failed to insert duplicate\n");
		}
		else if (res == 0)
		{
			printf("Error inserting duplicate\n");
			exit(4);
		}
		else
		{
			printf("Succeeded in inserting duplicate\n");
			exit(5);
		}
#if 0
#ifdef DEBUGGING
		printf("Contents of tree:\n");
		bt_dump(tree,dump);
#endif
#endif 0
	}
	printf("Finished creating the tree.\n");
	printf("Contents of tree follow:\n");
	bt_traverse(tree,visit,NULL);
	for (i = 0; i < LOOPS; i++)
	{
		printf("Iteration %d\n",i);
		printf("Seeking %d at %x\n",savedKeys[i].key,&savedKeys[i]);
		key = (KEY*)bt_search(tree,&savedKeys[i]);
		if (key != &savedKeys[i])
		{
			printf("Can't find %d, returned %x\n",savedKeys[i].key,
			       key);
		}
		printf("Freeing %d at %x\n",savedKeys[i].key,&savedKeys[i]);
		fflush(stdout);
		key = (KEY*)bt_delete(tree,&savedKeys[i]);
		if (key != &savedKeys[i])
		{
			printf("Problem freeing %d at %x, returned %x\n",
			       savedKeys[i].key,&savedKeys[i].key,key);
		}
		else
		{
			printf("Freed %d, returned %x\n",savedKeys[i].key,key);
		}
#ifdef DEBUGGING
	printf("Contents of tree:\n");
	bt_dump(tree,dump);
	fflush(stdout);
#endif 
		key = (KEY*)bt_search(tree,&savedKeys[i]);
		if (key == &savedKeys[i])
		{
			printf("Found %d after deletion\n",savedKeys[i].key);
		}
		key = (KEY*)bt_delete(tree,&savedKeys[i]);
		if (key != NULL)
		{
			printf("%d remained in tree after deletion at %x\n",
			       savedKeys[i].key,key);
		}
#if 0
#ifdef DEBUGGING
		printf("Contents of tree:\n");
		bt_dump(tree,dump);
#endif
#endif 0
	}
	(void) bt_malloc_verify();
	printf("Freeing tree\n");
	bt_destroy(tree,NULL);
	printf("Finished freeing tree\n");
	printf("Malloc verify:  ");
	if (bt_malloc_verify())
	{
		printf("success\n");
		fflush(stdout);
	}
	else
	{
		printf("failure\n");
		fflush(stdout);
		exit(6);
	}

	printf("Creating second tree\n");
	tree = (BTREE)bt_new(ORDER,comp);
	if (tree == NULL)
	{
		printf("Failed to create tree\n");
		exit(7);
	}
	(void) bt_malloc_verify();
	printf("Contents of tree:\n");
	bt_dump(tree,dump);

	for (i = 0; i < LOOPS; i++)
	{
		printf("Iteration %d\n",i+1);
		key = &savedKeys[i];
		key->key = rand();
		printf("Inserting %d\n",key->key);
		fflush(stdout);
		res = bt_insert(tree,key);
		if (res > 0)
		{
			printf("Inserted %d\n",key->key);
		}
		else if (res < 0)
		{
			printf("Duplicate key %d\n",key->key);
		}
		else
		{
			printf("Error inserting key %d\n");
			exit(8);
		}
		printf("Trying duplicate\n");
		res = bt_insert(tree,key);
		if (res < 0)
		{
			printf("Failed to insert duplicate\n");
		}
		else if (res == 0)
		{
			printf("Error inserting duplicate\n");
			exit(9);
		}
		else
		{
			printf("Succeeded in inserting duplicate\n");
			exit(10);
		}
#if 0
#ifdef DEBUGGING
		printf("Contents of tree:\n");
		bt_dump(tree,dump);
#endif
#endif 0
	}
	printf("Finished creating the tree.\n");
	printf("Contents of tree follow:\n");
	bt_traverse(tree,visit,NULL);
	printf("Freeing tree\n");
	bt_destroy(tree,NULL);
	printf("Finished freeing tree\n");
	printf("Malloc verify:  ");
	if (bt_malloc_verify()) printf("success\n");
	else printf("failure\n");
	fflush(stdout);

	exit(0);
}
-- 
Paul Sander        (408) 734-9822  | "Passwords are like underwear," she said,
paul@Atherton.COM                  | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul  | -- Bennett Falk in "Mom Meets Unix"

ogden@seal.cis.ohio-state.edu (William F Ogden) (07/26/90)

Paul Sanders writes:
>There has been some discussion in comp.software-eng lately regarding
>reusable software components.   ...

>                                                  ...    I would like
>to spur some discussion on just what a reusable software component is.
>Following is (at the risk of doing some undergraduate's first-quarter Data
>Structures homework) an implementation of an in-memory B-tree.  What I'd
>like to discuss are:
>
>- Is this a reusable software component?
>- What are its characteristics that make it one?  Or don't make it one?
>- Is it a good one?  If so, why?  If not, why?
>- Would anyone use it?  Why or why not?
>- How can it be made better?
>- Anything else relevant to producing a good reusable component that is reused

At the risk of getting back into the discussion of why software development
isn't like the construction of other engineering artifacts, let me suggest
that, from a reusability perspective, a B-tree facility is rather like
a circuit board in a computer. You certainly can't argue that it's not an
electronic component, but it's not a very likely candidate for reuse in
the next computer you design. 

>- What are its characteristics that ...               don't make it one?

First, the conceptual interface that it presents to its clients is not
simple. Clients don't really want to fool around restructuring trees or
reclaiming storage; they just want a fast storage and retrieval facility.
Here they are forced to deal with details that shouldn't concern them
[poor information hiding].

Second, the client interface is inadequately specified. 
Descriptions of the various B-tree operations are supplied only in English, 
which is inherently either verbose or imprecise. 
Global constraints about keeping the tree height uniform, the branchyness from
dropping too low, the indices in left-to-right order, etc. are not spelled out.
Performance characteristics are not indicated. 
The type of the items stored in the B-tree is kept commendably general,
but no mention is made of the fact that correct operation depends on the
comparison procedure testing a <= relation which must be a TOTAL ordering
of the items. 
Etc.

>- How can it be made better?
>- Anything else relevant to producing a good reusable component that is reused

Just as with a circuit board, you may try to improve it, but it shouldn't
be done with an eye to creating a reusable component per se. The circuit
board contains reusable components, and it is part of a larger reusable
component. That is also the situation with the B-tree facility.

The reusable component upon which the B-tree facility should have been based
is the exploration-tree facility. It is a general purpose facility which
provides clients with trees and operations for navigating around in them.
In contrast to the B-tree, it can be used as a component in a wide variety
of situations.

The reusable component which the B-tree facility supports is the
almost-constant-function facility. Conceptually, it provides a
function F from an arbitrary totally ordered domain set to an arbitrary
range set. For all but a few values, F(x) = C, hence the name. A typical
application would be sparse matrices, where the domain would be pairs
of integers (lexicogphically ordered), the range would be floating point
numbers, and C = 0.0. Another might be symbol tables where the domain
would be character strings, the range would be say type names, and
C = Undefined_Type. Data bases, polynomials, etc. provide additional
applications. The <x,y> pairs where F(x) = y # C can be stored in a
B-tree. Note, however, that a wide variety of other implementations
are posible (e.g. AVL tree, linked list, hashing, etc.). Analogously,
a computer architecture can be realized using a variety of chips and
board layouts.

The exploration-tree and the almost-constant-function present clients
with simple interfaces which have clean specifications. That's
surely a necessary feature of a reusable component. 
Another important property is that, once you discover them, you find
that they could be used in a wide variety of current applications.
It's not so clear that you can design reusable components which are
only useful in future software.
/Bill

ath@prosys.se (Anders Thulin) (07/27/90)

In article <27705@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes:

> [ ... deleted ... ]
>- How can it be made better?

By providing:

* external storage capability

  These should be some way to store the B-tree on secondary memory, so
  that the data it contains could be reused in another program.

* browser

  That is, providing a `bt_user_search(tree)' which presents the data
  in the tree, and lets the user select an item for further computation.

This is something I think most components that implement data structures
should have. At least those implementing `complicated' structures.

-- 
Anders Thulin       ath@prosys.se   {uunet,mcsun}!sunic!prosys!ath
Telesoft Europe AB, Teknikringen 2B, S-583 30 Linkoping, Sweden

paul@athertn.Atherton.COM (Paul Sander) (07/28/90)

In article <82454@tut.cis.ohio-state.edu> William F Ogden <ogden@cis.ohio-state.edu> writes:
>Paul Sander [that's me] writes:
[Introductory comments and questions omitted]
>At the risk of getting back into the discussion of why software development
>isn't like the construction of other engineering artifacts, let me suggest
>that, from a reusability perspective, a B-tree facility is rather like
>a circuit board in a computer. You certainly can't argue that it's not an
>electronic component, but it's not a very likely candidate for reuse in
>the next computer you design. 

I prefer to think of code of this sort as being something like integrated
circuits, rather than circuit boards.  They are more abstract in function
than the individual lines of code (as a gate or VLSI circuit is more abstract
in function than the active and passive components that are etched into it),
but less abstract in function than a subsystem or product (as an IC is less
abstract in function than a circuit board or product).  Simple data structures
can be built into more complicated ones in the same way that standard cells
are built into ASICs (Application Specific ICs) with the addition of a certain
amount of "glue" logic.  [Just one man's opinion here]

In any case, when designed well, chips, boards, and (hopefully) software
can be reused in new designs with little or no modification.  The flexibility
of the design and quality of implementation will govern its suitability for
new applications.  It is very common for ICs to be reused in new designs;
it is less common, but certainly not unusual to find circuit boards used in
new product designs.  A data structure that sorts things will likely be used
the next time someone needs to sort things, if they don't have make
substantial changes.

[Regarding my B-tree implementation:]
>>- What are its characteristics that ...               don't make it one?
>
>First, the conceptual interface that it presents to its clients is not
>simple. Clients don't really want to fool around restructuring trees or
>reclaiming storage; they just want a fast storage and retrieval facility.
>Here they are forced to deal with details that shouldn't concern them
>[poor information hiding].

But the client is spared the details of restructuring trees; the tree
collects its own garbage.  The client is forced to manage its own storage,
because the tree can't do an effective job of it anyway; the tree has no
clue as to what kind of stuff the client is storing there.  If it's a bunch
of linked structures, the only way the tree can manage it is to include some
sort of compiler with which a client describes its data, and how to manage
it.  This seems too complicated and error prone.  If the client needs to
store such complex data structures in a tree, then another software component
that maintains those data structures can be used, and objects it creates can
then be stored in the tree.  If both components do a good job managing their
storage, and the client is good about avoiding aliasing problems, then
the storage manages itself in the eyes of the client.

As for the information hiding issue, this was the best I could do with C,
since C doesn't have the data abstraction tools that, say, Modula-2 and Ada
have.  How can it be improved within C's constraints?

>Second, the client interface is inadequately specified. 
>Descriptions of the various B-tree operations are supplied only in English, 
>which is inherently either verbose or imprecise. 

Okay, there is a documentation problem.  How can it be improved in a way
that a large number of potential clients can understand?  I know of no good
way to describe a data structure mathematically, and even if I could do it,
few would understand it.

>Global constraints about keeping the tree height uniform, the branchyness from
>dropping too low, the indices in left-to-right order, etc. are not spelled out.

Why does the client need to know these?  All it sees are the interfaces
documented in the man page.  It doesn't even need to know that this data
structure that provides an indexed retrieval system that sorts data is
really a B-tree.  That information is provided to the client as a motivation
for providing a particular parameter when the data structure is created,
and provides a hook where the client might make performance trade-offs.

>Performance characteristics are not indicated. 

The performance characteristics are listed below, and have been added to
the man page.  Here, "m" is the order of the tree, "n" is the number of
items stored in it, and "M" is either O(log m) if n > 7 or O(n) if n <= 7
(this is the point where a binary search begins to outperform a linear
search).

Retrieval:    O(M * log n)
Creation:     O(1)
Destruction:  O(log m) + O(n)
Insertion:    O(m * log n)
Deletion:     O(m * log n)
Traversal:    O(n)

I couldn't measure the performance in any meaningful way.  Performing some
sort of benchmark on this implementation is of only limited value, because
it specifies only how the tree performed on a particular type of machine
with a particular load with a particular set of data having particular
characteristics.

>The type of the items stored in the B-tree is kept commendably general,
>but no mention is made of the fact that correct operation depends on the
>comparison procedure testing a <= relation which must be a TOTAL ordering
>of the items. 

All that the comparison procedure need guarantee is that, given a domain
of input, the output be the same every time the same two items are
compared.  I had thought that the semantics of "strcmp" were defined that
way when I made reference to it in the B-tree man page.  In any case, this
is another simple documentation problem.

>Etc.

What else?

>>- How can it be made better?
>>- Anything else relevant to producing a good reusable component that is reused
>
>Just as with a circuit board, you may try to improve it, but it shouldn't
>be done with an eye to creating a reusable component per se. The circuit
>board contains reusable components, and it is part of a larger reusable
>component. That is also the situation with the B-tree facility.

I don't understand this.  The intent was to build a piece of software that
could be used in a variety of ways to build bigger pieces of software.  It
was in fact created with an eye to creating a reusable component.  Is there
anything to prevent one from building a nested scope symbol table, using
this as the index?  Is there anything to prevent one from using this to
build a temporary index for an external file?

I guess there are several interpretations of what it means to say "improve
it."  It could mean "how can it be made to have more of the qualities that
make it a reusable software component?" or "how can it be made more bug
free?" or "how can it be made faster, without introducing more bugs?" or
"how can its restrictions be reduced?".  My question about "how can it
be improved" in my original posting was really intended to be the first
of these interpretations.  I expect the second and third are ignored in
practice because "it works, why fix it?".  The fourth is usually addressed
in a rewrite that is in turn reusable and may have a different set of
applications that it addresses better than the first.  That doesn't
mean that the earlier implementation is more or less useful (or reusable)
than the second, just that the second is more appropriate for some things.

>The reusable component upon which the B-tree facility should have been based
>is the exploration-tree facility. It is a general purpose facility which
>provides clients with trees and operations for navigating around in them.
>In contrast to the B-tree, it can be used as a component in a wide variety
>of situations.

This is the first I've heard about exploration-tree facilities.  Without
knowing more about it, I can't ask specific questions.  Could you email to
me some additional information and references, please?

>The reusable component which the B-tree facility supports is the
>almost-constant-function facility. Conceptually, it provides a
>function F from an arbitrary totally ordered domain set to an arbitrary
>range set. For all but a few values, F(x) = C, hence the name. A typical
>application would be sparse matrices, where the domain would be pairs
>of integers (lexicogphically ordered), the range would be floating point
>numbers, and C = 0.0. Another might be symbol tables where the domain
>would be character strings, the range would be say type names, and
>C = Undefined_Type. Data bases, polynomials, etc. provide additional
>applications. The <x,y> pairs where F(x) = y # C can be stored in a
>B-tree. Note, however, that a wide variety of other implementations
>are posible (e.g. AVL tree, linked list, hashing, etc.). Analogously,
>a computer architecture can be realized using a variety of chips and
>board layouts.
>
>The exploration-tree and the almost-constant-function present clients
>with simple interfaces which have clean specifications. That's
>surely a necessary feature of a reusable component. 
>Another important property is that, once you discover them, you find
>that they could be used in a wide variety of current applications.
>It's not so clear that you can design reusable components which are
>only useful in future software.

This coincides with my vision of what a reusable component is.  The
B-tree implementation presented is an attempt to provide a data structure
with a simple interface with clean specifications.  Granted, the specification
may not be as precise as some would like, it does present enough information
to make a decision as to whether or not it is adequate for a specific
task, and how to use the interfaces.

Besides the documentation issues mentioned above, how can this be made to
exhibit more of the qualities one associates with reusable software
components?
-- 
Paul Sander        (408) 734-9822  | "Passwords are like underwear," she said,
paul@Atherton.COM                  | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul  | -- Bennett Falk in "Mom Meets Unix"

paul@athertn.Atherton.COM (Paul Sander) (07/28/90)

In article <533@helios.prosys.se> ath@prosys.se (Anders Thulin) writes:
>In article <27705@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes:
>
>> [ ... deleted ... ]
>>- How can it be made better?
>
>By providing:
>
>* external storage capability
>
>  These should be some way to store the B-tree on secondary memory, so
>  that the data it contains could be reused in another program.

There are a number of features missing from this implementation that could
be useful.  These include Anders' secondary storage capability (which is
what B-trees are used most for), perhaps with multi user locking.  An
implementation that uses shared memory, again with multi-user
locking might be useful.  Still another that is implemented as a server
that could allow data to be shared across machines would be useful.

I guess these issues begin to illustrate why reusable software components
are so difficult to develop.  My implementation addresses none of these
issues, so its utility is limited to those applications that don't need
them.  On the other hand, applications that don't need the additional
features need not pay the overhead demanded by these other implementations
if my B-tree satisfies their needs.

This implies then, that a reusable software component includes in its
documentation a list of particular features and limitations, probably in
an index.  My implementation would include the following:
	B-tree:  in-memory, single user, single machine,
	         scope limited to single running applications

>* browser
>
>  That is, providing a `bt_user_search(tree)' which presents the data
>  in the tree, and lets the user select an item for further computation.
>
>This is something I think most components that implement data structures
>should have. At least those implementing `complicated' structures.

It is debatable whether or not a component should provide a browsing
capability or not.  It has no way of knowing what the client has stored
there.  It could, however, provide bt_getFirst, bt_getLast, bt_getNext,
and bt_getPrev functions that work in conjunction with the bt_search
function to provide the functionality needed for the client to locate
information and provide a more effective browser.

It is also debatable as to whether or not a data structure such as a
B-tree should provide the user interface code needed to support such a
browser.  On the other hand, a reusable Browser software component could
be designed that could work with a large number of data structures without
regard to their implementation.  It is easy for me to visualize a simple
user interface that provides commands for start, stop, move left, right,
up, down, in, out, and show summary or detail.  One such implementation
could register callbacks for each command and provide a way for the
client to format its data.
-- 
Paul Sander        (408) 734-9822  | "Passwords are like underwear," she said,
paul@Atherton.COM                  | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul  | -- Bennett Falk in "Mom Meets Unix"

ogden@seal.cis.ohio-state.edu (William F Ogden) (07/31/90)

In response to comments from Anders, Paul writes:
    ...
>>* browser
>>
>>  That is, providing a `bt_user_search(tree)' which presents the data
>>  in the tree, and lets the user select an item for further computation.
>>
>>This is something I think most components that implement data structures
>>should have. At least those implementing `complicated' structures.
>
>It is debatable whether or not a component should provide a browsing
>capability or not.  It has no way of knowing what the client has stored
>there.  It could, however, provide bt_getFirst, bt_getLast, bt_getNext,
>and bt_getPrev functions that work in conjunction with the bt_search
>function to provide the functionality needed for the client to locate
>information and provide a more effective browser.

Now here we are getting into a real reusablity issue. In designing a
facility for reuse, it is important to try to distinguish between
primary and secondary operations on a structure. The Get_First, 
Get_Next, etc. are primary operations on the reusable structure of
which the B-tree is one realization. Traverse or print-out operations
are secondary, since they can be implemented easily and efficiently using
the primary ones -- provided Get_Next, etc. are included as primaries.

There are generally a vast number of potentially useful secondary 
operations on a structure, but any particular application will
only require a small subset. Moreover the next application may
need other strange secondary operations which nobody ever considered.
Under these circumstances, achieving facility reuse without modification
seems paradoxical until you see that with proper design, secondary
operations can be added without changing the underlying implementation.

Determining which operations are primary and which secondary is
somewhat like finding a basis for a vector space of an independent
set of axioms for a mathematical theory. You list all the operations
on a structure that you can imagine anyone wanting (of course you'll
turn out to lack imagination) then you figure out which ones can
be implemented using which others. A small (and generally useful)
subset, in terms of which all the others can be implemented, form
the nucleus of the reusable facility.

>It is also debatable as to whether or not a data structure such as a
>B-tree should provide the user interface code needed to support such a
>browser. 

Clearly it shouldn't. It involves either a set of secondary operations or
more likely a separate facility.

/Bill

cweir@cpg.trs.reuter.com (Charles Weir) (08/01/90)

In article <27705@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes:

> - Is this a reusable software component?
Yes - a classic example of the kind of reusability that functional languages
make easy. 

> - What are its characteristics that make it one? 

Simple interface.   

Use of reference parameters (structure pointers) as "magic
cookies" where the meaning is not known by the caller to hide the meaning of
the data.   

Use of function pointers to allow "call back" so that the module can
interrogate the caller.

[I should mention that all these are what is now thought of as
Object-Orientation....]  

> - Anything else relevant to producing a good reusable component that is
> reused ...

Now lets get to the meat!!

Here are a couple of issues that the example did not come to grips with...

1) Error and Exception handling.

In Paul's example, exception handling was simple.   He returns an error code.
But what happens to that error code in the module that calls his code?   Is is
passed back - and where, finally, does the buck stop?

Most modules end up needing some kind of error handling mechanism - and this
can provide a real headache for reusable components:-

	A function supplied by the caller?   

	perror(), followed by exit() - [I'm a C programmer]?   

	Pass back a return code - and if so what?

Each option is appropriate in some applications and contexts, and not in
others.   Yet the reusable module writer must choose one, since they are
mutually exclusive.   Hence (s)he looses potential customers for the reuse
whichever option is chosen.

2) Configuration

This is another embarrassment in re-usable code.   Configuration parameters
come in several forms:-

Size parameters: 
If one application using a reusable module must run in a
small space, it will want to make all structures (buffers, strings etc.) as
small as reliably possible.  A second application, though, has speed as a
priority, and wants them as large as possible.

Text strings:
Some target users like messages like "DBLX fault at 0xfc0040":  others must be
treated more gently, or speak different languages.

Magic items:
Patterns on a screen, locations of files, etc.

I can think of several ways to implement such configuration decisions:-

	Compilation constants (#defines) mean that the code must actually be
	changed for reuse.

	Parameters passed through a functional interface make the interface
	complex, and may mean that the calling application must "know" rather
	too much about the module.

	Configuration files provide a source on confusion and errors.

Any option may be appropriate, but each cuts out potential for re-use.

----------------------------------------------------

I don't have any solutions, or even preferred options.

I'm sure others have - any comments?

- Charles
--
Charles Weir, Rich Inc, 1400 Kensington, Oak Brook, IL 60521
Email: cweir@richsun.uucp  uunet!richsun!cweir 
Phone: +1 708 574 7424 X2766

rwallace@vax1.tcd.ie (08/12/90)

In article <27914@athertn.Atherton.COM>, paul@athertn.Atherton.COM (Paul Sander) writes:
>>  These should be some way to store the B-tree on secondary memory, so
>>  that the data it contains could be reused in another program.
> 
> There are a number of features missing from this implementation that could
> be useful.  These include Anders' secondary storage capability (which is
> what B-trees are used most for), perhaps with multi user locking.  An
> implementation that uses shared memory, again with multi-user
> locking might be useful.  Still another that is implemented as a server
> that could allow data to be shared across machines would be useful.

As a matter of fact I think the module already has too much functionality. The
main reason I don't reuse software components (and I program in C++ which
combines the advantages of being the best OOP/software engineering language
around with the vast range of available C code) is that people who write
generic modules put in everything expect the kitchen sink. At the moment I'm
working on a project which requires screen handling, file handling via indexes,
and file and in-memory sort functions. There are commercially available
packages available to do all these things. I ignored them and wrote my own for
two reasons:

1. The commercial packages contained orders of magnitude more code than I need
for this project (a fourth-generation language for writing accounting
software). Their use would have expanded the size of applications produced with
the 4GL from around 100K to an estimated 400K (!!!).

2. To wade through all the junk and figure out how to make the commercial
packages actually work would have taken nearly as long as writing my own code.

The first problem can be solved by what I hope will be the next generation of
super-optimizing compilers that can examine all modules of a project
simultaneously and eliminate code that will never be used e.g.

main () { printf ("Hello, World!\n"); }

would end up as 100 bytes because the compiler would eliminate all the code
from printf to handle numbers etc.

The second problem requires an attitude change.

"To summarize the summary of the summary: people are a problem"
Russell Wallace, Trinity College, Dublin
rwallace@vax1.tcd.ie

paul@athertn.Atherton.COM (Paul Sander) (08/15/90)

In article <6729.26c481cb@vax1.tcd.ie> rwallace@vax1.tcd.ie writes:
>In article <27914@athertn.Atherton.COM>, paul@athertn.Atherton.COM (Paul Sander) writes:
>>>  These should be some way to store the B-tree on secondary memory, so
>>>  that the data it contains could be reused in another program.
>> 
>> There are a number of features missing from this implementation that could
>> be useful.  These include Anders' secondary storage capability (which is
>> what B-trees are used most for), perhaps with multi user locking.  An
>> implementation that uses shared memory, again with multi-user
>> locking might be useful.  Still another that is implemented as a server
>> that could allow data to be shared across machines would be useful.
>
>As a matter of fact I think the module already has too much functionality.

These additional features need not be added to this particular implementation.
Most of the new features in fact cannot be implemented with same the interfaces
as the example.  Here, the programmer would have to choose the implementation
that best suits their needs.

>                                                                           The
>main reason I don't reuse software components (and I program in C++ which
>combines the advantages of being the best OOP/software engineering language
>around with the vast range of available C code) is that people who write
>generic modules put in everything expect the kitchen sink.

It is true that many such components include many features, more than most
clients need.  There is a utility/size trade-off made.  However, if the
component is carefully designed, and the development environment is decent,
the size impact can be minimized; the linker and librarian would omit the
unused code from the executable image.

>                                                           At the moment I'm
>working on a project which requires screen handling, file handling via indexes,
>and file and in-memory sort functions. There are commercially available
>packages available to do all these things. I ignored them and wrote my own for
>two reasons:
>
>1. The commercial packages contained orders of magnitude more code than I need
>for this project (a fourth-generation language for writing accounting
>software). Their use would have expanded the size of applications produced with
>the 4GL from around 100K to an estimated 400K (!!!).
>
>2. To wade through all the junk and figure out how to make the commercial
>packages actually work would have taken nearly as long as writing my own code.
>
>The first problem can be solved by what I hope will be the next generation of
>super-optimizing compilers that can examine all modules of a project
>simultaneously and eliminate code that will never be used e.g.
>
>main () { printf ("Hello, World!\n"); }
>
>would end up as 100 bytes because the compiler would eliminate all the code
>from printf to handle numbers etc.

These super-optimizing compilers are actually performing global analyses to
use the CPU resources more efficiently;  allocating registers in a manner that
is efficient for the program as a whole (as opposed to a manner that is
efficient for a loop or function) is one such benefit, but not the only one.
If the library is designed well, a new compiler isn't needed to dump the
unused code.  On the other hand, the unused code is already there, waiting
to be used should the need arise.

Printf is a special case because of its polymorphism.  Here, the compiler
would have to recognize the printf call and parse the format string (assuming
it is a constant pointer to a constant) to optimize the code.  Such
optimizations wouldn't be done in many practical applications; national
language support enabled applications come to mind here.

>The second problem requires an attitude change.

And good documentation, with a good index.

How many projects like your current one have you done?  How much code have
you (or others) rewritten?  Have you ever turned up bugs in related
applications where you had to make the same bug fix in many places?  There
are three big wins you get from using reusable packages:  Your software will
(theoretically) be developed faster, because less needs to be written from
scratch; Your software will (theoretically) be more bug free because it is
built from (theoretically) bug free components; Your program will
(theoretically) be easier to maintain because bug fixes need only be made in
one place.  Some might even argue that your program would be "better"
(faster, smaller) because (theoretically) the implementor was expert at
implementing the thing at hand, and was able to make various optimizations
you wouldn't happen to think of.  Finally, come might argue that your
program is more portable, because the component has been ported to many
platforms (malloc and printf are handy examples of this).

What do you think should be taken out?
-- 
Paul Sander        (408) 734-9822  | "Passwords are like underwear," she said,
paul@Atherton.COM                  | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul  | -- Bennett Falk in "Mom Meets Unix"

paul@athertn.Atherton.COM (Paul Sander) (08/18/90)

Okay, I've taken people's suggestions into account, and now have an updated
implementation of my in-memory B-tree.  The documentation and some
miscellaneous stuff follow in "shar" format, and the actual implementation
will appear in a subsequent posting.

The changes I've made are the following:
- Added first/next/last/prev browsing capability
- Added "big-O" performance figures to the documentation
- Split the implementation into several files that a linker can mix and
  match as needed to save space
- Ported to VMS
- Included troff source for documentation

My questions now are:

- Is this still a reusable component?
- Is it an improvement over the previous attempt?
- How can it be improved still further?
- If you were building a system in C, would _you_, personally, use it?  Why,
  or why not?
- If this were installed in a much larger component library, would you use it?
  Why, or why not?

Paul Sander
paul@Atherton.COM
(408) 734-9822
--------------------------------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#	README
#	btree.doc
#	Makefile
#	Makefile.vms
#	btree.man
# This archive created: Fri Aug 17 20:48:29 1990
export PATH; PATH=/bin:/usr/bin:$PATH
if test -f 'README'
then
	echo shar: "will not over-write existing file 'README'"
else
cat << \SHAR_EOF > 'README'
This directory contains source code to an in-memory B-tree implementation.
The tree can hold arbitrary keys, but duplicate keys cannot be inserted.
Also, this was written in K&R C, so please handle accordingly.

This implementation has been placed in the public domain by its author,
Paul Sander, to be used as you wish.  However, if you add neat new features,
I'd appreciate having a copy sent to me at paul.Atherton.COM, or
sun!athertn!paul .

Following are the files included, and create a library called libbtree.a on
Unix, and LIBBTREE.OLB on VMS.

Makefile	Rules for building library and test case (SunOS)
Makefile.vms	Rules for building library and test case (VMS MMS)
README		This file
bt_last.c	Implementation of bt_last function
bt_next.c	Implementation of bt_next function
bt_prev.c	Implementation of bt_prev function
btdelete.c	Implementation of bt_delete function
btdestroy.c	Implementation of bt_destroy function
btdump.c	Implementation of bt_dump function (non-portable)
btfirst.c	Implementation of bt_first function
btinsert.c	Implementation of bt_insert function
btmalloc.c	Heap utilities used for debugging
btnew.c		Implementation of bt_new function
btpriv.h	Private header file used by B-tree source files
btree.doc	Formatted man page, this is ASCII text
btree.h		Public include file, to be used by B-tree clients
btree.man	Man page source -- format with "[tn]roff -man"
btsearch.c	Implementation of bt_search function
bttraverse.c	Implementation of bt_traverse function
btutil.c	Utility functions
test.c		Test case, links with library to make executable
SHAR_EOF
fi
if test -f 'btree.doc'
then
	echo shar: "will not over-write existing file 'btree.doc'"
else
cat << \SHAR_EOF > 'btree.doc'



BTREE()           MISC. REFERENCE MANUAL PAGES            BTREE()



NAME
     bt_dump,   bt_search,   bt_new,    bt_destroy,    bt_insert,
     bt_traverse, bt_delete - In-memory B-tree manipulation

SYNOPSIS
     #include <btree.h>

     BTREE     bt_new(order,comp)
     int  order;
     int  (*comp)();

     void bt_destroy(tree,free_key)
     BTREE     tree;
     void (*free_key)();

     int bt_insert(tree,key)
     BTREE     tree;
     char *key;

     char *bt_delete(tree,key)
     BTREE     tree;
     char *key;

     char *bt_search(tree,target)
     BTREE     tree;
     char *target;

     void bt_traverse(tree,fn,parms)
     BTREE     tree;
     void (*fn)();
     char *parms;

     void bt_dump(tree,key_dump)
     BTREE     tree;
     void (*key_dump)();

     char *bt_first(tree)
     BTREE     tree;

     char *bt_last(tree)
     BTREE     tree;

     char *bt_next(tree)
     BTREE     tree;

     char *bt_prev(tree)
     BTREE     tree;

DESCRIPTION
     These functions implement an in-memory  B-tree  data  struc-
     ture.   The tree itself stores only pointers to the client's
     data,  and  relies  on  client-provided  functions  for  any



Sun Release 4.0           Last change:                          1






BTREE()           MISC. REFERENCE MANUAL PAGES            BTREE()



     comparisons and storage deallocation.

     bt_new creates a  new  B-tree.   The  client  specifies  the
     desired  order  (the maximum number of children allowed each
     node) of the tree (for performance trade-offs based  on  the
     efficiency  of  comparisons  of client-provided data), and a
     pointer to a function that compares two client-provided data
     structures.   The comparison function, comp, has the follow-
     ing interface:
          int comp(k1,k2)
          char *k1,*k2;
     The result of this function is -1 if the object  pointed  to
     by  k1 is "less than" the object pointed to by k2, 0 if they
     are "equal", and 1 otherwise.  Client-provided  data  struc-
     tures  that  compare  greater  by  this function will appear
     later in the lexical order of the data stored in  the  tree.
     bt_new returns a pointer to a handle for the B-tree, or NULL
     if an error occurs.

     bt_destroy deallocates all storage allocated  to  a  B-tree.
     The  client provides a pointer to a visitation function that
     is called once for each item stored in the tree.  The  items
     are  visited  in arbitrary order.  If NULL is passed instead
     of a pointer to a function,  no  action  is  taken  for  the
     client-provided  data,  but  the  tree  structure  itself is
     freed.

     bt_insert inserts a new item into the tree.  1  is  returned
     if  the  insertion was successful, -1 is returned if the new
     item matches another item that  has  already  been  inserted
     into the tree, and 0 is returned in the event of an error.

     bt_delete deletes an item from a tree.  The  value  returned
     is  the  same  as that passed to bt_insert when the item was
     inserted, or NULL if the item is not found.

     bt_search searches  for  an  item  in  a  tree.   The  value
     returned  is  the  same as that passed to bt_insert when the
     item was inserted, or NULL if the item is not found.

     bt_traverse traverses the tree,  calling  a  client-provided
     visitation  function fn once for each item stored there.  fn
     has the following interface:
          void fn(item,parms)
          char *item;
          char *parms;
     item is the pointer stored when the item was  inserted  into
     the  tree.   parms  is  an arbitrary pointer that the client
     wishes to pass to the visitation function, but is  otherwise
     unused  by  the B-tree implementation.  Items are visited in
     their lexical order.




Sun Release 4.0           Last change:                          2






BTREE()           MISC. REFERENCE MANUAL PAGES            BTREE()



     bt_dump displays the contents of the tree to  stdout,  along
     with  some  diagnostic  and  statistical  information.   The
     key_dump function is called once for each item in the  tree,
     in arbitrary order.  It may be NULL if no action is desired.
     Its interface follows:
          void key_dump(item)
          char *item;

     bt_first returns the item that falls earliest in the lexical
     order  of  the items stored in the tree, or NULL if the tree
     is empty.

     bt_last returns the item that falls latest  in  the  lexical
     order  of  the items stored in the tree, or NULL if the tree
     is empty.

     bt_next returns the next item higher in  the  lexical  order
     after  the  last  key  returned  by  bt_first,  bt_next,  or
     bt_search.  If bt_search failed to  find  an  item,  bt_next
     returns  the  next item higher in the lexical order that was
     stored in the tree.  NULL is returned if the  last  call  to
     bt_next  returned  the  item  falling highest in the lexical
     order of the items stored in the tree, or if  the  tree  was
     modified since the last call to bt_next or bt_prev.

     bt_prev returns the next item lower  in  the  lexical  order
     after   the  last  key  returned  by  bt_last,  bt_next,  or
     bt_search.  If bt_search failed to  find  an  item,  bt_prev
     returns  the  next  item lower in the lexical order that was
     stored in the tree.  NULL is returned if the  last  call  to
     bt_prev  returned  the  item  falling  lowest in the lexical
     order of the items stored in the tree, or if  the  tree  was
     modified since the last call to bt_next or bt_prev.

     Worst case performance  characteristics  are  listed  below.
     Here,  "m"  is  number  passed  as  the "order" parameter to
     bt_new, and "n" is the number of items stored in the tree.
          bt_search:     O(m * log n)
          bt_new:        O(1)
          bt_destroy:    O(log n) + O(n)  when  free_key  is  not
                         NULL
                         O(log n) otherwise
          bt_insert:     O(m * log n)
          bt_delete:     O(m * log n)
          bt_traverse:   O(n)
          bt_next:       O(log n)
          bt_prev:       O(log n)
          bt_first:      O(log n)
          bt_last:       O(log n)

BUGS
     This implementation is written in K&R C, and tested only  on



Sun Release 4.0           Last change:                          3






BTREE()           MISC. REFERENCE MANUAL PAGES            BTREE()



     a Sun 3 running SunOS 4.0.3 and on VMS V5.3.

     This implementation assumes "char*" is the generic  pointer.
     This  goes against practice used by most "modern" compilers,
     but aids portability to older  compilers.   Note  also  that
     this will probably generate lots of complaints from Lint.

     bt_dump assumes that pointers are the same size as integers,
     and that they can be displayed in total in eight hexidecimal
     digits.













































Sun Release 4.0           Last change:                          4



SHAR_EOF
fi
if test -f 'Makefile'
then
	echo shar: "will not over-write existing file 'Makefile'"
else
cat << \SHAR_EOF > 'Makefile'
TARGET = libbtree.a

OBJECTS = btmalloc.o btutil.o bt_last.o bt_next.o bt_prev.o \
	btdelete.o btdestroy.o btdump.o btfirst.o btinsert.o btnew.o \
	btsearch.o bttraverse.o

$(TARGET): $(OBJECTS)
	ar r $@ $?
	ranlib $@

$(OBJECTS): btpriv.h

test:	test.o $(TARGET)
	cc -o test test.o $(TARGET) /usr/lib/debug/malloc.o

btree.doc: btree.man
	nroff -man btree.man > btree.doc

clean:
	rm *.o *.a *.doc test
SHAR_EOF
fi
if test -f 'Makefile.vms'
then
	echo shar: "will not over-write existing file 'Makefile.vms'"
else
cat << \SHAR_EOF > 'Makefile.vms'
TARGET = libbtree.olb

OBJECTS = btmalloc.obj btutil.obj bt_last.obj bt_next.obj \
	bt_prev.obj btdelete.obj btdestroy.obj btdump.obj btfirst.obj \
	btinsert.obj btnew.obj btsearch.obj bttraverse.obj

$(TARGET) : $(OBJECTS)
	if f$search("$(TARGET)") .eqs. "" then $library/create/object $@
	library/replace $@ $?

$(OBJECTS) : btpriv.h

test :	test.obj $(TARGET)
	link/exe=$@.exe test.obj,$(TARGET)/lib,sys$library:vaxcrtl.olb/lib

clean :
	delete *.obj;*,*.lib;*,test.exe
SHAR_EOF
fi
if test -f 'btree.man'
then
	echo shar: "will not over-write existing file 'btree.man'"
else
cat << \SHAR_EOF > 'btree.man'
.TH BTREE
.SH NAME
bt_dump, bt_search, bt_new, bt_destroy, bt_insert, bt_traverse, bt_delete -
In-memory B-tree manipulation
.SH SYNOPSIS
#include <btree.h>
.sp
BTREE	bt_new(order,comp)
.br
int	order;
.br
int	(*comp)();
.sp
void bt_destroy(tree,free_key)
.br
BTREE	tree;
.br
void	(*free_key)();
.sp
int bt_insert(tree,key)
.br
BTREE	tree;
.br
char	*key;
.sp
char	*bt_delete(tree,key)
.br
BTREE	tree;
.br
char	*key;
.sp
char *bt_search(tree,target)
.br
BTREE	tree;
.br
char	*target;
.sp
void bt_traverse(tree,fn,parms)
.br
BTREE	tree;
.br
void	(*fn)();
.br
char	*parms;
.sp
void bt_dump(tree,key_dump)
.br
BTREE	tree;
.br
void	(*key_dump)();
.sp
char *bt_first(tree)
.br
BTREE	tree;
.sp
char *bt_last(tree)
.br
BTREE	tree;
.sp
char *bt_next(tree)
.br
BTREE	tree;
.sp
char *bt_prev(tree)
.br
BTREE	tree;
.SH DESCRIPTION
These functions implement an in-memory B-tree data structure.  The tree
itself stores only pointers to the client's data, and relies on
client-provided functions for any comparisons and storage deallocation.
.sp
.B bt_new
creates a new B-tree.  The client specifies the desired order (the maximum
number of children allowed each node) of the tree
(for performance trade-offs based on the efficiency of comparisons of
client-provided data), and a pointer to a function that compares two
client-provided data structures.  The comparison function, 
.BR comp ,
has the following interface:
.RS
.B
int comp(k1,k2)
.br
.B
char *k1,*k2;
.RE
The result of this function is -1 if the object pointed to by k1 is "less than"
the object pointed to by k2, 0 if they are "equal", and 1 otherwise.
Client-provided data structures that
compare greater by this function will appear later in the lexical order
of the data stored in the tree.
.B bt_new
returns a pointer to a handle for the B-tree, or NULL if an error occurs.
.sp
.B bt_destroy
deallocates all storage allocated to a B-tree.  The client provides a pointer
to a visitation function that is called once for each item stored in the
tree.  The items are visited in arbitrary order.  If
NULL is passed instead of a pointer to a function, no action is taken for
the client-provided data, but the tree structure itself is freed.
.sp
.B bt_insert
inserts a new item into the tree.  1 is returned if the insertion was
successful, -1 is returned if the new item matches another item that has
already been inserted into the tree, and 0 is returned in the event of an
error.
.sp
.B bt_delete
deletes an item from a tree.  The value returned is the same as that passed
to
.B bt_insert
when the item was inserted, or NULL if the item is not found.
.sp
.B bt_search
searches for an item in a tree.  The value returned is the same as that passed
to
.B bt_insert
when the item was inserted, or NULL if the item is not found.
.sp
.B bt_traverse
traverses the tree, calling a client-provided visitation function
.B fn
once for each item stored there.
.B fn
has the following interface:
.RS
.B
void fn(item,parms)
.br
.B
char *item;
.B
.br
char *parms;
.RE
.B item
is the pointer stored when the item was inserted into the tree.
.B parms
is an arbitrary pointer that the client wishes to pass to the visitation
function, but is otherwise unused by the B-tree implementation.  Items
are visited in their lexical order.
.sp
.B bt_dump
displays the contents of the tree to stdout, along with some diagnostic and
statistical information.  The
.B key_dump
function is called once for each item in the tree, in arbitrary order.  It
may be NULL if no action is desired.  Its interface follows:
.RS
.B
void key_dump(item)
.br
.B
char *item;
.RE
.sp
.B bt_first
returns the item that falls earliest in the lexical order of the items
stored in the tree, or NULL if the tree is empty.
.sp
.B bt_last
returns the item that falls latest in the lexical order of the items
stored in the tree, or NULL if the tree is empty.
.sp
.B bt_next
returns the next item higher in the lexical order after the last key
returned by
.BR bt_first ,
.BR bt_next ,
.b bt_prev ,
or
.BR bt_search .
If
.B bt_search
failed to find an item,
.B bt_next
returns the next item higher in the
lexical order that was stored in the tree.  NULL is returned if
the last call to
.B bt_next
returned the item falling highest in the
lexical order of the items stored in the tree, or if the tree was
modified since the last call to
.B bt_next
or
.BR bt_prev .
.sp
.B bt_prev
returns the next item lower in the lexical order after the last key
returned by
.BR bt_last ,
.BR bt_next ,
.b bt_prev ,
or
.BR bt_search .
If
.B bt_search
failed to find an item,
.B bt_prev
returns the next item lower in the
lexical order that was stored in the tree.  NULL is returned if
the last call to
.B bt_prev
returned the item falling lowest in the
lexical order of the items stored in the tree, or if the tree was
modified since the last call to
.B bt_next
or
.BR bt_prev .
.sp
Worst case performance characteristics are listed below.  Here, "m" is number
passed as the "order" parameter to bt_new, and "n" is the number of items
stored in the tree.
.RS
bt_search:	O(m * log n)
.br
bt_new:		O(1)
.br
bt_destroy:	O(log n) + O(n) when free_key is not NULL
.br
		O(log n) otherwise
.br
bt_insert:	O(m * log n)
.br
bt_delete:	O(m * log n)
.br
bt_traverse:	O(n)
.br
bt_next:		O(log n)
.br
bt_prev:		O(log n)
.br
bt_first:		O(log n)
.br
bt_last:		O(log n)
.RE
.SH BUGS
This implementation is written in K&R C, and tested only on a Sun 3 running
SunOS 4.0.3 and on VMS V5.3.
.sp
This implementation assumes "char*" is the generic pointer.  This goes
against practice used by most "modern" compilers, but aids portability to
older compilers.  Note also that this will probably generate lots of
complaints from Lint.
.sp
.B bt_dump
assumes that pointers are the same size as integers, and that they can be
displayed in total in eight hexidecimal digits.
SHAR_EOF
fi
exit 0
#	End of shell archive
-- 
Paul Sander        (408) 734-9822  | "Passwords are like underwear," she said,
paul@Atherton.COM                  | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul  | -- Bennett Falk in "Mom Meets Unix"

paul@athertn.Atherton.COM (Paul Sander) (08/18/90)

Here is the updated implementation of the in-memory B-tree that has been
used as sample code for this thread.  Please see the related posting
identified as <28982@athertn.Atherton.COM>.
----------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#	btpriv.h
#	btree.h
#	bt_last.c
#	bt_next.c
#	bt_prev.c
#	btdelete.c
#	btdestroy.c
#	btdump.c
#	btfirst.c
#	btinsert.c
#	btmalloc.c
#	btnew.c
#	btsearch.c
#	bttraverse.c
#	btutil.c
#	test.c
# This archive created: Fri Aug 17 20:48:43 1990
export PATH; PATH=/bin:/usr/bin:$PATH
if test -f 'btpriv.h'
then
	echo shar: "will not over-write existing file 'btpriv.h'"
else
cat << \SHAR_EOF > 'btpriv.h'
/* The following structure describes a B-tree node. */

struct btnode {
	int		nkeys;		/* Number of keys stored here */
	int		currKey;	/* Last key found */
	struct btnode	*parent;	/* Parent of this node */
	char		**keys;		/* array[order-1] of keys */
	struct btnode	**children;	/* array[order] of children */
};

/* The following structure describes a B-tree. */

struct btree {
	int		order;		/* Max children permitted */
	int		(*comp)();	/* Comparison function for keys */
	struct btnode	*root;		/* Root of tree */
	int		capacity;	/* Max keys that will fit */
	int		contains;	/* Key currently stored */
	struct btnode	*currNode;	/* Node accessed */
	int		nextOk;		/* Indicates last search successful */
};

typedef struct btree *BTREE;
typedef struct btnode BTNODE;

#ifndef DEBUG
#define bt_malloc malloc
#define bt_free free
#else
extern char *bt_malloc();
extern char *bt_free();
#endif

extern bt_malloc_verify();

#ifdef __BTUTIL_C__
extern int bt_searchNode();
extern int bt_rotateRight();
extern int bt_rotateLeft();
#endif

SHAR_EOF
fi
if test -f 'btree.h'
then
	echo shar: "will not over-write existing file 'btree.h'"
else
cat << \SHAR_EOF > 'btree.h'
/* Header file for in-memory B-tree implementation */

typedef char *BTREE;

extern	void	bt_dump();
extern	char	*bt_search();
extern	BTREE	bt_new();
extern	void	bt_destroy();
extern	int	bt_insert();
extern	void	bt_traverse();
extern	char	*bt_delete();
extern	char	*bt_first();
extern	char	*bt_next();
extern	char	*bt_last();
extern	char	*bt_prev();
SHAR_EOF
fi
if test -f 'bt_last.c'
then
	echo shar: "will not over-write existing file 'bt_last.c'"
else
cat << \SHAR_EOF > 'bt_last.c'
/********************************************************************
 *
 * bt_last.c -- This file contains functions needed to find the
 *              key that is highest in the lexical order of keys
 *              stored in an in-memory B-tree.
 ********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/********************************************************************
 *
 * bt_last -- This function returns the key that appears last in
 *            the lexical order of the items stored in the tree.
 *            NULL is returned if the tree is empty.
 *
 *******************************************************************/

char *bt_last(tree)
BTREE	tree;
{
	BTNODE	*node;

	if (tree == NULL) return NULL;
	node = tree->root;
	if (node->nkeys == 0) return NULL;
	while (node->children != NULL)
	{
		node->currKey = node->nkeys;
		node = node->children[node->nkeys];
	}
	node->currKey = node->nkeys - 1;
	tree->currNode = node;
	tree->nextOk = 1;
	return node->keys[node->nkeys - 1];
}
SHAR_EOF
fi
if test -f 'bt_next.c'
then
	echo shar: "will not over-write existing file 'bt_next.c'"
else
cat << \SHAR_EOF > 'bt_next.c'
/********************************************************************
 *
 * bt_next.c -- This file contains functions needed to scan an in-memory
 *              B-tree in ascending order.
 *
 ********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/********************************************************************
 *
 * bt_next -- This function returns the key that appears next in
 *            the lexical order of the items stored in the tree,
 *            after the last bt_search, bt_next, bt_prev, bt_last,
 *            or bt_first.  NULL is returned if the tree is empty, an
 *            insertion or deletion invalidated the state of the
 *            tree, or the last item in the tree was already visited.
 *
 *******************************************************************/

char *bt_next(tree)
BTREE	tree;
{
	BTNODE	*node;
	int	key;

	/* Return if error, or insertion/deletion invalidated state */
	if ((tree == NULL) || (tree->currNode == NULL)) return NULL;

	/* Set up temporaries */
	node = tree->currNode;
	key = node->currKey;

	/* Return if we've overrun the tree */
	if (key >= node->nkeys) return NULL;

	/*
	 * Bump current key in current node, compensating for unsuccessful
	 * search
	 */
	if (tree->nextOk) key++;
	tree->nextOk = 1;

	/* If node has children, return leftmost key of next child */
	if (node->children != NULL)
	{
		node->currKey = key;
		node = node->children[key];
		while (node->children != NULL)
		{
			node->currKey = 0;
			node = node->children[0];
		}
		node->currKey = 0;
		tree->currNode = node;
		return node->keys[0];
	}

	/* Leaf node, return next key */
	if (key < node->nkeys)
	{
		node->currKey = key;
		return node->keys[key];
	}

	/* Already visited rightmost key of leaf, go back to parent */
	while ((node->parent != NULL) && (key >= node->nkeys))
	{
		node = node->parent;
		key = node->currKey;
	}

	/* Return NULL after last key in tree */
	if (key >= node->nkeys)
	{
		tree->currNode = node;
		node->currKey = key;
		return NULL;
	}

	/* Return next key in parent */
	tree->currNode = node;
	return node->keys[key];
}
SHAR_EOF
fi
if test -f 'bt_prev.c'
then
	echo shar: "will not over-write existing file 'bt_prev.c'"
else
cat << \SHAR_EOF > 'bt_prev.c'
/********************************************************************
 *
 * btprev.c -- This file contains functions needed to scan an in-memory
 *             B-tree in descending order.
 *
 ********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/********************************************************************
 *
 * bt_prev -- This function returns the next key that appears
 *            earlier in the lexical order of the items stored in
 *            the tree, after the last bt_search, bt_next, bt_prev, or
 *            bt_last.  NULL is returned if the tree is empty, an
 *            insertion or deletion invalidated the state of the
 *            tree, or the next item in the tree was already visited.
 *
 *******************************************************************/

char *bt_prev(tree)
BTREE	tree;
{
	BTNODE	*node;
	int	key;

	/* Return if error, or insertion/deletion invalidated state */
	if ((tree == NULL) || (tree->currNode == NULL)) return NULL;
	tree->nextOk = 1;

	/* Set up temporaries */
	node = tree->currNode;
	key = node->currKey;

	/* Return if we've overrun the tree */
	if (key < 0) return NULL;

	/*
	 * If node has children, return rightmost key of previous
	 * child
	 */
	if (node->children != NULL)
	{
		node = node->children[key];
		while (node->children != NULL)
		{
			node->currKey = node->nkeys;
			node = node->children[node->nkeys];
		}
		node->currKey = node->nkeys - 1;
		tree->currNode = node;
		return node->keys[node->currKey];
	}

	/* Leaf node, return previous key */
	key--;
	if (key >= 0)
	{
		node->currKey = key;
		return node->keys[key];
	}

	/* Already visited leftmost key of node, go back to parent */
	while ((node->parent != NULL) && (key < 0))
	{
		node = node->parent;
		key = node->currKey - 1;
	}

	/* Return NULL after last key in tree */
	if (key < 0)
	{
		node->currKey = -1;
		tree->currNode = node;
		return NULL;
	}

	/* Return next key in parent */
	node->currKey = key;
	tree->currNode = node;
	return node->keys[key];
}
SHAR_EOF
fi
if test -f 'btdelete.c'
then
	echo shar: "will not over-write existing file 'btdelete.c'"
else
cat << \SHAR_EOF > 'btdelete.c'
/********************************************************************
 *
 * btdelete.c -- This file contains functions needed to delete a key
 *               from an in-memory B-tree.
 *
 ********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/********************************************************************
 *
 * bt_delKey -- This function deletes the specified key from the
 *              specified node.
 *
 ********************************************************************/

static bt_delKey(node,n)
BTNODE	*node;
int	n;
{
	int	i;

	for (i = n; i < node->nkeys - 1; i++)
	{
		node->keys[i] = node->keys[i+1];
	}
	if (node->children != NULL)
	{
		for (i = n; i < node->nkeys; i++)
		{
			node->children[i] = node->children[i+1];
		}
	}
	node->nkeys--;
	return;
}

/********************************************************************
 *
 * bt_weld -- This function is the opposite of burst, above.  It joins
 *            two neighboring children of the given node, and lowers a
 *            key into the result.
 *
 ********************************************************************/

static bt_weld(tree,node,n)
BTREE	tree;
BTNODE	*node;
int	n;
{
	int	i;
	int	j;
	BTNODE	*left;
	BTNODE	*right;

	if (node->children == NULL) return;
	left = node->children[n];
	right = node->children[n+1];
	i = left->nkeys;
	left->keys[i] = node->keys[n];
	for (i++, j = 0; j < right->nkeys; i++, j++)
	{
		left->keys[i] = right->keys[j];
	}
	for (i = n + 1; i <= node->nkeys; i++)
	{
		node->keys[i-1] = node->keys[i];
		node->children[i] = node->children[i+1];
	}
	node->nkeys--;
	if (left->children != NULL)
	{
		for (i = left->nkeys + 1, j = 0; j <= right->nkeys; i++, j++)
		{
			left->children[i] = right->children[j];
			left->children[i]->parent = left;
		}
		bt_free(right->children);
	}
	left->nkeys += right->nkeys + 1;
	bt_free(right->keys);
	bt_free(right);
	tree->capacity -= tree->order - 1;
	return;
}

/********************************************************************
 *
 * bt_delPred -- This function searches a sub-tree, and deletes the
 *               highest key it finds, and returns it to its caller.
 *               If any node along the way  loses its b-tree'ness, the
 *               tree is reorganized after the deletion.
 *
 ********************************************************************/

static char *bt_delPred(tree,node)
BTREE	tree;
BTNODE	*node;
{
	BTNODE	*pnode;
	int	res;
	char	*retval;

	/*
	 * If at a leaf node, delete the highest key, otherwise
	 * call self recursively
	 */
	if (node->children == NULL)
	{
		retval = node->keys[node->nkeys - 1];
		node->nkeys--;
	}
	else
	{
		pnode = node->children[node->nkeys];
		retval = bt_delPred(tree,pnode);

		/* Reorganize tree if node gets too empty */
		if (pnode->nkeys < (tree->order - 1) / 2)
		{
			res = bt_rotateRight(node,node->nkeys - 1,tree->order);
			if (res == 0) bt_weld(tree,node,node->nkeys - 1);
		}
	}
	return retval;
}

/********************************************************************
 *
 * bt_delNode -- This function searches a sub-tree for a key, and
 *               deletes the key.  If the key is found in a leaf node,
 *               it is simple removed and returned to the caller;
 *               otherwise, it is swapped with its predecessor, and
 *               the returned to the caller.  If any node loses its
 *               b-tree'ness, the tree is reorganized.  If the key is
 *               not found, NULL is returned.
 *
 ********************************************************************/

static char *bt_delNode(tree,node,key)
BTREE	tree;
BTNODE	*node;
char	*key;
{
	int	i;
	int	res;
	char	*retval;

	/* Key not found */
	if (node == NULL) return NULL;

	/* Search for key, descend tree until found or find leaf */
	i = bt_searchNode(node,key,tree->comp,&res);
	if (res == 0)
	{
		if (node->children == NULL) return NULL;
		retval = bt_delNode(tree,node->children[i],key);
	}
	else
	{
		/* Return the deleted node to caller */
		retval = node->keys[i];

		if (node->children == NULL)
		{
			/* Delete from leaf */
			bt_delKey(node,i);
		}
		else
		{
			/* Swap with predecessor */
			node->keys[i] = bt_delPred(tree,node->children[i]);
		}
	}

	/* If a child node was reorganized, rebalancing may be necessary */
	if ((node->children != NULL) &&
	    (node->children[i]->nkeys < (tree->order - 1) / 2))
	{
		res = bt_rotateRight(node,i - 1,tree->order);
		if (res == 0) res = bt_rotateLeft(node,i,tree->order);
		if (res == 0)
		{
			if (i >= node->nkeys) i--;
			bt_weld(tree,node,i);
		}
	}
	return retval;
}

/********************************************************************
 *
 * bt_delete -- This function deletes the specified key from a
 *              b-tree and reorganizes it as necessary.  The deleted
 *              key is returned to the caller if it is found, NULL
 *              is returned otherwise.
 *
 ********************************************************************/

char	*bt_delete(ptree,key)
char	*ptree;
char	*key;
{
	BTNODE	*root;
	char	*retval;
	BTREE	tree;

	tree = (BTREE) ptree;

	/* Do no deletion if tree is empty */
	root = tree->root;
	if (root->nkeys == 0) return NULL;

	/* Delete the key */
	retval = bt_delNode(tree,root,key);

	/*
	 * Delete the root if it is empty, yet its children are not.
	 * This happens after a weld on the last key in the root.
	 */
	if ((root->nkeys == 0) && (root->children != NULL))
	{
		tree->root = root->children[0];
		bt_free(root->keys);
		bt_free(root->children);
		bt_free(root);
		tree->capacity -= tree->order - 1;
		tree->root->parent = NULL;
	}
	if (retval != NULL)
	{
		tree->contains--;
		tree->currNode = NULL;
	}
	return retval;
}
SHAR_EOF
fi
if test -f 'btdestroy.c'
then
	echo shar: "will not over-write existing file 'btdestroy.c'"
else
cat << \SHAR_EOF > 'btdestroy.c'
/********************************************************************
 *
 * btdestroy.c -- This function contains functions needed to deallocate
 *                an in-memory B-tree in its entirety.
 *
 ********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/********************************************************************
 *
 * bt_freeNode -- This function frees a node in a b-tree.  It is
 *              provided with a function that frees each key in the
 *              node also.  The free_key function has the following
 *              protocol:
 *
 *			free_key(key)
 *			char *key;
 *
 *              If NULL is passed as the free_key function, the node
 *              is freed, but no action is taken for the keys.
 *
 ********************************************************************/

static void bt_freeNode(node,free_key)
BTNODE	*node;
void	(*free_key)();
{
	int	i;

	if (node->children != NULL)
	{
		for (i = 0; i <= node->nkeys; i++)
		{
			bt_freeNode(node->children[i],free_key);
		}
		bt_free(node->children);
	}
	if (free_key != NULL)
	{
		for (i = 0; i < node->nkeys; i++)
		{
			(*free_key)(node->keys[i]);
		}
	}
	bt_free(node->keys);
	bt_free(node);
	return;
}

/********************************************************************
 *
 * bt_destroy -- This function frees a b-tree structure.  It is also
 *               passed a function that frees the keys contained by the
 *               tree.  The free_key function has the same calling
 *               protocol as the free_key function passed to bt_freeNode
 *               above.
 *
 ********************************************************************/

void bt_destroy(ptree,free_key)
char	*ptree;
void	(*free_key)();
{
	BTREE	tree;

	tree = (BTREE) ptree;
	bt_freeNode(tree->root,free_key);
	bt_free(tree);
	return;
}
SHAR_EOF
fi
if test -f 'btdump.c'
then
	echo shar: "will not over-write existing file 'btdump.c'"
else
cat << \SHAR_EOF > 'btdump.c'
/*********************************************************************
 *
 * btdump.c -- This function contains functions needed to display
 *             the contents of an in-memory B-tree, passing each key
 *             to a callback function.  This file contains non-portable
 *             code that is intended to help debug the B-tree
 *             implementation.
 *
 *********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/*********************************************************************
 *
 * bt_dumpNode -- This function displays the contents of a node and
 *                then recursively displays its children.  Note
 *                that this function assumes that a pointer to a node
 *                is the same size as an integer.
 *
 *********************************************************************/

void bt_dumpNode(node,key_dump)
BTNODE	*node;
void	(*key_dump)();
{
	int	i;

	if (node == NULL)
	{
		printf("ERROR:  null node pointer\n");
		return;
	}
	printf("%08x:  %d keys (keys %08x, children %08x), currKey = %d,\n",
	       (int) node,node->nkeys,node->keys,node->children,node->currKey);
	printf("parent = %08x\n", (int) node->parent);
	printf("--------\n");
	for (i = 0; i < node->nkeys; i++)
	{
		if (node->children != NULL)
		{
			printf("    %08x\n",node->children[i]);
		}
		else
		{
			printf("    %08x\n",0);
		}
		if (key_dump != NULL)
		{
			printf("        ");
			(*key_dump)(node->keys[i]);
		}
	}
	if (node->children != NULL)
	{
		printf("    %08x\n",node->children[i]);
		for (i = 0; i <= node->nkeys; i++)
		{
			bt_dumpNode(node->children[i],key_dump);
		}
	}
	else
	{
		printf("    %08x\n",0);
	}

	fflush(stdout);
	return;
}

/*********************************************************************
 *
 * bt_dump -- This function displays the contents of a B-tree.  The
 *            caller passes in a function that displays the contents
 *            of one of the keys stored in the tree.  The calling
 *            protocol for this function is:
 *
 *                       key_dump(key)
 *                       char *key;
 *
 *            Also, if key_dump is NULL, no action is taken.  This is
 *            useful if the structure of the tree is desired without
 *            including its contents.
 *
 *********************************************************************/

void bt_dump(ptree,key_dump)
char	*ptree;
void	(*key_dump)();
{
	BTREE	tree;

	tree = (BTREE) ptree;
	printf("B-tree dump:\n\n");
	printf("order = %d\n",tree->order);
	printf("capacity = %d\n",tree->capacity);
	printf("contains %d keys\n",tree->contains);
	printf("efficiency is %d per cent\n",
	       (tree->contains*100)/tree->capacity);
	printf("handle at %08x\n",tree);
	printf("root  = %08x\n\n",(int) tree->root);
	printf("currNode  = %08x\n\n",(int) tree->currNode);
	bt_dumpNode(tree->root,key_dump);
	return;
}
SHAR_EOF
fi
if test -f 'btfirst.c'
then
	echo shar: "will not over-write existing file 'btfirst.c'"
else
cat << \SHAR_EOF > 'btfirst.c'
/********************************************************************
 *
 * btfirst.c -- This file contains functions to locate the key
 *              falling first in the lexical order of items stored
 *              in an in-memory B-tree.
 *
 ********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/********************************************************************
 *
 * bt_first -- This function returns the key that appears first in
 *             the lexical order of the items stored in the tree.
 *             NULL is returned if the tree is empty.
 *
 *******************************************************************/

char *bt_first(tree)
BTREE	tree;
{
	BTNODE	*node;

	if (tree == NULL) return NULL;
	node = tree->root;
	if (node->nkeys == 0) return NULL;
	while (node->children != NULL)
	{
		node->currKey = 0;
		node = node->children[0];
	}
	node->currKey = 0;
	tree->currNode = node;
	tree->nextOk = 1;
	return node->keys[0];
}
SHAR_EOF
fi
if test -f 'btinsert.c'
then
	echo shar: "will not over-write existing file 'btinsert.c'"
else
cat << \SHAR_EOF > 'btinsert.c'
/*********************************************************************
 * 
 * btinsert.c -- This file contains functions needed to insert new
 *               items into an in-memory B-tree.
 *
 *********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/*********************************************************************
 *
 * bt_insertKey -- This function inserts a key into a specified node at
 *                 a specified location in its key array.  0 is returned
 *                 if the node is already full, 1 otherwise.
 *
 *********************************************************************/

static int bt_insertKey(node,key,n,order)
BTNODE	*node;
char	*key;
int	n;
int	order;
{
	int	i;

	/* Return FALSE if insertion is not possible */
	if (node->nkeys >= order - 1)
	{
		return 0;
	}

	/* Shift keys to make room, then insert new key */
	for (i = node->nkeys - 1; i >= n; i--)
	{
		node->keys[i+1] = node->keys[i];
	}
	node->keys[n] = key;
	node->nkeys++;
	return 1;
}

/*********************************************************************
 *
 * bt_burst -- Given a node and an index into its key and child arrays,
 *             this function splits the specified child node in half,
 *             elevating the key at the child's split point into this
 *             specified node.  0 is returned if the child array index
 *             is out of range, or the node is full, or malloc fails;
 *             1 is returned otherwise.
 *
 *********************************************************************/

static int bt_burst(tree,node,n)
BTREE	tree;
BTNODE	*node;
int	n;
{
	BTNODE	*new;
	int	i;
	int	m;
	BTNODE	*left;
	int	lkeys;

	/* Test parameters */
	if (((node->nkeys > 0) && (node->nkeys >= tree->order - 1)) ||
	    (n < 0))
	{
		return 0;
	}
	/* Create a new node */
	new = (BTNODE*) bt_malloc(sizeof(BTNODE));
	if (new != NULL)
	{
		new->parent = node;

		/* Allocate new key array */
		new->keys = (char**) bt_malloc(tree->order*sizeof(char*));
		if (new->keys != NULL)
		{
			/* Calculate needed partial results */
			m = tree->order / 2;
			left = node->children[n];
			lkeys = left->nkeys;

			/* Allocate new child array */
			if (left->children != NULL)
			{
				new->children =
				       (BTNODE**) bt_malloc((tree->order + 1) * 
				                            sizeof(BTNODE*));
				if (new->children != NULL)
				{
					for (i = 0; i <= lkeys - m; i++)
					{
						new->children[i] =
						         left->children[m + i];
						new->children[i]->parent = new;
					}
				}
				else
				{
					bt_free(new->keys);
					bt_free(new);
					return 0;
				}
			}
			else
			{
				new->children = NULL;
			}

			/* Split keys */
			for (i = 0; i < lkeys - m; i++)
			{
				new->keys[i] = left->keys[m + i];
			}
			new->nkeys = lkeys - m;
			left->nkeys = m - 1;

			/* Elevate the key where the split was made */
			for (i = node->nkeys; i > n; i--)
			{
				node->keys[i] = node->keys[i-1];
				node->children[i+1] = node->children[i];
			}
			node->children[n+1] = new;
			node->keys[n] = left->keys[m - 1];
			node->nkeys++;
			tree->capacity += tree->order - 1;
			return 1;
		}
		/* Failed to allocate new child array, free node */
		bt_free(new);
	}
	/* Failed to allocate new node, return FALSE */
	return 0;
}

/*********************************************************************
 *
 * bt_rebalance -- This function tries to rebalance the two adjacent
 *                 nodes.  This is needed when an insertion is attempted
 *                 into a node that is already full but its neighbors are
 *                 not, or when a deletion is made and the node is less
 *                 than half-full.
 *
 *********************************************************************/

static bt_rebalance(node,n,order)
BTNODE	*node;
int	n;
int	order;
{
	int	res;

	if (node->children[n]->nkeys < (order-1)/2)
	{
		res = bt_rotateLeft(node,n,order);
	}
	else
	if ((n < node->nkeys) && (node->children[n+1]->nkeys < (order-1)/2))
	{
		res = bt_rotateRight(node,n,order);
	}
	return;
}

/*********************************************************************
 *
 * bt_insertNode -- This function performs a recursive-descent of a b-tree,
 *                  inserting a new key in a leaf node.  If the leaf is
 *                  full, the tree is reorganized to make room.  -1 is
 *                  returned if the insert fails due to a duplicate key.
 *                  0 is returned if the insert fails for some other
 *                  reason.  1 is returned if successful.
 *
 *********************************************************************/

static int bt_insertNode(tree,node,key)
BTREE	tree;
BTNODE	*node;
char	*key;
{
	int	i;
	int	res;

	/* Locate proper place for new key in node */
	i = bt_searchNode(node,key,tree->comp,&res);
	if (res) return -1;

	/* If no children, insert key in this node */
	if (node->children == NULL)
	{
		return bt_insertKey(node,key,i,tree->order);
	}

	/* Try to insert the new key in a child node */
	res = bt_insertNode(tree,node->children[i],key);

	/* Child is full, reorganize */
	if (res == 0)
	{
		/* Try rotating right */
		if ((res == 0) && (i < node->nkeys) &&
		    (node->children[i+1]->nkeys < tree->order - 2))
		{
			res = bt_rotateRight(node,i,tree->order);
		}

		/* Try rotating left */
		if ((res == 0) && (i > 0) &&
		    (node->children[i-1]->nkeys < tree->order - 2))
		{
			res = bt_rotateLeft(node,i - 1,tree->order);
		}

		/* Can't rotate, try bursting */
		if (res == 0) res = bt_burst(tree,node,i);
		if (res > 0)
		{
			res = bt_insertNode(tree,node,key);
			if (res > 0)
			{
				bt_rebalance(node,i,tree->order);
			}
		}
	}
	return res;
}

/*********************************************************************
 *
 * bt_insert -- This function inserts a new key into a b-tree.  -1 is
 *              returned if the insert fails due to a duplicate key.
 *              0 is returned if the insert fails for some other
 *              reason.  1 is returned if successful.
 *
 *********************************************************************/

int bt_insert(ptree,key)
char	*ptree;
char	*key;
{
	int	res;
	BTNODE	*new;
	BTNODE	*node;
	BTREE	tree;

	tree = (BTREE) ptree;

	/* Begin at root */
	node = tree->root;

	/* If root is empty, insert first key */
	if (node->nkeys == 0)
	{
		node->keys[0] = key;
		node->nkeys = 1;
		tree->contains = 1;
		tree->currNode = NULL;
		return 1;
	}

	/* Try inserting new key */
	res = bt_insertNode(tree,node,key);

	/* If 0 return, burst the root, rebalance, and try again */
	if (res == 0)
	{
		new = (BTNODE*) bt_malloc(sizeof(BTNODE));
		if (new != NULL)
		{
			new->children = (BTNODE**) bt_malloc((tree->order + 1) *
			                                  sizeof(BTNODE*));
			if (new->children != NULL)
			{
				new->keys = (char**) bt_malloc((tree->order) *
				                                sizeof(char*));
				if (new->keys != NULL)
				{
					new->nkeys = 0;
					new->children[0] = node;
					node->parent = new;
					tree->root = new;
					tree->capacity += tree->order - 1;
					res = bt_burst(tree,new,0);
					if (res > 0)
					{
						res=bt_insertNode(tree,new,key);
					}
					if (res > 0)
					{
						tree->contains++;
						bt_rebalance(new,0,tree->order);
						tree->currNode = NULL;
					}
					return res;
				}
				bt_free(new->children);
			}
			bt_free(new);
		}
	}
	if (res > 0)
	{
		tree->contains++;
		tree->currNode = NULL;
	}
	return res;
}
SHAR_EOF
fi
if test -f 'btmalloc.c'
then
	echo shar: "will not over-write existing file 'btmalloc.c'"
else
cat << \SHAR_EOF > 'btmalloc.c'
/**************************************************************
 *
 * btmalloc.c -- This file contains functions that augment the
 *               C runtime library memory allocator in order
 *               to aid debugging.
 *
 **************************************************************/

#include <stdio.h>

/* The following structure is used for debugging heap corruptions */

struct heapblk {
	char	*blk;
	int	freeNext;
};

struct heap_struct {
	int		allocSeq;
	int		freeLast;
	struct heapblk	blks[10000];
};

struct heap_struct btheap = {0,-1}; 


/*
 * The following code is used for allocating space on the heap.  When not
 * debugging heap corruptions, a macro aliases bt_malloc with malloc.
 */

char *bt_malloc(size)
unsigned	size;
{
	char	*temp;

	temp = (char*) malloc(size);
	btheap.blks[btheap.allocSeq].blk = temp;
	btheap.blks[btheap.allocSeq].freeNext = -2;
	btheap.allocSeq++;
	return temp;
}

/*
 * The following code is used for freeing memory on the heap.  If not
 * debugging heap corruptions, bt_free is aliased with free.
 */


bt_free(ptr)
char	*ptr;
{
	int	i;

	printf("Freeing heap storage at %x\n",ptr);
	for (i = 0; i < btheap.allocSeq; i++)
	{
		if (ptr == btheap.blks[i].blk)
		{
			if (btheap.blks[i].freeNext == -2)
			{
				btheap.blks[i].freeNext = btheap.freeLast;
				btheap.freeLast = i;
				free(ptr);
				return;
			}
			else
			{
				printf(
				  "ERROR: freeing %x which was already freed\n",
				  ptr);
				printf("Block\n");
				i = btheap.freeLast;
				while (i > 0)
				{
					printf("%x\n",btheap.blks[i].blk);
					i = btheap.blks[i].freeNext;
				}
				fflush(stdout);
				kill(getpid(),5);
			}
		}
	}
	printf("ERROR:  Freeing %x which has not been allocated\n",ptr);
	fflush(stdout);
	kill(getpid(),5);
}

/* The following verifies the integrity of the heap. */

int bt_malloc_verify()
{
	int	retval;
	int	once;
	int	i;

	retval = 1;
	once = 0;

#ifdef DEBUG
	for (i = 0; i < btheap.allocSeq; i++)
	{
		if (btheap.blks[i].freeNext == -2)
		{
			if (!once)
			{
				once = 1;
				printf("The following blocks are allocated:\n");
			}
			printf("%x\n",btheap.blks[i].blk);
		}
	}
	fflush(stdout);
#endif

#ifdef sun
	retval = malloc_verify();
#endif

	return retval;
}
SHAR_EOF
fi
if test -f 'btnew.c'
then
	echo shar: "will not over-write existing file 'btnew.c'"
else
cat << \SHAR_EOF > 'btnew.c'
/***********************************************************************
 *
 * btnew.c -- This file contains functions to create a new in-memory
 *            B-tree.
 *
 ***********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/***********************************************************************
 *
 * bt_new -- Given the number of children each node is allowed to have,
 *           and a pointer to a strcmp-like function that compares two
 *           keys, this function creates an empty b-tree structure.
 *
 ***********************************************************************/

char	*bt_new(order,comp)
int	order;
int	(*comp)();
{
	BTREE	tree;
	int	i;

	/* Validate parameters */
	if ((comp == NULL) || (order < 3)) return (char *) NULL;

	/* Allocate the handle */
	tree = (BTREE) bt_malloc(sizeof(struct btree));
	if (tree != NULL)
	{
		/* Allocate the root */
		tree->root = (BTNODE*) bt_malloc(sizeof(struct btnode));
		if (tree->root != NULL)
		{
			/* Initialize root */
			tree->order = order;
			tree->comp = comp;
			tree->root->nkeys = 0;
			tree->root->children = NULL;

			/* Allocate key array */
			tree->root->keys =
				(char**)bt_malloc((order - 1) * sizeof(char*));
			if (tree->root->keys != NULL)
			{
				/* Initialize keys */
				for (i = 0; i < order - 1; i++)
				{
					tree->root->keys[i] = NULL;
				}
				tree->root->nkeys = 0;
				tree->root->parent = NULL;
				tree->contains = 0;
				tree->capacity = order - 1;
				tree->currNode = NULL;
				return (char *) tree;
			}

			/* Failed to allocate keys, free tree */
			bt_free(tree->root);
		}
		/* Failed to allocate root, free handle */
		bt_free(tree);
	}
	/* Failed to allocate handle, return null (error) */
	return (char *) NULL;
}
SHAR_EOF
fi
if test -f 'btsearch.c'
then
	echo shar: "will not over-write existing file 'btsearch.c'"
else
cat << \SHAR_EOF > 'btsearch.c'
/********************************************************************
 * btsearch -- This file contains functions needed to locate an
 *             item in an in-memory B-tree.
 *
 ********************************************************************/

#include <stdio.h>
#include "btpriv.h"

/********************************************************************
 *
 * bt_searchNode -- This function searches a node for a key.  If the key
 *                  is found, its index in the node's key array is
 *                  returned, along with a flag indicating that the key
 *                  was found.  If the key is not found, the index of the
 *                  next higher key in the key array is returned, along
 *                  with a "not found" indication.
 *
 ********************************************************************/

static int bt_searchNode(node,target,comp,eq)
BTNODE	*node;		/* Node to be searched */
char	*target;	/* Key to search for */
int	(*comp)();	/* strcmp()-like comparison function */
int	*eq;		/* 0 if not found, non-zero otherwise */
{
	int	i;
	int	res;
	int	hi;
	int	lo;

	res = -1;

	/*
	 * Perform linear search of node contains 7 keys or less,
	 * otherwise perform binary search
	 */
	if (node->nkeys <= 7)
	{
		for (i = 0; i < node->nkeys; i++)
		{
			res = (*comp)(node->keys[i],target);
			if (res >= 0) break;
		}
	}
	else
	{
		lo = 0;
		hi = node->nkeys - 1;
		while ((lo <= hi) && (res != 0))
		{
			i = (lo + hi) / 2;
			res = (*comp)(node->keys[i],target);
			if (res < 0) lo = i + 1;
			else if (res > 0) hi = i - 1;
		}
		if (res < 0) i++;
	}
	*eq = (res == 0);	/* Indicate successful search */
	node->currKey = i + *eq - 1;
	return i;
}

/***********************************************************************
 *
 * bt_locate -- This function performs a recursive-descent search of
 *              a sub-tree for a key.  The key is returned if found,
 *              or NULL otherwise.
 *
 ***********************************************************************/

static char *bt_locate(tree,node,target,comp)
BTREE	tree;
BTNODE	*node;
char	*target;
int	(*comp)();
{
	int	res;
	int	i;
	int	eq;

	i = bt_searchNode(node,target,comp,&eq);
	node->currKey = i;
	if (eq)
	{
		tree->currNode = node;
		tree->nextOk = 1;
		return node->keys[i];
	}
	else if (node->children == NULL)
	{
		tree->currNode = node;
		tree->nextOk = 0;
		return NULL;
	}
	else
	{
		return bt_locate(tree,node->children[i],target,comp);
	}
}

/***********************************************************************
 *
 * bt_search -- This function searches a tree for a specified key.  The
 *              key is returned if found, or NULL if not.
 *
 ***********************************************************************/

char *bt_search(tree,target)
BTREE	tree;
char	*target;
{
	if (tree == NULL) return NULL;
	return bt_locate(tree,tree->root,target,tree->comp);
}
SHAR_EOF
fi
if test -f 'bttraverse.c'
then
	echo shar: "will not over-write existing file 'bttraverse.c'"
else
cat << \SHAR_EOF > 'bttraverse.c'
/****************************************************************
 *
 * bttraverse.c -- This file contains functions needed to
 *                 traverse an in-memory B-tree, passing each
 *                 item stored there to a callback function.
 *
 ****************************************************************/

#include <stdio.h>
#include "btpriv.h"

/****************************************************************
 *
 * bt_visit -- This function is used while traversing a b-tree.
 *             It is called once for each node.  It calls some
 *             specified function once for each key in its
 *             key array, and calls itself once for each child
 *             in its child array.
 *
 ****************************************************************/

static void bt_visit(node,fn,parms)
BTNODE	*node;
void	(*fn)();
char	*parms;
{
	int	i;

	if (node == NULL) return;
	for (i = 0; i < node->nkeys; i++)
	{
		if (node->children != NULL)
		{
			bt_visit(node->children[i],fn,parms);
		}
		(*fn)(node->keys[i],parms);
	}
	if (node->children != NULL)
	{
		bt_visit(node->children[i],fn,parms);
	}
	return;
}

/****************************************************************
 *
 * bt_traverse -- This function traverses a b-tree, calling some
 *                specified function once for each key stored in
 *                it.  The specified function has the following
 *                protocol:
 *
 *			fn(key,parms)
 *			char *key;
 *			char *parms;
 *
 *                where key is a pointer to a key stored in the
 *                btree, and parms is the parameter structure
 *                passed by the caller.
 *
 *                The nodes are visited in descending order, if
 *                the comp function passed to bt_new is
 *                strcmp-like.
 *
 ****************************************************************/

void bt_traverse(ptree,fn,parms)
char	*ptree;
void	(*fn)();
char	*parms;
{
	BTREE	tree;

	tree = (BTREE) ptree;
	bt_visit(tree->root,fn,parms);
	return;
}
SHAR_EOF
fi
if test -f 'btutil.c'
then
	echo shar: "will not over-write existing file 'btutil.c'"
else
cat << \SHAR_EOF > 'btutil.c'
/********************************************************************
 *
 * btutil.c -- This function contains utility functions that are
 *             used for many features of an in-memory B-tree
 *             implementation.
 *
 ********************************************************************/

#define __BTUTIL_C__
#include <stdio.h>
#include "btpriv.h"

/********************************************************************
 *
 * bt_searchNode -- This function searches a node for a key.  If the key
 *                  is found, its index in the node's key array is
 *                  returned, along with a flag indicating that the key
 *                  was found.  If the key is not found, the index of the
 *                  next higher key in the key array is returned, along
 *                  with a "not found" indication.
 *
 ********************************************************************/

int bt_searchNode(node,target,comp,eq)
BTNODE	*node;		/* Node to be searched */
char	*target;	/* Key to search for */
int	(*comp)();	/* strcmp()-like comparison function */
int	*eq;		/* 0 if not found, non-zero otherwise */
{
	int	i;
	int	res;
	int	hi;
	int	lo;

	res = -1;

	/*
	 * Perform linear search of node contains 7 keys or less,
	 * otherwise perform binary search
	 */
	if (node->nkeys <= 7)
	{
		for (i = 0; i < node->nkeys; i++)
		{
			res = (*comp)(node->keys[i],target);
			if (res >= 0) break;
		}
	}
	else
	{
		lo = 0;
		hi = node->nkeys - 1;
		while ((lo <= hi) && (res != 0))
		{
			i = (lo + hi) / 2;
			res = (*comp)(node->keys[i],target);
			if (res < 0) lo = i + 1;
			else if (res > 0) hi = i - 1;
		}
		if (res < 0) i++;
	}
	*eq = (res == 0);	/* Indicate successful search */
	node->currKey = i + *eq - 1;
	return i;
}

/******************************************************************
 *
 * bt_rotateRight -- Given a node and an index into its key array, this
 *                   function rotates the node right at the specified
 *                   key.  This is used during insertion and deletion to
 *                   keep the tree balanced.  The return value is 0 if
 *                   the function fails, i.e. the node on the left
 *                   cannot lose keys, the node on the right cannot
 *                   gain keys, or the specified node is a leaf; 1 is
 *                   returned if successful.
 *
 ******************************************************************/

int bt_rotateRight(node,n,order)
BTNODE	*node;
int	n;
int	order;
{
	int	i;
	BTNODE	*right;
	BTNODE	*left;

	/* Test parameters */
	if ((n >= node->nkeys) || (n < 0))
	{
		return 0;
	}

	/* Point to children */
	left = node->children[n];
	right = node->children[n+1];

	/* Return FALSE if rotation is not possible */
	if ((left == NULL) ||
	    (left->nkeys <= order/2) ||
	    (right->nkeys >= order - 1))
	{
		return 0;
	}

	/* Shift the right child's keys */
	for (i = right->nkeys; i > 0; i--)
	{
		right->keys[i] = right->keys[i - 1];
	}

	/* Rotate keys */
	right->keys[0] = node->keys[n];
	node->keys[n] = left->keys[left->nkeys - 1];
	left->keys[left->nkeys - 1] = NULL;

	if (right->children != NULL)
	{
		/* Shift the right child's children */
		for (i = right->nkeys; i >= 0; i--)
		{
			right->children[i + 1] = right->children[i];
		}

		/* Rotate children */
		right->children[0] = left->children[left->nkeys];
		right->children[0]->parent = right;
		left->children[left->nkeys] = NULL;
	}

	/* Update key count and return TRUE */
	right->nkeys++;
	left->nkeys--;
	return 1;
}

/******************************************************************
 *
 * bt_rotateLeft -- Given a node and an index into its key array, this
 *                  function rotates the node left at the specified
 *                  key.  This is used during insertion and deletion to
 *                  keep the tree balanced.  The return value is 0 if
 *                  the function fails, i.e. the node on the right
 *                  cannot lose keys, the node on the left cannot
 *                  gain keys, or the specified node is a leaf; 1 is
 *                  returned when successful.
 *
 ******************************************************************/

int bt_rotateLeft(node,n,order)
BTNODE	*node;
int	n;
int	order;
{
	int	i;
	BTNODE	*right;
	BTNODE	*left;

	/* Test parameters */
	if ((n < 0) || (n >= node->nkeys))
	{
		return 0;
	}

	/* Point to children */
	right = node->children[n+1];
	left = node->children[n];

	/* Return FALSE if rotation is not possible */
	if ((left == NULL) ||
	    (right->nkeys <= order/2) ||
	    (left->nkeys >= order - 1))
	{
		return 0;
	}

	/* Rotate keys */
	left->keys[left->nkeys] = node->keys[n];
	node->keys[n] = right->keys[0];

	/* Update key counts */
	left->nkeys ++;
	right->nkeys --;

	/* Shift the right child's keys */
	for (i = 0; i < right->nkeys; i++)
	{
		right->keys[i] = right->keys[i+1];
	}
	right->keys[i] = NULL;

	if (left->children != NULL)
	{
		/* Rotate children */
		left->children[left->nkeys] = right->children[0];
		left->children[left->nkeys]->parent = left;

		/* Shift the right child's children */
		for (i = 0; i <= right->nkeys; i++)
		{
			right->children[i] = right->children[i+1];
		}
		right->children[i+1] = NULL;
	}

	/* Return TRUE */
	return 1;
}
SHAR_EOF
fi
if test -f 'test.c'
then
	echo shar: "will not over-write existing file 'test.c'"
else
cat << \SHAR_EOF > 'test.c'
#include <stdio.h>
#include "btree.h"

/* #define DEBUGGING */

#define	LOOPS	500
#define	ORDER	12

typedef struct key {
	int	key;
} KEY;

KEY	savedKeys[LOOPS];

dump(key)
KEY	*key;
{
	if (key == NULL) printf("********NULL**********\n");
	else printf("%x: %d\n",key,key->key);
}

visit(key,parms)
KEY	*key;
char	*parms;
{
	dump(key);
}

comp(k1,k2)
KEY	*k1,*k2;
{
	return (k1->key - k2->key);
}

main()
{
	BTREE	tree;
	KEY	*key;
	int	i;
	int	res;
	KEY	*key1;
	KEY	*key2;
	KEY	*key3;
	KEY	*key4;
	KEY	tkey;

#ifdef DEBUGGING
#ifdef sun
	malloc_debug(2);
#endif
#endif
	tree = (BTREE)bt_new(ORDER,comp);
	if (tree == NULL)
	{
		printf("Failed to create tree\n");
		exit(1);
	}
	(void) bt_malloc_verify();

	for (i = 0; i < LOOPS; i++)
	{
		printf("Iteration %d\n",i+1);
		key = &savedKeys[i];
		key->key = rand();
		printf("Inserting %d\n",key->key);
		fflush(stdout);
		res = bt_insert(tree,key);
		if (res > 0)
		{
			printf("Inserted %d\n",key->key);
		}
		else if (res < 0)
		{
			printf("Duplicate key %d\n",key->key);
		}
		else
		{
			printf("Error inserting key %d\n");
			exit(3);
		}
		printf("Trying duplicate\n");
		res = bt_insert(tree,key);
		if (res < 0)
		{
			printf("Failed to insert duplicate\n");
		}
		else if (res == 0)
		{
			printf("Error inserting duplicate\n");
			exit(4);
		}
		else
		{
			printf("Succeeded in inserting duplicate\n");
			exit(5);
		}
#if 0
#ifdef DEBUGGING
		printf("Contents of tree:\n");
		bt_dump(tree,dump);
#endif
#endif 0
	}
	printf("Finished creating the tree.\n");
	printf("Contents of tree follow:\n");
	bt_traverse(tree,visit,NULL);
	for (i = 0; i < LOOPS; i++)
	{
		printf("Iteration %d\n",i);
		printf("Seeking %d at %x\n",savedKeys[i].key,&savedKeys[i]);
		key = (KEY*)bt_search(tree,&savedKeys[i]);
		if (key != &savedKeys[i])
		{
			printf("Can't find %d, returned %x\n",savedKeys[i].key,
			       key);
		}
		printf("Freeing %d at %x\n",savedKeys[i].key,&savedKeys[i]);
		fflush(stdout);
		key = (KEY*)bt_delete(tree,&savedKeys[i]);
		if (key != &savedKeys[i])
		{
			printf("Problem freeing %d at %x, returned %x\n",
			       savedKeys[i].key,&savedKeys[i].key,key);
		}
		else
		{
			printf("Freed %d, returned %x\n",savedKeys[i].key,key);
		}
#ifdef DEBUGGING
	printf("Contents of tree:\n");
	bt_dump(tree,dump);
	fflush(stdout);
#endif 
		key = (KEY*)bt_search(tree,&savedKeys[i]);
		if (key == &savedKeys[i])
		{
			printf("Found %d after deletion\n",savedKeys[i].key);
		}
		key = (KEY*)bt_delete(tree,&savedKeys[i]);
		if (key != NULL)
		{
			printf("%d remained in tree after deletion at %x\n",
			       savedKeys[i].key,key);
		}
#if 0
#ifdef DEBUGGING
		printf("Contents of tree:\n");
		bt_dump(tree,dump);
#endif
#endif 0
	}
	(void) bt_malloc_verify();
	printf("Freeing tree\n");
	bt_destroy(tree,NULL);
	printf("Finished freeing tree\n");
	printf("Malloc verify:  ");
	if (bt_malloc_verify())
	{
		printf("success\n");
		fflush(stdout);
	}
	else
	{
		printf("failure\n");
		fflush(stdout);
		exit(6);
	}

	printf("Creating second tree\n");
	tree = (BTREE)bt_new(ORDER,comp);
	if (tree == NULL)
	{
		printf("Failed to create tree\n");
		exit(7);
	}
	(void) bt_malloc_verify();
	printf("Contents of tree:\n");
	bt_dump(tree,dump);

	for (i = 0; i < LOOPS; i++)
	{
		printf("Iteration %d\n",i+1);
		key = &savedKeys[i];
		key->key = rand();
		printf("Inserting %d\n",key->key);
		fflush(stdout);
		res = bt_insert(tree,key);
		if (res > 0)
		{
			printf("Inserted %d\n",key->key);
		}
		else if (res < 0)
		{
			printf("Duplicate key %d\n",key->key);
		}
		else
		{
			printf("Error inserting key %d\n");
			exit(8);
		}
#if 0
#ifdef DEBUGGING
		printf("Contents of tree:\n");
		bt_dump(tree,dump);
#endif
#endif 0
	}
	printf("Finished creating the tree.\n");
	printf("Contents of tree follow:\n");
	bt_traverse(tree,visit,NULL);
	printf("Contents of tree:\n");
	bt_dump(tree,dump);

	printf("Testing first and next\n");
	key = (KEY *) bt_first(tree);
	while (key != NULL)
	{
		dump(key);
		key = (KEY *) bt_next(tree);
	}
	printf("Backing up\n");
	key = (KEY *) bt_prev(tree);
	dump(key);

	printf("Testing last and prev\n");
	key = (KEY *) bt_last(tree);
	while (key != NULL)
	{
		dump(key);
		key = (KEY *) bt_prev(tree);
	}
	printf("Stepping forward\n");
	key = (KEY *) bt_next(tree);
	dump(key);

	printf("Testing search\n");
	key1 = (KEY *) bt_first(tree);
	key2 = (KEY *) bt_next(tree);
	key4 = (KEY *) bt_last(tree);
	key3 = (KEY *) bt_prev(tree);
	dump(key1);
	dump(key2);
	dump(key3);
	dump(key4);

	printf("Testing search at beginning\n");
	key = (KEY *) bt_search(tree,key1);
	if (key == key1)
	{
		printf("Successful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == key2) printf("next okay\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,key1);
		key = (KEY *) bt_prev(tree);
		if (key == NULL) printf("prev Ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Successful search failed\n");
	}

	printf("Testing search at end\n");
	key = (KEY *) bt_search(tree,key4);
	if (key == key4)
	{
		printf("Successful search okay\n");
		key = (KEY *) bt_prev(tree);
		if (key == key3) printf("prev okay\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,key4);
		key = (KEY *) bt_next(tree);
		if (key == NULL) printf("next Ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Successful search failed\n");
	}

	printf("Testing first unsuccessful key\n");
	tkey.key = key1->key + 1;
	key = (KEY *) bt_search(tree,&tkey);
	if (key == NULL)
	{
		printf("Unsuccessful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == key2) printf("next ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,&tkey);
		key = (KEY *) bt_prev(tree);
		if (key == key1) printf("prev ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Unsuccessful search failed\n");
	}

	printf("Testing second unsuccessful key\n");
	tkey.key = key4->key - 1;
	key = (KEY *) bt_search(tree,&tkey);
	if (key == NULL)
	{
		printf("Unsuccessful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == key4) printf("next ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,&tkey);
		key = (KEY *) bt_prev(tree);
		if (key == key3) printf("prev ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Unsuccessful search failed\n");
	}

	printf("Testing third unsuccessful key\n");
	tkey.key = key1->key - 1;
	key = (KEY *) bt_search(tree,&tkey);
	if (key == NULL)
	{
		printf("Unsuccessful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == key1) printf("next ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,&tkey);
		key = (KEY *) bt_prev(tree);
		if (key == NULL) printf("prev ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Unsuccessful search failed\n");
	}

	printf("Testing fourth unsuccessful key\n");
	tkey.key = key4->key + 1;
	key = (KEY *) bt_search(tree,&tkey);
	if (key == NULL)
	{
		printf("Unsuccessful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == NULL) printf("next ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,&tkey);
		key = (KEY *) bt_prev(tree);
		if (key == key4) printf("prev ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Unsuccessful search failed\n");
	}

	printf("Freeing 3/4 of the keys\n");
	for (i = 0; i < 3 * LOOPS / 4; i++)
	{
		printf("Iteration %d\n",i+1);
		key = &savedKeys[i];
		key1 = (KEY*) bt_delete(tree,key);
		if (key1 == key) printf("Deletion okay\n");
		else printf("Deletion error, wanted %x, got %x\n",key,key1);
		fflush(stdout);
#if 0
#ifdef DEBUGGING
		printf("Contents of tree:\n");
		bt_dump(tree,dump);
#endif
#endif 0
	}
	printf("Finished deleting keys.\n");
	printf("Contents of tree follow:\n");
	bt_traverse(tree,visit,NULL);
	printf("Contents of tree:\n");
	bt_dump(tree,dump);

	printf("Testing first and next\n");
	key = (KEY *) bt_first(tree);
	while (key != NULL)
	{
		dump(key);
		key = (KEY *) bt_next(tree);
	}
	printf("Backing up\n");
	key = (KEY *) bt_prev(tree);
	dump(key);

	printf("Testing last and prev\n");
	key = (KEY *) bt_last(tree);
	while (key != NULL)
	{
		dump(key);
		key = (KEY *) bt_prev(tree);
	}
	printf("Stepping forward\n");
	key = (KEY *) bt_next(tree);
	dump(key);

	printf("Testing search\n");
	key1 = (KEY *) bt_first(tree);
	key2 = (KEY *) bt_next(tree);
	key4 = (KEY *) bt_last(tree);
	key3 = (KEY *) bt_prev(tree);
	dump(key1);
	dump(key2);
	dump(key3);
	dump(key4);

	printf("Testing search at beginning\n");
	key = (KEY *) bt_search(tree,key1);
	if (key == key1)
	{
		printf("Successful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == key2) printf("next okay\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,key1);
		key = (KEY *) bt_prev(tree);
		if (key == NULL) printf("prev Ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Successful search failed\n");
	}

	printf("Testing search at end\n");
	key = (KEY *) bt_search(tree,key4);
	if (key == key4)
	{
		printf("Successful search okay\n");
		key = (KEY *) bt_prev(tree);
		if (key == key3) printf("prev okay\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,key4);
		key = (KEY *) bt_next(tree);
		if (key == NULL) printf("next Ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Successful search failed\n");
	}

	printf("Testing first unsuccessful key\n");
	tkey.key = key1->key + 1;
	key = (KEY *) bt_search(tree,&tkey);
	if (key == NULL)
	{
		printf("Unsuccessful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == key2) printf("next ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,&tkey);
		key = (KEY *) bt_prev(tree);
		if (key == key1) printf("prev ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Unsuccessful search failed\n");
	}

	printf("Testing second unsuccessful key\n");
	tkey.key = key4->key - 1;
	key = (KEY *) bt_search(tree,&tkey);
	if (key == NULL)
	{
		printf("Unsuccessful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == key4) printf("next ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,&tkey);
		key = (KEY *) bt_prev(tree);
		if (key == key3) printf("prev ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Unsuccessful search failed\n");
	}

	printf("Testing third unsuccessful key\n");
	tkey.key = key1->key - 1;
	key = (KEY *) bt_search(tree,&tkey);
	if (key == NULL)
	{
		printf("Unsuccessful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == key1) printf("next ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,&tkey);
		key = (KEY *) bt_prev(tree);
		if (key == NULL) printf("prev ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Unsuccessful search failed\n");
	}

	printf("Testing fourth unsuccessful key\n");
	tkey.key = key4->key + 1;
	key = (KEY *) bt_search(tree,&tkey);
	if (key == NULL)
	{
		printf("Unsuccessful search okay\n");
		key = (KEY *) bt_next(tree);
		if (key == NULL) printf("next ok\n");
		else
		{
			printf("next bad\n");
			dump(key);
		}
		key = (KEY *) bt_search(tree,&tkey);
		key = (KEY *) bt_prev(tree);
		if (key == key4) printf("prev ok\n");
		else
		{
			printf("prev bad\n");
			dump(key);
		}
	}
	else
	{
		printf("Unsuccessful search failed\n");
	}

	printf("Freeing tree\n");
	bt_destroy(tree,NULL);
	printf("Finished freeing tree\n");
	printf("Malloc verify:  ");
	if (bt_malloc_verify()) printf("success\n");
	else printf("failure\n");
	fflush(stdout);

	exit(0);
}
SHAR_EOF
fi
exit 0
#	End of shell archive
-- 
Paul Sander        (408) 734-9822  | "Passwords are like underwear," she said,
paul@Atherton.COM                  | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul  | -- Bennett Falk in "Mom Meets Unix"

rwallace@vax1.tcd.ie (08/19/90)

In article <28772@athertn.Atherton.COM>, paul@athertn.Atherton.COM (Paul Sander) writes:
>>> There are a number of features missing from this implementation that could
>>> be useful.  These include Anders' secondary storage capability (which is
>>> what B-trees are used most for), perhaps with multi user locking.  An
>>> implementation that uses shared memory, again with multi-user
>>> locking might be useful.  Still another that is implemented as a server
>>> that could allow data to be shared across machines would be useful.
>>
>>As a matter of fact I think the module already has too much functionality.
> 
> These additional features need not be added to this particular implementation.
> Most of the new features in fact cannot be implemented with same the interfaces
> as the example.  Here, the programmer would have to choose the implementation
> that best suits their needs.
>
As far as I can see, what you're saying is that packages shouldn't' be made to
do
everything, they should be made expandable so that people can add the features
they need. I agree with you there. 
>>                                                                           The
>>main reason I don't reuse software components (and I program in C++ which
>>combines the advantages of being the best OOP/software engineering language
>>around with the vast range of available C code) is that people who write
>>generic modules put in everything expect the kitchen sink.
> 
> It is true that many such components include many features, more than most
> clients need.  There is a utility/size trade-off made.  However, if the
> component is carefully designed, and the development environment is decent,
> the size impact can be minimized; the linker and librarian would omit the
>
 unused code from the executable image.
Some of it but by no means all. Look at the source code for any reusable
component written by someone else. Think about a typical use, e.g. a screen
package for accounting software: it will have facilities for handling movable
resizable windows in graphics or text mode, with handling of carriage return /
linefeed characters, beep characters, probably ANSI codes to boot and most of
this will be in the core code. The linker can do a very limited amount to
eliminate unused code.
> 
>>                                                           At the moment I'm
>>working on a project which requires screen handling, file handling via indexes,
>>and file and in-memory sort functions. There are commercially available
>>packages available to do all these things. I ignored them and wrote my own for
>>two reasons:
>>
>>1. The commercial packages contained orders of magnitude more code than I need
>>for this project (a fourth-generation language for writing accounting
>>software). Their use would have expanded the size of applications produced with
>>the 4GL from around 100K to an estimated 400K (!!!).
>>
>>2. To wade through all the junk and figure out how to make the commercial
>>packages actually work would have taken nearly as long as writing my own code.
>>
>>The first problem can be solved by what I hope will be the next generation of
>>super-optimizing compilers that can examine all modules of a project
>>simultaneously and eliminate code that will never be used e.g.
>>
>>main () { printf ("Hello, World!\n"); }
>>
>>would end up as 100 bytes because the compiler would eliminate all the code
>>from printf to handle numbers etc.
> 
> These super-optimizing compilers are actually performing global analyses to
> use the CPU resources more efficiently;  allocating registers in a manner that
> is efficient for the program as a whole (as opposed to a manner that is
> efficient for a loop or function) is one such benefit, but not the only one.
> If the library is designed well, a new compiler isn't needed to dump the
> unused code.  On the other hand, the unused code is already there, waiting
> to be used should the need arise.
>
In the library yes. However for the reasons given above, with current compilers
it'll be in the executable as well, wasting space. 
> Printf is a special case because of its polymorphism.  Here, the compiler
> would have to recognize the printf call and parse the format string (assuming
> it is a constant pointer to a constant) to optimize the code.  Such
> optimizations wouldn't be done in many practical applications; national
> language support enabled applications come to mind here.
>

I don't agree. I think reusable software components could be useful in most
applications, but to produce acceptable efficiency such compilers need to come
into general use.

>>The second problem requires an attitude change.
> 
> And good documentation, with a good index.
> 
> How many projects like your current one have you done?  How much code have
> you (or others) rewritten?  Have you ever turned up bugs in related
> applications where you had to make the same bug fix in many places?  There
> are three big wins you get from using reusable packages:  Your software will
> (theoretically) be developed faster, because less needs to be written from
> scratch; Your software will (theoretically) be more bug free because it is
> built from (theoretically) bug free components; Your program will
> (theoretically) be easier to maintain because bug fixes need only be made in
> one place.  Some might even argue that your program would be "better"
> (faster, smaller) because (theoretically) the implementor was expert at
> implementing the thing at hand, and was able to make various optimizations
> you wouldn't happen to think of.  Finally, come might argue that your
> program is more portable, because the component has been ported to many
> platforms (malloc and printf are handy examples of this).
> 
> What do you thi

By and large the documentation for modern development software is excellent,
but no matter how good the documentation you're going to have to do some
experimentation and the more complicated the product the longer it'll take to
figure out how to use it.

Admittedly this is by far the biggest project I've ever worked on but my
answers to your points are as follows:

1. _The_ advantage of reusable software components is that it reduces software
development time, at least if the tendency to overcomplicate things is
controlled. As I explained, this advantage is becoming less and less relevant
as developers put more and more unnecessary junk into their products.

2. I can make my routines as bug free as a reusable component developer, and
consider: If there's a bug in one of my routines, I can fix it on the spot. If
there's a bug in a routine supplied by someone else to which I don't have the
source, I have to wait for an upgrade.

3. Two examples: First, screen display routines. My routines are much more
efficent than anyone elses because mine just transfer data direct to video RAM
while other people's routines have to cope with things like linefeed and beep
characters.

Second, sort routines. Admittedly the third party routine under consideration
was faster than mine for merge sorts too big to fit into memory, because they'd
spent a lot of time optimizing the algorithm. However for sorts that fit into
memory (almost all in the application in question) my routine is faster because
there's less junk in it.

To answer your last point: malloc() and printf() are part of the language
standard. Third party packages may only be available on some systems. This
project is being developed on MS-DOS with a view to porting to Unix. What if I
used a third-party routine not available on Unix? My own routines can be ported
to Unix in a few hours (at least in theory :-)).

From the binary tree package which was originally under discussion: I can't
remember most of what was in it but I think the facility for N-way trees
should be removed because it won't be used in most cases (none that I can think 
of) and it reduces efficiency. (After all the whole point of binary trees is
their efficiency compared to linked lists!).

Sorry if this posting is a bit garbled, I'm using terminal emulator softwar 
which doesn't work very well with this editor.

"To summarize the summary of the summary: people are a problem"
Russell Wallace, Trinity College, Dublin
rwallace@vax1.tcd.ie

edwards@IDA.ORG (Steve Edwards) (08/21/90)

Well, I've been listening to this discussion from the sidelines for a
while, but I finally decided to interject a few thoughts.

In article <6775.26cdc403@vax1.tcd.ie>, rwallace@vax1.tcd.ie writes:

> In the library yes. However for the reasons given above, with current
> compilers it'll [extra functionality] be in the executable as well,
> wasting space.

True.  Of course, one man's wasted space is another's investment in
future expandability.  Isn't writing a custom version of the component
to address this problem ``wasted effort''?  It all depends on your
perspective.

It is very important to know that there are many ``costs'' that come
along with reusable software.  The extra code it may insert into your
image, the extra execution time it may impose over a more optimized
version of the same module written for a specific application, the
extra complexity added to the interface by code and functionality
you don't need at the moment, and so on.  Tools (like the smart
linkers discussed in the above message) can help to reduce some of these
costs, but they certainly don't make the central difficulty go away.
Deciding whether to reuse a specific component or write a custom
version is a tradeoff--do the costs of reusing outweigh the benefits
you get from reusing (measured in savings, of course)?  I'll
discuss the benefits a little bit more below.

> I don't agree. I think reusable software components could be useful in most
> applications, but to produce acceptable efficiency such compilers need
> to come into general use.

Again, what is ``acceptable''?  The important point is that it's
_relative_ to the tradeoff mentioned above.  The efficiency penalty
is a cost.  What are the real efficiency requirements for the thing
your building?  Those requirements may drive strict boundaries on how
much inefficiency (in time, space, or whatever resource you choose to
name) you can ``accept.''  Then, its a matter of whether that efficiency
penalty buys you a ``useful'' amount of benefit from reusing.  Again,
``useful'' is relative as well.

> 1. _The_ advantage of reusable software components is that it reduces
> software development time, at least if the tendency to overcomplicate
> things is controlled. As I explained, this advantage is becoming less
> and less relevant as developers put more and more unnecessary junk
> into their products.

This is the central statement with which I disagree.  I think _way_ too
much emphasis has been placed on the development savings that reuse
offers.  In fact, the _real_ cost of developing software on a large
scale isn't development at all (particularly not the cost of writing
the code).  It's _maintenance_.

Sure, reuse allows you to ``share'' the cost of developing a component
with other people (particularly the guy who wrote it originally), and
you come out on the winning end of this deal.  But for large projects,
the maintenance cost really dwarfs the development effort (typical
estimates for large projects place maintenance at 70% of total life
cycle costs, sometimes more).  But reuse also lets you ``share'' the
cost of finding and repairing bugs (if all the right reuse library
mechanisms are in place, of course), and of performing maintenance.  This
sharing can be spread over all past, current, and future users.  Thus,
over time reusable components become much more robust and bug free because
of the accumulating investment in their maintenance.  And new users get
to share in the benefits of this investment.

Although many developers in smaller projects fail to see these
ramifications, I believe they are the most important benefits reuse
has to offer.  And more importantly, they have to be figured into the
``to reuse or not to reuse'' tradeoff before you decide whether or not
to reuse.

> 2. I can make my routines as bug free as a reusable component
> developer, and consider: If there's a bug in one of my routines, I
> can fix it on the spot. If there's a bug in a routine supplied by
> someone else to which I don't have the source, I have to wait for
> an upgrade.

_Given enough $$$$_, this is true.  The point is that reuse provides the
opportunity for all the reusers of a component (past and present) to,
in effect, pool their maintenance resources to get this level of quality.
While I'm sure you're capable of debugging a component as well as the
next guy, how much money can your project pour into debugging (during
development), and then how much more during maintenance?  Why is this
duplication of effort (if an acceptable component is available)
necessary?

It's always possible that for any given module, the tradeoff will swing
towards writing a custom version.  In fact, in the MSDOS world, where
efficiency and size are premium concerns and few good components are
available, this is likely to be the most common case today.  But you
have to actually think about everything involved and make an _educated_
tradeoff.  It's not good engineering to make that decision
``just because'' ...

Your second point, about availability of source code, is a different
issue.  It's specific to a given component and a given vendor, and
has to be taken into account when deciding whether to reuse
_that component_.

> 3. Two examples: First, screen display routines. My routines are much more
> efficent than anyone elses because mine just transfer data direct to video
> RAM while other people's routines have to cope with things like linefeed
> and beep characters.

Again, is this efficiency _required_? [I'm not asking this about your
particular application specifically, just trying to make a point]  If it
is, then that's an important factor in deciding whether to reuse.  If
_not_, then it _should not_ drive the decision alone!  Again, remember
that in the MSDOS world, things can be very different than in a different
application environment.  Just because certain requirements exist in
one doesn't mean those requirements (for image size, for example) are
universal.

Well, I hope this didn't sound too much like a flame--I didn't intend
it to be one.  I was just trying to provide a few more ideas for
discussion.

Since I haven't posted here recently, I'll also include just a little
bit about myself so that you know where my ideas are coming from.  I'm an
Ada programmer, although I programmed in C years before that.  I've
participated in many small/medium-size Ada projects (i.e., each about
50KSLOC), all of which included at least some reusable components. I'm
also currently working on a paper about software reuse in Ada, including
many of these ideas, as part of a government-sponsored research project.
Needless to say, these opinions are mine, and not my company's
(or our sponsor's).

		    -- Steve

-------------------------------
Stephen Edwards
Institute for Defense Analyses
5111 Leesburg Pike
Suite 300
Falls Church, VA 22041
(703)845-3536
edwards@ida.org

rwallace@vax1.tcd.ie (08/26/90)

In article <1990Aug21.144152.3513@IDA.ORG>, edwards@IDA.ORG (Steve Edwards) writes:
> In article <6775.26cdc403@vax1.tcd.ie>, rwallace@vax1.tcd.ie writes:
> 
>> In the library yes. However for the reasons given above, with current
>> compilers it'll [extra functionality] be in the executable as well,
>> wasting space.
> 
> True.  Of course, one man's wasted space is another's investment in
> future expandability.  Isn't writing a custom version of the component
> to address this problem ``wasted effort''?  It all depends on your
> perspective.
> 
> It is very important to know that there are many ``costs'' that come
> along with reusable software.  The extra code it may insert into your
> image, the extra execution time it may impose over a more optimized
> version of the same module written for a specific application, the
> extra complexity added to the interface by code and functionality
> you don't need at the moment, and so on.  Tools (like the smart
> linkers discussed in the above message) can help to reduce some of these
> costs, but they certainly don't make the central difficulty go away.
> Deciding whether to reuse a specific component or write a custom
> version is a tradeoff--do the costs of reusing outweigh the benefits
> you get from reusing (measured in savings, of course)?  I'll
> discuss the benefits a little bit more below.

True, and usually development time is more important than efficiency. However
in this case we're talking about a very small penalty in development time
(since it didn't take much longer to write my own routines than it would have
taken to figure out how to use third party ones) and a ridiculously huge
penalty in executable file size (rouch estimate is an increase from 100K to
400K, with corresponding losses in speed because the routines are slower and
you've got less memory to play with for disk caching etc.)

 

>> 1. _The_ advantage of reusable software components is that it reduces
>> software development time, at least if the tendency to overcomplicate
>> things is controlled. As I explained, this advantage is becoming less
>> and less relevant as developers put more and more unnecessary junk
>> into their products.
> 
> This is the central statement with which I disagree.  I think _way_ too
> much emphasis has been placed on the development savings that reuse
> offers.  In fact, the _real_ cost of developing software on a large
> scale isn't development at all (particularly not the cost of writing
> the code).  It's _maintenance_.

> 

> Sure, reuse allows you to ``share'' the cost of developing a component
> with other people (particularly the guy who wrote it originally), and
> you come out on the winning end of this deal.  But for large projects,
> the maintenance cost really dwarfs the development effort (typical
> estimates for large projects place maintenance at 70% of total life
> cycle costs, sometimes more).  But reuse also lets you ``share'' the
> cost of finding and repairing bugs (if all the right reuse library
> mechanisms are in place, of course), and of performing maintenance.  This
> sharing can be spread over all past, current, and future users.  Thus,
> over time reusable components become much more robust and bug free because
> of the accumulating investment in their maintenance.  And new users get
> to share in the benefits of this investment.
> 

Writing my own routines substantially improves maintenance time because I know
exactly what the routines do, they're custom written for this application and
the application code doesn't have to be forced into awkward contortions to
work with them, and most important they're much more reliable because they
contain much less code.
> 
>> 2. I can make my routines as bug free as a reusable component
>> developer, and consider: If there's a bug in one of my routines, I
>> can fix it on the spot. If there's a bug in a routine supplied by
>> someone else to which I don't have the source, I have to wait for
>> an upgrade.
> 
> _Given enough $$$$_, this is true.  The point is that reuse provides the
> opportunity for all the reusers of a component (past and present) to,
> in effect, pool their maintenance resources to get this level of quality.
> While I'm sure you're capable of debugging a component as well as the
> next guy, how much money can your project pour into debugging (during
> development), and then how much more during maintenance?  Why is this
> duplication of effort (if an acceptable component is available)
> necessary?
> 

Maintenance and debugging effort increase as something like the square or cube
of the number of lines of source code involved. My routines contain _orders of
magnitude_ less code than their third party equivalents (e.g. all my screen
handling routines for overlapping windows, data entry and verification,
scrolling etc. are only a few hundred lines of code; a normal third party
package would be at least several thousand). The effort of debugging my
routines is completely trivial because they're so simple. This improvement in
reliability is another reason why writing my own routines reduces maintenance
costs.

"To summarize the summary of the summary: people are a problem"
Russell Wallace, Trinity College, Dublin
rwallace@vax1.tcd.ie