[comp.sw.components] How to design a reusable component?

paul@athertn.Atherton.COM (Paul Sander) (08/30/90)

Now that the "What is a reusable software component?" thread has died down,
how about starting a new topic about the issues involved in designing a good
one?

I can think of several that developers should consider; some of the more
obvious ones are: If it's a data structure, it should be able to store anything
a client can throw at it (ideally); its interface should be as simple as
possible; it should contain a minimal but complete set of operations, from
which other more complex operations should be built; it should be portable;
is the "reuse" done a compile time, link time, or run time? Other issues that
might not be as obvious are:  Consideration of how the client might glue
several components together; how components might be catalogged and providing
an automatic way of installing the documentation into the appropriate browsing
and cross-referencing tools; how to flatten the learning curve for a set of
related components (i.e. designing interfaces so that knowledge of one
component can be applied to another); provide a way of automatically trimming
capabilities that aren't needed by particular clients.

Taking the B-tree example used in the previous thread, there are some
deficiencies in the interface that show up when it is used in a larger
data structure.  Let's say, for example, that I wish to build a symbol
table, using the tree as an index for some identifiers whose symbols I wish
to store.  It's easy to build the symbol table on top of the tree, but
suppose I find that the vast majority of calls to the symbol table are
random access, retrieving symbols, and the symbols never need to be sorted.
To improve performance, I'd like to replace the B-tree with a hash table,
but I don't want to rewrite the symbol table code.  How can the symbol
table be designed to make it easy make the swap?

There are a number of ways I can think of to glue the stuff together:
- Write the symbol table so that it calls generic functions to access the
  index structure.  This makes the symbol table library totally separate
  from the index, but the client must link an additional "wrapper" between
  the symbol table and the index.  Also, only one type of index may be used per
  program; if I need two symbol tables, both must the same index.  It also
  weakens the client's option to simplify the linking process by creating
  its own libraries.

- Write all suitable index structures with a uniform interface.  This makes
  it possible to swap index structures at link time, and eliminates the
  additional wrapper that the client must provide.  But it also means that
  every developer of any kind of indexing structure must provide an interface
  that can be used by the symbol table.  It also means that implementation
  details that may be exploited for performance are hidden from the client.

- Produce a different symbol table component for each index structure
  available.  Again, the client may easily swap out the indices; in fact, it
  doesn't even need to know what kind of index is built in.  But then the
  symbol table code is needlessly replicated if two or more symbol tables
  use different indices; there is also a higher likelihood of namespace
  collisions here.

- Register callback functions with the symbol table that are called when the
  index is accessed.  Again, the client must provide appropriate wrappers,
  but no strange things need to be done at link time.  It also complicates
  the symbol table interface, and additional components built on top of the
  symbol table must pass the callbacks through to their interfaces.
  Performance trade-offs are made in the wrappers, but they may need additional
  global information to do the job (e.g.  this symbol table uses a B-tree
  of order m, that symbol table uses a hash table and uses hash function H1,
  this other symbol table uses another hash table and uses hash function H2,
  and so on).  The global state may be in the form of nearly replicated
  callback functions, or perhaps configuration variables that are set up
  before the callbacks are invoked.

- Pass a configuration structure to all reusable components.  This structure
  may be different for each type of component, but any such structure must
  be able to contain (or refer to) any other.  Such structures may contain
  B-tree node sizes, hash functions, pointers to functions that insert keys
  into indices, etc.  The calling interfaces to each component are simple
  (only one additional parameter is added), and the linking process is
  straightforward, but setting up the configuration structures may be
  complicated.  Several such structures may be needed for various calls to
  a component (e.g. creation may require a different structure than traversal;
  creation is done once, so the information stored in the structure can be
  copied somehow, but the traversal configuration is much more volatile).

Which of these methods have people used in the past?  Which have worked best,
and which have worked worst?  Are there any others people have tried that
aren't listed here?
-- 
Paul Sander        (408) 734-9822  | "Passwords are like underwear," she said,
paul@Atherton.COM                  | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul  | -- Bennett Falk in "Mom Meets Unix"

welch-l@bowling.cis.ohio-state.edu (Lonnie R Welch) (09/04/90)

In article <29597@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes:
>
>Now that the "What is a reusable software component?" thread has died down,
>how about starting a new topic about the issues involved in designing a good
>one?
>
>I can think of several that developers should consider; some of the more
>obvious ones are: If it's a data structure, it should be able to store anything
>a client can throw at it (ideally); its interface should be as simple as
>possible; it should contain a minimal but complete set of operations, from
>which other more complex operations should be built; it should be portable;
>is the "reuse" done a compile time, link time, or run time? Other issues that
>might not be as obvious are:  Consideration of how the client might glue
>several components together; how components might be catalogged and providing
>an automatic way of installing the documentation into the appropriate browsing
>and cross-referencing tools; how to flatten the learning curve for a set of
>related components (i.e. designing interfaces so that knowledge of one
>component can be applied to another); provide a way of automatically trimming
>capabilities that aren't needed by particular clients.
>
>Taking the B-tree example used in the previous thread, there are some
>deficiencies in the interface that show up when it is used in a larger
>data structure.  Let's say, for example, that I wish to build a symbol
>table, using the tree as an index for some identifiers whose symbols I wish
>to store.  It's easy to build the symbol table on top of the tree, but
>suppose I find that the vast majority of calls to the symbol table are
>random access, retrieving symbols, and the symbols never need to be sorted.
>To improve performance, I'd like to replace the B-tree with a hash table,
>but I don't want to rewrite the symbol table code.  How can the symbol
>table be designed to make it easy make the swap?
>
>There are a number of ways I can think of to glue the stuff together:
>
>Which of these methods have people used in the past?  Which have worked best,
>and which have worked worst?  Are there any others people have tried that
>aren't listed here?


The suggestions that you made seem reasonable. Here is another way to approach
the problem:

For each reusable component create two parts: 

	(1) a formal specification (conceptualization) of its signature 
	    and of its behavior

	(2) one or more implementations (realizations)


A component instance could be created by choosing a conceptualization
and then specifying a realization. Both the conceptualization and the
realization can be parameterized, so instantiation of a component
would include fixing the parameters.

E.g., a symbol table component could be specified by modelling the type it
provides as a function from domain_type to range_type. The behavior of the
operations on a variable of type symbol table could be specified in terms of
the function. 

The conceptualization of the symbol table would be parameterized by
the domain_type and the range_type.

There would be (at least) two realizations of the component---one that
uses a b-tree and the other that uses a hash table.  One realization
would be parameterized with an instance of the b-tree module, and the
other would be parameterized with an instance of the module providing
the hash function.

To change representations of the symbol table, you would change the
realization chosen where the component providing the type symbol table
is instantiated. Multiple symbol tables could be used in a program,
with each symbol table containing different types of items and with
each implemented differently; this could be achieved by instantiating
the symbol table component with different parameters and different
realizations.

I hope this is helpful. I don't know if there is currently a language
that provides these capabilities, but if one did exist, I think that
reuse would become easier.

--Lonnie Welch