paul@athertn.Atherton.COM (Paul Sander) (08/30/90)
Now that the "What is a reusable software component?" thread has died down,
how about starting a new topic about the issues involved in designing a good
one?
I can think of several that developers should consider; some of the more
obvious ones are: If it's a data structure, it should be able to store anything
a client can throw at it (ideally); its interface should be as simple as
possible; it should contain a minimal but complete set of operations, from
which other more complex operations should be built; it should be portable;
is the "reuse" done a compile time, link time, or run time? Other issues that
might not be as obvious are: Consideration of how the client might glue
several components together; how components might be catalogged and providing
an automatic way of installing the documentation into the appropriate browsing
and cross-referencing tools; how to flatten the learning curve for a set of
related components (i.e. designing interfaces so that knowledge of one
component can be applied to another); provide a way of automatically trimming
capabilities that aren't needed by particular clients.
Taking the B-tree example used in the previous thread, there are some
deficiencies in the interface that show up when it is used in a larger
data structure. Let's say, for example, that I wish to build a symbol
table, using the tree as an index for some identifiers whose symbols I wish
to store. It's easy to build the symbol table on top of the tree, but
suppose I find that the vast majority of calls to the symbol table are
random access, retrieving symbols, and the symbols never need to be sorted.
To improve performance, I'd like to replace the B-tree with a hash table,
but I don't want to rewrite the symbol table code. How can the symbol
table be designed to make it easy make the swap?
There are a number of ways I can think of to glue the stuff together:
- Write the symbol table so that it calls generic functions to access the
index structure. This makes the symbol table library totally separate
from the index, but the client must link an additional "wrapper" between
the symbol table and the index. Also, only one type of index may be used per
program; if I need two symbol tables, both must the same index. It also
weakens the client's option to simplify the linking process by creating
its own libraries.
- Write all suitable index structures with a uniform interface. This makes
it possible to swap index structures at link time, and eliminates the
additional wrapper that the client must provide. But it also means that
every developer of any kind of indexing structure must provide an interface
that can be used by the symbol table. It also means that implementation
details that may be exploited for performance are hidden from the client.
- Produce a different symbol table component for each index structure
available. Again, the client may easily swap out the indices; in fact, it
doesn't even need to know what kind of index is built in. But then the
symbol table code is needlessly replicated if two or more symbol tables
use different indices; there is also a higher likelihood of namespace
collisions here.
- Register callback functions with the symbol table that are called when the
index is accessed. Again, the client must provide appropriate wrappers,
but no strange things need to be done at link time. It also complicates
the symbol table interface, and additional components built on top of the
symbol table must pass the callbacks through to their interfaces.
Performance trade-offs are made in the wrappers, but they may need additional
global information to do the job (e.g. this symbol table uses a B-tree
of order m, that symbol table uses a hash table and uses hash function H1,
this other symbol table uses another hash table and uses hash function H2,
and so on). The global state may be in the form of nearly replicated
callback functions, or perhaps configuration variables that are set up
before the callbacks are invoked.
- Pass a configuration structure to all reusable components. This structure
may be different for each type of component, but any such structure must
be able to contain (or refer to) any other. Such structures may contain
B-tree node sizes, hash functions, pointers to functions that insert keys
into indices, etc. The calling interfaces to each component are simple
(only one additional parameter is added), and the linking process is
straightforward, but setting up the configuration structures may be
complicated. Several such structures may be needed for various calls to
a component (e.g. creation may require a different structure than traversal;
creation is done once, so the information stored in the structure can be
copied somehow, but the traversal configuration is much more volatile).
Which of these methods have people used in the past? Which have worked best,
and which have worked worst? Are there any others people have tried that
aren't listed here?
--
Paul Sander (408) 734-9822 | "Passwords are like underwear," she said,
paul@Atherton.COM | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"welch-l@bowling.cis.ohio-state.edu (Lonnie R Welch) (09/04/90)
In article <29597@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes: > >Now that the "What is a reusable software component?" thread has died down, >how about starting a new topic about the issues involved in designing a good >one? > >I can think of several that developers should consider; some of the more >obvious ones are: If it's a data structure, it should be able to store anything >a client can throw at it (ideally); its interface should be as simple as >possible; it should contain a minimal but complete set of operations, from >which other more complex operations should be built; it should be portable; >is the "reuse" done a compile time, link time, or run time? Other issues that >might not be as obvious are: Consideration of how the client might glue >several components together; how components might be catalogged and providing >an automatic way of installing the documentation into the appropriate browsing >and cross-referencing tools; how to flatten the learning curve for a set of >related components (i.e. designing interfaces so that knowledge of one >component can be applied to another); provide a way of automatically trimming >capabilities that aren't needed by particular clients. > >Taking the B-tree example used in the previous thread, there are some >deficiencies in the interface that show up when it is used in a larger >data structure. Let's say, for example, that I wish to build a symbol >table, using the tree as an index for some identifiers whose symbols I wish >to store. It's easy to build the symbol table on top of the tree, but >suppose I find that the vast majority of calls to the symbol table are >random access, retrieving symbols, and the symbols never need to be sorted. >To improve performance, I'd like to replace the B-tree with a hash table, >but I don't want to rewrite the symbol table code. How can the symbol >table be designed to make it easy make the swap? > >There are a number of ways I can think of to glue the stuff together: > >Which of these methods have people used in the past? Which have worked best, >and which have worked worst? Are there any others people have tried that >aren't listed here? The suggestions that you made seem reasonable. Here is another way to approach the problem: For each reusable component create two parts: (1) a formal specification (conceptualization) of its signature and of its behavior (2) one or more implementations (realizations) A component instance could be created by choosing a conceptualization and then specifying a realization. Both the conceptualization and the realization can be parameterized, so instantiation of a component would include fixing the parameters. E.g., a symbol table component could be specified by modelling the type it provides as a function from domain_type to range_type. The behavior of the operations on a variable of type symbol table could be specified in terms of the function. The conceptualization of the symbol table would be parameterized by the domain_type and the range_type. There would be (at least) two realizations of the component---one that uses a b-tree and the other that uses a hash table. One realization would be parameterized with an instance of the b-tree module, and the other would be parameterized with an instance of the module providing the hash function. To change representations of the symbol table, you would change the realization chosen where the component providing the type symbol table is instantiated. Multiple symbol tables could be used in a program, with each symbol table containing different types of items and with each implemented differently; this could be achieved by instantiating the symbol table component with different parameters and different realizations. I hope this is helpful. I don't know if there is currently a language that provides these capabilities, but if one did exist, I think that reuse would become easier. --Lonnie Welch