paul@athertn.Atherton.COM (Paul Sander) (08/30/90)
Now that the "What is a reusable software component?" thread has died down, how about starting a new topic about the issues involved in designing a good one? I can think of several that developers should consider; some of the more obvious ones are: If it's a data structure, it should be able to store anything a client can throw at it (ideally); its interface should be as simple as possible; it should contain a minimal but complete set of operations, from which other more complex operations should be built; it should be portable; is the "reuse" done a compile time, link time, or run time? Other issues that might not be as obvious are: Consideration of how the client might glue several components together; how components might be catalogged and providing an automatic way of installing the documentation into the appropriate browsing and cross-referencing tools; how to flatten the learning curve for a set of related components (i.e. designing interfaces so that knowledge of one component can be applied to another); provide a way of automatically trimming capabilities that aren't needed by particular clients. Taking the B-tree example used in the previous thread, there are some deficiencies in the interface that show up when it is used in a larger data structure. Let's say, for example, that I wish to build a symbol table, using the tree as an index for some identifiers whose symbols I wish to store. It's easy to build the symbol table on top of the tree, but suppose I find that the vast majority of calls to the symbol table are random access, retrieving symbols, and the symbols never need to be sorted. To improve performance, I'd like to replace the B-tree with a hash table, but I don't want to rewrite the symbol table code. How can the symbol table be designed to make it easy make the swap? There are a number of ways I can think of to glue the stuff together: - Write the symbol table so that it calls generic functions to access the index structure. This makes the symbol table library totally separate from the index, but the client must link an additional "wrapper" between the symbol table and the index. Also, only one type of index may be used per program; if I need two symbol tables, both must the same index. It also weakens the client's option to simplify the linking process by creating its own libraries. - Write all suitable index structures with a uniform interface. This makes it possible to swap index structures at link time, and eliminates the additional wrapper that the client must provide. But it also means that every developer of any kind of indexing structure must provide an interface that can be used by the symbol table. It also means that implementation details that may be exploited for performance are hidden from the client. - Produce a different symbol table component for each index structure available. Again, the client may easily swap out the indices; in fact, it doesn't even need to know what kind of index is built in. But then the symbol table code is needlessly replicated if two or more symbol tables use different indices; there is also a higher likelihood of namespace collisions here. - Register callback functions with the symbol table that are called when the index is accessed. Again, the client must provide appropriate wrappers, but no strange things need to be done at link time. It also complicates the symbol table interface, and additional components built on top of the symbol table must pass the callbacks through to their interfaces. Performance trade-offs are made in the wrappers, but they may need additional global information to do the job (e.g. this symbol table uses a B-tree of order m, that symbol table uses a hash table and uses hash function H1, this other symbol table uses another hash table and uses hash function H2, and so on). The global state may be in the form of nearly replicated callback functions, or perhaps configuration variables that are set up before the callbacks are invoked. - Pass a configuration structure to all reusable components. This structure may be different for each type of component, but any such structure must be able to contain (or refer to) any other. Such structures may contain B-tree node sizes, hash functions, pointers to functions that insert keys into indices, etc. The calling interfaces to each component are simple (only one additional parameter is added), and the linking process is straightforward, but setting up the configuration structures may be complicated. Several such structures may be needed for various calls to a component (e.g. creation may require a different structure than traversal; creation is done once, so the information stored in the structure can be copied somehow, but the traversal configuration is much more volatile). Which of these methods have people used in the past? Which have worked best, and which have worked worst? Are there any others people have tried that aren't listed here? -- Paul Sander (408) 734-9822 | "Passwords are like underwear," she said, paul@Atherton.COM | "Both should be changed often." {decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"
welch-l@bowling.cis.ohio-state.edu (Lonnie R Welch) (09/04/90)
In article <29597@athertn.Atherton.COM> paul@athertn.Atherton.COM (Paul Sander) writes: > >Now that the "What is a reusable software component?" thread has died down, >how about starting a new topic about the issues involved in designing a good >one? > >I can think of several that developers should consider; some of the more >obvious ones are: If it's a data structure, it should be able to store anything >a client can throw at it (ideally); its interface should be as simple as >possible; it should contain a minimal but complete set of operations, from >which other more complex operations should be built; it should be portable; >is the "reuse" done a compile time, link time, or run time? Other issues that >might not be as obvious are: Consideration of how the client might glue >several components together; how components might be catalogged and providing >an automatic way of installing the documentation into the appropriate browsing >and cross-referencing tools; how to flatten the learning curve for a set of >related components (i.e. designing interfaces so that knowledge of one >component can be applied to another); provide a way of automatically trimming >capabilities that aren't needed by particular clients. > >Taking the B-tree example used in the previous thread, there are some >deficiencies in the interface that show up when it is used in a larger >data structure. Let's say, for example, that I wish to build a symbol >table, using the tree as an index for some identifiers whose symbols I wish >to store. It's easy to build the symbol table on top of the tree, but >suppose I find that the vast majority of calls to the symbol table are >random access, retrieving symbols, and the symbols never need to be sorted. >To improve performance, I'd like to replace the B-tree with a hash table, >but I don't want to rewrite the symbol table code. How can the symbol >table be designed to make it easy make the swap? > >There are a number of ways I can think of to glue the stuff together: > >Which of these methods have people used in the past? Which have worked best, >and which have worked worst? Are there any others people have tried that >aren't listed here? The suggestions that you made seem reasonable. Here is another way to approach the problem: For each reusable component create two parts: (1) a formal specification (conceptualization) of its signature and of its behavior (2) one or more implementations (realizations) A component instance could be created by choosing a conceptualization and then specifying a realization. Both the conceptualization and the realization can be parameterized, so instantiation of a component would include fixing the parameters. E.g., a symbol table component could be specified by modelling the type it provides as a function from domain_type to range_type. The behavior of the operations on a variable of type symbol table could be specified in terms of the function. The conceptualization of the symbol table would be parameterized by the domain_type and the range_type. There would be (at least) two realizations of the component---one that uses a b-tree and the other that uses a hash table. One realization would be parameterized with an instance of the b-tree module, and the other would be parameterized with an instance of the module providing the hash function. To change representations of the symbol table, you would change the realization chosen where the component providing the type symbol table is instantiated. Multiple symbol tables could be used in a program, with each symbol table containing different types of items and with each implemented differently; this could be achieved by instantiating the symbol table component with different parameters and different realizations. I hope this is helpful. I don't know if there is currently a language that provides these capabilities, but if one did exist, I think that reuse would become easier. --Lonnie Welch