paj@uk.co.gec-mrc (Paul Johnson) (04/11/91)
I would like to propose a guideline for designing abstract data types (ADTs). It is When writing an ADT, pick a minimum spanning set of operations and define all other operations in terms of that set. Then include a list of these operations in the documentation. ADTs are things like sets, lists, hash tables, arrays, matrices, and complex numbers, as opposed to the more concrete objects like airplanes or hire_cars. There is no sharp boundary, but some things have the ADT nature more than others. By `minimum spanning set' I mean the smallest set of operations necessary to support the functionality of the class. The justification for this rule is that subsequent reuse of a class may include a descendant which must implement new invarients. These may require the interception of operations on the instances in order to maintain these invarients. If a minimum spanning set of operations is used consistently then it is much easier to do this, since only the relevant routines in the spanning set need to be redefined. For example, consider the ISE list hierarchy. The deferred class LIST has only a few deferred methods. All the position features are defined in terms of `count' and `position', so `offright' returns True if `position > count'. Movement is (mostly) defined in terms of `go_offleft', `go_offright' and `forth'. This means that subsequent implementors only need to write a few operations. On the other hand the LINKED_LIST class does not follow this philosophy. Suppose I want a list of REALs which also includes statistics such as average and standard deviation. I define a class STATISTIC_LIST which inherits from LINKED_LIST[REAL] and also defines the features `average', `standard_deviation', `sigma_x' and `sigma_x_squared'. In my implementation `sigma_x' and `sigma_x_squared' are variables rather than functions, since functions would have to walk the list for each evaluation. Every time the list is changed (element changed, new element inserted, element deleted) these variables will have to be updated. This presents a problem because the update operations in LINKED_LIST are all implemented separately. The minimum spanning set for updates is `put', `put_right' and `remove', but these are not used. Hence I must go through LINKED_LIST re-implementing all the update methods. One problem with the spanning set concept is the requirement for efficiency. As an example, take the merging of linked lists. Using the minimum spanning set, each item in the source list would have to be inserted into the target list and then removed from the source. This would make the procedure an O(n) operation. Instead the lists are simply spliced together through their links, a much quicker operation. (Detailed point: it is still necessary to zip through the source list in order to get to the last element, but even so the operation is still much faster than with the minimum spanning set) The answer to this point is that the minimum spanning set algorithm should be used in an ancestor class (such as D_LIST or L_LIST) and then redefined for efficiency in the effective classes while the original routines are renamed and not exported. Hence the minimum spanning set algorithm is still available if it is required in subsequent descendants. If LINKED_LIST had followed this philosophy then I could create my STATISTIC_LIST by redefining `put', `put_right' and `remove', and then renaming and redefining in the `inherit' clause to bring back the original spanning set implementations of the other methods. (At the International Eiffel Users Conference in Paris, Dr. Meyer said that Eiffel 3 will include `synonym' and `frozen' constructs to help with the management of features in descendant classes.) Paul. Paul Johnson UUCP: <world>!mcvax!ukc!gec-mrc!paj --------------------------------!-------------------------|------------------- GEC-Marconi Research is not | Telex: 995016 GECRES G | Tel: +44 245 73331 responsible for my opinions. | Inet: paj@gec-mrc.co.uk | Fax: +44 245 75244