bertrand@eiffel.UUCP (Bertrand Meyer) (05/10/89)
An important aspect of object-oriented design of reusable components is the proper choice of names for exported features of each class. The basic rule is that these names should be both simple (which usually implies that they should be short) and chosen according to consistent conventions. One consequence is that one should resist the temptation to over-qualify names. For example a procedure for inserting elements into a dictionary should not be called ``insert_in_dictionary'' or ``dictionary_insert'', but (barring any better choice, as discussed below) just ``insert''. This would not necessarily be true in a less typed language because of ambiguities and errors that might result if the same simple names (insert, delete, put, ...) are used in many different classes. In Eiffel, however, typing averts these problems. When you see d.insert (...) the type of d (as declared in the class in which this appears) immediately tells you which ``insert'' is meant. These ideas were applied to the design of the Basic Eiffel Library. We recently took a closer look at naming conventions for the library, however, especially after some criticisms were made regarding their consistency (see the presentation by John Anderson of Cognos at the recent Eiffel conference in Paris). For version 2.2 we have decided to take an extremist approach to name consistency by focusing on a small number of names, especially for ``container'' classes (those which describe data structures used as repositories of objects, such as sets, arrays, lists etc.). Examples of these basic names are at (for accessing an element) put (for inserting an element) force (same as ``put'', but will work in cases in which put might have failed; for arrays, for example, put only works for indices between the current bounds, whereas force applied to an out-of-bounds index will silently resize the array. This feature of arrays was previously called ``enter_force'') and so on. The names are used consistently, but the corresponding routines do not necessarily have identical signatures; for example: at (index: INTEGER): T in class ARRAY [T]: access to element through its index at: T in class STACK [T] and its descendants: access to top element at (key: U): T in class H_TABLE [T, U -> HASHABLE]: access to element through its key and so on. Of course synonyms may be needed for client programmers who want more specific terminology. In class STACK and its descendants, for example, a function called ``top'' is still available (as it was before) and yields the same result as ``at''. When different classes are combined through multiple inheritance, identically named features will be distinguished through renaming. For example the implementation of stacks by arrays is of the form class FIXED_STACK [T] export at, ... inherit ARRAY [T] rename at as array_at, ... STACK [T] feature nb_elements: INTEGER; -- Redefined from STACK as an attribute at: T is -- Last element pushed; -- same as top. require not_empty: not empty do Result := array_at (nb_elements) end; -- at ... end -- class FIXED_STACK Again, the typed nature of the language is essential here to make sure that any error due to a confusion between two identically named features (for example ``at'' from ARRAY and ``at'' from FIXED_STACK) is caught right away by the compiler. As a result, the vocabulary of recommended feature names for the library will significantly decrease. (I use the term ``recommended names'' because the old ones are usually kept as synonyms for compatibility; in a forthcoming message I will describe the 2.2 ``obsolete'' facility which helps in this respect.) It might be argued, of course, that the use of the same name for operations with different signatures (such as the three versions of ``at'' above) is confusing for programmers of client classes. We considered this argument but it does not seem to hold on closer inspection. Regardless of the names chosen, the client programmer who needs to access elements in arrays and stacks as well as hash tables (to continue using this example) must somehow master the information that: - For an array you must provide an integer index. - For a stack you don't provide any argument since you can only access the last element pushed (top). - For a hash table you must provide the key, which must be of ``hashable'' type defined for the table (e.g. STRING). Some effort is needed to understand and remember this information. If in addition the routine names are different, the effort required is higher, not lower. If instead you can rely on the systematic convention that regardless of the data structure standard access is always called ``at'', standard addition of an element is always called ``put'' and so on, then you can concentrate on learning the really meaningful differences: the signatures of the operations. -- -- Bertrand Meyer bertrand@eiffel.com
day@grand.UUCP (Dave Yost) (05/12/89)
In article <137@eiffel.UUCP> bertrand@eiffel.UUCP (Bertrand Meyer) writes: > An important aspect of object-oriented design of reusable components >is the proper choice of names for exported features of each class. >The basic rule is that these names should be both simple (which usually >implies that they should be short) and chosen according to consistent >conventions. I am glad to see this revision toward regularization of the feature names in the library. I might add that I think the best way to standardize such names is for them to appear in a very basic deferred class, parent of all similar descendents in which they would be used. For example, a deferred base class COLLECTION could have an "nb_elements" feature, and all descendent COLLECTION classes would be obliged to use that name for the number of items in the collection. So, strings and arrays which are obviously collections would each have an nb_elements, instead of a STRING having a length, and an array having an nb_elements. (My preferred name for this feature would be simply, "size"). --dave
bertrand@eiffel.UUCP (Bertrand Meyer) (05/14/89)
From <493@grand.UUCP>, day@grand.UUCP (Dave Yost): > I think the best way to standardize names [of features] is for them > to appear in a very basic deferred class, parent of all > similar descendents in which they would be used. For > example, a deferred base class COLLECTION could have > an "nb_elements" feature, and all descendent COLLECTION > classes would be obliged to use that name for the number > of items in the collection. So, strings and arrays > which are obviously collections would each have an > nb_elements, instead of a STRING having a length, and > an array having an nb_elements. (My preferred name for > this feature would be simply, "size"). The example given is typical of the need for name standardization and I agree with the use of ``size'' as standard name. I also agree with the desirability of having a deferred base class whenever possible. It is of course preferable if you perceive the need for such a class right from the start, although sometimes you will recognize it only as an afterthought. (Speaking of afterthoughts, it has been pointed out to me that the reference to ``l'esprit de l'escalier'' in my message <138@eiffel.UUCP>, coming as it does from an apartment-oriented civilization, was culturally opaque in a suburban, one-story-house society. There the correct form is ``l'esprit du driveway''.) When it is possible to devise such a common ancestor, however, name consistency is usually achieved fairly naturally, since by default the names will be the same in all descendants. Differences only arise in descendants that explicitly rename the feature - presumably for a good reason. My note was going further by suggesting that whenever appropriate the names should be the same for all features of a certain broad category even if the signatures are not the same, precluding the use of a single deferred routine in a common ancestor. For example if every container class contains a feature that represents the basic access mechanism associated with the corresponding data structures, we may decide to call it ``at'' throughout, even though the signatures are different: at (index: INTEGER): T in ARRAY [T] (and STRING where T is CHARACTER) at: T in STACK [T] and so on (see original note). In this case, because the signatures are different and the language is typed, there cannot be a common deferred routine ``at'' in a common ancestor. Nor should there be, as these routines are really different, not only in their signatures but more generally in their specifications. It is because they are different that different names (such as ``entry'' and ``top'') may initially have been chosen. Because they share the same general goal, however, it is appropriate on further reflection to use identical names so as to facilitate the task of the client programmers, who in any case must learn and remember the significant differences (differences of specification) but with this approach won't also need to remember irrelevant name differences. After a while they should feel more comfortable with the basic classes by being able to guess the name of a feature they don't immediately recall. -- -- Bertrand Meyer bertrand@eiffel.com