bertrand@eiffel.UUCP (Bertrand Meyer) (05/10/89)
An important aspect of object-oriented design of reusable components
is the proper choice of names for exported features of each class.
The basic rule is that these names should be both simple (which usually
implies that they should be short) and chosen according to consistent
conventions.
One consequence is that one should resist the temptation to
over-qualify names. For example a procedure for inserting elements into a
dictionary should not be called ``insert_in_dictionary'' or
``dictionary_insert'', but (barring any better choice, as discussed
below) just ``insert''.
This would not necessarily be true in a less typed language because of
ambiguities and errors that might result if the same simple names
(insert, delete, put, ...) are used in many different classes. In Eiffel,
however, typing averts these problems. When you see
d.insert (...)
the type of d (as declared in the class in which this appears) immediately
tells you which ``insert'' is meant.
These ideas were applied to the design of the Basic Eiffel Library.
We recently took a closer look at naming conventions for the library,
however, especially after some criticisms were made regarding their
consistency (see the presentation by John Anderson of Cognos
at the recent Eiffel conference in Paris).
For version 2.2 we have decided to take an extremist approach to name
consistency by focusing on a small number of names, especially for
``container'' classes (those which describe data structures used as
repositories of objects, such as sets, arrays, lists etc.). Examples of
these basic names are
at (for accessing an element)
put (for inserting an element)
force (same as ``put'', but will work in cases in which put might
have failed; for arrays, for example, put only works for
indices between the current bounds, whereas force applied
to an out-of-bounds index will silently resize the array.
This feature of arrays was previously called ``enter_force'')
and so on. The names are used consistently, but the corresponding routines
do not necessarily have identical signatures; for example:
at (index: INTEGER): T in class ARRAY [T]:
access to element through its index
at: T in class STACK [T] and its descendants:
access to top element
at (key: U): T in class H_TABLE [T, U -> HASHABLE]:
access to element through its key
and so on.
Of course synonyms may be needed for client programmers who want
more specific terminology. In class STACK and its descendants, for example,
a function called ``top'' is still available (as it was before) and
yields the same result as ``at''.
When different classes are combined through multiple inheritance,
identically named features will be distinguished through renaming. For
example the implementation of stacks by arrays is of the form
class FIXED_STACK [T] export
at, ...
inherit
ARRAY [T]
rename
at as array_at,
...
STACK [T]
feature
nb_elements: INTEGER;
-- Redefined from STACK as an attribute
at: T is
-- Last element pushed;
-- same as top.
require
not_empty: not empty
do
Result := array_at (nb_elements)
end; -- at
...
end -- class FIXED_STACK
Again, the typed nature of the language is essential here to make sure
that any error due to a confusion between two identically named features
(for example ``at'' from ARRAY and ``at'' from FIXED_STACK) is caught right
away by the compiler.
As a result, the vocabulary of recommended feature names for the
library will significantly decrease. (I use the term ``recommended names''
because the old ones are usually kept as synonyms for compatibility; in a
forthcoming message I will describe the 2.2 ``obsolete'' facility which
helps in this respect.)
It might be argued, of course, that the use of the same name for
operations with different signatures (such as the three versions of ``at''
above) is confusing for programmers of client classes. We considered
this argument but it does not seem to hold on closer inspection.
Regardless of the names chosen, the client programmer who needs
to access elements in arrays and stacks as well as hash tables
(to continue using this example) must somehow master the information that:
- For an array you must provide an integer index.
- For a stack you don't provide any argument since you can only
access the last element pushed (top).
- For a hash table you must provide the key, which must be of
``hashable'' type defined for the table (e.g. STRING).
Some effort is needed to understand and remember this information.
If in addition the routine names are different, the effort required is
higher, not lower. If instead you can rely on the systematic convention
that regardless of the data structure standard access is always
called ``at'', standard addition of an element is always called ``put'' and
so on, then you can concentrate on learning the really meaningful
differences: the signatures of the operations.
--
-- Bertrand Meyer
bertrand@eiffel.comday@grand.UUCP (Dave Yost) (05/12/89)
In article <137@eiffel.UUCP> bertrand@eiffel.UUCP (Bertrand Meyer) writes: > An important aspect of object-oriented design of reusable components >is the proper choice of names for exported features of each class. >The basic rule is that these names should be both simple (which usually >implies that they should be short) and chosen according to consistent >conventions. I am glad to see this revision toward regularization of the feature names in the library. I might add that I think the best way to standardize such names is for them to appear in a very basic deferred class, parent of all similar descendents in which they would be used. For example, a deferred base class COLLECTION could have an "nb_elements" feature, and all descendent COLLECTION classes would be obliged to use that name for the number of items in the collection. So, strings and arrays which are obviously collections would each have an nb_elements, instead of a STRING having a length, and an array having an nb_elements. (My preferred name for this feature would be simply, "size"). --dave
bertrand@eiffel.UUCP (Bertrand Meyer) (05/14/89)
From <493@grand.UUCP>, day@grand.UUCP (Dave Yost): > I think the best way to standardize names [of features] is for them > to appear in a very basic deferred class, parent of all > similar descendents in which they would be used. For > example, a deferred base class COLLECTION could have > an "nb_elements" feature, and all descendent COLLECTION > classes would be obliged to use that name for the number > of items in the collection. So, strings and arrays > which are obviously collections would each have an > nb_elements, instead of a STRING having a length, and > an array having an nb_elements. (My preferred name for > this feature would be simply, "size"). The example given is typical of the need for name standardization and I agree with the use of ``size'' as standard name. I also agree with the desirability of having a deferred base class whenever possible. It is of course preferable if you perceive the need for such a class right from the start, although sometimes you will recognize it only as an afterthought. (Speaking of afterthoughts, it has been pointed out to me that the reference to ``l'esprit de l'escalier'' in my message <138@eiffel.UUCP>, coming as it does from an apartment-oriented civilization, was culturally opaque in a suburban, one-story-house society. There the correct form is ``l'esprit du driveway''.) When it is possible to devise such a common ancestor, however, name consistency is usually achieved fairly naturally, since by default the names will be the same in all descendants. Differences only arise in descendants that explicitly rename the feature - presumably for a good reason. My note was going further by suggesting that whenever appropriate the names should be the same for all features of a certain broad category even if the signatures are not the same, precluding the use of a single deferred routine in a common ancestor. For example if every container class contains a feature that represents the basic access mechanism associated with the corresponding data structures, we may decide to call it ``at'' throughout, even though the signatures are different: at (index: INTEGER): T in ARRAY [T] (and STRING where T is CHARACTER) at: T in STACK [T] and so on (see original note). In this case, because the signatures are different and the language is typed, there cannot be a common deferred routine ``at'' in a common ancestor. Nor should there be, as these routines are really different, not only in their signatures but more generally in their specifications. It is because they are different that different names (such as ``entry'' and ``top'') may initially have been chosen. Because they share the same general goal, however, it is appropriate on further reflection to use identical names so as to facilitate the task of the client programmers, who in any case must learn and remember the significant differences (differences of specification) but with this approach won't also need to remember irrelevant name differences. After a while they should feel more comfortable with the basic classes by being able to guess the name of a feature they don't immediately recall. -- -- Bertrand Meyer bertrand@eiffel.com