[comp.lang.eiffel] Feature names

bertrand@eiffel.UUCP (Bertrand Meyer) (05/10/89)

    An important aspect of object-oriented design of reusable components
is the proper choice of names for exported features of each class.
The basic rule is that these names should be both simple (which usually
implies that they should be short) and chosen according to consistent
conventions.

    One consequence is that one should resist the temptation to
over-qualify names. For example a procedure for inserting elements into a
dictionary should not be called ``insert_in_dictionary'' or
``dictionary_insert'', but (barring any better choice, as discussed
below) just ``insert''.

    This would not necessarily be true in a less typed language because of
ambiguities and errors that might result if the same simple names
(insert, delete, put, ...) are used in many different classes. In Eiffel,
however, typing averts these problems. When you see

    d.insert (...)

the type of d (as declared in the class in which this appears) immediately
tells you which ``insert'' is meant.

    These ideas were applied to the design of the Basic Eiffel Library.

    We recently took a closer look at naming conventions for the library,
however, especially after some criticisms were made regarding their
consistency (see the presentation by John Anderson of Cognos
at the recent Eiffel conference in Paris).
For version 2.2 we have decided to take an extremist approach to name
consistency by focusing on a small number of names, especially for
``container'' classes (those which describe data structures used as
repositories of objects, such as sets, arrays, lists etc.). Examples of
these basic names are

    at       (for accessing an element)
    put      (for inserting an element)
    force    (same as ``put'', but will work in cases in which put might
             have failed; for arrays, for example, put only works for
             indices between the current bounds, whereas force applied
             to an out-of-bounds index will silently resize the array.
             This feature of arrays was previously called ``enter_force'')  

and so on. The names are used consistently, but the corresponding routines
do not necessarily have identical signatures; for example:

    at (index: INTEGER): T        in class ARRAY [T]:
                                    access to element through its index

    at: T                        in class STACK [T] and its descendants:
                                    access to top element

    at (key: U): T                in class H_TABLE [T, U -> HASHABLE]:
                                    access to element through its key

and so on.

    Of course synonyms may be needed for client programmers who want
more specific terminology. In class STACK and its descendants, for example,
a function called ``top'' is still available (as it was before) and
yields the same result as ``at''. 


    When different classes are combined through multiple inheritance,
identically named features will be distinguished through renaming. For
example the implementation of stacks by arrays is of the form

class FIXED_STACK [T] export

    at, ...

inherit

    ARRAY [T]
        rename
            at as array_at,
            ...

    STACK [T]
feature


    nb_elements: INTEGER;
            -- Redefined from STACK as an attribute

    at: T is
            -- Last element pushed;
            -- same as top.
        require
            not_empty: not empty
        do
            Result := array_at (nb_elements)
        end; -- at

    ...

end -- class FIXED_STACK


    Again, the typed nature of the language is essential here to make sure
that any error due to a confusion between two identically named features
(for example ``at'' from ARRAY and ``at'' from FIXED_STACK) is caught right
away by the compiler.

    As a result, the vocabulary of recommended feature names for the
library will significantly decrease. (I use the term ``recommended names''
because the old ones are usually kept as synonyms for compatibility; in a
forthcoming message I will describe the 2.2 ``obsolete'' facility which
helps in this respect.)

    It might be argued, of course, that the use of the same name for
operations with different signatures (such as the three versions of ``at''
above) is confusing for programmers of client classes. We considered
this argument but it does not seem to hold on closer inspection.
Regardless of the names chosen, the client programmer who needs
to access elements in arrays and stacks as well as hash tables
(to continue using this example) must somehow master the information that:

    - For an array you must provide an integer index.

    - For a stack you don't provide any argument since you can only
      access the last element pushed (top).

    - For a hash table you must provide the key, which must be of
      ``hashable'' type defined for the table (e.g. STRING).

    Some effort is needed to understand and remember this information.
If in addition the routine names are different, the effort required is
higher, not lower. If instead you can rely on the systematic convention
that regardless of the data structure standard access is always
called ``at'', standard addition of an element is always called ``put'' and
so on, then you can concentrate on learning the really meaningful
differences: the signatures of the operations.


-- 

-- Bertrand Meyer
bertrand@eiffel.com

day@grand.UUCP (Dave Yost) (05/12/89)

In article <137@eiffel.UUCP> bertrand@eiffel.UUCP (Bertrand Meyer) writes:
>    An important aspect of object-oriented design of reusable components
>is the proper choice of names for exported features of each class.
>The basic rule is that these names should be both simple (which usually
>implies that they should be short) and chosen according to consistent
>conventions.

I am glad to see this revision toward regularization of
the feature names in the library.  I might add that I
think the best way to standardize such names is for them
to appear in a very basic deferred class, parent of all
similar descendents in which they would be used.  For
example, a deferred base class COLLECTION could have
an "nb_elements" feature, and all descendent COLLECTION
classes would be obliged to use that name for the number
of items in the collection.  So, strings and arrays
which are obviously collections would each have an
nb_elements, instead of a STRING having a length, and
an array having an nb_elements.  (My preferred name for
this feature would be simply, "size").

 --dave

bertrand@eiffel.UUCP (Bertrand Meyer) (05/14/89)

From <493@grand.UUCP>, day@grand.UUCP (Dave Yost):

> I think the best way to standardize names [of features] is for them
> to appear in a very basic deferred class, parent of all
> similar descendents in which they would be used.  For
> example, a deferred base class COLLECTION could have
> an "nb_elements" feature, and all descendent COLLECTION
> classes would be obliged to use that name for the number
> of items in the collection.  So, strings and arrays
> which are obviously collections would each have an
> nb_elements, instead of a STRING having a length, and
> an array having an nb_elements.  (My preferred name for
> this feature would be simply, "size").

    The example given is typical of the need for name standardization
and I agree with the use of ``size'' as standard name. I also agree
with the desirability of having a deferred base class whenever possible.
It is of course preferable if you perceive the need for such a class right
from the start, although sometimes you will recognize it only as an
afterthought.

    (Speaking of afterthoughts, it has been pointed out to me
    that the reference to ``l'esprit de l'escalier'' in my message
    <138@eiffel.UUCP>, coming as it does from an apartment-oriented
    civilization, was culturally opaque in a suburban, one-story-house
    society.  There the correct form is ``l'esprit du driveway''.)

    When it is possible to devise such a common ancestor, however,
name consistency is usually achieved fairly naturally, since by default
the names will be the same in all descendants. Differences only arise in
descendants that explicitly rename the feature - presumably for a good reason.

    My note was going further by suggesting that whenever
appropriate the names should be the same for all features of a certain
broad category even if the signatures are not the same, precluding the use
of a single deferred routine in a common ancestor.
For example if every container class contains a feature
that represents the basic access mechanism associated with the corresponding
data structures, we may decide to call it ``at''
throughout, even though the signatures are different:

    at (index: INTEGER): T  in ARRAY [T]
                             (and STRING where T is CHARACTER)
    at: T  in STACK [T]

and so on (see original note). In this case, because the signatures are
different and the language is typed, there cannot be a common deferred
routine ``at'' in a common ancestor. Nor should there be, as these
routines are really different, not only in their signatures but more
generally in their specifications. It is because they are different that
different names (such as ``entry'' and ``top'') may initially have been
chosen.

    Because they share the same general goal, however, it is appropriate
on further reflection to use identical names so as to facilitate the task
of the client programmers, who in any case must learn and remember the
significant differences (differences of specification) but with
this approach won't also need to remember irrelevant name differences.
After a while they should feel more comfortable with the basic classes
by being able to guess the name of a feature they don't immediately recall.
-- 

-- Bertrand Meyer
bertrand@eiffel.com