[comp.lang.eiffel] The indexing clause

bertrand@eiffel.UUCP (Bertrand Meyer and Philippe Stephan) (07/14/89)

                    THE INDEXING CLAUSE

              Bertrand Meyer, Philippe Stephan

           Interactive Software Engineering Inc.

                     bertrand@eiffel.com
                     stephan@eiffel.com




     As mentioned in the recent message about innovations in
2.2,  Eiffel  classes  may  now  begin  with  an  ``indexing
clause''.  This note explains the purpose of the clause  and
raises  some  questions  on  which  input  from  the  Eiffel
community will be welcome.

     The general principle of documenting Eiffel software is
that  as  much  of  the  documentation as possible should be
within the software texts (that is to say, the class  texts)
themselves.  Documentation  tools  (such as the ``flat'' and
``short'' commands) and class retrieval tools, also known as
browsers  (such as the ``eb'' and ``GOOD'' commands), should
use these texts as their primary source of information.

     Some properties of class designs,  however,  are  of  a
higher  level  of abstraction than what is usually expressed
in the class text proper.  They include indexing categories,
descriptions   of   design   and  implementation  decisions,
references to algorithms or data structures as published  in
the  literature etc.  The indexing clause is meant to record
them  in  a  standardized  format  for  use  by  tools   for
documentation, archival and retrieval.

     Although the tools of the environment make only limited
use of this clause in version 2.2, we believe it will play a
major part in the future.  We think Eiffel will be the basis
for  the  first  real industry of software components, which
should be the major development in software technology  over
the  next  fifteen  years.   But  as  hundreds  of  reusable
software components become available, it will  be  essential
to  have  a  systematic way of indexing and retrieving them.
The indexing clause should be the one of the key  mechanisms
for achieving this goal.

     The syntactical  format  is  simple.  If  present,  the
clause  comes  at  the  beginning  of  a  class  and has the
following form:

       indexing
          index: value, value, value, ...;
          index: value, value, value, ...
          ...
     class (etc.)

The indices are identifiers; the values are  identifiers  or
manifest  constants  (integers,  strings,  reals).  In  each
subclause, the part ``index:'' is  optional,  so  that  some
indexing  values  may  be  given  without index. Each of the
semicolon-separated components is called  an  entry  of  the
indexing clause.

     Thanks  to  this  clause,  it  should  be  possible  to
retrieve archived classes using query languages that express
queries based on <index, value> pairs,  and  the  associated
tools.

     Because of the very nature of the clause, the choice of
indices  and values is free. Obviously, we should not overly
restrict the developers' choice in this area.

     If users are to be able  to  formulate  meaningful  and
successful  queries, however, it will be necessary to define
consistent and systematic  indexing  conventions.  Since  we
wanted  the  Data Structure Library, one of the key elements
of the Basic Eiffel Library, to be properly ``indexed''  for
2.2,  we  have defined a first set of such conventions. They
are not the result  of  very  deep  thinking,  but  seem  to
succeed  in  capturing  the essential properties of the data
structure library classes.

     We offer them here for discussion and improvement. It's
too  late  for  the  improvements to affect 2.2, but history
does not stop  here.   Suggestions  and  criticism  will  be
welcome.

     Some of the guidelines were the following:

     + Keep the indexing clauses short (2 to  5  entries  is
       typical).

     + Avoid repeating information which is in the  rest  of
       the class text.

     + Use a set of standardized indices for properties that
       apply   to   many   structures  (such  as  choice  of
       representation).

     +  For   values,   define   a   set   of   standardized
       possibilities for the common cases.

     + Include positive information  only.  For  example,  a
       ``representation''  index  is  used  to  describe the
       choice of representation  (linked,  array,  ...).   A
       deferred  class  does  not have a representation. For
       such a class the clause should not contain an   entry
       ``representation: none'' but simply no entry with the
       index ``representation''. A reasonable query language
       will make it possible to use a query pair of the form
       <representation, $NONE$>, where $NONE$  is  a special
       value indicating absence.

     The indices chosen for  the 2.2 data structure  library
are the following, along with the most frequent values.

     An entry of index ``names'' is used  to record the names
under  which  the  corresponding  data structures are known.
Although a class has only one official name, the abstraction
it implements may be known under other names. For example, a
``list'' is also known as a ``sequence''.

     An entry  of  index  ``access'' records  the  mode  of
access of  the  data  structures.  Standard values include:
``fixed'' (only one element is accessible at any given time,
as  in  a  stack  or  queue);  ``fifo'' (first-in-first-out
policy); ``lifo'' (last-in-first-out); ``index'' (access by
an integer index); ``key'' (access by  a  non-integer key);
``cursor''   (access through a client-controlled cursor, as
with  the list  classes);  ``membership'' (availability  of
a membership test); ``min, max'' (availability of operations
to  access  the  minimum  or  the  maximum). Obviously, more
than  one of these values may be used in the same ``access''
entry.

     An entry of index ``size'' indicates a size limitation.
Value ``fixed''  means the size of the structure is fixed at
Create time and cannot be changed later (there are few  such
cases  in the library). Value  ``resizable''  means  that an
initial size is chosen but  the  structure  may  be  resized
(possibly  at  some  cost)  if  it  outgrows that size.  For
extendible structures without size restrictions  this  entry
should not be present.

     An entry of index ``representation'' indicates a choice
of representation. Value ``array'' indicates representation
by contiguous, direct-access memory areas. Value  ``linked''
indicates a linked structure.

     An  entry  of  index  ``contents'' is appropriate   for
container  data  structures.   It  indicates  the  nature of
the contents. Possible values include generic  (for  generic
classes),  int,  real,  bool, char (for classes representing
containers of objects of basic types).

     For  example,  the  ARRAY_LIST  class  describes  lists
implemented  by  one  or more arrays, chained to each other.
The clause in this case is:

     indexing
          names: list, sequence;
          representation: array, linked; -- In this case it is both!
          access: fixed, cursor;
          size: resizable;
          contents: generic

jacob@gore.com (Jacob Gore) (07/15/89)

/ comp.lang.eiffel / bertrand@eiffel.UUCP (Bertrand Meyer and Philippe Stephan) / Jul 13, 1989 /
     + Include positive information  only.  For  example,  a
       ``representation''  index  is  used  to  describe the
       choice of representation  (linked,  array,  ...).   A
       deferred  class  does  not have a representation. For
       such a class the clause should not contain an   entry
       ``representation: none'' but simply no entry with the
       index ``representation''. A reasonable query language
       will make it possible to use a query pair of the form
       <representation, $NONE$>, where $NONE$  is  a special
       value indicating absence.
----------

Why?

It is not clear to me that in a database that the combined indexing entries
will form lack of information must imply negation.  Why is it better to
assume that the lack of "representation:" means "$NONE$" rather than
"$UNKNOWN$" (or, perhaps, simply "$UNSPECIFIED$")?

--
Jacob Gore	Jacob@Gore.Com		{nucsrl,boulder}!gore!jacob

bertrand@eiffel.UUCP (Bertrand Meyer) (07/16/89)

From <120001@gore.com> by jacob@gore.com (Jacob Gore):

> It is not clear to me that in a database [of reusable classes]
> lack of information [about a certain property of a class]
> must imply negation.  Why is it better to
> assume that the lack of a "representation:" [index entry] means
> "$NONE$" rather than "$UNKNOWN$" (or, perhaps,
> simply "$UNSPECIFIED$")?

I agree. In my hypothetical query language, a query pair of the form
<index, $NONE$> was meant as expressing absence of information
about ``index'', not negation. $ABSENT$ or $UNSPECIFIED$ would
be less ambiguous keywords than $NONE$.

True, one may argue that in the example at hand (the ``representation''
entry for a deferred class), what is desired is indeed negation,
not just absence:

    Not only does a deferred class have no ``representation'' property:
	not having a representation is one of its properties.

(It took me some time to phrase this right.) The indexing clause
is not needed, however, to capture such a negative property.
which is readily deduced from the deferred nature of the class.

Clearly, any browsing tool must have access to the information that
the class is deferred, and the query language should allow
a user to specify that the target of a query may, must or may not
be deferred.
-- 

-- Bertrand Meyer
bertrand@eiffel.com

alonzo@microsoft.UUCP (Alonzo Gariepy) (07/19/89)

In article <120001@gore.com> jacob@gore.com (Jacob Gore) writes:
>/ comp.lang.eiffel / bertrand@eiffel.UUCP (B. Meyer and P. Stephan) / Jul 13, 1989 /
>>       deferred  class  does  not have a representation. For
>>       such a class the clause should not contain an   entry
>>       ``representation: none'' but simply no entry with the
>>       index ``representation''. A reasonable query language
>>       will make it possible to use a query pair of the form
>>       <representation, $NONE$>, where $NONE$  is  a special
>>       value indicating absence.
>----------
>
>Why?
>
>It is not clear to me that in a database that the combined indexing entries
>will form lack of information must imply negation.  Why is it better to
>assume that the lack of "representation:" means "$NONE$" rather than
>"$UNKNOWN$" (or, perhaps, simply "$UNSPECIFIED$")?
>
>--
>Jacob Gore	Jacob@Gore.Com		{nucsrl,boulder}!gore!jacob

I have to agree strongly with Jacob Gore.  I have found it very 
very convenient to distinguish between an index with no value and 
one that is not present.  The existence of a value such as "none"
provides a third distinction.

The meaning of an index that is not present is that the classification
has no relevance to the object (in this case, an Eiffel class).  For
example, "RetrievalMethod:" has no relevance to floating point numbers,
and "EncodingStandard:" has no relevance to arrays of cake recipes.

An index that is present, but that has no value, might indicate a variety
of things: the value has not been put in yet; the proper value has yet
to be determined; the value is null (not to say irrelevant); the value
is some kind of default.  Eventually, these indexed Eiffel classes will
exist in a database containing both finished classes and those under
development.

The alternatives for abstract classes might then be:
Representation: deferred    -- specifies subclass responsibility
Representation: none        -- there is no representation 
Representation:             -- unspecified 
< missing altogether >      -- Representation?  What's that?

This is not the best example however.  

Consider an abstract class COMPRESS that deals with compressed data.  
It leaves the actual compression method to be defined in a subclass.
For illustration we will have three subclasses of COMPRESS called
RUNLENGTH, HUFFMAN, and PLAINTEXT.  One can then imagine the following
indexing:

class COMPRESS indexing

Representation: deferred;    -- abstract class
Compression:    deferred;    -- subclass responsibility
...

class RUNLENGTH indexing

Representation: byte, array;
Compression:    runlength;
...

class HUFFMAN indexing
Representation: word, array;
Compression:    two byte, huffman;
...

class PLAINTEXT indexing
Representation: character, array;
Compression:    none   ;   -- this subclass isn't actually compressed
...

class LMPLZFF indexing
Representation: 12bit, bitfield, array;
Compression:           ;   -- blank. I can't remember how to spell this.
...

================
Alonzo Gariepy
alonzo@microsoft
================