bertrand@eiffel.UUCP (Bertrand Meyer and Philippe Stephan) (07/14/89)
THE INDEXING CLAUSE Bertrand Meyer, Philippe Stephan Interactive Software Engineering Inc. bertrand@eiffel.com stephan@eiffel.com As mentioned in the recent message about innovations in 2.2, Eiffel classes may now begin with an ``indexing clause''. This note explains the purpose of the clause and raises some questions on which input from the Eiffel community will be welcome. The general principle of documenting Eiffel software is that as much of the documentation as possible should be within the software texts (that is to say, the class texts) themselves. Documentation tools (such as the ``flat'' and ``short'' commands) and class retrieval tools, also known as browsers (such as the ``eb'' and ``GOOD'' commands), should use these texts as their primary source of information. Some properties of class designs, however, are of a higher level of abstraction than what is usually expressed in the class text proper. They include indexing categories, descriptions of design and implementation decisions, references to algorithms or data structures as published in the literature etc. The indexing clause is meant to record them in a standardized format for use by tools for documentation, archival and retrieval. Although the tools of the environment make only limited use of this clause in version 2.2, we believe it will play a major part in the future. We think Eiffel will be the basis for the first real industry of software components, which should be the major development in software technology over the next fifteen years. But as hundreds of reusable software components become available, it will be essential to have a systematic way of indexing and retrieving them. The indexing clause should be the one of the key mechanisms for achieving this goal. The syntactical format is simple. If present, the clause comes at the beginning of a class and has the following form: indexing index: value, value, value, ...; index: value, value, value, ... ... class (etc.) The indices are identifiers; the values are identifiers or manifest constants (integers, strings, reals). In each subclause, the part ``index:'' is optional, so that some indexing values may be given without index. Each of the semicolon-separated components is called an entry of the indexing clause. Thanks to this clause, it should be possible to retrieve archived classes using query languages that express queries based on <index, value> pairs, and the associated tools. Because of the very nature of the clause, the choice of indices and values is free. Obviously, we should not overly restrict the developers' choice in this area. If users are to be able to formulate meaningful and successful queries, however, it will be necessary to define consistent and systematic indexing conventions. Since we wanted the Data Structure Library, one of the key elements of the Basic Eiffel Library, to be properly ``indexed'' for 2.2, we have defined a first set of such conventions. They are not the result of very deep thinking, but seem to succeed in capturing the essential properties of the data structure library classes. We offer them here for discussion and improvement. It's too late for the improvements to affect 2.2, but history does not stop here. Suggestions and criticism will be welcome. Some of the guidelines were the following: + Keep the indexing clauses short (2 to 5 entries is typical). + Avoid repeating information which is in the rest of the class text. + Use a set of standardized indices for properties that apply to many structures (such as choice of representation). + For values, define a set of standardized possibilities for the common cases. + Include positive information only. For example, a ``representation'' index is used to describe the choice of representation (linked, array, ...). A deferred class does not have a representation. For such a class the clause should not contain an entry ``representation: none'' but simply no entry with the index ``representation''. A reasonable query language will make it possible to use a query pair of the form <representation, $NONE$>, where $NONE$ is a special value indicating absence. The indices chosen for the 2.2 data structure library are the following, along with the most frequent values. An entry of index ``names'' is used to record the names under which the corresponding data structures are known. Although a class has only one official name, the abstraction it implements may be known under other names. For example, a ``list'' is also known as a ``sequence''. An entry of index ``access'' records the mode of access of the data structures. Standard values include: ``fixed'' (only one element is accessible at any given time, as in a stack or queue); ``fifo'' (first-in-first-out policy); ``lifo'' (last-in-first-out); ``index'' (access by an integer index); ``key'' (access by a non-integer key); ``cursor'' (access through a client-controlled cursor, as with the list classes); ``membership'' (availability of a membership test); ``min, max'' (availability of operations to access the minimum or the maximum). Obviously, more than one of these values may be used in the same ``access'' entry. An entry of index ``size'' indicates a size limitation. Value ``fixed'' means the size of the structure is fixed at Create time and cannot be changed later (there are few such cases in the library). Value ``resizable'' means that an initial size is chosen but the structure may be resized (possibly at some cost) if it outgrows that size. For extendible structures without size restrictions this entry should not be present. An entry of index ``representation'' indicates a choice of representation. Value ``array'' indicates representation by contiguous, direct-access memory areas. Value ``linked'' indicates a linked structure. An entry of index ``contents'' is appropriate for container data structures. It indicates the nature of the contents. Possible values include generic (for generic classes), int, real, bool, char (for classes representing containers of objects of basic types). For example, the ARRAY_LIST class describes lists implemented by one or more arrays, chained to each other. The clause in this case is: indexing names: list, sequence; representation: array, linked; -- In this case it is both! access: fixed, cursor; size: resizable; contents: generic
jacob@gore.com (Jacob Gore) (07/15/89)
/ comp.lang.eiffel / bertrand@eiffel.UUCP (Bertrand Meyer and Philippe Stephan) / Jul 13, 1989 / + Include positive information only. For example, a ``representation'' index is used to describe the choice of representation (linked, array, ...). A deferred class does not have a representation. For such a class the clause should not contain an entry ``representation: none'' but simply no entry with the index ``representation''. A reasonable query language will make it possible to use a query pair of the form <representation, $NONE$>, where $NONE$ is a special value indicating absence. ---------- Why? It is not clear to me that in a database that the combined indexing entries will form lack of information must imply negation. Why is it better to assume that the lack of "representation:" means "$NONE$" rather than "$UNKNOWN$" (or, perhaps, simply "$UNSPECIFIED$")? -- Jacob Gore Jacob@Gore.Com {nucsrl,boulder}!gore!jacob
bertrand@eiffel.UUCP (Bertrand Meyer) (07/16/89)
From <120001@gore.com> by jacob@gore.com (Jacob Gore): > It is not clear to me that in a database [of reusable classes] > lack of information [about a certain property of a class] > must imply negation. Why is it better to > assume that the lack of a "representation:" [index entry] means > "$NONE$" rather than "$UNKNOWN$" (or, perhaps, > simply "$UNSPECIFIED$")? I agree. In my hypothetical query language, a query pair of the form <index, $NONE$> was meant as expressing absence of information about ``index'', not negation. $ABSENT$ or $UNSPECIFIED$ would be less ambiguous keywords than $NONE$. True, one may argue that in the example at hand (the ``representation'' entry for a deferred class), what is desired is indeed negation, not just absence: Not only does a deferred class have no ``representation'' property: not having a representation is one of its properties. (It took me some time to phrase this right.) The indexing clause is not needed, however, to capture such a negative property. which is readily deduced from the deferred nature of the class. Clearly, any browsing tool must have access to the information that the class is deferred, and the query language should allow a user to specify that the target of a query may, must or may not be deferred. -- -- Bertrand Meyer bertrand@eiffel.com
alonzo@microsoft.UUCP (Alonzo Gariepy) (07/19/89)
In article <120001@gore.com> jacob@gore.com (Jacob Gore) writes: >/ comp.lang.eiffel / bertrand@eiffel.UUCP (B. Meyer and P. Stephan) / Jul 13, 1989 / >> deferred class does not have a representation. For >> such a class the clause should not contain an entry >> ``representation: none'' but simply no entry with the >> index ``representation''. A reasonable query language >> will make it possible to use a query pair of the form >> <representation, $NONE$>, where $NONE$ is a special >> value indicating absence. >---------- > >Why? > >It is not clear to me that in a database that the combined indexing entries >will form lack of information must imply negation. Why is it better to >assume that the lack of "representation:" means "$NONE$" rather than >"$UNKNOWN$" (or, perhaps, simply "$UNSPECIFIED$")? > >-- >Jacob Gore Jacob@Gore.Com {nucsrl,boulder}!gore!jacob I have to agree strongly with Jacob Gore. I have found it very very convenient to distinguish between an index with no value and one that is not present. The existence of a value such as "none" provides a third distinction. The meaning of an index that is not present is that the classification has no relevance to the object (in this case, an Eiffel class). For example, "RetrievalMethod:" has no relevance to floating point numbers, and "EncodingStandard:" has no relevance to arrays of cake recipes. An index that is present, but that has no value, might indicate a variety of things: the value has not been put in yet; the proper value has yet to be determined; the value is null (not to say irrelevant); the value is some kind of default. Eventually, these indexed Eiffel classes will exist in a database containing both finished classes and those under development. The alternatives for abstract classes might then be: Representation: deferred -- specifies subclass responsibility Representation: none -- there is no representation Representation: -- unspecified < missing altogether > -- Representation? What's that? This is not the best example however. Consider an abstract class COMPRESS that deals with compressed data. It leaves the actual compression method to be defined in a subclass. For illustration we will have three subclasses of COMPRESS called RUNLENGTH, HUFFMAN, and PLAINTEXT. One can then imagine the following indexing: class COMPRESS indexing Representation: deferred; -- abstract class Compression: deferred; -- subclass responsibility ... class RUNLENGTH indexing Representation: byte, array; Compression: runlength; ... class HUFFMAN indexing Representation: word, array; Compression: two byte, huffman; ... class PLAINTEXT indexing Representation: character, array; Compression: none ; -- this subclass isn't actually compressed ... class LMPLZFF indexing Representation: 12bit, bitfield, array; Compression: ; -- blank. I can't remember how to spell this. ... ================ Alonzo Gariepy alonzo@microsoft ================