bertrand@eiffel.UUCP (Bertrand Meyer) (07/11/89)
This article is the discussion on Eiffel types
which has been announced for a rather long time.
I apologize for the delay. Finding the few hours needed to
finish this paper turned out to be more difficult than I
expected when I announced its coming availability in
January after completing the first draft.
The preparation of version 2.2 of the Eiffel system, addressing
problems that are more relevant to Eiffel use, seemed to command
higher priority.
All those people (several hundred) who requested a paper
version back in February will receive one. The mailing won't
be sent until early August.
IMPORTANT: The form below is still a draft.
-------------------------------------------------------
Warning:
This text has been designed with laser printing in mind
(fonts, subscripts), and some parts may be hard to read
without these. Sorry.
-------------------------------------------------------
Because its size is > 50K, this article has been split
into two parts for posting to Usenet. Please make sure you have
both parts.
PART 1 OF 2
STATIC TYPING FOR EIFFEL
Bertrand Meyer
January 1989 (updated, July 2-4, 1989)
1 OVERVIEW
The object-oriented language Eiffel [5] is statically typed:
in other words, type errors may be catched by mere
examination of software texts.
Unfortunately, the description of the basic type rules
in [5], as well as the type checking performed by the
current Eiffel implementations, did not cover certain kinds
of type errors. These errors have two interesting
properties: they almost never occur in the practical
development of software systems if proper management
procedures are observed; yet it is easy, even for a
newcomer, to come up with artificial examples that exhibit
some of these errors.
In recent months several people have independently come
across these problems and in some cases have even published
their findings as evidence of purported flaws in the Eiffel
type system.
This article shows that the type policy adopted by
Eiffel is sound, and actually claims that it is the only
policy compatible with Eiffel's aim to provide a clean,
powerful basis for the practical development of medium- and
large-scale software systems in industrial environments.
The article gives the precise rule for static typing in
Eiffel and describes the strategy for implementing this rule
as part of the incremental, modular compilation method used
for the implementation of Eiffel. Lack of time has prevented
the full inclusion of these checks in release 2.2; they
will, however, be part the next release (3).
Most of the arguments that have been voiced in previous
discussions are theoretical. In contrast, this article
considers in detail the pragmatic aspects of using classes
and inheritance in an industrial environment. A purely
theoretical approach misses the software engineering
criteria without which the entire discussion is meaningless.
Section 2 discusses the notion of typing in a general
context. Section 3 explains what typing and type failures
mean for Eiffel. Section 4 introduces the rationale behind
the Eiffel type policy. Section 5 gives the precise type
rule. Section 6 outlines the strategy for enforcing this
rule as part of Eiffel compiling. Section 7 is a conclusion
and assessment.
2 BACKGROUND
During the execution of a computer program, an attempt may
be made to apply an operation to operands which are not of
the form, or type, expected by the operation. Such a
regrettable course of affairs may be called a type failure.
The rest of this article discusses how to avoid type
failures in object-oriented programming.
2.1 Checking types
One possible defense against type failures is dynamic
checking, whereby the applicability of an operation to
candidate operands is ascertained at run-time, just before
the operation is to be applied. For anyone concerned with
software reliability, however, this is not a satisfactory
solution, since run-time is too late a stage to find a
suitable recovery from a type failure. A better strategy is
static checking, where potential type failures are detected
by an analysis of software texts, without execution. Such
an analysis is performed by a mechanism called a static
checker, or just checker in the rest of this discussion,
which does not consider any form of checking other than
static.
Formally, a checker is a predicate on software texts -
that is to say, a boolean function which, for any such text,
expresses whether or not the text is typewise correct. In
practice, the only checkers of interest are those which can
be implemented by computable algorithms. The practical
requirements are even stronger: to be worthy of
consideration, the algorithm must be efficient, and it must
be possible to integrate it in a compiler for the
programming language used to write the software.
A full checker is a [static] checker which is able to
guarantee that no type failure may result from the execution
of certain software systems. The ability to build full
checkers depends on the programming language being used. A
programming language for which it is possible to write a
practical and useful full checker will be said to be fully
typed, or just ``typed'' for short. (``Typable'' would be a
more accurate term since the use of such a language will
only guarantee the absence of type failures if an
appropriate checker is applied.)
To make full checking possible, typed programming
languages require programmers to provide supplementary
information in the form of type declarations. Programmers
will explicitly declare every program entity, such as a
variable or a formal routine argument, as being of a certain
type, identified by a type name. The static checker can
then check that every use of the entity is consistent with
this declaration. In Pascal, for example, an entity declared
as integer may only participate in operations such as
addition or comparison, which are applicable to integer
arguments.
2.2 The typing problem in object-oriented programming
With the exception of Trellis-Owl [8], most object-oriented
languages other than Eiffel do not support static typing.
Some, such as Smalltalk and Lisp derivatives have followed
the route of dynamic typing. C extensions such as C++
include a notion of type, but defeat any attempt at static
checking by providing loopholes (``type casts'') inherited
from C; these loopholes are required for any realistic use
of such a language, in particular because of the absence of
genericity (as discussed below). Such a lax attitude
towards typing is all the more regrettable that static
checking is particularly desirable in an object-oriented
context.
The fundamental construct in object-oriented
programming is the application of an operation to an object;
this is written entity.feature in Eiffel, meaning ``Apply
the operation denoted by feature to the object denoted by
entity''. (Arguments may be given if feature is a routine.)
This will cause a type failure if no such operation is
applicable to the given object.
Under its basic form, the static typing problem for
Eiffel is simple: this is the problem of statically
guaranteeing that such a type failure will never occur. (A
more complete description of the problem will be given
below.)
In other words, the basic problem is the possibility of
writing a checker which, whenever it accepts a set of
classes containing a case of the notation entity.feature,
guarantees that the run-time support mechanism will always
be able to find an operation for feature, applicable to the
object associated with entity.
The problem might appear trivial in any language that
requires every entity x to be explicitly declared of a
certain type T, defined by a class: then f must be an
exported feature of that class. The reason things are not
that simple is the presence of two powerful mechanisms:
polymorphism, which at run time allows x to denote an object
of a type T', possibly different from T and statically
unpredictable; and dynamic binding, which implies that in
such a case the version of f is the version applicable to
instances of T', which is not necessarily the same as the
version for T. (Dynamic binding should not be confused with
dynamic typing; in fact the Eiffel combination of static
typing and dynamic binding is a particularly productive
one.)
2.3 Limits of typing
Before analyzing in more detail how Eiffel addresses the
typing problem, it is important to consider three practical
aspects of typing, which are a consequence of the preceding
discussion.
The first observation is that static type checking
cannot guarantee that no run-time failure will occur: it
will only remove what has been called ``type failures'' at
the beginning of this discussion. Take for example a typed
language where the division operation expects arguments of
type REAL. Then the language and its checkers will flag a/b
as illegal if b is a variable of type CHARACTER; but if both
operands are of the correct type, the expression will be
considered legal even though the value of b might be zero at
execution time, producing a failure.
A similar problem exists in the type system of object-
oriented programming: a type failure has been defined above
as a case in which entity.feature is executed when entity
denotes an object which has no operation for feature. But
all that a type checker for Eiffel will be able to guarantee
is that if the object exists, there will be an operation
feature applicable to the object denoted by entity. This
will cause a type failure if no such operation is applicable
to the given object.
This aspect of the typing problem is frustrating
because it means that type checking may be used to catch the
``easy'' errors, such as using a character as denominator in
a division, but not the ``hard'' ones, such as using a real
denominator which in some cases may be zero. Unfortunately,
to catch the hard errors statically would require formal
proofs of program correctness, a goal that, even in
decidable cases, is beyond today's technology (although
progress towards this goal is definitely possible in Eiffel
thanks to the assertion mechanism, but this is beyond the
present discussion). So type checking, which may be called
the ``poor man's proof'', only addresses errors that may be
statically detected through relatively straightforward
mechanisms.
This observation does not mean that type checking is
useless, if only because the ``easy'' errors that it detects
are among the most important ones in practice, reflecting
inconsistencies which often result from serious design
mistakes. But it reminds us that type checking is only a
partial answer to the problem of program correctness, and
that the distinction between the errors which are detectable
by type checking and those which are not is a pragmatic one,
having to do with the power of the detection mechanisms
available to us.
A program that has been type-checked and found typewise
correct is guaranteed to execute without type failure; it is
not guaranteed to execute without failure.
In Eiffel, the remaining failures will cause
exceptions, which lead to abnormal termination unless the
application deals with them explicitly.
2.4 Cost issues
In the preceding discussion two kinds of errors were
considered: ones that are very difficult or impossible to
detect in the current state of the technology; others that
are relatively easy for a compiler to find in a language
designed for static typing.
In practice the situation may not be so clearcut. Some
checks may be tractable in principle but costly to implement
as part of standard compiling mechanisms. In traditional
languages, this is often the case with inter-procedural
checks. A frequent solution is to provide a static checker
(such as the lint tool for C), which is separate from the
compiler and performs more advanced checks.
One may contend, of course, that the need for such
tools is a consequence of poorly designed type systems and
in many cases it is. But the underlying idea is not without
merit. Checking implies definite costs in time. If one is
able to classify errors according to their practical
probability or the damages they cause, then the benefit-to-
cost ratio of static checks will usually show a point of
diminishing returns. Then it may make sense to have the
compiler, under its default options, perform the checks with
the highest benefit-to-cost ratio, the others being carried
out by explicit request only.
A related practical requirement arises in the case of
Eiffel, a modular language meant for incremental
compilation. In any good compiled implementation of Eiffel,
classes are compiled separately; if a class C has been
type-checked and compiled, a programmer who writes a new
class D depending on C must clearly be able to check D
without having to check C again. (Dependents in Eiffel
include clients, which have access to the class through its
interface only, and heirs.) This implies a strong constraint
on the type system, which must be designed in such a way
that the addition of a new descendant never makes an
existing class type-wise incorrect.
Satisfying this requirement means that it will be
possible to integrate full type checking with the automatic
incremental compilation mechanism used in the current Eiffel
implementation, which after a change recompiles only the
minimum set of impacted classes, taking export controls into
account (clients of a class which has undergone changes
affecting only hidden features are not recompiled).
2.5 Pessimism
The third important observation is that type checking is
always a pessimistic policy. Pascal compilers, for example,
will reject a program extract of the form
procedure p(var i: integer; r: real; var b: BOOLEAN);
begin
if b then i := r else i := i + 1
end
because the first branch of the conditional attempts to
assign a real value to an integer variable (a forbidden
combination). Yet in a given program it may very well be the
case that the procedure is always called with b false, so
that no type failure would result.
The problem is not just that a program should only be
accepted if it cannot ever produce a type failure. If the
instruction
b := false
is inserted at the beginning of the above procedure, no
execution of this procedure may ever produce a type failure.
Yet all Pascal compilers will still reject the above
procedure, because they do not perform the kind of flow
analysis that would be necessary in the general case to
discover that the type failure may not occur.
The behavior of compilers (and designers of statically
typed languages) in such cases is easy to understand.
Compiling time is a precious resource and should not be
wasted in trying to salvage programs that look hopelessly
suspicious, even if once in a while one of these programs
would in fact work. The principle followed here is ``better
safe than sorry''. If the checker is to err, then it should
err on the safe side by rejecting some programs that would
work, rather than accepting ones that will not.
So in all non-trivial cases typing is pessimistic. This
means that every language is typed in a way: a universal
full checker (applicable to Smalltalk, Lisp and APL as well
as Basic) is the program
print "Program rejected"
This example is ludicrous, of course, but shows that
typing is a relative, not an absolute notion. The above
definition of a typed language should thus be amended: A
programming language is fully typed, or just typed, if it is
possible to write a practical and useful full checker which
will not reject any reasonable program.
This definition leaves some margin of appreciation as
to what is a ``reasonable'' program, a notion that may not
be defined rigorously, although ``reasonable'' people should
be able to agree on whether a specific kind of program is
reasonable or not.
2.6 The role of typing: an assessment
The common lesson from the above three comments on typing is
that in any full discussion of typing the pragmatic
considerations are as important as the theoretical
discussion.
The three key questions of typing for a programming
language are:
1
+ Among all possible run-time failures, which ones are
to be considered type failures, to be totally avoided
by the checking mechanism, and which ones should be
considered as non-type failures (as division by
zero)?
2
+ Among all possible checks, which ones should always
be performed by the compiler, and which ones (if any)
should be done on request only?
3
+ Given that the checker will reject all programs
containing potential type failures as defined under
1, how far can it go in rejecting programs that might
not lead to such failures if executed?
The answers to all of these questions must of necessity
rely on an assessment of highly practical issues.
For question 1, the issue is what checks can
realistically be performed by a compiler or other checker.
This requires familiarity with the current technology of
compilers and other software tools.
For question 2, the issue is the benefit-to-cost ratio
of various checks. This adds to the previous requirement
some knowledge of what errors programmers actually make and
their actual degrees of gravity.
For question 3, the issue is what kinds of potentially
correct programs may be rejected by the checker. This
demands a practical understanding of how programmers use the
language and what is essential for them.
What is required in all cases is a software engineering
view which, beyond the language and its theory, encompasses
its practical application to the construction of useful
software.
3 THE EIFFEL TYPING PROBLEM
3.1 Terminology
It is assumed that the reader is familiar with Eiffel and
has read [5] or at least the 12-page introduction in [7].
Any meaningful discussion of type issues in Eiffel requires
using the proper terminology, which will be recalled here.
Software texts in Eiffel are made of classes. An
executable software product is obtained by assembling a
system, or combination of one or more classes, one of which
is called the root of the system.
A class describes a number of potential run-time
objects, called its instances. A class is characterized by
a number of features. A feature is either an attribute
(describing a modifiable component present in each instance
of the class) or a routine (describing a computation
applicable to each instance of the class).
A routine is either a function, which returns a result,
or a procedure, which does not. A routine may have formal
arguments; if so, calls to the routines must include the
corresponding actual arguments (expressions).
An entity is a name in a class text which stands for a
value at run-time. For this discussion we ignore entities of
``expanded types'' which include basic types (integer, real,
boolean, character) and for which polymorphism is not
supported; so the only values of interest are those whose
run-time values are references. A reference either points to
an object or is void. The following forms of entities are
provided:
+ Attributes of a class.
+ Local variables in routines.
+ The predefined entity Result which can only appear in
a function. (The value returned by a function call
is the final value of Result on termination of the
function's execution.)
+ Formal arguments of a routine.
An attribute or local variable may be used as target of
an assignment, written
x := y
where x is an entity and y is an expression. The effect of
such an assignment is to make x void if the value of y is
void, and otherwise to make x refer to the same object as y.
The latter case produces aliasing between x and y; two
entities are said to be aliased if there values are
references to the same object. Aliasing also may also
result from a routine call where y is the actual argument
and x the corresponding formal argument.
What makes the object-oriented typing problem non-
trivial is the presence of polymorphic aliasing, where x and
y are of different class types. The typing problem is then
that the same object is accessible through names of two
different types, which may have different rules with respect
to what features are exported and what feature arguments are
permissible.
3.2 The basic type rules
Two fundamental type rules (which have always been
statically enforced by our implementation of Eiffel) serve
to avoid most of the cases that might lead to a type
failure. They are given in sections 10.1.4 and 11.3 of [5]
and repeated here in simplified form.
The first rule governs feature application: x.f (...),
where x is of class type C, is only legal if one of the
ancestors of C (possibly C itself) defines a feature f, and
C exports f.
The second rule governs polymorphic aliasing: x := y is
only legal if the type of y say D, conforms to C. (For
non-generic classes, this means that D is a descendant of C;
for generic classes, conformance also includes recursive
conditions on the generic parameters.) This also applies to
argument passing if x is the formal argument and y the
corresponding actual argument.
This second rule is also what lies behind the
possibility of defining flexible generic data structures
such as
stack_of_C: STACK [C]
with the effect that the push routine applied to stack_of_C
will accept as argument any expression whose type conforms
to C.
Taken together, the two rules exclude most type errors.
Nevertheless some type failures are still possible in
principle.
3.3 Violations
Type failures may occur in Eiffel in spite of the above
rules because of the following two properties of the
language:
+ The information hiding mechanism is orthogonal to the
inheritance mechanism. In other words, a feature
hidden (exported) by a class may be exported (hidden)
by any of its proper descendants.
+ When a routine is redefined in a descendant, the type
of a formal argument may be redefined to be a proper
descendant of the original type. This is sometimes
called the covariant rule.
Both of these rules makes it easy to construct examples
that lead to type failures. Let PARENT be a class and HEIR
one of its heirs. Assume PARENT exports feature f, but HEIR
does not. Then the first type rule makes the following
legal:
/1/
p: PARENT; h: HEIR;
...
h.Create;
p := h;
p.f
Here because of dynamic binding we are sneakily
applying operation f to an object of type HEIR, even though
the designer of class HEIR has expressly removed f from the
list of operations applicable to such objects. Why this is
indeed bad will be analyzed in more detailed below.
As an example of the second kind, assume that routine g
is defined in PARENT with an argument of type PARENT, and
redefined in HEIR with an argument of type HEIR. (More
generally the respective argument types might be any P and H
such that H conforms to P; here for simplicity we have
chosen P and H to be the enclosing classes themselves. In
practice, explicit redeclaration is avoided in such a case
by using a ``declaration by association'' of the form like
Current.) Then the second type rule makes the following
legal:
/2/
p: PARENT; h: HEIR;
...
h.Create;
p := h;
p.g (p)
Here we are sneakily calling g on an object of type
HEIR with an argument of dynamic type PARENT, not HEIR.
3.4 Analysis of failures
Why are the above situations examples of type failures?
In case /2/, the problem is clear: since the version of
g redefined in HEIR assumes an argument of type HEIR, it
might well access an attribute defined for objects of type
HEIR but not for those of type PARENT. Such a type failure
will usually trigger a run-time exception.
Case /1/ is more subtle, as will be seen if we compare
it to the following almost identical variant, which assumes
the reverse export policy: f exported by HEIR but not by
PARENT.
/3/
p: PARENT; h: HEIR;
...
h.Create;
p := h;
h.f
In /3/ to we again apparently use aliasing (the ability
to refer to an object through more than one name) as a way
to cheat with export restrictions. Even though p.f is
illegal, we can apply f to the object associated with p by
accessing it through its HEIR name.
In /3/, however, nothing bad can really occur; because
the designer of HEIR decided to export f, it means that (if
the class is correct) f may be correctly applied to objects
of this type. In precise terms, f, being exported by HEIR,
must preserve the invariant of this class. (No discussion of
these issues makes sense unless it refers to the notion of
class invariant.)
For /1/ the situation is different. Feature f is
exported by PARENT, meaning that it preserves the invariant
of this class. But it may well fail to preserve the
invariant of HEIR (this by itself being a sufficient reason
for HEIR not to export f). By using polymorphism and
dynamic binding, the client class containing extract /1/
succeeds in applying f to an object of type HEIR; as a
result, the object may enter into an incorrect state (which
usually will soon produce an exception).
Cases /1/ and /2/ are prototypical examples of the bad
cases that may occur. Others (involving in particular
genericity and declaration by association) are mere
variations on these basic patterns. /3/, as noted above,
does not lead to type failures.
For all their obviousness, these cases very seldom
occur in practice in a project whose aim is to construct
useful software rather than to crash the Eiffel environment.
For example, it is interesting to mention that no type
failure has ever been recorded in the usage of Eiffel at
Interactive Software Engineering. This explains why fixing
the type rules and the corresponding type checking performed
by the compiler has not been a top priority.
But of course the loopholes must eventually be closed.
THIS IS THE END OF PART 1 (OF 2)
--
-- Bertrand Meyer
bertrand@eiffel.combertrand@eiffel.UUCP (Bertrand Meyer) (07/11/89)
Because its size is > 50K, this article has been split
into two parts for posting to Usenet. Please make sure you have
both parts.
THIS IS THE BEGINNING OF PART 2 (OF 2)
4 RATIONALE FOR THE EIFFEL RULES
Before studying the solution, we will pause to consider the
Eiffel type rules giving rise to the above problems and
assess whether they make sense from a software engineering
perspective. These rules have indeed been criticized [1, 3,
4] (the second of these references even includes a rather
baroque proposal for ``fixing'' these rules). Several
related messages have also been posted to the
comp.lang.eiffel newsgroup.
These rules were not designed hastily, however, and the
experience of several years and hundreds of thousands of
lines of useful Eiffel software even suggests that they are
the only acceptable ones.
4.1 Rationale for the export rule
Consider first the export-inheritance rule, which often
surprises newcomers as an apparent violation of information
hiding principles. How can a descendant override export
restrictions of an ancestor, or hide what the ancestor
exported?
To obtain a better perspective on this rule it is
useful to examine its application to what may be the
archetypal example of inheritance. Assume PARENT is a class
POLYGON and HEIR is a class RECTANGLE. The inheritance
relation in this case is clear and natural.
It is not absurd, however, to assume that class POLYGON
has a procedure
add_vertex (new: POINT) is
-- Insert new vertex new at current cursor position
do
...
ensure
nb_vertices = old nb_vertices + 1
end -- add_vertex
For general polygons, this procedure may make sense. It
is not, however, applicable to rectangles, whose class
invariant should contain the clause
nb_vertices = 4
The Eiffel solution is to ensure that class RECTANGLE
does not export procedure add_vertex. This is the kind of
case that led to situations such as /1/, with f being
add_vertex.
POLYGON
^ ^
/ \
/ \
/ \
/ \
/ \
/ \
FIXED_POLYGON VARIABLE_POLYGON
^ feature
| add_vertices
|
|
|
|
|
RECTANGLE
invariant
nb_vertices = 4
Figure 1 : Alternative inheritance structures
Three policies are possible in the face of such cases.
+ Policy 1 considers that this use of inheritance is
inadequate: RECTANGLE is not a subtype of POLYGON
because not all operations applicable to the latter
are applicable to the former. According to this
policy, the proper solution (see figure 1) is to
consider two heirs to POLYGON: FIXED_POLYGON and
VARIABLE_POLYGON. Procedure add_vertex is a feature
of VARIABLE_POLYGON only, whereas RECTANGLE inherits
from FIXED_POLYGON. This way the Eiffel rule can be
changed to require that every class should export all
features exported by its ancestors. (Even with this
policy, however, there is no reason to prohibit a
class from exporting a feature hidden by a proper
ancestor. This technique is commonly used in
practical Eiffel programming and, as discussed above,
entails no danger of type failure.)
+ Policy 2 considers that inheritance is used properly
in this case, but only for half of its capacity: as a
module enrichment mechanism, not as a subtyping
facility. The definition of class RECTANGLE should be
permitted, but not polymorphic assignments of the
form p := r (with p of type POLYGON and r of type
RECTANGLE). The conformance rules governing such
assignments (as given in section 11.3.1 of [5])
should be updated to exclude such cases. If
polymorphic assignments are needed, then the
inheritance structure should be redesigned as shown
in figure 1.
+ Policy 3 is more liberal: it permits the above
polymorphic assignments as long as it can be
ascertained statically at reasonable cost, for any
Eiffel system involving these classes, that no type
failure may result; in other words, that no
add_vertex operation may ever be applied to p or an
entity that may become associated with p. This is the
policy applied in Eiffel, for which the exact rules
will be given below.
What speaks in favor of policies 1 and 2 is that they
are extremely easy to implement for the compiler writer.
Restricting exports in descendant classes (policy 1) is
trivial; restricting polymorphic aliasing (policy 2) is
almost as immediate. These solutions would also have the
advantage of quieting the theoreticians and removing the
fears of prospective users.
In short, all obvious arguments seemed to push the
designers of Eiffel towards policy 1 or 2: convenience
(since the same group is also in charge of an
implementation) and ease of convincing future users. So it
takes some dose of courage to choose policy 3 and stick to
it. Why should this be the Eiffel policy?
The reasons can only be understood by going beyond the
purely theoretical discussions and considering the practice
of object-oriented software construction.
Policies 1 and 2 assume that programming is for gods.
Gods never make any mistake. Those among them who aside
from their other business indulge in the heavenly pleasures
of object-oriented programming always get their inheritance
structures right the first time around.
In today's job market, however, most employers must
resort to hiring programmers who are only demigods, or even
in some cases (although they will deny it) mere mortals.
Then we have to accept the need to work with inheritance
structures that may already in be place and have been
designed with less-than-perfect foresight.
Assume for example that POLYGON had already been
designed and has descendants such as CONVEX_POLYGON and
CONCAVE_POLYGON. In other words, the existing inheritance
structure does not take into account the difference between
``variable'' and ``fixed'' polygons. Then a new programmer
comes in and has a need for RECTANGLE; writing it as an heir
to POLYGON seems the obvious solution.
Policy 1 precludes doing this without first
reorganizing the entire inheritance structure. This is
unrealistic in a practical development environment. True,
we should accept the need for regular improvements of
inheritance structures, reflecting improved understanding of
software artefacts. But as anyone who has managed a class
library in an industrial environment will appreciate, such
evolution should be carefully planned and properly
controlled. One cannot accept a situation in which new but
legitimate uses of a class require constant changes in the
design of existing libraries.
Policy 2 is only marginally better. It does allow the
programmer to define RECTANGLE as an heir to POLYGON but
prevents him from using polymorphism in this case; it is not
possible, for example, to define a data structure such as
polygon_stack: STACK [POLYGON]
and push a RECTANGLE object onto polygon_stack. This is
particularly frustrating in an application that never even
uses add_vertex. Again, the only solution is to redesign the
inheritance structure.
Policies 1 and 2 are overly inflexible. In contrast,
the idea of policy 3 is to avoid bothering programmers with
unnecessary restrictions when their use of inheritance
cannot possibly lead to any type failure. The criterion to
apply is obvious in the example given: the checker should
reject any software system that contains both of the
operations
+ p := r
+ p.add_vertex (...)
Because checkers cannot realistically carry out flow
analysis, it is not necessary to check whether or not these
two instructions can be executed on the same control flow
path; the mere presence of both in the same system is reason
enough for the checker to reject that system. In other
words, we accept without regret that the following sequence
(with n an integer entity) will be rejected even though it
cannot possibly lead to a type failure:
if n >= 0 then p := r end
if n < 0 then p.add_vertex (...) end
The reason for this was discussed in section 2.4:
checkers should only be bound to perform ``reasonable''
controls. For a checker to recognize that the above is safe
requires control flow analysis, whose cost, if it is at all
possible, is not justified by the benefit. Using the
terminology of section 2.4, we are past the point of
diminishing returns.
As noted, type checking is always a pessimistic
strategy; the only question is what degree of pessimism is
acceptable. Rejecting extracts of the above form appears
acceptable because it is hard to conceive of them being used
in useful programs. This is not true, however, of the much
stronger pessimism implied by policies 1 and 2, which leads
to rejection of programs that, for the reasons discussed
above, are useful and even needed.
How policy 3 can be implemented at reasonable cost will
be discussed below.
4.2 Further reflections on inheritance
The argument made above for orthogonal export and
inheritance mechanisms was that inheritance structures may
be temporarily imperfect. The implied consequence is that
eventually these structures will be perfected; then policies
1 and 2 could be implemented if we were willing to implement
tougher library management procedures (forcing restructuring
when appropriate, and requiring new heir designs to wait in
the meantime).
Unfortunately, even such an approach may not be viable.
In practice we must probably accept that some inheritance
structures will always be imperfect.
Inheritance is the application to software engineering
of the most fundamental of all scientific techniques:
classification. The modern forms of the natural sciences,
since Linnaeus, and of mathematics, at least since Bourbaki,
were borne out of systematic efforts to classify natural and
artificial objects; object-oriented programming attempts the
same for software objects.
Classification is humanity's attempt to bring a
semblance of order to the description of an otherwise rather
chaotic world. As anyone who has ever tried to use a botany
book to recognize actual flowers knows, classifications
never quite achieve perfection. Close as one can get to a
fully satisfactory system, a few exceptions and special
examples will remain.
Moving from botany to zoology, the cliche' example of
ostriches and flying illustrates these problems well. In
class BIRD, it seems appropriate to include and export a
feature fly. But a descendant class such as STRUTHIO (the
name of the genus that includes ostriches) should not export
fly. This is really the Eiffel form of selective
inheritance, or rejecting part of your heritage: the
rejected feature is still there internally, but not visible
by your clients. (The discussion of selective inheritance in
section 10.5.3 of [5], is too restrictive in this respect:
it seems to reject any form of selective inheritance, even
though Eiffel supports the form discussed here.)
In a case such as this one there does not seem to be a
much better solution. Any other classification of birds, for
example into flying and non-flying ones, would miss key
criteria for distinguishing between various classes of
birds, and would undoubtedly cause bigger problems than the
original solution. Multiple inheritance from a class
FLYING_THING would not help much.
So far, only a minority of Eiffel users have admittedly
reported ostrich-oriented programming as their major area of
technical interest. But the need to deal with imperfect
inheritance structures seems universal. Of course, we
should strive to make these structures as complete and
regular as possible. But we must also accept that unexpected
special cases and exceptions may always occur. When they do
and the programmer is only performing safe manipulations,
the programming environment should help him, not put undue
restrictions in his way.
4.3 Rationale for the redefinition rule
The analysis of the redefinition rule is similar to the
preceding discussion of the export rule. It is in fact hard
to see what convention other than the one used in Eiffel can
have any practical value.
Consider classes HEIR and PARENT as above, and add
HEIR_LIST and PARENT_LIST representing lists of objects of
these respective types. Of course, the normal Eiffel method
is to use genericity and declare these classes as LIST
[HEIR] and LIST [PARENT]. But the use of genericity does not
affect this discussion.
Clearly, if every instance of HEIR ``is-an'' instance
of PARENT, a list of HEIR objects also ``is-a'' list of
PARENT objects. This and other reasons suggest that
HEIR_LIST should inherit from PARENT_LIST. The conformance
rules indeed ensure this if genericity is used.
Assume that PARENT_LIST contains a procedure insert
with an argument of type PARENT. Clearly, we only want to
insert HEIR objects into an HEIR_LIST. So insert should be
redefined in HEIR_LIST so as to take an argument of type
HEIR. This satisfies the Eiffel rule for redefining routine
arguments - the covariant policy. (In practical Eiffel
programming, no explicit redefinition will be used; the
effect is achieved more elegantly by a like something
declaration, also known as declaration by association.)
Interestingly, denotational models for inheritance (see
[2] or the brief presentation in [6]) prescribe the reverse,
``contravariant'' rule: arguments can only be redefined to
ancestor types of the original. This is apparently [3] the
rule implemented in Trellis-Owl.
Examples such as the above, of which there are
thousands in practical Eiffel applications, make it very
hard to imagine how significant object-oriented software can
be written in a typed language without a covariant policy.
Many of these examples use declaration by association, which
is only a syntactical abbreviation, but in practice an
essential one; its very availability for routine arguments
is only possible because of the covariant rule.
Pronouncements that the contravariant rule is ``the
correct policy'', based on the revealed truths of
denotational semantics, are surprising. (When the facts
revolt, send in the lambda troops!). To be sure, the
simplicity of mathematical models is an important guideline
for assessing programming language decisions. Yet a model
that conflicts with practical needs of software development
loses its usefulness. A model is only an operational
description of some reality. If the two do not match, the
scientific method suggests that we should change the model,
not the reality.
The problem with the covariant rule, of course, is that
it opens the possibility of type failures of the kind called
/2/ above, which readily transposes to our latest example:
/4/
pl: PARENT_LIST; hl: HEIR_LIST;
p: PARENT;
...
hl.Create;
pl := hl;
pl.insert (p)
The answer to such a case is in line with the result of
the discussion on the export-inheritance rule: put the
burden on the compiler, not the compiler. Any occurrence of
the last two above instructions in an Eiffel system should
be detected by the checker; once again, their mere
coexistence, without any consideration of whether the two
can be part of the same control flow path, is sufficient
cause for rejection. No flow analysis should be imposed on
the checker.
The next section specifies the type rules in light of
the preceding discussion. The following one will outline the
method to be followed by a checker to enforce these rules at
reasonable cost.
5 THE TYPE RULE
The type rule will be introduced under a number of
simplifying assumptions, meant to keep the discussion clear.
5.1 Simplifications
An Eiffel type will be taken to be defined by a class. This
includes in fact two simplifications:
+ ``Expanded types'', which in particular include the
basic types INTEGER, REAL etc., are not covered.
These, however, are not involved in polymorphism, and
hence do not raise any of the problems discussed
above.
+ Generically parameterized types are also not covered.
Genericity, essential in practical uses of Eiffel,
does not fundamentally affect the current discussion.
A consequence of this simplification is that ``B
conforms to A'' is the same relation as ``B is a
descendant of A''.
For the purpose of this discussion, Eiffel may be
viewed as having only two primitives: assignment and feature
application.
An assignment is of the form
x := y
and is a reference assignment. The left-hand side is an
entity; the right-hand side is an expression.
A feature application is either ``remote'' or
``local'', corresponding to the two forms
x.f (a, b, ...)
f (a, b, ...)
where feature f may be an attribute or a routine.
All feature applications will be considered to be
remote applications of routines with exactly one argument,
of the form:
x.r (a)
meaning ``Apply routine r to the object associated with x,
if any, with the given arguments''. This simplification
does not entail any loss of generality:
+ For local feature applications, we can consider that
x is Current, the predefined entity representing the
current object.
+ A routine with no argument may always be rewritten as
having one argument, not used in the body.
+ For a feature f which is an attribute, we can write
an associated function r with one argument (not
used), returning the value of the attribute, and
replace all applications of f by applications of r.
+ For a routine with more than one argument, the rules
on a as given below are easily extended to rules
applying to every formal argument and the
corresponding actual arguments.
The above does not consider ``multidot'' feature
applications of the form a.b.c.d.... Although useful in
practice, they can safely be ignored from a theoretical
viewpoint since they can always be replaced by sequences of
one-dot applications by using auxiliary variables (x := a.b;
y := x.c; etc.)
As a further simplification, we assume that no renaming
is used, so that a feature defined in a class has the same
name in all descendant classes, including those where it is
redefined. This entails no loss of generality since renaming
is a purely syntactic facility, albeit a very important one
in practice. The only case for which renaming is
indispensable is removing name clashes in multiple
inheritance; for the purpose of this discussion we can
assume that the original classes are rewritten so as to
avoid any such class. In the same spirit we assume that when
a routine is redefined the name of its formal argument
(whose type may be redefined according to the covariant
rule) does not change.
5.2 The goal of typing
The preceding simplifications make it possible to express
the goal of Eiffel typing in a complete yet simple way/
_________________________________________________
Type safety constraint: An Eiffel system is
said to be type-safe if for any routine
application x.r (a), and for any dynamic types
TX and TA that x and a may take on during the
execution of the system, TX includes and
exports a routine corresponding to r, with a
formal argument declared of a type which is an
ancestor of TA.
_________________________________________________
A system satisfying this constraint cannot suffer type
failures at run time. It can still, however, suffer other
failures, most notably by attempting to apply a routine r on
a reference x that happens to be void. But this, as the
division by zero case discussed at the beginning of this
article, is not within the scope of typing strategies.
Clearly, the contravariant rule, combined with either
of policies 1 and 2 for export and inheritance, guarantees
the type safety constraint. We must now see how to ensure it
with the more liberal rules: policy 3 and the covariant
rule.
5.3 The type rule
The fundamental type rule will address the preceding
requirements; it replaces (by strengthening it) the first
rule of section 3.2. The rule uses the set Aliases (x)
which, for any entity x in a system, contains all entities
to which x may become aliased through assignment or routine
call. A precise definition of this set will be given below.
The type rule is the following:
_________________________________________________
Type rule: An Eiffel system is type-correct if
and only if the following two conditions are
satisfied for any routine application x.r (z):
+ For every y in Aliases (x), S [y] exports
r.
+ For every y in Aliases (x) and every t in
Aliases (z), S [t] conforms to the type of the
formal argument of r in S [y].
_________________________________________________
Pending a proper definition of Aliases, it is clear
that this rule will satisfy the above type constraint.
5.4 Aliasing
It remains to define Aliases (x) precisely. A more correct
notation would be Aliases [S] (x) denoting the possible
aliases of x with respect to a given system (set of classes)
S. In what follows S will be considered to be fixed and the
subscript will be dropped.
Repeating the definition of a few basic concepts will
be useful.
+ An entity is a name declared in a class, representing
a reference which may be assigned values at run-time.
Formally, then, an entity of name x appearing in
class C is defined by the pair <C, x>. For
simplicity, entities will in fact be designated below
given by their entity names only, but we must be
careful to make it clear, for every x, what class C
it belongs to. C will be called the class of x
+ Any entity x is declared of a certain type, called
the static type of x and written T [x]. The
declaration may appear in the class of the entity or,
for an inherited entity, in an ancestor.
+ At any time during execution, the type of the object
associated with a non-void entity is called the
dynamic type of the entity and written D [x]. The
second basic type rule (section 3.2) guarantees that
the dynamic type conforms to the static type.
+ The definition of entities includes the predefined
entity Result, denoting the result of a function. To
resolve any ambiguity, the function name will be used
in this case as name for the entity.
Aliases is introduced through two binary relations.
First entity y is said to be assignable to entity x if and
only if any of the following assignments appears in the
system:
A.1
+ x := y
A.2
+ x := b.y (t) for any b and t.
In the second case, the relation holds not only for the
entity y declared in S [b] but also to the corresponding
entities in all the descendants of this class.
Next, y is said to be bindable to x if and only if one
of the following calls appears in S:
B.1
+ a.r (y)
B.2
+ a.r (b.y(t))
In both cases the relation holds whenever x is the
formal argument for r in any descendant of S [a]; in the
second case the relation extends to all y in any descendant
of class S [b].
We say that y is a direct alias for x if y is
assignable to x or bindable to x.
Aliases (x) may now be defined by transitive reflexive
closure: for any entity x, Aliases (x) is the set containing
x itself and all entities y such that, for some z, y is a
direct alias for z and (recursively) z belongs to
Aliases (x).
The above definitions consider all assignments and
routine calls present in the system regardless of whether
they may or may not be executed in actual runs of the
system. So no flow analysis will be needed to enforce the
type rule; as announced in 4.1, this will mean a pessimistic
policy, although a less pessimistic one than policies 1 and
2. (The inclusion of all descendants of S [a] and S [b] in
the above rules is also pessimistic; the aim here was to
keep the definitions simple at no significant loss of useful
systems. The interested reader may wish to examine how these
conditions can be relaxed.)
6 IMPLEMENTING THE RULE
[This section is only relevant for readers who are
interested in the details of the incremental verification
mechanism. Others are invited to proceed to the last
section.]
As introduced so far, the computation of Aliases (x)
for every entity of a system does not take into account the
order of compilation. As discussed in section 2.4, an
essential requirement is the ability for the automatic
incremental compilation mechanism to check a new class
without re-checking the classes on which it depends - its
suppliers and ancestors - if they have already been checked.
Although the Eiffel incremental recompilation algorithm is
not in the public domain, we may outline the strategy here.
The first observation is that case B.2, which seems the
most formidable one, can safely be ignored. By conceptually
introducing a new entity z, we can rewrite the corresponding
call as
z := b.y (t);
a.r (z)
making B.2 a combination of A.2 and B.1.
The incremental compilation requirement precludes the
straightforward implementation of type checking, whereby we
would compute Aliases (x) for every entity x, and then
examine every feature call x.f (z) to see if it satisfies
the type rule. True, this is appropriate for case A.2:
whenever this case yields a new direct alias y for
x, then the class of x is a client of the class of y, so
that the set Aliases (x) may be updated in an order
compatible with the compilation order. But this does not
work for case B.1, where it is in fact the class of y which
is a client of the class of x. (In case A.1 x and y belong
to the same class.)
As a consequence we must adopt a mixed strategy; the
sets to be examined must be computable and adaptable in an
order compatible with the dependency relation (and hence the
compilation order). For every entity x, these sets will be:
+ Remote_aliases (x), the set of y for which A.2 holds;
for convenience, this set also contains x.
+ Required_exports (x), the set of routines r which
must be exported by the class of x because the system
contains a call z.f (t) for some z such that x is in
Aliases (z).
+ Required_types (x), the set of types C to which the
S [x] must conform because the system contains a call
z.f (t), where f requires a formal argument of type
C, for some z such that x is in Aliases (z).
The strategy to follow conceptually involves four steps
for each new class.
Step 1 initializes the above sets for each entity x of
the class: the first set is initialized to the set
containing just x itself; the second, to the set of f such
that x.f (...) appears in the class; the last, to the types
of the corresponding arguments.
Step 2 propagates aliasing information to descendants.
Whenever a new class HEIR is defined as heir to an existing
one PARENT, step 2 performs the following for every entity x
of HEIR (where x' is the corresponding entity in PARENT):
+ Remote_aliases (x) := Remote_aliases (x) U
Remote_aliases (x')
+ Required_exports (x) := Required_exports (x) U
Required_exports (x')
+ Required_types (x) := Required_types (x) ^
Required_types (x')
[Nroff note: ^ means intersection, U means union].
This step implements the rule given in the previous
section for both the ``assignable'' and ``bindable''
relations: any alias of an entity of class PARENT is
considered to be also an alias of the corresponding entities
in descendants of PARENT.
For each new class, step 3 will perform the incremental
computation of the three above sets and step 4 will verify
that these sets satisfy the type rule. As will be seen
below, these two steps may be performed simultaneously on
each class.
Step 3, when encountering any of the first three types
of potential aliasing introduced above, must update the sets
as follows:
A.1
+ Required_exports (y) := Required_exports (y) U
Required_exports (x)
Required_types (y) := Required_types (x) ^
Required_types (x)
A.2
+ Remote_aliases (x) := Remote_aliases (x) U
Remote_aliases (y)
B.1
+ Required_exports (y) := Required_exports (y) U
Required_exports (x)
Required_types (y) := Required_types (y) ^
Required_types (x)
In each of these cases the updating has been chosen so
that it only affects some sets A (x) for an entity x in the
new class, not in its (possibly old) suppliers.
Using the above updates, the given sets are computed
incrementally in an order which is compatible with inter-
class dependencies, so that the computation may be
integrated with the automatic compiling mechanism.
Once this has been done, step 4 establishes the type
rule for every new class C by checking the following two
properties for every entity x in C:
P1
+ Every routine r in Required_exports (y), for every y
in Remote_aliases (x), is exported by C.
P2
+ S [a] belongs to Required_types (y) for all y in
Remote_aliases (x).
In practice, step 4 may be performed at the same time
as step 3 on each class C. Property P1 may be verified
incrementally by checking in cases A.1 and B.1 that r
belongs to Required_exports (y). P2 is verified
incrementally by checking in cases A.1 and B.1 that S [A]
belongs to Required_types (y).
This incremental checking assumes that some initial
verification has been integrated with steps 1 and 2. The
details are easily filled. It is interesting to note that
this initialization of the complete checking process is
precisely the set of verifications performed by the current
Eiffel compiler, reflecting the first and second rules
described in section 3.2. As this discussion as shown, the full
checking strategy is an extension to this partial approach.
7 ASSESSMENT
To conclude, it is useful to examine a few possible
objections to the rules and enforcement strategy described
above.
A first possible objection is that the algorithm
sketched in the previous section is too difficult to
integrate with the current Eiffel compiler. Of course, the
only way to refute this objection would be to present the
implementation, which is still some time away. The objection
can only be made, however, by someone who does not realize
the sophistication of some of the tasks already performed by
the current Eiffel compiler (for example the implementation
of repeated inheritance).
A related objection addresses the cost of the new
checking, especially in space. The precise impact needs to
be assessed more carefully, but I believe the checking can
reasonably be integrated within the normal compiling
mechanism. If not, the mechanism would still be available as
a compiler option a` la Lint, to be applied to systems at
validation time.
Perhaps the most serious objection has to do with the
relative complexity of the mechanism described in the
preceding section. But this mechanism is of no interest to
normal users of the language; only implementors (and others
doubtful of the soundness and enforceability oe the type
rules) need understand it. Eiffel programmers will need to
learn the type rule of section 5.3. This rule is only
slightly more complex than the old type rule (first rule of
section 3.2), which implies it; the advantage of the new rule
should be clear to programmers reading a simplified form of
the ``rationale'' section of this article (4). An attractive
property of the incremental strategy described in the last
section is that it appears to make it possible for the
checker to produce clear and precise error messages, of the
form
``In line m, feature f is applied to x; however,
because of the assignment on line n, x might
dynamically be of type D, which does not export f.''
If, as seems to be the case, the Eiffel compiler can
reach this level of sophistication, then we will have
achieved our goal. The purpose of a typed language is the
same as the purpose of the programming environment of which
it is a part: to help programmers, not bother them.
Restrictions should boost programmer productivity by
enabling an automatic checker to catch errors that would be
difficult for a human being to detect. The liberal typing
policy that has been integrated in Eiffel (``policy 3'') was
designed to make programmers' life easier. The type failures
to which it may lead are results of abnormal situations.
Detecting such situations statically is a tricky and tedious
business; as any such business, it should be handled by
computers, not by humans.
Acknowledgments
I am deeply indebted to all the people who contributed
to the type debate, including those with whose conclusions I
sharply disagree, for forcing me to clarify the matter of
types in Eiffel.
References
1. Pierre America, Review of ``Object-Oriented Software
Construction'', Science of Computer Programming, 1989,
to appear (or having appeared already?)
2. Luca Cardelli, "A Semantics of Multiple Inheritance,"
in Semantics of Data Types (Eds. Gilles Kahn, David B.
McQueen and Gordon Plotkin), Lecture Notes in Computer
Science 173, pp. 51-67, Springer-Verlag, New York,
1984.
3. William Cook, "A Proposal for Making Eiffel Type-Safe,"
ECOOP Conference, 1989.
4. Philippe Elinck, "De la Conception-Programmation par
Objets," Me'moire de Licence en Informatique (BS
Thesis), Universite' Libre de Bruxelles, 1988.
5. Bertrand Meyer, Object-Oriented Software Construction,
Prentice-Hall, 1988.
6. Bertrand Meyer, An Introduction to the Theory of
Programming Languages, Prentice-Hall, 1989. To appear.
7. Bertrand Meyer, "Eiffel: An Introduction," Technical
Report TR-EI-3/GI, Interactive Software Engineering
Inc., 1986 (Last revision: 1988).
8. Craig Schaffert, Topher Cooper, Bruce Bullis, Mike
Kilian, and Carrie Wilpolt, "An Introduction to
Trellis-Owl," in OOPSLA '86 Conference Proceedings,
Portland (Oreg.), Sept. 29-Oct. 2, 1986, pp. 9-16,
1986. (Published as SIGPLAN Notices, 21, 11, Nov.
1986.)
--
-- Bertrand Meyer
bertrand@eiffel.comcook@hplabsz.HPL.HP.COM (William Cook) (07/21/89)
In my last note (on which news has been oddly silent, although I have gotten some nice mail messages) I argued that there are some insecurities in the Eiffel type system and made a proposal for fixing them. Bertrand Meyer then posted the reply "Static typing for Eiffel". I would like to reply to his proposal. One way of looking the insecurities in Eiffel (not necessarily the best way) is that certain combinations of assignment of subclass objects to superclass variables (basic uses of subtype polymorphism essential to OOP) in combination with certain feature accesses can lead to run-time errors in the presence of genericity, feature type redefinition, and association types. I suggested that in some cases redefinition is not safe, and in others that the assumption of assignment compatibility is unreasonable. Dr. Meyer claims that these are conceptually correct, but that the use of certain assignments and a feature accesses in the same program are invalid. I have two comments. First, it seems pretty clear to me that most features are accessed somewhere in the program. Otherwise why would they be there. Thus we may focus on the use of "certain assignments", namely those that assign a subclass object to a superclass variable in those situations that I pointed out where unreasonable. If a single assignment of this form appears in the program, then the program will not compile and an error will be signalled. Yet this is exactly what I propose: to make the assignment illegal by eliminating the erroneous conformance relation that made it apparently legal in the first place. Second, it seems apparent to me that the kind of assignments used in a class should not be part of its "type" but should be encapsulated within its implementation. My proposal defines the legality of assignments in terms of types, which are something the user can be expected to know and understand. Dr. Meyer's proposal is based on supposedly hidden information, namely the assignments a class "in fact uses" as opposed to "is allowed to use". Thus rewriting the implementation of a class (e.g. to improve efficiency) may cause all programs based on it to break if it happens to use a different set of assignments. This violates the goal of encapsulated implementation. I think that if you rewrote the program, or planned it in advance at the design stage, so that the types where an accurate description of what you were trying to do, then the program would have better structure, be more robust, and easier to teach/maintain. I also can't believe that the LIST example in section 4.3, the "rationale for the redefinition rule" is very good evidence. It seems to me that LISTS are naturally generic, and if one doesn't realize this at first attempt it would be a good idea to redesign as soon as possible rather than building more code on a faulty design until it becomes really difficult to change the situation. This also brings up the possibility that a non-generic type (the first attempt) could reasonably conform to a generic type (based on the redesign) and thus ease the transition... -william cook