[comp.object] Void references

bertrand@eiffel.UUCP (Bertrand Meyer) (11/10/90)

From <1990Nov9.010930.26493@Neon.Stanford.EDU>,
craig@Neon.Stanford.EDU (Craig D. Chambers) writes:

> [...] 
> What are you saying?  That you didn't make the comments that I
> attributed to you?

In mail to Mr. Chambers I listed the points on which I believe he
attributed to me ideas which are the contrary of what I wrote, which
is why I just cannot participate in that part of the debate (static
typing). I'd like to confine any controversy to technical issues,
rather than each person's (recursive) understanding of another's
comments, so I'll stop here. By the way, it is quite clear that
if Mr. Chambers did misunderstand my points that may be partially
or even totally my fault - for not being clear enough. 

On void references:

1. A void reference by any other name (nil, NONE, null object etc.)
is still a void reference.

2. 

> In Self we are rewriting our code to avoid using a
> generic nil object, always initializing variables to "real" objects.
> In some cases we have type-specific null objects that implement all
> the messages that should be understood by objects of the type,
> typically by doing nothing and terminating a recursion.

	This is the worst idea I have heard in a
long time (which means something). When you find that an operation
is impossible to execute, you just ignore it! What about the
specification - the contract - that your client expects?

	In the end, THIS is the real difference. Of course Self,
like everything else, has void references - they are just called
null objects.  Where Self, from this description, differs
from Eiffel, is that an erroneous feature call will simply
be ignored. This policy, of course, could have been implemented in Eiffel
(as it could be in Simula or anything else) by accepting that x.f
does nothing if x is void. But this is not acceptable!
If there is an illegal call, we want it to signal failure by raising
an exception - NOT to fake success!

	Just to drive the point closer: if I have the call

	landing_gear.raise

in a flight control software system, then I certainly would check
very carefully, by static means, that landing_gear will never be
void (null, nil ...) when this is executed. Once I have indeed
checked, if, for some reason (which can only be a bug,
although if there is some fault-tolerance in the software a
rescue mechanism may be available), landing_gear does happen to be
void at run-time, is the proper reaction to do nothing and
wait quietly?

	``Ladies and Gentlemen, we are starting our final
	approach towards Stanford. For reasons too complicated
	to explain in detail, you may wish to fasten your seat belts.
	We hope that if you future plans call for air travel ...''

	I believe that this is not proper. The proper
reaction is to trigger an exception, with the hope, of course,
that there is some exception handling code - even if all that code
does is to flash a light on the pilot's control board.

-- Bertrand Meyer
bertrand@eiffel.com

johnson@m.cs.uiuc.edu (11/11/90)

craig@Neon.Stanford.EDU (Craig D. Chambers) writes:

> In Self we are rewriting our code to avoid using a
> generic nil object, always initializing variables to "real" objects.
> In some cases we have type-specific null objects that implement all
> the messages that should be understood by objects of the type,
> typically by doing nothing and terminating a recursion.

to which bertrand@eiffel.com (Bertrand Meyer) replies:

>	This is the worst idea I have heard in a
>long time (which means something). When you find that an operation
>is impossible to execute, you just ignore it! What about the
>specification - the contract - that your client expects?

>	In the end, THIS is the real difference. Of course Self,
>like everything else, has void references - they are just called
>null objects.  Where Self, from this description, differs
>from Eiffel, is that an erroneous feature call will simply
>be ignored.

This is not exactly the case.  It is clear that Craig was not
being clear, though that is not exactly his fault.  I first heard
this idea from Allan Wirfs-Broch about two or three years ago,
and it took me awhile to get it.

There are several different ways to use nil in Smalltalk.  One
is as the value of an unassigned variable.  The other is as a generic
"other" or "don't care" value.  A common idiom in Smalltalk is
	x isNil ifFalse: [ x fee. x fi. x foe. x foo ]
nil doesn't understand fee, fi, foe, or foo, but if x is not
nil then x will understand those messages.  Type-checking this
code is funny.   One alternative is to say that nil is a subtype
of every type.  This implies that programs that have been type
checked could still send messages to objects that do not understand
them.  Another is to have the type-checking algorithm use flow
analysis and case analysis (which is the alternative we chose in
Typed Smalltalk).  This makes type-checking expensive and hard
to understand.

The solution is to say that nil is used ONLY for undefined values.
"Don't care" values and "error" values should be user defined objects,
defined specifically for a particular type.  Thus, performing an
operation on an "error" value raises an error, while performing
an operation on a "don't care" object does nothing.  Which one
you use depends on which one you need.

The result is that programs type-check and the type system has the
property that programs that have been type-checked only send
messages that are understood by their receivers.  It might be
that some checking is given up, since the programmer might
mistakenly give a "don't care" object when an "error" object
should have been given.  However, the increase in the quality
of the type-checking might overcome this.

I would be happy to try this technique in Typed Smalltalk,
and indeed we often use something like it in the code that
we write.  However, one of the goals of the Typed Smalltalk
project is to handle typical Smalltalk code the way that it
is written by typical Smalltalk programmers, and this is not
yet the way most Smalltalk programmers program.  Thus, we
cannot depend on it in our type system.

Ralph Johnson -- University of Illinois at Urbana-Champaign

bertrand@eiffel.UUCP (Bertrand Meyer) (11/12/90)

There is something strange in this discussion. In the messages
by Ralph Johnson and Craig Chambers one gets the impression
(please correct me if this is a wrong impression) that
it is possible to guarantee statically that no error will
ever occur at run-time. I don't see this as a fulfillable
goal theoretically or practically, short of taking either
or both of the following two measures:

1. Full-fledged program proving.
2. Unacceptable restrictions on the expressive power
   of the language (e.g. no references).

The aims of typing are more modest: guaranteeing statically
that a certain class of run-time errors will never occur.
Of course we should make that class as broad as possible.

Object-oriented technology is a particularly illuminating
context for discussing these issues because in a pure object-oriented
language such as Eiffel or Smalltalk, once you remove the supporting
machinery, the analysis/design/programming language has essentially
ONE operation: feature call, of the form

	x.f

Let us assume for a moment that f has no precondition (i.e. is always
applicable to an instance of the class in which f was declared).
Then the only problem of software reliability becomes:

/A/
	Can we statically guarantee that for every possible execution
	of x.f at run time, x will be attached to an object capable
	of handling feature f?

Because answering this question positively would require either
or both of 1. and 2. above and hence seems impossible, the most
useful practical solution seems to be to settle for the question

/B/
	Can we statically guarantee that for every possible execution
	of x.f at run time, if x is attached to an object, that object
	will be capable of handling feature f?

The answer in Eiffel to question /B/ is  ``yes in principle; with
Interactive's current compiler, yes in almost every case that counts'',
and we are working to transform this into a ``yes'' without qualification.

[Anyone familiar with axiomatic semantics will recognize in
the difference between /A/ and /B/ something similar to the difference
between total and partial correctness.]

The idea behind /B/ is to cover statically as many as possible of
the potential erroneous cases through type checking,
and to relegate every other case to a single type of error,
feature application to a void reference.

In practice, there remains another erroneous case, the one in
which f does have an object to work on, but the object does not satisfy
the precondition of f (e.g. f is extraction of the real square root
and x is an object representing a real number). Then /B/ becomes:

/B'/
	Can we statically guarantee that for every possible execution
	of x.f at run time, if x is attached to an object that satisfies
	the precondition of f, that object will be capable of handling f?

(Some practical cases that are handled differently at the implementation
level, such as arithmetic overflow or operating system signals, which
theoretically fall in this framework too.)

The situation with respect to precondition violations is the same as
for vacuity tests (checking whether x may be void): everyone would like
to check statically, but that is impossible in the general case
within the current state of program proving technology.

Of course, if we choose /B/ it is still essential to try to check
statically, as extensively as possible, that x will always be
non-void. In many cases this is possible or even trivial, as in

	x.Create;
	x.f ...

Similarly, if we consider /B'/ instead (which is necessary in practice)
we will try to ascertain statically, in as many cases as possible,
that all feature calls guarantee the corresponding precondition.
However:

	+ When it comes to static checking of both non-vacuity properties
	  and preconditions, we have to accept that the techniques that
	  work are NOT those commonly used for static type checking,
	  and that, short of solving 1. and 2. above, there is no universal
	  solution, only individual, specific and partial techniques.

	+ Whenever we have less than perfect faith in the exhaustiveness
	  of the static verification we may have performed, we need to rely
	  on dynamic checking: for every execution of x.f that has not been
	  proved correct, check at run-time that x is non-void and the
	  associated object satisfies the precondition of f.

In dynamic checking does detect anomaly, the only acceptable
solution is to raise an exception, which is why it is so essential
for any attempt at reliable object-oriented software to have a clean
exception handling facility, tied in to the assertion mechanism.

-- Bertrand Meyer
bertrand@eiffel.com

craig@Neon.Stanford.EDU (Craig D. Chambers) (11/12/90)

In article <447@eiffel.UUCP> bertrand@eiffel.UUCP (Bertrand Meyer) writes:
>From <1990Nov9.010930.26493@Neon.Stanford.EDU>,
>craig@Neon.Stanford.EDU (Craig D. Chambers) writes:
>> In Self we are rewriting our code to avoid using a
>> generic nil object, always initializing variables to "real" objects.
>> In some cases we have type-specific null objects that implement all
>> the messages that should be understood by objects of the type,
>> typically by doing nothing and terminating a recursion.
>
>	This is the worst idea I have heard in a
>long time (which means something). When you find that an operation
>is impossible to execute, you just ignore it! What about the
>specification - the contract - that your client expects?

OK.  Let me try to be clear here.  I'm not just calling nil a new
name, and implementing messages that mask errors.  That's clearly
silly, and I try not to advocate silly things.  Give me a little
credit.  This business with type-specific null/empty objects is a
useful technique for writing programs in a good object-oriented
language; I had hoped it was a little more widely known.

Let's consider a linked-list abstraction, implemented with cons-cell
objects and some sort of end-of-list object.  In most implementations
of this abstraction, the end-of-list object is either a void reference
(null pointer) (e.g. Eiffel) or a pointer to the generic nil object
(e.g. Lisp, Smalltalk).  The code for iterating down the list,
starting at a cons-cell, typically tests whether the next element
(cdr) is nil, and if so, returns.  If the next element is non-nil,
then the iteration recursively invokes the iteration function on the
next element of the list.  Here's some Smalltalk code for this
function:

"In class ConsCell"

do: aBlock
	aBlock value: car.
	cdr isNil ifTrue: [ "do nothing " ]
		 ifFalse: [ cdr do: aBlock ].

Most routines in the ConsCell class will have similar tests for nil of
the cdr instance variable.  Plus all users of a list will have to test
for empty list (i.e. isNil) before sending a message like "do:" to the
cons-cell.  This is pretty ugly, and all the testing operations are
contrary to the spirit of object-oriented programming.

A better way of implementing this list abstraction is to recognize
that there are two representations of a list link (ConsCell and
EndOfList) and that both have the same interface.  EndOfList is an
example of a type-specific null object, but it isn't just another name
for the generic nil, since it is a perfectly reasonable representation
of an instance of the list abstraction, namely, an empty list.  Here's
some Smalltalk code for this kind of implementation:

"In class RevisedConsCell"

do: aBlock
	aBlock value: car.
	"continue the recursion with the next list element"
	cdr do: aBlock.

"In class EndOfList"

do: aBlock
	"end of the list; do nothing"
	self.

Both RevisedConsCell and EndOfList are subtypes/subclasses/whatever of
some List abstraction, but are different representations appropriate to
different subsets of the instances of List.  Since EndOfList is a
perfectly reasonable List instance, understanding all the List
protocol, neither the code in the RevisedConsCell class, nor the users
of the list, need to check for some nil object.  This is a more
object-oriented way of implementing lists, and this same style works
for other kinds of "behavior modes" that exist within an abstraction:
expanded-vs.-iconified windows, empty-vs.-non-empty collections, etc.

To preempt the next comment/criticism of this scheme, I'll claim that
it works well even for mutable objects, given the right language
support.  One problem with the EndOfList class above is that it's hard
to add an element to an empty list in place, mutating an empty list
into a non-empty list.  Self offers a natural solution to this problem
using dynamic inheritance.  The list instance is represented by an
empty object that is initially a *child* of the EndOfList object.  To
allow the user to add an element to a (currently) empty list, the
EndOfList object defines the "append:" method that creates a new
ConsCell object containing the new element, and then switches the
parent pointer of the child list instance to point to the new instance
of ConsCell.  No references to the list object need to be updated
(i.e. no "become:" was used), but the list now behaves as if it were a
singleton list object, since it inherits from the ConsCell object.

To find out more about dynamic inheritance and its usefulness for
dynamically-changing behavior modes within an abstraction, you can
read the paper "Organizing Programs without Classes" in the set of
papers available via anonymous ftp from otis.stanford.edu.  It is also
scheduled to appear in an upcoming issue of "Lisp and Symbolic
Computation" devoted entirely to Self.

I hope this discussion is a little more clear on what I mean by
type-specific null/empty objects and why they are a better way of
writing many kinds of abstractions.

-- Craig Chambers

craig@Neon.Stanford.EDU (Craig D. Chambers) (11/12/90)

In article <451@eiffel.UUCP> bertrand@eiffel.UUCP (Bertrand Meyer) writes:
>
>There is something strange in this discussion. In the messages
>by Ralph Johnson and Craig Chambers one gets the impression
>(please correct me if this is a wrong impression) that
>it is possible to guarantee statically that no error will
>ever occur at run-time.

You are so corrected.  I'm not making such a claim (in fact, in my
earlier posting I presented concrete examples that are extremely
difficult to type check statically, let alone verify for
"correctness"), and I'm certain that Ralph Johnson isn't making such a
claim, either.  Again, give us some credit.

> [stuff deleted]
>
>1. Full-fledged program proving.
>2. Unacceptable restrictions on the expressive power
>   of the language (e.g. no references).
> 
> [stuff deleted]
>
> x.f
>
> [stuff deleted]
>
>Then the only problem of software reliability becomes:
>
>/A/
>	Can we statically guarantee that for every possible execution
>	of x.f at run time, x will be attached to an object capable
>	of handling feature f?
>
>Because answering this question positively would require either
>or both of 1. and 2. above and hence seems impossible, the most
>useful practical solution seems to be to settle for the question
>
>/B/
>	Can we statically guarantee that for every possible execution
>	of x.f at run time, if x is attached to an object, that object
>	will be capable of handling feature f?

Why do you insist that void references are required?  I've given you
examples of languages that include neither void references nor generic
nil objects (CLU, Trellis/Owl), and real programs can be written in
them.  Implementations of both are available on various platforms
(although I think you probably have to pay money to DEC for
Trellis/Owl).  Can you present examples that *require* void references
to be practical?  I can think of a few examples that without void
references may be a little uglier or require a slightly more
sophisticated type checker, but allowing void references throughout a
program introduces a potential class of programming errors that is
much more costly that a bit of complexity in a few instances.

My position is that static type checking may be useful for developing
large, reliable systems.  It is certainly not *the* solution, but may
be one of several useful programming tools.  I feel that static type
checking, if it exists, should be as unrestrictive as possible
*without* accepting without notice programs that contain what I would
call type errors.  [I know what I mean by "type error", but I don't
have the time or inclination to define it completely here.]  To me,
this implies contravariant subtyping rules, support for multiple
dispatching, and (as a last resort) the ability of the programmer to
ignore compile-time type errors (of course the implementation will
contain a run-time type test at every ignored type error).  I consider
void references to be an undesirable language feature, since they are
difficult to check statically (thus reducing the reliability of
programs) and offer little additional expressive power in return.  I
agree with most people that subtyping is different from implementation
inheritance, and that the two mechanisms should be linguistically
distinct.  This doesn't necessarily imply that classes cannot be
interpreted as both types and chunks of code, but instead that at
least two kinds of subclassing (subtyping and code inheritance) should
be supported.  I hope that clears up any mistaken impressions of my
position.

-- Craig Chambers

moss@cs.umass.edu (Eliot Moss) (11/14/90)

I have two things to add to this discussion:

(1) First, the idea of multiple representations for individual objects of the
same type, and specifically its application to situations such as the empty
list, has already appeared in the literature, so let's give some authors
credit: 

@INPROCEEDINGS{LTP86,
	AUTHOR = "Wilf R. {LaLonde} and Dave A. Thomas and John R. Pugh",
	TITLE = "An Exemplar Based {Smalltalk}",
	BOOKTITLE = oopsla,
	ADDRESS = "Portland, OR",
	YEAR = 1986,
	MONTH = sep,
	ORGANIZATION = acm,
	PAGES = "322--330",
	SERIES = sigplan,
	VOLUME = "21(11)"
}

It does seem like a reasonable idea in that it cleans up code and takes
advantage of the implicit "type test" performed by the dynamic method lookup
mechanisms of OO languages. And it does so without violate static type
constraints. It is not clear if there is a performance difference between the
techniques. With nil you can detect end of list without having to follow that
last pointer (it has a distinguished magic value, frequently 0; in my
Smalltalk implementation nil is also a static constant rather than a pointer
to the Nil object precisely to speed up the code in a lot of places (we can
still send messages to it, etc.)); without nil you eliminate a number of
conditional checks (but will always have to do dynamic calls). A detailed
study of the relative performance of the techniques would be interesting; I am
not aware of its having been done.

(2) Since I was in on the design of CLU and also participated in Trellis/Owl,
perhaps I can add a little w.r.t. to them as well. First, variables can
potentially be uninitialized; this is supposed to be detected by the run time
system. A compiler can, with simple data flow analysis, sort the variables
into three categories: definitely initialized before use, definitely *not*
initialized before use (produce an error message and code to raise an
exception if that point is reached in the program); and uncertain (requires
run time check and possible exception at run time). The uncertain category
will tend to be pretty rare if reasonable programming style is used. This does
not directly relate to the void pointer question, but I thought it might help
clarify things w.r.t. Bertrand's points about total/partial correctness
analogies, etc.

To do a linked list in CLU you use a "oneof" (or "variant"; the distinction is
not all that relevant here) type, which is similar to a variant record in
Pascal. Such an object has a tag, and there is a tagcase statement for testing
the tag, which allows you to interpret the contents of the oneof according to
its actual tag, in a type safe manner. This avoids the concept of a void
pointer and replaces it with the concept of a void case in an object format.
The CLU approach leads to a nicer type system, but the representation is not
as compact and the coding perhaps more verbose than if we had void pointers.
Some ML compilers are smart enough to use the actual void pointer
representation for its analog of "oneof" when there are only two cases, one of
them void. The exemplar approach does not exactly get around the data
compactness problem -- the "tag" is the type field in the object, used by the
run time for method lookup. Of course, if the type field *must* be there for
other reasons, then there is no *additional* overhead. The end of list object
must also be represented explicitly. But the main thing is that the code is
compact.

Personally, I would not call the end of list object a "don't care". Among
other things, it probably says nasty things if you send car or cdr to it, but
it *does* implement them, so it is type correct in the sense that the method
is guaranteed to exist. If one had full signatures including exceptions, as in
CLU, Trellis/Owl, Modula-3, etc., then car and cdr should be noted as possibly
raising an exception.

Well, I hope this was helpful to some of you anyway .....
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu

bertrand@eiffel.UUCP (Bertrand Meyer) (11/15/90)

Two more comments after Craig Chambers's and Eliot Moss's
last postings.

1. The technique using two variants of list cell descriptions
(two heirs to a class CELL or LINKABLE), one to represent
internal cells and another to represent terminal cells,
can be implemented in any object-oriented language.
I do not find it particularly attractive:

	A. It seems to require at least as much code as
	using a vacuity test.

	B. One may also question the claim that it is
	``more object-oriented''. Are we to understand that
	the conditional instruction
	is not object-oriented? This is going too far.
	Good object-oriented style implies getting rid
	of certain conditional instructions (those which
	discriminate among a number, often large, of
	data types, especially if the list of choices
	is bound to change). The philosophy that I see behind
	Mr. Chambers's method (although he did not express it
	in that way) is that whenever technically feasible
	an explicit test should be replaced by dynamic binding.
	I may be misrepresenting his position, in which case
	he'll correct me; if so, however, this means there is
	room for conditional instructions, and the discrimination
	between void and non-void references (or terminal and
	non-terminal cells) seems to be a legitimate case for
	such instructions.

	C. The prospect of having to write TWO classes for
	every kind of cell (list element, tree node, ...),
	is rather unpleasant.

	Apparently this is easier in Self because it's
	not really classes but objects.
	But here I quit. Object-oriented programming
	as I understand it has little to do with objects;
	it is really about classes. Why one should try to have
	O-O Programming without classes, the major facility of
	the method, is beyond my understanding.  I should probably
	have kept my mouth shut, but it's too late.
	(The term ``Class-based analysis, design and programming''
	is what I personally would use to characterize the field.
	By the way, I have never understood the rationale behind
	distinctions such as ``object-based'' vs. ``object-oriented''.
	If we are using English, it would seem that terms such as
	``based'', ``oriented'' and the like should retain a meaning
	which is reasonably close to their usual sense, and technical
	terms such as ``object'' should be used in their accepted technical
	sense. What does the distinction between ``based''
	and ``oriented'', if any, have in common with the distinction
	between encapsulation languages such as Ada, supporting a certain
	module structure, and approaches such as Smalltalk's or Eiffel's
	supporting a module and typing structure based on classes?
	How do Ada ``objects'' differ from those of Pascal?
	In what way does the use of such terminology improve our
	understanding of the issues? But by now I have probably opened
	too many cans of worms anyway.)

2. On the topic of errors: It is nice to see that apparently
everyone agrees that there may remain errors at execution-time.
The next question is: which ones? I have described a framework
(Eiffel's) in which, assuming a perfect compiler, only three kinds
of errors may remain:

	A. Attempt to call a feature on a void reference.

	B. Assertion violation.

	C. Hardware or operating system fault (e.g. arithmetic overflow).

In a theoretical study, case C would disappear, being covered by
A and B.

I believe this is clean, intellectually satisfying, practically
convenient, and easy to explain. It is certainly not in the only
possible solution.  But it is essential that anyone claiming that
language or environment X ``solves the problems'' of errors of
a certain type (e.g. vacuity of references) should also state what
are the categories of possible run-time errors that DO remain. 
 
-- 
-- Bertrand Meyer
bertrand@eiffel.com

craig@Neon.Stanford.EDU (Craig D. Chambers) (11/15/90)

In article <454@eiffel.UUCP> bertrand@eiffel.UUCP (Bertrand Meyer) writes:
>	Apparently this is easier in Self because it's
>	not really classes but objects.
>	But here I quit. Object-oriented programming
>	as I understand it has little to do with objects;
>	it is really about classes. Why one should try to have
>	O-O Programming without classes, the major facility of
>	the method, is beyond my understanding.  I should probably
>	have kept my mouth shut, but it's too late.

I'm glad you didn't.  There seems to be a common misconception that
languages without classes cannot be used for normal kinds of
object-oriented programming, in a "structured" way.  This, I'd argue,
is not the case.

The linguistic facility in many object-oriented languages called a
"class" provides a number of features:

    Repository for shared behavior (e.g. inherited methods)

    Repository for shared state (e.g. class variables)

    Template for format of instances (e.g. instance variables)

    Object containing instance-independent behavior and state (e.g.
    class methods, class variables)

And for statically-typed languages:

    Specification of interface (e.g. signatures of methods, deferred
    methods)

All of these features are useful, and any reasonable object-oriented
language should provide these mechanisms somehow.  Class-based
languages tie these features together into one all-consuming
linguistic construct, the class.  This in itself is not a real
problem.  The problem arises when one class is defined as a subclass
of another.  In most existing class-based languages, it is not
possible to inherit only part of the aspects provided by the
superclass; it is impossible in most class-based languages to inherit
the behavior of a superclass (i.e. the methods) but not the
representation.  For example, suppose a new Rectangle class wants to
inherit most of the implementation of the existing Polygon class, but
replace the list-of-vertices representation with four integers.  In
most existing class-based languages, the Rectangle class would be
forced to accept the vertices instance variable as well.  Trellis/Owl
is an example of a class-based language that does not suffer from this
problem (at least as documented in the manuals I have; perhaps the
implementation is not this smart yet).  Languages with extensive
metaclass and reflective support, like CLOS, can probably solve this
problem as well by doing extra programming at the metaclass level, but
this seems a bit complicated for what should be a simple task.
Another example of two facilities being intertwined in some existing
class-based languages is the inheritance of implementation and
interface as a unit (i.e. subtyping = submoduling). The specification
part of classes must be inherited if the behavior part is inherited,
and vice versa.  This issue has been beaten to death already.

In a classless language like Self, you'd use separate objects to
provide each the above facilities.  One object holds shared behavior
and shared state for an abstraction (called the traits object).
Another object serves as a template for instances of the abstraction
(called the prototype).  Depending on the mood of the Self programmer,
instance-independent behavior may be included in the traits object or
moved out into a separate object.  Objects are also used as name
spaces, so that a simple message like "traits rectangle" will evaluate
to the rectangle traits object.  Object inheritance subsumes the
subclassing hierarchy, the instantiation hierarchy, and even the
variable name scoping rules.  Since different objects support
different aspects of classes, refining objects can inherit behavior
without representation, or vice versa, by inheriting from only the
appropriate objects.  In Self defining two or three objects for what
would normally be part of a single class declaration is a bit more
verbose (by a few lines), but future classless languages need not have
this problem.

People who've seen some of our talks on Self, or who have read some of
our papers, sometimes feel like we've just re-invented classes and
called them traits objects.  In a way, they're right: traits objects
are similar to class objects in that both fulfill the need to provide
shared behavior.  But classes bring along lots of other baggage (like
representation) that may not always be needed or desired.  What we've
really done is to break up classes into several more primitive pieces
that can be combined in useful ways that aren't possible in typical
class-based languages.

My feeling is that "ideal" class-based languages and "ideal" classless
languages turn out to be very similar.  Ideal class-based languages
would allow the different facilities provided by the class construct
to be mixed and matched in subclasses (thus extending the idea of two
kinds of inheritance links that I proposed earlier to allow
inheritance of particular aspects of classes).  Ideal classless
languages might be able to define some objects as "abstract" (implying
use as a traits object) and other objects as "concrete" (implying use
as a prototype object); the syntax for defining these objects could be
designed so that a single declaration would define several related
objects.  In the end, programs in the two kinds of languages would end
up looking remarkably similar, and similar kinds of power and
flexibility are afforded by both languages.  One remaining difference
is that class-based languages include an instantiation hierarchy while
classless languages do not; I'm not sure how important this
distinction is.  Another difference is that classless languages
support inheritance from other objects (either abstract or concrete
objects), while class-based languages only support inheritance from
classes (analogous to inheriting from only abstract objects).  This
distinction may be more interesting, and in the end ideal classless
languages may win out over ideal class-based languages for this very
reason.

-- Craig Chambers

P.S.  A less speculative paper on this topic, "Organizing Programs
without Classes," is available via anonymous ftp from
otis.stanford.edu.

ddean@rain.andrew.cmu.edu (Drew Dean) (11/16/90)

From a relative neophyte, if a subclass only inherits parts of its superclass
(ie. some subset of the methods and/or variables), how can an instance
of the subclass be used anywhere an instance of the superclass can be used ?
In my (comparably primitive) understanding of OOP, this is important.  Without
this property, what good are abstract classes, which seem to be a useful 
organizational tool ?

Drew Dean
Drew_Dean@rain.andrew.cmu.edu

barmar@think.com (Barry Margolin) (11/16/90)

In article <1990Nov15.011702.25087@Neon.Stanford.EDU> craig@Neon.Stanford.EDU (Craig D. Chambers) writes:
>  In most existing class-based languages, it is not
>possible to inherit only part of the aspects provided by the
>superclass; it is impossible in most class-based languages to inherit
>the behavior of a superclass (i.e. the methods) but not the
>representation.  

While this may be strictly true, it is easily solved in class-based
languages with multiple inheritance, such as CLOS.  In such languages you
can put the representations in one set of classes and the behavioral
implementations in another set of classes.  You would then define new
classes that inherit one from column A and one from column B (am I coining
a new metaphor, related to Flavors' use of "mixins"?), and only make
instances of these classes.

Implementing things this way requires extreme care.  Normally, the code
implementing the behavior of a class likes to have direct access to the
representation of the class.  In many OO languages the components of the
representation ("slots" in CLOS, "instance variables" in Flavors and
Smalltalk, "members" in C++, etc.) can be accessed directly as variables in
the methods (in CLOS this requires use of the WITH-SLOTS macro, as methods
are not attached to a single flavor so there is no built-in notion of
"this" or "self").  In order to cope with the flexibility of the Chinese
Menu style of OO programming you must take as much care in designing the
internal interfaces of the classes as you do in designing the external
interfaces.

Separating out the interface specification can be done similarly, but it
may require extra work.  There can be a third set of classes that actually
provide the externally visible interfaces.  The interfaces to the above two
sets of classes would not be available outside these three sets of classes
(either using the language's visibility mechanisms or simply by adhering to
naming conventions, e.g. calling the external interface FOO and the
internal one FOO-INTERNAL).  A particular specification class would only
provide the interfaces that are part of the specification it models, but
its implementations would simply pass the call on to the corresponding
internal interface.  In languages with explicit abstract superclasses you
may be able to avoid the dual interfaces, and simply specify the interface
in the abstract superclasses.  Either way, the specification classes
become column C in the class menu.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

sakkinen@tukki.jyu.fi (Markku Sakkinen) (11/16/90)

In article <11113@pt.cs.cmu.edu> ddean@rain.andrew.cmu.edu (Drew Dean) writes:
>From a relative neophyte,

Keep on asking these questions!  I suppose that the object-oriented people
(like many others) live under a small but constant threat of
"the emperor's new clothes" (H.C. Andersen).

>        if a subclass only inherits parts of its superclass
>(ie. some subset of the methods and/or variables), how can an instance
>of the subclass be used anywhere an instance of the superclass can be used ?
>In my (comparably primitive) understanding of OOP, this is important.  Without
>this property, what good are abstract classes, which seem to be a useful 
>organizational tool ?

Such partial inheritance can only be used for inheriting implementation
(what I called "incidental inheritance" in my ECOOP'89 paper),
thus more typically with concrete than abstract superclasses.
An instance of the subclass can _not_ be substituted for an instance
of the superclass.

In my opinion, inheriting only parts of superclass, say C,
into a subclass, say D, is a highly dubious idea even for this purpose.
One should rather have defined a fully-fledged class B containing
those parts first, then both C and D as subclasses of B.
It would be nice to have the kind of generalisation facilities
that Claus Pedersen suggested in his OOPSLA'89 paper:
that one could derive _super_classes approximately as easily
as existing languages allow us to derive _sub_classes.

Markku Sakkinen
Department of Computer Science and Information Systems
University of Jyvaskyla (a's with umlauts)
Seminaarinkatu 15
SF-40100 Jyvaskyla (umlauts again)
Finland
          SAKKINEN@FINJYU.bitnet (alternative network address)

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/19/90)

On 15 Nov 90 16:22:19 GMT, ddean@rain.andrew.cmu.edu (Drew Dean) said:

ddean> From a relative neophyte, if a subclass only inherits parts of
ddean> its superclass (ie. some subset of the methods and/or variables),
ddean> how can an instance of the subclass be used anywhere an instance
ddean> of the superclass can be used ?  In my (comparably primitive)
ddean> understanding of OOP, this is important.

Well, this type of polymorphism does not arise in conventional OO
languages, where the algebra of interface and implementation reuse is
just inheritance via prefixing, and links together both, and only for
supersetting, not for subsetting -- often the reason for this is, more
or less explicitly, precisely to avoid this problem.

But if you look at less restrictive languages you see that the algebras
of interface and implementation reuse can be considerably more flexible,
and less interlinked; in another article Barry Margolin points out that
the problem is...

barmar> ... easily solved in class-based languages with multiple
barmar> inheritance, such as CLOS.  In such languages you can put the
barmar> representations in one set of classes and the behavioral
barmar> implementations in another set of classes.

What he really means is that in CLOS (like in some sense in Smalltalk)
you have a functional interface to both methods and attributes, and you
can obtain a new class interface by merging the interfaces (or subsets
thereof) of one (or more) class that gives you the attributes interface
with one (or more) that gives you the methods interface.

The problem, for data traits or attributes, is naturally how much of the
(name,offset,type) table that describes the components of the state of
an object we need to keep at runtime. At one extreme in C and C++ 1.x
and similar languages all attributes have characteristics known at
compile time, even if their number may change (a subclass of A will have
more attributes, but you are forbidden from accessing these, except via
virtual functions, which indeed are accessed via a table)

With virtual multiple inheritance, as in C++ 2.x (or Eiffel), some
attributes have offsets that are variable at run time, so we need to
store the offset at run time; in some languages the *type* of an
attribute is not known at runtime, as in Smalltalk.

In the fully general case the state of an object can be dynamic in all
three dimensions; the number of attributes may be variable, their
offsets (representations!) may vary, and their types too.  Now consider
the domain of applicability (or polymorphism) of a method; a method may
depend at the very least on the names of the attributes it uses in the
object, or also on their offsets (representations!), or also on their
types.

The most extreme cases can be easily implemented; if the set of
attributes of an object are known at compile time and so are their
offsets (the C/C++ 1.x case above) we do not need any runtime table; we
can match a method against an object by comparing their interfaces in
the compiler. If nothing is know at compile time, the solution is
equally easy; we associate a full (name,offset,type) table to each
object, and we match it with that of the method and its implementation
at runtime. A method (or its implementation) can be applied to any
object whose interface is 'stronger' than that it has been defined for.

    Whether it is a mistake to associate a stronger implementation
    to a weaker method interfaces is largely a matter of taste. On the
    other hand applying methods with a stronger interface to objects
    with a weaker one is a partial function, and its validty must be
    determined at runtime, just as applying a method with a stronger
    specification to an object with a weaker one. Example of the latter:
    applying 'factorial' to a numeric object that is not guaranteed positive
    and integral. Example of the former: applying 'square(int)' to a
    numeric object that is declared as 'union(int,float)'.

The runtime table based solution allows for easy mix and match of
interfaces, by mixing and matching the tables; but, especially for
attribute access, a table lookup on every use is not attractive, except
to AI people, which have been doing this forever (alists, frames, ...),
also because in their applications a trait which is an attribute may be
*dynamically* reimplemented as a function, or viceversa, and so
table/alist based approaches are nearly essential. But in many cases
the full generality is not necessary, even if static foreknowledge is
not possible either.

Implementation techniques for the intermediate cases are being
researched. Things like Self and automatically typed Smalltalk are
examples of what can be done to reduce the cost of full generality when
it can be constrained either statically (at least part of the table for
an object can be deduced at compile time) or dynamically (it is expected
that the contents of the table do not change frequently).

There are other tricks; I am terribly sorry that I do not remember the
exact reference, but in some recent 1989 SIGPLAN conference on languages
and their implementation there is a paper on laying out different
objects types in memory so that subobjects that have the same naem and
type in all those object types are at the same offset relative to the
object pointer. This naturally means that the access functions to the
subobjects are constant and known at compile time and are the same for
all those object types.  The goal is to minimize holes, of course; we
can make the layouts of two classes compatible in the sense of having
attributes of the same name and type at the same offset by creating
holes, but this is hardly desirable.

In the simple inheritance case implemented with prefixing avoiding this
is trivial; with multiple prefixing things become more difficult (C++
has to resort to indirection via hidden pointers for virtual base
classes). The paper I mention above demonstrates that a fairly simple
trick, using postfixing (putting subojects at negative offsets from the
object pointer) can solve the problem in many interesting cases with
grater efficiency, that is no holes and no extra levels of indirection.

ddean> Without this property, what good are abstract classes, which seem
ddean> to be a useful organizational tool ?

Abstract (deferred) classes are no good of course. They are just a hack
to decouple a bit the algebra of interfaces from that of
implementations, which are otherwise coupled in inheritance.

The best thing sis instead to uncouple them completely, and have
different algebras for implementations and interfaces, and to have more
general algebras as well, not just supersetting/subsetting.

As to this I would like to point you also to some work in the sw
engineering reuse field on interface and component description
languages, which allow you to describe how to mix and match interfaces
(module A has an interface that is the union of those of modules B and C
minus that of D).

You could contact Bob Gautier (rjg%uk.ac.aber.cs@nsfnet-relay.ac.uk)
which has done some work on this issue and on CDL, an interface
description language, which I think is interesting.

Some other interesting work is being done by the functional language
people when they try to define clean type systems. Polymorphism is very
important to them, and they have expended a lot of effort into
dealing with it.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

davis@barbes.ilog.fr (Harley Davis) (11/20/90)

In article <454@eiffel.UUCP> bertrand@eiffel.UUCP (Bertrand Meyer) writes:

   2. On the topic of errors: It is nice to see that apparently
   everyone agrees that there may remain errors at execution-time.
   The next question is: which ones? I have described a framework
   (Eiffel's) in which, assuming a perfect compiler, only three kinds
   of errors may remain:

	   A. Attempt to call a feature on a void reference.

	   B. Assertion violation.

	   C. Hardware or operating system fault (e.g. arithmetic overflow).

"Assertion violation" is a little vague as a type of error.  What
doesn't it cover?  In particular, why can't it include type assertions
(or other static errors normally caught by the perfect compiler) and
void references?

-- Harley Davis
--
------------------------------------------------------------------------------
Harley Davis			internet: davis@ilog.fr
ILOG S.A.			uucp:  ..!mcvax!inria!ilog!davis
2 Avenue Gallie'ni, BP 85	tel:  (33 1) 46 63 66 66	
94253 Gentilly Cedex		
France

bertrand@eiffel.UUCP (Bertrand Meyer) (11/20/90)

From <DAVIS.90Nov19190243@barbes.ilog.fr> by davis@barbes.ilog.fr
(Harley Davis):

> "Assertion violation" is a little vague as a type of error.  What
> doesn't it cover?

An assertion violation is a routine precondition that is not
satisfied on routine entry, a routine postcondition not satisfied
on routine exit, or a class invariant not satisfied in either case.
(There are also other cases involving loop invariants and check
instructions.)

This is discussed in detail in various references
(``Object-Oriented Software Construction'', chapter 7;
``Eiffel: The Language'', chapters 12 and 13;
``Programming by Contract''). On the other hand,
since I am the author of these references, I am not
the best judge of just how ``vague'' they are.

> In particular, why can't it include type assertions
> (or other static errors normally caught by the perfect compiler)
> and void references?

If some condition can be expressed as a type constraint and
caught statically we don't need assertions to express it.
(There was a comment of a similar nature in a recent message by
LaPalice@tauto.fr, although I can't find the exact reference.)
-- 
-- Bertrand Meyer
bertrand@eiffel.com

johnson@m.cs.uiuc.edu (11/21/90)

pcg@cs.aber.ac.uk said:

>Abstract (deferred) classes are no good of course. They are just a hack
>to decouple a bit the algebra of interfaces from that of
>implementations, which are otherwise coupled in inheritance.

I am shocked!  What can this mean?  Abstract classes are one of
the most important design techniques in object-oriented programming!

Reading on, I decided that Piercarlo and I have different definitions
of abstract classes.  I think that he defines an abstract class as
one that has NO implementation, in which every method is deferred.
However, my definition of an abstract class is that it is a class
that does not have a COMPLETE definition, in which there is SOME
method that is deferred.  Abstract classes according to my definition
are a crucial technique for reusing classes.  An abstract class is a 
template for its subclasses, and replaces other ways of reusing design 
such as code skeletons.  I agree with Piercarlo that abstract classes
should not be used just to represent types (though of course if you
are programming in C++ then you have no other choice).  Thus, we
probably don't really disagree.

Ralph Johnson -- University of Illinois at Urbana-Champaign

davidm@uunet.UU.NET (David S. Masterson) (11/25/90)

>>>>> On 20 Nov 90 06:45:55 GMT, bertrand@eiffel.UUCP (Bertrand Meyer) said:

Bertrand> An assertion violation is a routine precondition that is not
Bertrand> satisfied on routine entry, a routine postcondition not satisfied on
Bertrand> routine exit, or a class invariant not satisfied in either case.
Bertrand> (There are also other cases involving loop invariants and check
Bertrand> instructions.)

Hmmm.  A question from relational database constraint resolution.

Within a relational database, one can define constraints that updates to the
database must satisfy before the update is allowed.  Might this be considered
an assertion (most likely a routine postcondition)?

If so, its possible to define constraints on two (or more) different tables
(objects?) that are mutually exclusive (updating any of the tables violates a
constraint).  I can't think of an example at the moment of when this would
occur (its usually rather complex), but the resolution of the problem is to
check constraints at transaction commit time (when all updates are done)
rather than after every update.  The user of the objects makes the
determination whether to consider each update atomic or take all updates
together as atomic.

Would this be expressable under assertion methodologies (control of assertion
checking being outside of the object) or is this something that is just
avoided in development of objects?
--
====================================================================
David Masterson					Consilium, Inc.
(415) 691-6311					640 Clyde Ct.
uunet!cimshop!davidm				Mtn. View, CA  94043
====================================================================
"If someone thinks they know what I said, then I didn't say it!"