[net.lang] Type Checking in Smalltalk-80

pavel@RICE@cornell.UUCP (pavel@RICE) (12/05/83)
From: Pavel Curtis <pavel@rice>
To: net-lang@CORNELL

    Date: Fri, 2-Dec-83 11:22:49 EST
    From: hou3c!ka
    Subject: Re: Anti-CLU... Anti-Strong-Typing...
    Newsgroups: net.lang
        
    Last I heard, Xerox was experimenting with compile time type checking
    in Smalltalk.  The goal was to detect programmer errors at compile time
    and to use the type information to generate more efficient code.  The
    type information could be incomplete or omitted so that it would still
    be possible to implement sets, sort routines, and the like.
        
    It is interesting that Ada took the concept of strong typing from Pascal
    and added features like overloading and generics which make it possible
    to come closer to an "object oriented" style of programming while Small-
    talk is moving in the direction of compile time type checking.  Perhaps
    we are witnessing a convergence of ideas here?
				    Kenneth Almquist

I should speak up here.  The Software Concepts Group at Xerox PARC (aka the
Smalltalk Group) is, indeed, investigating compile-time type inference and
checking for the Smalltalk-80 language.  The key question at present is
whether or not it is possible to do an acceptably-good job of type
inference without any programmer-supplied declarations.

(Type inference is the process of discovering what types a given
variable or expression will take on during execution of the program.
In languages with normal declarations, such as C, Pascal, Ada and
others, this problem is trivial.  In languages which do not declare
their types, or declare them in more than trivial ways, such as
Russell, Smalltalk, SETL and Lisp, it is far more difficult and, in
some cases and depending upon one's definition of `type', perhaps
undecidable.)

The approach I am taking to the type-inference problem in Smalltalk is to
define a program with system-supplied declarations to be `type correct' if
and only if we can guarantee that, at no time during execution of that
program, a message will be sent to an object which does not understand it.
Please note that it is easy to show that, for any type-inference algorithm
for Smalltalk (or SETL or Lisp), there will be programs which the algorithm
will claim are not type correct and yet will never produce a `message not
understood' error at runtime.  That is to say that the problem of inferring
exact types in these languages is undecidable.  What we are after,
therefore, is an algorithm which performs `well enough', which produces an
`acceptably small' number of false alarms.

In order to achieve this goal, we need a far more flexible and expressive
system of types than is provided by any of Pascal, ALGOL, C, or even CLU.
We need types which can be parameterised by any other type, the ability to
describe functions which take arbitrary functions and arguments and produce
arbitrary types of functions as output.  We need function types whose
result-types are dependent in arbitrarily-complex ways upon the argument-
types.

It is this complex and subtle system of types which makes the problem
interesting.

Now, after all this, to correct the statement in Kenneth Almquist's article
which got me started:
                                                                      The
    type information could be incomplete or omitted so that it would still
    be possible to implement sets, sort routines, and the like.

When one has a type system as described above, there is no need to
partially or totally omit any type information at all.  Indeed, the system
allows such expressive power that it is possible to completely and
confidently check all of the current Smalltalk system with only a few,
truly minor, exceptions.  Please note that these exceptions have absolutely
nothing to do with the kind of polymorphism spoken of in Almquist's note.
They occur in low-level methods for essentially peeking and poking the
values of another object's instance variables, a facility needed by the
debugger and a few other low-level utilities.  \All/ of the rest of the
system can be checked within this framework.

    It is interesting that Ada took the concept of strong typing from Pascal
    and added features like overloading and generics which make it possible
    to come closer to an "object oriented" style of programming while Small-
    talk is moving in the direction of compile time type checking.  Perhaps
    we are witnessing a convergence of ideas here?

I think that the convergence we are witnessing is a desire for the great
flexibility of the object-oriented and inheritance paradigms without the
loss of the confidence and opportunity for optimisation granted by the
ability to statically check program types.  As time goes on, it will become
more and more common for all languages, \especially/ the so-called very
high level ones, to accomodate a great deal of static analysis; it is
helpful and indeed necessary for producing large correct programs.

Another thing which is occuring is an even greater extension of the notion
of data-type than the one I alluded to above: data-types as predicates.
Greg Nelson at PARC is working on a language which he calls Juno in which
all data belongs to a single domain: integers, reals and Lisp-like atoms,
closed under the formation of ordered-pairs (as in the CONS-nodes of
Lisp).  In this language, it is impossible to determine by examination the
unique `type' of a given value.  Types are predicates and any number of
predicates are true of any given item.  The interesting property of this
approach is that semantic-constraint information can be encoded in a
data-type specification, not just structural information.  Thus, one could,
for example, define the type `ReducedFraction' as a pair of mutually-prime
integers.  I know of no other language which can express such ideas.
Clearly, one could specify one's types strongly enough to express the
correctness conditions of the program as a whole.  To statically check this
language depends upon the system employing some form of automatic theorem-
prover (such as that of Boyer and Moore) and, when that prover fails,
requiring the user to submit his own proof.  A fascinating concept.

        Pavel Curtis