pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/18/90)
Another wonderful treatise from your very own! :-). This is in reaction to the recent discussion on overloading, and to my perplexities on seeing unions called 'space saving structs' in both Lippman's and the C++ ARM. Please note that followups are to comp.lang.misc, because while there are OO, sw. eng. and C++ aspects to this discussion, it is really about language design. ACHIEVING REUSE First of all, let's remember that what we want to achieve is reuse of interface/protocol The way a software module is accessed. reuse of semantics/behaviour The effect of executing a software module (specification). reuse of implementation/representation The way the software module is actually realized. The reason is, as Dijkstra says, the limited capacity of our mind to deal with structural complexity. As he says, our main weapon in overcoming our limitations on dealing with complexity is reuse via abstraction. It is also more economical... USING ABSTRACTION In order to achieve reuse we want to be able to use abstraction (parametrization), and therefore define abstract interfaces, semantics, implementations. In particular we may want to do abstraction over control and data, but for the time being we will not consider the two seaprate issues in detail. Once we are able to support abstractions, we also want some algebra for abstractions, so we can mix and match them, e.g. we want to have an algebra for abstractions over control and data, so that we can do multilevel abstractions and achieve even greater reuse. For example, procedures are a way to abstract control, and structs are another way to abstract data. A procedure *definition* is reuse of implementation; every time we call a procedure its code is abstracted with respect to the actual data it has to operate upon. OVERLOADING Now, let's consider the problem of reusing a procedure interface; one way to reuse is to make its *declaration* parametric with respect to not just the values of the data it operates upon but also their types. This is called overloading. The most common overloading is static; the programmer defines a series of procedure interfaces which differ in the type of one or more arguments, and the right interface will be automatically chosen by the implementation (and the right implementation associated to that interface) based on the type of the arguments. Dynamic overloading is where the type of the arguments is not known until the procedure is invoked, but it is essentially similar. GENERICITY A different thing is genericity -- in it the implementation of a procedure is parametric with respect to the type of its arguments. For a procedure to be generic its interface must be overloaded on some type, and all the operations used in the procedure implementation must also be overloaded on the same type. In this way the implementation is parametric on that type as well. Dynamic genericity (also called unrestrained polymorphism) is where the interface of a procedure does not depend on the type of its arguments, and all the operations used in the procedure implementation are also dynamically generic. So so far we have overloading, which is a property of a procedure interface, and genericity, which is property of a procedure implementation. We also have static and dynamic versions of each, depending on whether the parameter over which we abstract is a compile time constant or not. We will not discuss genericity much in the rest of this treatise, as our main interest will be overloading, but usually the same arguments apply similarly to both. OPEN AND CLOSED ABSTRACTION We have another conceptual category to discuss before turning to examples, and it is whether the abstraction device we use is open or close ended. Abstraction is open ended where we cannot limit to a finite set the possible values over which we abstract, and is close ended where it is known the possible finite set of values over which abstraction may be instantiated. For example, static overloading is, in the context of a program, close ended, in that by looking at all the source modules of a program where a statically overaloded interface is used it is possible to determine the full set of types over which a procedure interface is abstracted. The same is true of dynamic overloading but again if and only if the full source of a program, whether or not there is a use of a dynamically overloaded procedure interface, is available. If we have strict separate compilation static overloading is close ended if we accept some limitations, but dynamic overloading must be open ended. EFFICIENT ABSTRACTION Abstraction to work in practice must be efficient, i.e. economical of resources. In particular better efficiency is achieved by having greater information about the context of the abstraction rather than less, because this allows for early resolution of as many issues as possible at the time the abstraction is defined rather than used, thus saving on run time space and time. For example, inlining allows early resolution at compile time of procedural abstraction, with potentially significant benefits where early resolution is advantageous. I have long reckoned that good compilers should be first and foremost symbolic execution engines. Now there is an interesting efficiency question related to the open or close endedness of overloading (and genericity). If overloading is close ended identification of the actual interface inteded is so much easier and potentially faster. There are some interesting design decisions to to make here: 1) Do we want open abstraction at all? 2) How closed do we want closed abstraction? 3) How do we express the allowed set in the case of closed abstraction? 4) What about dynamic overloading? Different language features give different answers to these questions. SOME LANGUAGES AND INTERFACE/TYPE ABSTRACTION/REUSE Classic Lisp gives close ended dynamic overloading and genericity. Since the set of types cannot be extended by the programmer abstraction is by necessity close ended. On the other hand primitive operations mostly work on any object type (for "symmetry" many Lisp implementations allowed CAR and CDR on atoms as well), and therefore a procedure that uses them is more or less generic. The type of an object is only known at run time, unless opportune hints are given to the compiler. There exist runtime type predicates because of this. Smalltalk gives dynamic overloading and genericity and open abstraction. When you are given an object identifier you are not constrained in any way as to the protocol you can apply to it, whether in time or type. Clever implementations try to deduce constraints based on usage patterns at compile time, or cache constrainsts at run time expecting time wise correlated reuse of the same concrete interface and implementation. Smalltalk constrains overloading of interface to happen only on the first parameter to the procedure interface (self). Eiffel gives dynamic overloading and genericity and closed abstraction on both. Abstraction is closed in two different ways; the interface to an object is constrained to be a superset of that of its superclass, and genericity is constrained because the set of possible types is also limited by listing (the builder) all the types actually present in the entire program that satisfy a given procedure's implementation interfaces. Algol 68 and C give dynamic overloading, static genericity and closed abstraction on both by using *unions*. A value of a union type can belong to any of the united types, but only to those explicitly listed as such; a tag tells a procedure the current type of the union object. Thus once can write an implementation that is Algol 68, Ada, C++ give static overloading, open ended using procedure interface abstraction over types, but genericity is not supported with same mechanism because overloading must be strictly on the interface only (static type resolution). Simula 67 and C++ offer both static and dynamic overloading, no genericity (templates are not yet official), partially close ended abstraction. Dynamic overloading is done via the virtual mechanism, and close endedness is assured by the requirement that (all in Simula 67 that only has member functions, member function's in C++) dynamic overloading be restricted by the derived-from relationship. In particular C++'s abstract base classes restrict dynamic overloading to all classes whose interface is a superset of that of the base class. In the case of a non abstract base class the data representation must also be a superset of that of the base class. Note that C++'s partial closededness of the abstraction applies *only* to the interface and data representation; it does not apply to the implementation, which is open ended, because it is now known, because C++ supports strictly separate compilation, which classes will be derived from a base class and which of them redefine the implementation of the overloaded interface. For non virtual procedures this is not terribly important because it is a mistake not to make this information available at the compiler before run time, but for virtual procedures it means that interface/implementation matching must be done in part at run time, even if given the fact that the interface is statically known this is not as expensive as a totally unrestricted system. EXAMPLES Just to be clearer, I will give some (contrived, and taking some liberties with syntax) examples. Suppose we want to reuse the 'float areaOf(entity)' interface. We could have struct Square { float side; }; struct Circle ( float radius; }; float areaOf(Square s) { return s.side*s.side; } float areaOf(Circle c) { return c.radius*c.radius*3.14159; } which illustrates close ended static overloading, no genericity. struct Square { float side; }; struct Circle ( float radius; }; struct Shape { char t; union { Square s; Circle c; }; }; float areaOf(Shape *x) { switch (x->t) { case 'S': return x->s.side*x->s.side; break; case 'C': return x->c.radius*x->c.radius*3.14159; break; } } This is close ended dynamic overloading, static genericity for users of this procedure. struct Shape { virtual float areaOf() = 0; }; struct Square : Shape { float side; }; struct Circle : Shape ( float radius; }; float Square::areaOf() { return side*side; } float Circle::areaOf() { return radius*radius*3.14159; } Dynamic overloading, partially open ended, reuse of interface. Note that when it is known that a pointer points to a leaf class, optimizations are possible. The reason is that overloading is close ended in leaf classes. Note also that in derivation the union tag is the so called vtbl-pointer, and the vtbl is the strict equivalent of the switch in the previous example. struct Square { float side; }; struct Circle ( float radius; }; /* virtual indicates dynamic overloading allowed */ float areaOf(virtual Square s) { return s.side*s.side; } float areaOf(virtual Circle c) { return c.radius*c.radius*3.14159; } Square s; Circle c; /* this implies type is deducible from a 'void *' at runtime */ void *shape = (random0or1()) ? &Square : &Circle; float a = areaOf(shape); This, in the presence of strict separate compilation, cannot be in general constrained (the assignment to shape and one of the overloadings of areaOf might be in different files), and so is open ended, because 'void *' gives no type indication. Objective C style. Incidentally, all this makes for a good argument that void is not a primitive type at all, but should be defined as 'union {};'. It also suggests that one might want open unions, i.e. unions that define only the interface, or a set of base types that can be extended. This would obviate the need for abstract base classes in C++. Of course just like struct is an abstraction operator that realizes the cartesian product of some types, union realizes the union of some types. We could have a rule that any procedure defined on a type united in a union can be applied to a union pointer or reference, with suitable checks that it is either dynamically or statically resolvable. My thinking on the issue is not yet definite; I think that it is not impossible with some cleverness to define a union type constructor that also does the right thing with respect to interfaces, static or dynamic overloading, and genericity of implementation. I am considering how to cover the spectrum between static closed overloading, passing thru traditional closed unions, open derivation, to full dynamic open overloading, using a single algebra with suitable compiler hints. In way of principle the Smalltak/Objective C/CLOS/SELF advocates are right; we can just have full open dynamic reuse of interface and implementation and then it is the implementor's task to analyze the program and deduce the safety and efficiency constrainsts, either in time or applicability, that staticize or restrict either. Unfortunately the relative technology is not yet quire ready (SELF, some CLOS/flavours/PCL), and in any case I am in favour of programmer supplied constraints and smaller compilers, because I reckon that this is both more cost effective, safer, and more explicit than leaving such information implicit to be synthetized. Of course fully orthogonal facilities for separately defining interface, implementations and semantics abstractions, and the relative algebras, would be wonderful and ideal, but I think we still have a long way to go (I am busy doing other things, that is :->). Research is still needed in better insight in these issues. I hope I have however provided some insight on the more or less hidden rationales and consequences and alternatives for providing reuse of interface (and also of implementation) that are currently popular. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk