[comp.software-eng] The state of the union

pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/18/90)
Another wonderful treatise from your very own! :-). This is in reaction
to the recent discussion on overloading, and to my perplexities on
seeing unions called 'space saving structs' in both Lippman's and the
C++ ARM.

Please note that followups are to comp.lang.misc, because while there
are OO, sw. eng. and C++ aspects to this discussion, it is really about
language design.

		ACHIEVING REUSE

First of all, let's remember that what we want to achieve is

reuse of interface/protocol

   The way a software module is accessed.

reuse of semantics/behaviour

    The effect of executing a software module (specification).

reuse of implementation/representation

    The way the software module is actually realized.

The reason is, as Dijkstra says, the limited capacity of our mind to
deal with structural complexity. As he says, our main weapon in
overcoming our limitations on dealing with complexity is reuse via
abstraction. It is also more economical...

		USING ABSTRACTION

In order to achieve reuse we want to be able to use abstraction
(parametrization), and therefore define abstract interfaces, semantics,
implementations.  In particular we may want to do abstraction over
control and data, but for the time being we will not consider the two
seaprate issues in detail.

Once we are able to support abstractions, we also want some algebra for
abstractions, so we can mix and match them, e.g. we want to have an
algebra for abstractions over control and data, so that we can do
multilevel abstractions and achieve even greater reuse.

For example, procedures are a way to abstract control, and structs are
another way to abstract data.

A procedure *definition* is reuse of implementation; every time we call
a procedure its code is abstracted with respect to the actual data it
has to operate upon.


		OVERLOADING

Now, let's consider the problem of reusing a procedure interface; one
way to reuse is to make its *declaration* parametric with respect to not
just the values of the data it operates upon but also their types.

This is called overloading. The most common overloading is static; the
programmer defines a series of procedure interfaces which differ in the
type of one or more arguments, and the right interface will be
automatically chosen by the implementation (and the right implementation
associated to that interface) based on the type of the arguments.

Dynamic overloading is where the type of the arguments is not known
until the procedure is invoked, but it is essentially similar.


		GENERICITY

A different thing is genericity -- in it the implementation of a procedure
is parametric with respect to the type of its arguments. For a procedure
to be generic its interface must be overloaded on some type, and all the
operations used in the procedure implementation must also be overloaded
on the same type. In this way the implementation is parametric on that
type as well.

Dynamic genericity (also called unrestrained polymorphism) is where the
interface of a procedure does not depend on the type of its arguments,
and all the operations used in the procedure implementation are also
dynamically generic.

So so far we have overloading, which is a property of a procedure
interface, and genericity, which is property of a procedure
implementation. We also have static and dynamic versions of each,
depending on whether the parameter over which we abstract is a compile
time constant or not.

	We will not discuss genericity much in the rest of this
	treatise, as our main interest will be overloading, but
	usually the same arguments apply similarly to both.


		OPEN AND CLOSED ABSTRACTION

We have another conceptual category to discuss before turning to
examples, and it is whether the abstraction device we use is open or
close ended. Abstraction is open ended where we cannot limit to a finite
set the possible values over which we abstract, and is close ended where
it is known the possible finite set of values over which abstraction may
be instantiated.

For example, static overloading is, in the context of a program, close
ended, in that by looking at all the source modules of a program where a
statically overaloded interface is used it is possible to determine the
full set of types over which a procedure interface is abstracted.

The same is true of dynamic overloading but again if and only if the
full source of a program, whether or not there is a use of a dynamically
overloaded procedure interface, is available.

If we have strict separate compilation static overloading is close ended
if we accept some limitations, but dynamic overloading must be open
ended.


		EFFICIENT ABSTRACTION

Abstraction to work in practice must be efficient, i.e. economical of
resources. In particular better efficiency is achieved by having greater
information about the context of the abstraction rather than less,
because this allows for early resolution of as many issues as possible
at the time the abstraction is defined rather than used, thus saving on
run time space and time.

	For example, inlining allows early resolution at compile time
	of procedural abstraction, with potentially significant
	benefits where early resolution is advantageous. I have long
	reckoned that good compilers should be first and foremost
	symbolic execution engines.

Now there is an interesting efficiency question related to the open or
close endedness of overloading (and genericity). If overloading is close
ended identification of the actual interface inteded is so much easier
and potentially faster. There are some interesting design decisions to
to make here:

1) Do we want open abstraction at all?

2) How closed do we want closed abstraction?

3) How do we express the allowed set in the case of closed abstraction?

4) What about dynamic overloading?

Different language features give different answers to these questions.


		SOME LANGUAGES AND INTERFACE/TYPE ABSTRACTION/REUSE

Classic Lisp gives close ended dynamic overloading and genericity. Since
the set of types cannot be extended by the programmer abstraction is by
necessity close ended. On the other hand primitive operations mostly
work on any object type (for "symmetry" many Lisp implementations
allowed CAR and CDR on atoms as well), and therefore a procedure that
uses them is more or less generic. The type of an object is only known
at run time, unless opportune hints are given to the compiler. There
exist runtime type predicates because of this.

Smalltalk gives dynamic overloading and genericity and open abstraction.
When you are given an object identifier you are not constrained in any
way as to the protocol you can apply to it, whether in time or type.
Clever implementations try to deduce constraints based on usage patterns
at compile time, or cache constrainsts at run time expecting time wise
correlated reuse of the same concrete interface and implementation.
Smalltalk constrains overloading of interface to happen only on the
first parameter to the procedure interface (self).

Eiffel gives dynamic overloading and genericity and closed abstraction
on both. Abstraction is closed in two different ways; the interface to
an object is constrained to be a superset of that of its superclass, and
genericity is constrained because the set of possible types is also
limited by listing (the builder) all the types actually present in the
entire program that satisfy a given procedure's implementation
interfaces.

Algol 68 and C give dynamic overloading, static genericity and closed
abstraction on both by using *unions*. A value of a union type can
belong to any of the united types, but only to those explicitly listed
as such; a tag tells a procedure the current type of the union object.
Thus once can write an implementation that is

Algol 68, Ada, C++ give static overloading, open ended using procedure
interface abstraction over types, but genericity is not supported with
same mechanism because overloading must be strictly on the interface
only (static type resolution).

Simula 67 and C++ offer both static and dynamic overloading, no
genericity (templates are not yet official), partially close ended
abstraction.  Dynamic overloading is done via the virtual mechanism, and
close endedness is assured by the requirement that (all in Simula 67
that only has member functions, member function's in C++) dynamic
overloading be restricted by the derived-from relationship. In
particular C++'s abstract base classes restrict dynamic overloading to
all classes whose interface is a superset of that of the base class. In
the case of a non abstract base class the data representation must also be a
superset of that of the base class.

Note that C++'s partial closededness of the abstraction applies *only*
to the interface and data representation; it does not apply to the
implementation, which is open ended, because it is now known, because
C++ supports strictly separate compilation, which classes will be
derived from a base class and which of them redefine the implementation
of the overloaded interface. For non virtual procedures this is not
terribly important because it is a mistake not to make this information
available at the compiler before run time, but for virtual procedures it
means that interface/implementation matching must be done in part at run
time, even if given the fact that the interface is statically known this
is not as expensive as a totally unrestricted system.


		EXAMPLES

Just to be clearer, I will give some (contrived, and taking some
liberties with syntax) examples. Suppose we want to reuse the 'float
areaOf(entity)' interface. We could have

	struct Square { float side; };
	struct Circle ( float radius; };

	float areaOf(Square s) { return s.side*s.side; }
	float areaOf(Circle c) { return c.radius*c.radius*3.14159; }

which illustrates close ended static overloading, no genericity.

	struct Square { float side; };
	struct Circle ( float radius; };

	struct Shape { char t; union { Square s; Circle c; }; };

	float areaOf(Shape *x)
	{
	     switch (x->t)
	     {
	     case 'S':	return x->s.side*x->s.side; break;
	     case 'C':	return x->c.radius*x->c.radius*3.14159; break;
	     }
	}

This is close ended dynamic overloading, static genericity for users of this
procedure.

	struct Shape { virtual float areaOf() = 0; };

	struct Square : Shape { float side; };
	struct Circle : Shape ( float radius; };

	float Square::areaOf() { return side*side; }
	float Circle::areaOf() { return radius*radius*3.14159; }

Dynamic overloading, partially open ended, reuse of interface. Note that
when it is known that a pointer points to a leaf class, optimizations
are possible. The reason is that overloading is close ended in leaf
classes. Note also that in derivation the union tag is the so called
vtbl-pointer, and the vtbl is the strict equivalent of the switch
in the previous example.

	struct Square { float side; };
	struct Circle ( float radius; };

	/* virtual indicates dynamic overloading allowed */

	float areaOf(virtual Square s) { return s.side*s.side; }
	float areaOf(virtual Circle c) { return c.radius*c.radius*3.14159; }

	Square s; Circle c;

	/* this implies type is deducible from a 'void *' at runtime */

	void *shape = (random0or1()) ? &Square : &Circle;
	float a = areaOf(shape);

This, in the presence of strict separate compilation, cannot be in
general constrained (the assignment to shape and one of the overloadings
of areaOf might be in different files), and so is open ended, because
'void *' gives no type indication. Objective C style.

Incidentally, all this makes for a good argument that void is not a
primitive type at all, but should be defined as 'union {};'. It also
suggests that one might want open unions, i.e. unions that define only
the interface, or a set of base types that can be extended. This would
obviate the need for abstract base classes in C++.

Of course just like struct is an abstraction operator that realizes the
cartesian product of some types, union realizes the union of some types.

We could have a rule that any procedure defined on a type united in a
union can be applied to a union pointer or reference, with suitable
checks that it is either dynamically or statically resolvable. My
thinking on the issue is not yet definite; I think that it is not
impossible with some cleverness to define a union type constructor that
also does the right thing with respect to interfaces, static or dynamic
overloading, and genericity of implementation. I am considering how to
cover the spectrum between static closed overloading, passing thru
traditional closed unions, open derivation, to full dynamic open
overloading, using a single algebra with suitable compiler hints.

In way of principle the Smalltak/Objective C/CLOS/SELF advocates
are right; we can just have full open dynamic reuse of interface and
implementation and then it is the implementor's task to analyze the
program and deduce the safety and efficiency constrainsts, either in
time or applicability, that staticize or restrict either.

Unfortunately the relative technology is not yet quire ready (SELF, some
CLOS/flavours/PCL), and in any case I am in favour of programmer
supplied constraints and smaller compilers, because I reckon that this
is both more cost effective, safer, and more explicit than leaving such
information implicit to be synthetized.

Of course fully orthogonal facilities for separately defining interface,
implementations and semantics abstractions, and the relative algebras,
would be wonderful and ideal, but I think we still have a long way to go
(I am busy doing other things, that is :->).

Research is still needed in better insight in these issues. I hope I
have however provided some insight on the more or less hidden rationales
and consequences and alternatives for providing reuse of interface (and
also of implementation) that are currently popular.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk