[comp.lang.c++] Generic functions -- a step toward polymorphism

david@june.cs.washington.edu (David Callahan) (11/13/88)

In the following, I assume a knowledge of Stroustrup's
papers on Parameterized types and pointers to member functions
published in the USENIX C++ Workshop 87/Conference 88 proceedings
(I don't have them with me so I'm not sure which).

There is an undeniable need for "parameterized" class definitions.
The oft-used example of a "vector" class that has different
base elements is compelling. Stroustrup has proposed syntax
for parameterized classes and functions. His intent was
that the compiler instantiate specific types and function 
implementations as needed. I think of this as a type-safe
macro preprocessor. (This is simplified of course, please
correct me if its actually wrong.)

Example:

	template<class T> vector {
		T* data ;
		int length ;
	public:
		vector(int l) { data = new T[length] ; }
		T& operator[](int i) { return T[i] ; }
	} ;

Even if the bodies of the member functions are defined 
"out of line", they must be instantiated for each type that 
appears as a parameter to a "vector" in the program.
One of the problems with this approach is the proliferation
of function implementations. In "vector", these functions are
trivial and so the impact is small, but in general
there may be a lot of essentially redundant code.
Another problem is that it requires the source for all
member functions of the parameterized type to be
available at every compile. This makes libraries,
especially libraries of proprietary software, 
more complex to use.

A second example is an homogenous list:

	template<class T> list {
		class link {
		public:
			T* data ; link* Next ;
			link(T* d, link* n) 
				{ data = d ; Next = n;  } 
			link* next() {
				link* temp = this->Next ;
				delete this ;
				return temp ;
			} 
		} *head ;
	public: 
		void insert(T* d) {
			head = new link(d,head) ;
			}
		T* head() {
			T* temp  = head->data ;
			head = head->Next() ;
			return temp ;
		}
	} ;

(I don't know if that's quite right, but you get the idea).
How do implementations of this example differ for list<int> and
list<double>? They don't unless "int*" and "double*" have different
sizes which I assume is not the case. 
The only operator applied here is pointer assignment without
casting and "new". The implementation of "vector<T>" does depend on 
"sizeof(T)" but the "list<T>" template is truly polymorphic with
respect to type T and it is undesirable to instantiate an implementation
for each declared type (I know the member functions will be inlined,
assume for the sake of discussion that I declared them out of line).

Let me define the term "generic" to refer to a function implementation
that can take different types for some argument and return different
types of results as long as the types associated with each argument
position are "structurally" identical. Let me reserve the term
"polymorphic" for the more general case where all actual arguments share 
a set operations but their implementations may differ structurally.
So, the "list<T>" class is generic but the "vector<T>" 
class is polymorphic.

Languages like SmallTalk (correct me if I'm wrong here) do not 
use the structure of an argument object when a function is implemented.
This allows functions to be polymorphic. C++ on the other hand binds
types to structures at compile time and exploits that binding in function
implementations. Thus, the most the language can support are 
generic function (I'll return to this latter).

Currently, C++ programmers  get generic functions by explicitly
casting objects to base types, invoking functions on those base
types and then casting the result back to a derived type. The above
list example could be handled by replacing "T" with "void"
everywhere and then deriving a new class for each parameter
that simply performs type casting:

	class list<int> : public list<void> {
	public:
		int* head {return (int*) list<void>::head(); }
	} ;

Is this sufficient? Maybe. Is it cumbersome? I think so.
Let me now make a proposal for a language extension to support
generic function definition. The syntax is modeled after templates:

	generic< class T : S, class X : Y, class Z > name { 
	 	// class definition referring to objects 
		// of classes T, X, Z
	} ;

The arguments indicate that "T" is derived from base class "S",
"X" is derived from base class "Y" and "Z" is any class (derived
from void?). There is also a requirement inside the class 
definition that only operations legal for "S" may be applied 
to arguments of type  "T" and similarly, only operations legal 
for "Y" may be applied to arguments of type "X" and only operations 
legal for "void" (such as "&" to "void*") may be applied to arguments 
of type "Z". In particular, the declaration of a member function may 
refer to "T", "X", and "Z" only as types of arguments and
the result type. The body of the member functions and the 
member data objects may not refer to "T", "X", and "Z" but
may refer to "S", "Y" , and "void". One exception is allowed
to this rule: the cast operation from "S*" to "T*" is defined
and assignment of T*'s and application of 
the "->*" (pointer-to-member dereferencing)
is allowed.  When a member function 
is defined "out of line", the same "generic< ... >" 
specification must proceed the function definition.

How is the conversion from S* to T* accomplished? In 
a single-inheritance implementation, we can assume that
no modification to an S* will be needed to get a T*. 
In a multiple inheritance-implementation, 
the value of S* may need to be offset by an amount depending on 
the type T. These offset amounts could be held in a
static table like virtual functions pointers are and their 
values determined at compile time. The effect of this implementation 
is to embed a very small amount of structural information in the object
that can be used efficiently at runtime.

"Polymorphic" functions can be simulated using generic functions
if the generic functions are explicitly parameterized by all of
the operations needed by the polymorphic function. The 
"pointer to member object" type provides a way to do this.
Assume that we want to write a function that sums integer
values from each object in a list. Using templates:

	template<class T> list : { 
		... 
		int list_sum(list<T>* L) {
	 		link* cur = L->head ;
			int sum = 0 ;
			while (cur != NIL) {
				sum += cur->data->value ;
				cur = cur->next ;
			}
			return sum ;
		} 
	} ;

Note that this will generate a compile-time type fault if 
"T" does not have a "value" field. Consider now a "generic"
version:

	generic<class T> list : { 
		...
		int list_sum(int (T::*) valuepm) {
	 		link* cur = this->head ;
			int sum = 0 ;
			while (cur != NIL) {
				sum += ((T*)(cur->data))->*valuepm ;
				cur = cur->next ;
			}
			return sum ;
		} 
	}
   /* example */ 
	class temp { public: int i } ;
	list<temp> *L ; 
	... L->list_sum(&temp::i) ... 

Here we have added an extra parameter that selects a field.
This is information that is dynamically available in SmallTalk
but must be statically supplied in C++. A side effect of this
transformation is that the actual name of the field is no longer
used and so this function could be used to sum along different
integer fields. The same effect could be had in the template 
case as well. To restore the field name to the type specification
would require a more powerful pointer-to-member type structure that
would allow you to specify a type that is a "pointer to <type> member
of class <class> with name <name>". Perhaps something like:
	member "value" int  (T::*) valuepm ;
but I see no use for this since it only restricts the applicability
of an otherwise generic function.

I'd like to return briefly to the question of polymorphic
functions, in particular, polymorphic implementations of 
"template" functions. The process of transforming a "template"
function to a "generic" function could be done automatically
(hence a need for pointers to members with particular names).
It requires that all of the members used be known, which 
generates a compilation order problem which may include
circular dependencies. Efficient implementations will also require
that all types be known so that "constant" member functions
(especially operators like "->") can be "compiled in" (one of 
the virtues of specifying base types for type parameters
is that is limits what operations may vary, also note that
we are assuming that operator "->*" can not be redefined).
The problem is straightforward given an adequate programming
environment and may be important if there is a significant 
space/time or debuggability/time or compile-time/time or
copy-protection/time trade off.

A more difficult problem is deciding when to replicate
implementations (clone) to exploit parameter-specific
information. The "template" implementation outlined by 
Stroustrup takes the approach "clone everywhere" which
in a large program could be a problem. The transformation
to polymorphic takes the other extreme: only one implementation.
Which solution is "right" is application/machine/life cycle
dependent and probably requires a meta-language solution.

These are early thoughts and comments/questions are 
encouraged.

David Callahan, Tera Computer Co., Seattle WA.

shap@polya.Stanford.EDU (Jonathan S. Shapiro) (11/14/88)

In article <6403@june.cs.washington.edu> david@uw-june.UUCP (David Callahan) writes:
>
>So, the "list<T>" class is generic but the "vector<T>" 
>class is polymorphic.

Uh, sorry, but it isn't.  The fact that the member types have
supertypes simply means that you are confining yourself to the
supertype. While that is in some very weak sense polymorphic, in that
anything derived from the same supertype has the same implementation,
it isn't fully polymorphic.  The point of parameterized types is to be
able to build truly generic objects.  Further, the current C++
language can already do what you suggest.

The real issue with C++ parameterized types will be proliferation of
copies of the same code [as you suggest] and the library management
involved in the proliferation.

With regard to the proliferation of copies problem, there is a simpler
solution than the one you propose.  Given a class (forgive me the
syntax, it has been some time since I read the paper):

	class Stack <T> {
		T *box;
	public:
		T& operator<<(T&);
		T& operator>>();
		...
	} ;

The issue is can we show that for some set of needed types T the
implementations of the functions are the same.  The simplest way to
determine this is to generate the functions for both cases and see if
they are the same code modulo name substitution.  This could be done
in the code tree easily.

In this case, if the idea is to resize Stack::box each time, then both
functions would need to know sizeof(T), so they are not equivalent.
This could be eliminated by adding a private variable size and filling
it in in the constructor, which would make the constructor type
dependent and the other function implementations type independent.
This optimization might even be done (or at least suggested)
automatically.

The harder problem is making it work across object files.  Essentially
the problem here is that an object file is an inappropriate scoping
constraint with respect to this issue.  Unfortunately, to do the
problem right implies some fairly sophisticated project management and
compilation tools.

Jon Shapiro

david@june.cs.washington.edu (David Callahan) (11/14/88)

In article <5031@polya.Stanford.EDU> shap@polya.Stanford.EDU (Jonathan S. Shapiro) writes:
;>In article <6403@june.cs.washington.edu> david@uw-june.UUCP (David Callahan) writes:
 
;>So, the "list<T>" class is generic but the "vector<T>" 
;>class is polymorphic.

;Uh, sorry, but it isn't.  The fact that the member types have
;supertypes simply means that you are confining yourself to the
;supertype. While that is in some very weak sense polymorphic, in that
;anything derived from the same supertype has the same implementation,
;it isn't fully polymorphic.  

Perhaps my use of "polymorphic" was imprecise, that is why I gave
a definition in the article. My intent was to characterize the
difference between "list<T>" which can be implemented once
for all T and "vector<T>" which can not because of the internal
use of type attributes. If you would prefer different terms,
please define them and we can use them. Based on this characterization,
I wanted to provide language support for "generic" classes that
worked with the strongly-typed nature of C++. By this I mean
that the programmer must make assumptions about the type
parameter explicit and the compiler enforces these assumptions both
in the class and at points of use. Beyond that, I outlined how a 
smart compiler could provide a "polymorphic" implementation of 
general parameterized class specified with a "template".

; ...
; Given a class ... :
;
	[ parameterized stack<T> class, deleted ]
;
; The issue is can we show that for some set of needed types T the
; implementations of the functions are the same.  The simplest way to
; determine this is to generate the functions for both cases and see if
; they are the same code modulo name substitution.  This could be done
; in the code tree easily.

Simplicity must also be in the eye of the beholder. Won't this approach
have complexity exponential in the number of needed types? What if not
all types are known? My suggestion was to examine the text and have the 
compiler extend the interface specification and, at the point of use,
have the compiler added extra parameters to satisfy that specification.
A compile-time type fault would occur if the compiler could not satisfy
the specification.

; In this case, if the idea is to resize Stack::box each time, then both 
; functions would need to know sizeof(T), so they are not equivalent.

Won't they also need a pointer to the "initialization from a reference"
constructor for T (typed: T (T&)), if one exists?

; This could be eliminated by adding a private variable size and filling
; it in in the constructor, which would make the constructor type
; dependent and the other function implementations type independent.

The point of the "generic" class was to make this operation easy
for the current generation of separate-compilation-is-god compilers.

; This optimization might even be done (or at least suggested)
; automatically.

; The harder problem is making it work across object files.  Essentially
; the problem here is that an object file is an inappropriate scoping
; constraint with respect to this issue.  Unfortunately, to do the
; problem right implies some fairly sophisticated project management and
; compilation tools.

I agree mostly, I'm not sure how "sophisticated" the tools really
need to be to be sufficient.

; Jon Shapiro

David Callahan,  Tera Computer Co.   Seattle, WA.