david@june.cs.washington.edu (David Callahan) (11/13/88)
In the following, I assume a knowledge of Stroustrup's papers on Parameterized types and pointers to member functions published in the USENIX C++ Workshop 87/Conference 88 proceedings (I don't have them with me so I'm not sure which). There is an undeniable need for "parameterized" class definitions. The oft-used example of a "vector" class that has different base elements is compelling. Stroustrup has proposed syntax for parameterized classes and functions. His intent was that the compiler instantiate specific types and function implementations as needed. I think of this as a type-safe macro preprocessor. (This is simplified of course, please correct me if its actually wrong.) Example: template<class T> vector { T* data ; int length ; public: vector(int l) { data = new T[length] ; } T& operator[](int i) { return T[i] ; } } ; Even if the bodies of the member functions are defined "out of line", they must be instantiated for each type that appears as a parameter to a "vector" in the program. One of the problems with this approach is the proliferation of function implementations. In "vector", these functions are trivial and so the impact is small, but in general there may be a lot of essentially redundant code. Another problem is that it requires the source for all member functions of the parameterized type to be available at every compile. This makes libraries, especially libraries of proprietary software, more complex to use. A second example is an homogenous list: template<class T> list { class link { public: T* data ; link* Next ; link(T* d, link* n) { data = d ; Next = n; } link* next() { link* temp = this->Next ; delete this ; return temp ; } } *head ; public: void insert(T* d) { head = new link(d,head) ; } T* head() { T* temp = head->data ; head = head->Next() ; return temp ; } } ; (I don't know if that's quite right, but you get the idea). How do implementations of this example differ for list<int> and list<double>? They don't unless "int*" and "double*" have different sizes which I assume is not the case. The only operator applied here is pointer assignment without casting and "new". The implementation of "vector<T>" does depend on "sizeof(T)" but the "list<T>" template is truly polymorphic with respect to type T and it is undesirable to instantiate an implementation for each declared type (I know the member functions will be inlined, assume for the sake of discussion that I declared them out of line). Let me define the term "generic" to refer to a function implementation that can take different types for some argument and return different types of results as long as the types associated with each argument position are "structurally" identical. Let me reserve the term "polymorphic" for the more general case where all actual arguments share a set operations but their implementations may differ structurally. So, the "list<T>" class is generic but the "vector<T>" class is polymorphic. Languages like SmallTalk (correct me if I'm wrong here) do not use the structure of an argument object when a function is implemented. This allows functions to be polymorphic. C++ on the other hand binds types to structures at compile time and exploits that binding in function implementations. Thus, the most the language can support are generic function (I'll return to this latter). Currently, C++ programmers get generic functions by explicitly casting objects to base types, invoking functions on those base types and then casting the result back to a derived type. The above list example could be handled by replacing "T" with "void" everywhere and then deriving a new class for each parameter that simply performs type casting: class list<int> : public list<void> { public: int* head {return (int*) list<void>::head(); } } ; Is this sufficient? Maybe. Is it cumbersome? I think so. Let me now make a proposal for a language extension to support generic function definition. The syntax is modeled after templates: generic< class T : S, class X : Y, class Z > name { // class definition referring to objects // of classes T, X, Z } ; The arguments indicate that "T" is derived from base class "S", "X" is derived from base class "Y" and "Z" is any class (derived from void?). There is also a requirement inside the class definition that only operations legal for "S" may be applied to arguments of type "T" and similarly, only operations legal for "Y" may be applied to arguments of type "X" and only operations legal for "void" (such as "&" to "void*") may be applied to arguments of type "Z". In particular, the declaration of a member function may refer to "T", "X", and "Z" only as types of arguments and the result type. The body of the member functions and the member data objects may not refer to "T", "X", and "Z" but may refer to "S", "Y" , and "void". One exception is allowed to this rule: the cast operation from "S*" to "T*" is defined and assignment of T*'s and application of the "->*" (pointer-to-member dereferencing) is allowed. When a member function is defined "out of line", the same "generic< ... >" specification must proceed the function definition. How is the conversion from S* to T* accomplished? In a single-inheritance implementation, we can assume that no modification to an S* will be needed to get a T*. In a multiple inheritance-implementation, the value of S* may need to be offset by an amount depending on the type T. These offset amounts could be held in a static table like virtual functions pointers are and their values determined at compile time. The effect of this implementation is to embed a very small amount of structural information in the object that can be used efficiently at runtime. "Polymorphic" functions can be simulated using generic functions if the generic functions are explicitly parameterized by all of the operations needed by the polymorphic function. The "pointer to member object" type provides a way to do this. Assume that we want to write a function that sums integer values from each object in a list. Using templates: template<class T> list : { ... int list_sum(list<T>* L) { link* cur = L->head ; int sum = 0 ; while (cur != NIL) { sum += cur->data->value ; cur = cur->next ; } return sum ; } } ; Note that this will generate a compile-time type fault if "T" does not have a "value" field. Consider now a "generic" version: generic<class T> list : { ... int list_sum(int (T::*) valuepm) { link* cur = this->head ; int sum = 0 ; while (cur != NIL) { sum += ((T*)(cur->data))->*valuepm ; cur = cur->next ; } return sum ; } } /* example */ class temp { public: int i } ; list<temp> *L ; ... L->list_sum(&temp::i) ... Here we have added an extra parameter that selects a field. This is information that is dynamically available in SmallTalk but must be statically supplied in C++. A side effect of this transformation is that the actual name of the field is no longer used and so this function could be used to sum along different integer fields. The same effect could be had in the template case as well. To restore the field name to the type specification would require a more powerful pointer-to-member type structure that would allow you to specify a type that is a "pointer to <type> member of class <class> with name <name>". Perhaps something like: member "value" int (T::*) valuepm ; but I see no use for this since it only restricts the applicability of an otherwise generic function. I'd like to return briefly to the question of polymorphic functions, in particular, polymorphic implementations of "template" functions. The process of transforming a "template" function to a "generic" function could be done automatically (hence a need for pointers to members with particular names). It requires that all of the members used be known, which generates a compilation order problem which may include circular dependencies. Efficient implementations will also require that all types be known so that "constant" member functions (especially operators like "->") can be "compiled in" (one of the virtues of specifying base types for type parameters is that is limits what operations may vary, also note that we are assuming that operator "->*" can not be redefined). The problem is straightforward given an adequate programming environment and may be important if there is a significant space/time or debuggability/time or compile-time/time or copy-protection/time trade off. A more difficult problem is deciding when to replicate implementations (clone) to exploit parameter-specific information. The "template" implementation outlined by Stroustrup takes the approach "clone everywhere" which in a large program could be a problem. The transformation to polymorphic takes the other extreme: only one implementation. Which solution is "right" is application/machine/life cycle dependent and probably requires a meta-language solution. These are early thoughts and comments/questions are encouraged. David Callahan, Tera Computer Co., Seattle WA.
shap@polya.Stanford.EDU (Jonathan S. Shapiro) (11/14/88)
In article <6403@june.cs.washington.edu> david@uw-june.UUCP (David Callahan) writes: > >So, the "list<T>" class is generic but the "vector<T>" >class is polymorphic. Uh, sorry, but it isn't. The fact that the member types have supertypes simply means that you are confining yourself to the supertype. While that is in some very weak sense polymorphic, in that anything derived from the same supertype has the same implementation, it isn't fully polymorphic. The point of parameterized types is to be able to build truly generic objects. Further, the current C++ language can already do what you suggest. The real issue with C++ parameterized types will be proliferation of copies of the same code [as you suggest] and the library management involved in the proliferation. With regard to the proliferation of copies problem, there is a simpler solution than the one you propose. Given a class (forgive me the syntax, it has been some time since I read the paper): class Stack <T> { T *box; public: T& operator<<(T&); T& operator>>(); ... } ; The issue is can we show that for some set of needed types T the implementations of the functions are the same. The simplest way to determine this is to generate the functions for both cases and see if they are the same code modulo name substitution. This could be done in the code tree easily. In this case, if the idea is to resize Stack::box each time, then both functions would need to know sizeof(T), so they are not equivalent. This could be eliminated by adding a private variable size and filling it in in the constructor, which would make the constructor type dependent and the other function implementations type independent. This optimization might even be done (or at least suggested) automatically. The harder problem is making it work across object files. Essentially the problem here is that an object file is an inappropriate scoping constraint with respect to this issue. Unfortunately, to do the problem right implies some fairly sophisticated project management and compilation tools. Jon Shapiro
david@june.cs.washington.edu (David Callahan) (11/14/88)
In article <5031@polya.Stanford.EDU> shap@polya.Stanford.EDU (Jonathan S. Shapiro) writes: ;>In article <6403@june.cs.washington.edu> david@uw-june.UUCP (David Callahan) writes: ;>So, the "list<T>" class is generic but the "vector<T>" ;>class is polymorphic. ;Uh, sorry, but it isn't. The fact that the member types have ;supertypes simply means that you are confining yourself to the ;supertype. While that is in some very weak sense polymorphic, in that ;anything derived from the same supertype has the same implementation, ;it isn't fully polymorphic. Perhaps my use of "polymorphic" was imprecise, that is why I gave a definition in the article. My intent was to characterize the difference between "list<T>" which can be implemented once for all T and "vector<T>" which can not because of the internal use of type attributes. If you would prefer different terms, please define them and we can use them. Based on this characterization, I wanted to provide language support for "generic" classes that worked with the strongly-typed nature of C++. By this I mean that the programmer must make assumptions about the type parameter explicit and the compiler enforces these assumptions both in the class and at points of use. Beyond that, I outlined how a smart compiler could provide a "polymorphic" implementation of general parameterized class specified with a "template". ; ... ; Given a class ... : ; [ parameterized stack<T> class, deleted ] ; ; The issue is can we show that for some set of needed types T the ; implementations of the functions are the same. The simplest way to ; determine this is to generate the functions for both cases and see if ; they are the same code modulo name substitution. This could be done ; in the code tree easily. Simplicity must also be in the eye of the beholder. Won't this approach have complexity exponential in the number of needed types? What if not all types are known? My suggestion was to examine the text and have the compiler extend the interface specification and, at the point of use, have the compiler added extra parameters to satisfy that specification. A compile-time type fault would occur if the compiler could not satisfy the specification. ; In this case, if the idea is to resize Stack::box each time, then both ; functions would need to know sizeof(T), so they are not equivalent. Won't they also need a pointer to the "initialization from a reference" constructor for T (typed: T (T&)), if one exists? ; This could be eliminated by adding a private variable size and filling ; it in in the constructor, which would make the constructor type ; dependent and the other function implementations type independent. The point of the "generic" class was to make this operation easy for the current generation of separate-compilation-is-god compilers. ; This optimization might even be done (or at least suggested) ; automatically. ; The harder problem is making it work across object files. Essentially ; the problem here is that an object file is an inappropriate scoping ; constraint with respect to this issue. Unfortunately, to do the ; problem right implies some fairly sophisticated project management and ; compilation tools. I agree mostly, I'm not sure how "sophisticated" the tools really need to be to be sufficient. ; Jon Shapiro David Callahan, Tera Computer Co. Seattle, WA.