[comp.lang.c++] coercions and nested subclasses

dl@rocky.oswego.edu (Doug Lea) (02/15/89)

My point takes a little bit of code to set up:

class A
{
public:
    int x;
    A() { x = 0; } 
};

class B : public A
{
public:
    B() { x = 1; } // anything to make a `B' somehow different than an `A'
};

class C : public B
{
public:
    C() { x = 2; }
};

overload f;
    
void f(A& p, A& q) { q.x += p.x; }      // actual definition of f irrelevant
void f(B& p, B& q) { q.x += p.x + 1; }  // so long as A and B versions differ

A operator + (A& p, A& q) { A r; r.x = p.x + q.x; return r; } // ditto
B operator + (B& p, B& q) { B r; r.x = p.x + q.x + 1; return r; }

void g()
{
  A a1, a2;
  B b1, b2;
  C c1, c2;

  f(a1, a2); // use f(A, A)
  f(b1, b2); // use f(B, B)
  f(a1, b2); // use f(A, A);
  f(c1, a2); // use f(A, A);
  f(c1, c2); //               (*)
  f(c1, b2); //               (**)

  a1 = a1 + a2; // use A + A
  b1 = b1 + b2; // use B + B
  a1 = a1 + b1; // use A + A
  a1 = c1 + a2; // use A + A
  b1 = c1 + c2; //            (***)
  b1 = c1 + b2; //            (****)
}

In line (*), c++ reports an ambiguity. Since a `C' is both a `B' and
an `A', it doesn't know which version of `f' to use. 

The ambiguity is reported merely as a warning. The compiler (cfront or
g++) then picks (randomly, for all I know) one of versions of `f' to
call and continues. (Whether this *should* be a warning versus fatal
error is unclear from my reading the relevant sections of Stroustrup.)

A similar ambiguity arises in (**). (***) and (****) meet the same
problem, but here the choice of which operator to apply leads to a
fatal error if the compiler picks the A operator+ instead of the B
version.

I think that this is at best, counterintuitive, and perhaps something
that should be remedied: I cannot think of a reason why anyone would
want any invocation of f(c, c), f(b, c), or f(c, b) to use the f(A, A)
version, and similarly for operator `+'. The usual motivation for
overloading functions like `f' to handle derived types is to override
default behavior in such circumstances in ways that ought to carry
through to subclasses in the absence of further overrides. Perhaps there
are counterexamples, but I expect that they are pretty rare.

Thus, the c++ modification that would be nicest here is to *promise*
that matches requiring coercions for derived types always be selected
in a `minimum-derived-distance' (i.e., nearest ancestor first)
fashion, and to otherwise *preserve* the current ambiguity warning
(but stating which version was chosen), as a reminder of a potential
problem. 

This is nearly the same rule that is already in place when coercing
arguments along the pretend heirarchy of builtins
double:float:long:int:short:char. This would probably require
implementation via a simple distance metric to be minimized.  (In
accordance with existing coercion rules, this implies that
non-hierarchical programmer-defined coercions would be applied only in
the absence of available hierarchical coercions.)  Of course, this
rule may not lead to a unique solution for functions with multiple
parameters, especially under multiple inheritence.  However, it
appears that it would almost always arrive at exactly the coercion
that programmers have in mind, as do the usual rules for builtins.

Here are a few possible arguments against this proposal, along with
brief replies:

1. There already is a way to ensure desired behavior in this and other
cases by replacing (*) with the nonobvious, messy and error-prone
  f(*(B*)(&c1), *(B*)(&c2));
and similarly for the others. But using this construct in programs
where this kind of ambiguity abounds quickly leads to thoroughly
unreadable code.

2. Given this, you could pre-empt ambiguities while avoiding so many ugly
constructions by inserting simple inlines like
  inline void f(C& p, C& q) { f(*(B*)(&p), *(B*)(&q)); }
  inline void f(C& p, B& q) { f(*(B*)(&p), q; }
  inline void f(B& p, C& q) { f(p, *(B*)(&q)); }
but in applications where there are several subclasses of `B',
the number of such declarations you need combinatorially gets out
of hand (e.g., if another subclass of `B' were written, 5 more such
declarations of `f' are needed, a third needs 7, etc... If you had
ten subclasses and ten binary functions, you'd need 990 of these, most
of which will never actually be invoked in any given program.)

3. One could argue that this whole example is suspect. Why aren't `f'
and `+' members of A and B? This is indeed possible in this example,
but generally, you don't want to make every single `utility' procedure
or operator that only uses the public interface of a class into a
member just to get around this problem, right? One way of looking at
this proposal is that it just makes the coercion rules for non-member
functions more similar to the means by which the appropriate member
function is selected under the usual inheritence rules. 

4. Jonathan Shopiro's suggestion (which I assume will be adopted) of
supporting inheritable static member functions may help in some cases
like these, but only if such functions are declared as static members,
which, again, is otherwise undesirable for ordinary unprivilaged
utility functions, especially since writing any such function would
require modification of the class declaration.

5. Finally, one could argue that the proposal adds unnecessary
complexity to the language. While a matter of taste, I think not: the
modest amount of complexity it introduces to compiler implementations
seems to more than offset the complexity of user programs required to
circumvent the problem. In fact, when such cases arise, it seems that
executing automated coercion rules should be faster than parsing all the
explicit coercions and/or inlines mentioned above. And while the
current coercion rules are arguably too complex already, extending
them in order to minimize unexpected undesirable results does not
seem a liability.

I think that implementing the minimum-derived-distance rule, along
with the existing means to force a particular coercion (both to avoid
warning messages and to override the default) solves this particular
problem in a reasonably pleasant fashion.  Does anyone think differently?


-Doug

Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@rocky.oswego.edu        or dl%rocky.oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl