[comp.lang.c++] A treatise on overloading quasi-operators

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/23/90)

On 21 Sep 90 03:25:05 GMT, rfg@NCD.COM (Ron Guilmette) said:

rfg> In article <57570@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK)
rfg> writes:

jimad> Please join me in lobbying the ANSI C++ committee to correct
jimad> this oversight in the language definition.  Overloading
jimad> operator.() makes equal sense for reference classes as
jimad> overloading operator->() for pointer classes...

rfg> Finally, Jim and I have found something to agree on. :-)

rfg> I'd say that Jim is correct is saying that treating either `.' or
rfg> `->' as if they were OPERATORS (kinda like other operators) makes
rfg> equal sense, i.e. *NONE*!  Allowing overloading for either of these
rfg> makes about as much sense as allowing overloading for `{' or `}'.

I tend to agree, but for other reasons, as the case for overloading '->'
is better than you think. I'd allow it, but *indirectly*, via
overloading '*'.

rfg> Look folks, just because a particular C++ token contains some non-
rfg> alphanumeric characters does not make it an operator!

The *binary* '.' syntax can be viewed as a *compile time* operator;
given a typed (l)value to an object and an identifier, select the slot
in the object named by that identifier. In languages that implement
objects as "frames" (more or less loosely, e.g.  Self, in some way
Smalltalk, CLOS), you *can* do it. Maybe Jim Adcock has got a case of
CLOS-envy?  :-) :-) :-)

rfg> OK.  Back to basics.  What is `->'? Well, it is (syntactic sugar)
rfg> shorthand notation for `*.' (i.e.  dereference and select). [ ... ]
rfg> The meaning is clear and unambiguous.  Apply the prefix unary
rfg> dereference operator to the thing on the left, and then select out
rfg> a member from the result.

Note: there may be a good argument that '->' is more fundamental than
'.', so it may be better to say that 'a.b' can be rewritten as '(&a)->b'
rather than 'a->b' can be rewritten as '(*a).b', even if there is then a
problem with register structs.

rfg> For all of the binary operators that *I* know of, either the left
rfg> or the right operands may be arbitrarily complex *expressions* (so
rfg> long as these expressions evaluate to some type of value which is
rfg> appropriate for the given category of operator).

This definition would seem to justify overloading ',' which instead I
think makes about as much sense as overloading ';', which is not a lot,
as they both are really just 'and-also' style syntax words, one for
expressions, the other for statements. Too bad that the tempation to
overload ',' to create pairs such as complex numbers or cartesian
coordinates has been overwhelming. Too bad that ',' cannot be overloaded
on fundamental types like 'int' or 'float', which makes things less
enjoyable in this respect.

rfg> Likewise, since `->' is just a shorthand notation for a combination
rfg> of an operator (i.e. unary prefix dereference) and the selector,
rfg> allowing overloading for `->' make as little sense as allowing it
rfg> for the selector all by itself, i.e.  none whatsoever.

But overloading '->' is a *unary* postfix operator, unintuitive as it
may seem. You seem to have fallen into the trap (in which I had fallen
myself when first meeting the idea of overloading '->') of thinking that
it is overloaded as a binary operator.  Actually '->' is a binary
selector (compile time operator) *plus* a unary (runtime) operator,
and C++ allows you magically to overload the latter only.

Overloaded '->' is essentially equivalent to a postfix overloaded unary
'*', but *only* in the context of 'casting' an object to a pointer to an
object for access to a member of that object; while just overloading
unary '*' currently works if one wants to access the entire object
*only*.

	This cast-like nature on 'this' is why '->' is only overloadable
	as a member operator, not as a (friend, usually) unary operator
	proper. Note that the '->' signature may somehow include its
	result type, like a cast.

When '->' is overloaded, it results in something like

    class anyclass { ... b; ... };
    class someclass { ... anyclass *operator ->(); ... };
    someclass a;

    assert (&a->b == &(operator ->(a))->b); // funny syntax here

I say 'funny syntax' because the expression '(operator ->(a))->b' should
have been written as '(*(a).operator ->()).b', where the '.'  denote the
builtin selector here, and '*' may be builtin or overloaded in
'anyclass'.

rfg> I do object (most violently) however to any attempts to call the
rfg> selector `.' a `binary operator' (or to treat it as though it were
rfg> one).  Obviously, it isn't.

By the same device used for '->' you could have overloading of '.' as a
unary postfix operator that took a *reference* to an object, and
returned one; for example,

    class anyclass { ... y; ... };
    class someclass { ... class anyclass &operator .(); ... };
    someclass x;

    assert (&x.y == &(operator .(x)).y); // funny syntax here

	Note that while '->' would return a *pointer*, '.' would return a
	*reference*, as Jim Adcock says; this entails assuming that each
	object may be a reflexive reference to itself. This may or may
	not be fishy.

The fundamental difference between overloading '->' or '.'  (but also a
few others!) and something like '+' is that the latter does not involve
a pseudo recursive invocation of the builtin '+', but the former two
eventually resolve some underlying builtin.

	For Interlisp speakers: we are essentially advising the builtin
	'->' and '.' syntax words.

By further analogy one could argue that '[]' should overload as an unary
operator too. To illustrate the difference between these two types of
overloading, let's take this point further:

	class coord { int x,y; ... coord(int,int); ... };
	class triangularMatrix {
		... friend float &operator[](coord &xy); ...
	};

	triangularMatrix triangle(50,50);

	float middle = triangle[coord(25,25)];

This is clearly the current style of overloading of '[]'. Here '[]' is a
binary operator totally unrelated to '*' or '+'.

	class hashTable {
		... symbol &operator[](); ...
	};

	hashTable table(...);

	symbol first = table[0];

Here instead we assume '[]' to be a *unary* operator, and the last line
would be equivalent to something like

	symbol first = (operator [](table))[0]; // funny syntax here

rfg> If there are any rules for what should be called an operator and
rfg> what should not, and what should be overloadable and what should
rfg> not, I'd like to see them!

Well, if anybody can produce consistent reasons (instead of nice
arguments) for many of the design decisions of any language out there
I'd call that a major advance of the state of the art. The only major
case where I have seen happen is for type theories -- Pascal's based on
Hoare's (Structured Programming, 2nd essay), and there more recent
examples, e.g. in the functional language group. The people working in
the latter seem to have some idea of what they are doing, but they tend
to limit their languages to what their *mathematical* based reasoning
amounts to. Even in Eiffel, where Meyer seems to make an effort to
justify his decisions by reasoning, there are many non-sequiturs (vide
the issue of references) in that reasoning, IMNHO.

rfg> If these rules are at all consistant, if they make any sense
rfg> whatsoever, and if they still would seem to permit -> to be
rfg> overloaded, I'll eat my hat.  --

Ahem -- I think I have found some rationales that would allow apparently
*consistent* overloading for '->' and '.', interpreting
them as unary postfix operators redefining part of a builtin.

    Note that I say *consistent* -- I leave the question of whether
    the whole business is the best way to achieve certain admittedly
    desirable effects, or instead rather a messy way, to posterity :-)
    (well actually to the last part of this treatise).

By applying the same rules consistently one could also make the case
that '[]' should also be interpretable as a unary operator, overloading
the builtin '[]' type unconstructor.

I am using here 'uncostructor' as a neologism, for want of better
terminology.  When you say 'int *p;' the '*' is a type constructor (old
reminiscences of EL/1, more reent ones of ML etc...), because it is a
function on types (mapping 'int' to 'int *'). When you thereafter say
'*p' this '*' in a sense 'undoes' the other, because it *both* has a
runtime effect (indirection, which undoes the '&' operator), and a
compile time effect (mapping the type 'int *' to 'int'). Overloading
'->' and '.' (the unconstructors for the 'struct' like type
constructors), '*' and '[]' (those for '*' and '[]', of course
stretching things a bit for '[]') essentially amounts to overloading the
runtime effect.

As binary operators, '[]' may be reasonably interpreted as a *just
runtime* operator, and thus overloadable within C++; '->' and '.'  only
as *compiletime* (too) operators, and therefore they could only be
overloaded in both their runtime and compiletime dimensions in meta-C++,
which we do not have access to.

Unfortunately for your hat (:->) C++ *does* define '->' as a _unary_
operator only, and according to my rationale above, it is consistent to
treat this as an operator (I very much doubt that any of the people
involved in designing C++ did actually consider all these subtle points,
because they left out unary []), as a kind of unary postfix '*', as
shown above.

This operator is however different from '*', because the overloaded
operator '*' is free as to the type it can return, because it does
not overload (part of) an unconstructor.

rfg> Now I've got no problem with (or objection to) allowing overloading
rfg> for the unary prefix dereference operator BECAUSE THAT *IS* AN
rfg> OPERATOR IN EVERY SENSE OF THE WORD.

Indeed now that I think of it, a case could be made that one should be
able to overload '*' as part of an uncostructor as well, like for '[]'
in both circumstances. I mean that currently, assuming

	class mypointer { ... friend int operator *(); ... };

	mypointer m(...);
	mypointer *mp = &m;

	assert (&(*mp) == &m);

'*' is the builtin one.

	assert (*m == ... /* some 'int' value */);

We could extend this as in:

	class mypointer {
		... friend void *operator *(); ...
		... mypointer *operator *(); ...
	};

Now the second overloading of '*' is overloaded also an uncostructor,
when applied to a pointer type, and not just to a base type. Naturally
we have some difficulty in deciding which to apply if both are defined,
because they are both unary, so we should forbid defining both... (note
that with unary and binary '[]' we can disambiguate, if the binary
version signature does not have 'int' as second operand).

Now let's turn to whether it is desirable (not just consistent) to
overload not just operators on object types, but also operators on
unconstructors, i.e. on pointers, references and arrays to
objects.

First note that in C it is true that 'a.b' and '(&a)->b' are the same
thing, as it is also true that 'a[b]' and '*(a+b)' are equivalent, just
as '*(&a)' and 'a' and '&(*a)'. This equivalence has been broken in C++.
This may be criticizable in itself, admittedly. We would be adding new
cases where traditional equivalences are broken.

For '[]' there is some rationale; it is convenient to be able to express
the notion of indexing objects (directly, I mean) with arbitrary values,
not just integers, and while this could, with suitable definitions, be
done overloading '+', it just *seems* a little more readable to be able
to overload directly '[]'.  But IMNHO this is scant justification.

The same problem would recur with '->' and '.': we can have '->'
overload '.' as well (by using the traditional equivalence between the
two), or make them separately overloadable.

I think that since the goal of overloading '->' and '.' and '[]' is
really implementing "intelligent" pointers ('->') or references ('.') or
arrays('[]'), this could be better achieved by overloading just unary
'*', by having the traditional equivalences, in all of which we may
overload '*':

	a->b	(*a).b		(. is builtin)
	a.b	(*(&a)).b	(& may be overloaded, . is builtin)
	a[b]	*(a+b)		(+ may be overloaded)

We would just lose, by having the ability to overload '*' (and '&'), the
equivalences involving it and '&' ('*(&a)' equivalent to 'a' and
'&(*a)'), which seems unavoidable.

But maybe people would prefer, in the IMNHO mistaken idea that it is
clearer, to overload '->', '.' and '[]' etc... directly, even if this
breaks more equivalences.

Breaking more equivalences creates bizarre situations, just like for
'[]'; '(*a).b' in C++ would be no more equivalent to 'a->b' than
'a[b]' is equivalent to '*(a+b)', for non primitive types.

If we are not discouraged by this, then overloading '->' and '[]' etc...
make sense.

jimad> Overloading operator.() makes equal sense for reference classes
jimad> as overloading operator->() for pointer classes...

The monstrosity mentioned above is that if '.' is overloadable
separately, we may have the following interesting consequence:

	void fun /* pun intended! */(someclass *ptr)
	{
	    someclass &ref = *ptr;

	    assert (ref.field != ptr->field);
	}

Bah. Gasp! Uhm... This seems to stretch things much further than

	assert (ptr["key"] != *(ptr + "key")); // in general


So, in the end I think that

1) The whole area merits much further investigation -- do we really want
to overload type uncostructors? In which form? For example, do we want
them to be strictly reflexive (the overloading of '->' mapping a
'someclass *' value into another 'someclass *' value) or not (returning
an 'otherclass *' value)? And so on...

2) If we really want to overload type unconstructors because we want to
refine their semantics (hints of reflexivity here!) we had better be
able to overload *all* of them for consistency, and with some nicer
(macro-like?) syntax, maybe.

3) It would be much better not to overload separately the '->' and '.'
type unconstructors, as it would be better much not to overload
separately the '[]' and '+' binary operators, etc..., in order to
preserve or restore the traditional 'C' equivalences, except for those
involving '*' and '&', to which the others can be reduced (with the help
of the builtin . selector).


Or we could (if only...) adopt a Forth or Lisp style syntax for C++, in
which punctuation is nearly non existent and everything is a (compile
time or runtime) function or 'word', and can be executed at compile
time (macro, compiler dictionary word) or at run time (lambda, execution
dictionary word) as the user wishes. We could add a few more things,
and we would have CLOS :-).

We could even at this point adopt an OO, extensible, compiler and
runtime execution design. Just joking of course :-).


If you have followed me down to this line (only a few more to go :->),
and all this has made you think that consistency and orthogonality,
simplicity and completeness, etc... are important, you win your honorary
membership of the Algol 68 Revised Report lover's club, and you are
disqualified from membership in any language standardization committee
set up by a certain USA organization related to a popular character
code, or any standardization committee related to CCITT :-).
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

ge@wn3.sci.kun.nl (Ge' Weijers) (09/25/90)

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

>[An awful lot about overloading * . and -> deleted]
>If you have followed me down to this line (only a few more to go :->),
>and all this has made you think that consistency and orthogonality,
>simplicity and completeness, etc... are important, you win your honorary
>membership of the Algol 68 Revised Report lover's club, and you are
>disqualified from membership in any language standardization committee
>set up by a certain USA organization related to a popular character
>code, or any standardization committee related to CCITT :-).

And what about passing operators as parameters? Even Algol68 disallowed
that. What about

	PROC add = (OP * (INT, INT)INT, INT a, b)INT:
		a * b; # let's confuse the reader #

	print(add(+,4,4));

Think of the unknown possibilities of passing -> as a parameter. It gives
me a headache, especially when combined with parameterised types and
polymorphism 

	(((INT n)VOID:print((" :-)" * n, newline)))(10))

(Is C++ going in the kitchen-sink language direction?)

Ge' Weijers


--
Ge' Weijers                                    Internet/UUCP: ge@cs.kun.nl
Faculty of Mathematics and Computer Science,   (uunet.uu.net!cs.kun.nl!ge)
University of Nijmegen, Toernooiveld 1         
6525 ED Nijmegen, the Netherlands              tel. +3180652483 (UTC-2)