pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/23/90)
On 21 Sep 90 03:25:05 GMT, rfg@NCD.COM (Ron Guilmette) said: rfg> In article <57570@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) rfg> writes: jimad> Please join me in lobbying the ANSI C++ committee to correct jimad> this oversight in the language definition. Overloading jimad> operator.() makes equal sense for reference classes as jimad> overloading operator->() for pointer classes... rfg> Finally, Jim and I have found something to agree on. :-) rfg> I'd say that Jim is correct is saying that treating either `.' or rfg> `->' as if they were OPERATORS (kinda like other operators) makes rfg> equal sense, i.e. *NONE*! Allowing overloading for either of these rfg> makes about as much sense as allowing overloading for `{' or `}'. I tend to agree, but for other reasons, as the case for overloading '->' is better than you think. I'd allow it, but *indirectly*, via overloading '*'. rfg> Look folks, just because a particular C++ token contains some non- rfg> alphanumeric characters does not make it an operator! The *binary* '.' syntax can be viewed as a *compile time* operator; given a typed (l)value to an object and an identifier, select the slot in the object named by that identifier. In languages that implement objects as "frames" (more or less loosely, e.g. Self, in some way Smalltalk, CLOS), you *can* do it. Maybe Jim Adcock has got a case of CLOS-envy? :-) :-) :-) rfg> OK. Back to basics. What is `->'? Well, it is (syntactic sugar) rfg> shorthand notation for `*.' (i.e. dereference and select). [ ... ] rfg> The meaning is clear and unambiguous. Apply the prefix unary rfg> dereference operator to the thing on the left, and then select out rfg> a member from the result. Note: there may be a good argument that '->' is more fundamental than '.', so it may be better to say that 'a.b' can be rewritten as '(&a)->b' rather than 'a->b' can be rewritten as '(*a).b', even if there is then a problem with register structs. rfg> For all of the binary operators that *I* know of, either the left rfg> or the right operands may be arbitrarily complex *expressions* (so rfg> long as these expressions evaluate to some type of value which is rfg> appropriate for the given category of operator). This definition would seem to justify overloading ',' which instead I think makes about as much sense as overloading ';', which is not a lot, as they both are really just 'and-also' style syntax words, one for expressions, the other for statements. Too bad that the tempation to overload ',' to create pairs such as complex numbers or cartesian coordinates has been overwhelming. Too bad that ',' cannot be overloaded on fundamental types like 'int' or 'float', which makes things less enjoyable in this respect. rfg> Likewise, since `->' is just a shorthand notation for a combination rfg> of an operator (i.e. unary prefix dereference) and the selector, rfg> allowing overloading for `->' make as little sense as allowing it rfg> for the selector all by itself, i.e. none whatsoever. But overloading '->' is a *unary* postfix operator, unintuitive as it may seem. You seem to have fallen into the trap (in which I had fallen myself when first meeting the idea of overloading '->') of thinking that it is overloaded as a binary operator. Actually '->' is a binary selector (compile time operator) *plus* a unary (runtime) operator, and C++ allows you magically to overload the latter only. Overloaded '->' is essentially equivalent to a postfix overloaded unary '*', but *only* in the context of 'casting' an object to a pointer to an object for access to a member of that object; while just overloading unary '*' currently works if one wants to access the entire object *only*. This cast-like nature on 'this' is why '->' is only overloadable as a member operator, not as a (friend, usually) unary operator proper. Note that the '->' signature may somehow include its result type, like a cast. When '->' is overloaded, it results in something like class anyclass { ... b; ... }; class someclass { ... anyclass *operator ->(); ... }; someclass a; assert (&a->b == &(operator ->(a))->b); // funny syntax here I say 'funny syntax' because the expression '(operator ->(a))->b' should have been written as '(*(a).operator ->()).b', where the '.' denote the builtin selector here, and '*' may be builtin or overloaded in 'anyclass'. rfg> I do object (most violently) however to any attempts to call the rfg> selector `.' a `binary operator' (or to treat it as though it were rfg> one). Obviously, it isn't. By the same device used for '->' you could have overloading of '.' as a unary postfix operator that took a *reference* to an object, and returned one; for example, class anyclass { ... y; ... }; class someclass { ... class anyclass &operator .(); ... }; someclass x; assert (&x.y == &(operator .(x)).y); // funny syntax here Note that while '->' would return a *pointer*, '.' would return a *reference*, as Jim Adcock says; this entails assuming that each object may be a reflexive reference to itself. This may or may not be fishy. The fundamental difference between overloading '->' or '.' (but also a few others!) and something like '+' is that the latter does not involve a pseudo recursive invocation of the builtin '+', but the former two eventually resolve some underlying builtin. For Interlisp speakers: we are essentially advising the builtin '->' and '.' syntax words. By further analogy one could argue that '[]' should overload as an unary operator too. To illustrate the difference between these two types of overloading, let's take this point further: class coord { int x,y; ... coord(int,int); ... }; class triangularMatrix { ... friend float &operator[](coord &xy); ... }; triangularMatrix triangle(50,50); float middle = triangle[coord(25,25)]; This is clearly the current style of overloading of '[]'. Here '[]' is a binary operator totally unrelated to '*' or '+'. class hashTable { ... symbol &operator[](); ... }; hashTable table(...); symbol first = table[0]; Here instead we assume '[]' to be a *unary* operator, and the last line would be equivalent to something like symbol first = (operator [](table))[0]; // funny syntax here rfg> If there are any rules for what should be called an operator and rfg> what should not, and what should be overloadable and what should rfg> not, I'd like to see them! Well, if anybody can produce consistent reasons (instead of nice arguments) for many of the design decisions of any language out there I'd call that a major advance of the state of the art. The only major case where I have seen happen is for type theories -- Pascal's based on Hoare's (Structured Programming, 2nd essay), and there more recent examples, e.g. in the functional language group. The people working in the latter seem to have some idea of what they are doing, but they tend to limit their languages to what their *mathematical* based reasoning amounts to. Even in Eiffel, where Meyer seems to make an effort to justify his decisions by reasoning, there are many non-sequiturs (vide the issue of references) in that reasoning, IMNHO. rfg> If these rules are at all consistant, if they make any sense rfg> whatsoever, and if they still would seem to permit -> to be rfg> overloaded, I'll eat my hat. -- Ahem -- I think I have found some rationales that would allow apparently *consistent* overloading for '->' and '.', interpreting them as unary postfix operators redefining part of a builtin. Note that I say *consistent* -- I leave the question of whether the whole business is the best way to achieve certain admittedly desirable effects, or instead rather a messy way, to posterity :-) (well actually to the last part of this treatise). By applying the same rules consistently one could also make the case that '[]' should also be interpretable as a unary operator, overloading the builtin '[]' type unconstructor. I am using here 'uncostructor' as a neologism, for want of better terminology. When you say 'int *p;' the '*' is a type constructor (old reminiscences of EL/1, more reent ones of ML etc...), because it is a function on types (mapping 'int' to 'int *'). When you thereafter say '*p' this '*' in a sense 'undoes' the other, because it *both* has a runtime effect (indirection, which undoes the '&' operator), and a compile time effect (mapping the type 'int *' to 'int'). Overloading '->' and '.' (the unconstructors for the 'struct' like type constructors), '*' and '[]' (those for '*' and '[]', of course stretching things a bit for '[]') essentially amounts to overloading the runtime effect. As binary operators, '[]' may be reasonably interpreted as a *just runtime* operator, and thus overloadable within C++; '->' and '.' only as *compiletime* (too) operators, and therefore they could only be overloaded in both their runtime and compiletime dimensions in meta-C++, which we do not have access to. Unfortunately for your hat (:->) C++ *does* define '->' as a _unary_ operator only, and according to my rationale above, it is consistent to treat this as an operator (I very much doubt that any of the people involved in designing C++ did actually consider all these subtle points, because they left out unary []), as a kind of unary postfix '*', as shown above. This operator is however different from '*', because the overloaded operator '*' is free as to the type it can return, because it does not overload (part of) an unconstructor. rfg> Now I've got no problem with (or objection to) allowing overloading rfg> for the unary prefix dereference operator BECAUSE THAT *IS* AN rfg> OPERATOR IN EVERY SENSE OF THE WORD. Indeed now that I think of it, a case could be made that one should be able to overload '*' as part of an uncostructor as well, like for '[]' in both circumstances. I mean that currently, assuming class mypointer { ... friend int operator *(); ... }; mypointer m(...); mypointer *mp = &m; assert (&(*mp) == &m); '*' is the builtin one. assert (*m == ... /* some 'int' value */); We could extend this as in: class mypointer { ... friend void *operator *(); ... ... mypointer *operator *(); ... }; Now the second overloading of '*' is overloaded also an uncostructor, when applied to a pointer type, and not just to a base type. Naturally we have some difficulty in deciding which to apply if both are defined, because they are both unary, so we should forbid defining both... (note that with unary and binary '[]' we can disambiguate, if the binary version signature does not have 'int' as second operand). Now let's turn to whether it is desirable (not just consistent) to overload not just operators on object types, but also operators on unconstructors, i.e. on pointers, references and arrays to objects. First note that in C it is true that 'a.b' and '(&a)->b' are the same thing, as it is also true that 'a[b]' and '*(a+b)' are equivalent, just as '*(&a)' and 'a' and '&(*a)'. This equivalence has been broken in C++. This may be criticizable in itself, admittedly. We would be adding new cases where traditional equivalences are broken. For '[]' there is some rationale; it is convenient to be able to express the notion of indexing objects (directly, I mean) with arbitrary values, not just integers, and while this could, with suitable definitions, be done overloading '+', it just *seems* a little more readable to be able to overload directly '[]'. But IMNHO this is scant justification. The same problem would recur with '->' and '.': we can have '->' overload '.' as well (by using the traditional equivalence between the two), or make them separately overloadable. I think that since the goal of overloading '->' and '.' and '[]' is really implementing "intelligent" pointers ('->') or references ('.') or arrays('[]'), this could be better achieved by overloading just unary '*', by having the traditional equivalences, in all of which we may overload '*': a->b (*a).b (. is builtin) a.b (*(&a)).b (& may be overloaded, . is builtin) a[b] *(a+b) (+ may be overloaded) We would just lose, by having the ability to overload '*' (and '&'), the equivalences involving it and '&' ('*(&a)' equivalent to 'a' and '&(*a)'), which seems unavoidable. But maybe people would prefer, in the IMNHO mistaken idea that it is clearer, to overload '->', '.' and '[]' etc... directly, even if this breaks more equivalences. Breaking more equivalences creates bizarre situations, just like for '[]'; '(*a).b' in C++ would be no more equivalent to 'a->b' than 'a[b]' is equivalent to '*(a+b)', for non primitive types. If we are not discouraged by this, then overloading '->' and '[]' etc... make sense. jimad> Overloading operator.() makes equal sense for reference classes jimad> as overloading operator->() for pointer classes... The monstrosity mentioned above is that if '.' is overloadable separately, we may have the following interesting consequence: void fun /* pun intended! */(someclass *ptr) { someclass &ref = *ptr; assert (ref.field != ptr->field); } Bah. Gasp! Uhm... This seems to stretch things much further than assert (ptr["key"] != *(ptr + "key")); // in general So, in the end I think that 1) The whole area merits much further investigation -- do we really want to overload type uncostructors? In which form? For example, do we want them to be strictly reflexive (the overloading of '->' mapping a 'someclass *' value into another 'someclass *' value) or not (returning an 'otherclass *' value)? And so on... 2) If we really want to overload type unconstructors because we want to refine their semantics (hints of reflexivity here!) we had better be able to overload *all* of them for consistency, and with some nicer (macro-like?) syntax, maybe. 3) It would be much better not to overload separately the '->' and '.' type unconstructors, as it would be better much not to overload separately the '[]' and '+' binary operators, etc..., in order to preserve or restore the traditional 'C' equivalences, except for those involving '*' and '&', to which the others can be reduced (with the help of the builtin . selector). Or we could (if only...) adopt a Forth or Lisp style syntax for C++, in which punctuation is nearly non existent and everything is a (compile time or runtime) function or 'word', and can be executed at compile time (macro, compiler dictionary word) or at run time (lambda, execution dictionary word) as the user wishes. We could add a few more things, and we would have CLOS :-). We could even at this point adopt an OO, extensible, compiler and runtime execution design. Just joking of course :-). If you have followed me down to this line (only a few more to go :->), and all this has made you think that consistency and orthogonality, simplicity and completeness, etc... are important, you win your honorary membership of the Algol 68 Revised Report lover's club, and you are disqualified from membership in any language standardization committee set up by a certain USA organization related to a popular character code, or any standardization committee related to CCITT :-). -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
ge@wn3.sci.kun.nl (Ge' Weijers) (09/25/90)
pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >[An awful lot about overloading * . and -> deleted] >If you have followed me down to this line (only a few more to go :->), >and all this has made you think that consistency and orthogonality, >simplicity and completeness, etc... are important, you win your honorary >membership of the Algol 68 Revised Report lover's club, and you are >disqualified from membership in any language standardization committee >set up by a certain USA organization related to a popular character >code, or any standardization committee related to CCITT :-). And what about passing operators as parameters? Even Algol68 disallowed that. What about PROC add = (OP * (INT, INT)INT, INT a, b)INT: a * b; # let's confuse the reader # print(add(+,4,4)); Think of the unknown possibilities of passing -> as a parameter. It gives me a headache, especially when combined with parameterised types and polymorphism (((INT n)VOID:print((" :-)" * n, newline)))(10)) (Is C++ going in the kitchen-sink language direction?) Ge' Weijers -- Ge' Weijers Internet/UUCP: ge@cs.kun.nl Faculty of Mathematics and Computer Science, (uunet.uu.net!cs.kun.nl!ge) University of Nijmegen, Toernooiveld 1 6525 ED Nijmegen, the Netherlands tel. +3180652483 (UTC-2)