[comp.lang.c++] User defined operators

db@its63b.ed.ac.uk (D Berry) (04/26/88)

In article <4444@ihlpf.ATT.COM>  writes:
>>In article <8804140925.AA13150@klaus.olsen.uucp> Info-Modula2 Distribution List <INFO-M2%UCF1VM.bitnet@jade.berkeley.edu> writes:
>>
>>Does C++ allow infix procedures other than the standard set?
>
>No.  Doing this tends to lead to unreadable code.  For example:  If I
>overload the word 'or' as an infix operator, this sentence no longer has
>the same meaning that I intended (this is because 'word' becomes 'w or d'.

Does this mean that "newton" isn't a legal C++ identifier because it will
be parsed as "new" "ton"?  I doubt it.
Most languages that allow user defined infix operators let them be any
(of a subrange of) lexically distinct token(s).  Often alphanumeric and symbolic
tokens will be different sets, allowing expressions such as "w+d" to be parsed
correctly, while requiring the spaces in "w or d" to distinguish this case
from "word".

>It also leads to nightmares for the parser (is '/+' an error or an overload
>operator, etc.).

The easiest way to handle this is to take the rule for distinguishing between
alphanumeric identifers -- read the longest -- and use it for symbolic
identifiers as well.  So Nevin's example would be a single identifier "/+".
If this were done to C++ (I'm not suggesting it should be done), its
expression would differ from C in some cases.  E.g.

	Expression	C parse				(C++)++ parse 

	a+++++b   	"a" "++" "+" "++" "b"		"a" "+++++" "b"
	*++p		"*" "++" "p"			"*++" "p"

However, C++ isn't source code compatible with C anyway, and this scheme
would make the existing treatment of "/*" and "*/" examples of a general rule
rather than a specific case.  It would probably also make cases like the above
easier to read, as they would have to be broken up:

	a++ + ++b		*(++p)

(Really basic symbols such as brackets and quotes shouldn't be allowed to
appear in symbolic identifiers or things get out of hand).

Defining your own operators is subject to the same cautions as overloading
existing ones.  It can make code easier to read, but you can also use it to
make a real dog's breakfast.

It also requires scope rules for the infix nature of the token.  One person
might define "or" to be a infix operator in class A while someone else defines
"or" as a function in class B.  How does a function parse "or" in a program that
uses both classes?

The rule used in Standard ML would translate to C++ as follows: an infix
operator is infix in the class in which it's declared and in all subclasses
and member functions.  From outside the class, it's treated as a prefix
function of two arguments (e.g. A::or (x, y);   B::** (x, n);).  A keyword can
make all infix tokens of a class be parsed as infix in the current file
(e.g. acceptinfix A;  x or y; acceptinfix B; x ** n;).

An alternative would be to let "A::or" be used infix all the time (e.g.
x A::or y;  x B::** n;).  This would probably be better for C++.

Presumably to be orthogonal this scheme would have to allow user defined
prefix operators as well.  Functions and prefix operators would have to
be distinguished by the same rule as functions and infix operators.
C++ can already distinguish between prefix and infix operators.

I'm not proposing that user defined operators should be added to C++; I'm just
attempting to show that it could be done and to point out some of the problems
that would need to be resolved.  Please follow up if I've missed anything.
 
>
> _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194

-- 
"The answer is simple, they could do it with ease;
 stop attacking the patients, and attack the disease."	-- Tom Robinson.

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/27/88)

BTW, in my original article I was merely explaining why (I thought) that
C++ did not allow user-definable operators.  I hope that I didn't give the
impression that I thought that this is impossible to define within the
language.

In article <1206@its63b.ed.ac.uk> db@itspna.ed.ac.uk (Dave Berry) writes:

>	Expression	C parse				(C++)++ parse 
>
>	a+++++b   	"a" "++" "+" "++" "b"		"a" "+++++" "b"
>	*++p		"*" "++" "p"			"*++" "p"
>
>However, C++ isn't source code compatible with C anyway, and this scheme
>would make the existing treatment of "/*" and "*/" examples of a general rule
>rather than a specific case.

There is very little that makes C programs choke in the C++ translator.  If
you fix the header files up (put in function prototypes, which will be
required by ANSI C anyway) and don't use any of the new keywords (I know
there are a few more little things but I don't have the C++ book handy
right now), I can compile my C program with my C++ compiler.  Changing the
rules from C to C++ is very undesirable.

>It would probably also make cases like the above
>easier to read, as they would have to be broken up:

I agree, but this should not be a design goal of the language (this does
not give me more expressive power, so why put it in?).

A few things.  First, a syntax would have to be devised to allow the
overloading of the user-defined operators for the builtin types, such as
char, int, etc.

Secondly, what do you do about defining the order of precedence and
associativity of these user-defined operators?  If you make them all the
same, then what do you really gain by having infix instead of prefix; you
still need parens all over the place.  And where in the chart would you put
it?

Thirdly, there might be some problem in differentiating (at compile time)
which operators return lvalues and which do not.

Fourthly (is that a word? :-)), I wouild also like to be able to define my
own unary operators (both prefix and postfix).  Otherwise the same question
which started this discussion comes up :-).


I am not saying that it is impossible to allow user-definable operators in
C++, only that it may not be desirable and that if it is desirable it needs
a lot more planning than we have done so far.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

daniels@teklds.TEK.COM (Scott Daniels) (04/28/88)

In article <1206@its63b.ed.ac.uk> db@itspna.ed.ac.uk (Dave Berry) writes:
>...I'm not proposing user defined operators should be added to C++; I'm just
>attempting to show that it could be done and to point out some of the problems
>that would need to be resolved.  Please follow up if I've missed anything.

Most of the explained non-problems with operators have to do with lexical 
analysis, solvable easily if by nothing else than whitespace rules.  The 
other problems with operators is determining their precedence.  This is 
a substantial problem to the code reader as well as the compiler.
How does the following associate?:

	a + b operator c * d;

If the answer is not obvoius to the code reader as well as the translator,
you have a nightmare.  One possible solution is to take advantage of an
observetation about precendence in grammars:
	The vast majority of expressions do not require precendence to
	resolve them.
Therefore, you could restrict user operators to use in contexts that provide
unambiguous groupings.  This would make the above illegal, requiring:
	a + (b operator c) * d;
   or   (a + b) operator (c * d);
   or ...

However, you might not be too happy with:
	a = b op c; // illegal
	a = (b op c); // legal
Since '=' is itself an operator.

Perhaps all user-operators are a fixed precedence?  All in all, I think 
the can of worms is larger than the benefit.

-Scott Daniels  (daniels@teklds.UUCP)

djones@megatest.UUCP (Dave Jones) (04/28/88)

in article <4549@ihlpf.ATT.COM>, nevin1@ihlpf.ATT.COM (00704a-Liber) says:
> 

...

> There is very little that makes C programs choke in the C++ translator.  

  In this case, "very little" is quite a bit.

  You didn't mention that C++ treats the name-space differently: There
  is no separate lookup table for "struct this" and "struct that".  That's
  another "little thing" that makes C++ not a superset of C.  And as
  you mentioned in a part that I edited out, ANSII C is adding new keywords,
  so C and C++ will diverge even further (if you will allow that ANSII C is
  C, and not a new language per se, an arguable point).

> If you fix the header files up (put in function prototypes, which will be
> required by ANSI C anyway) ...

  Will they?  I don't know much about ANSII C, but I certainly hope not!
  I hope that all old C programs will compile just fine under ANSII C.
  Is that not the intention of the committee?

  Except for old files which use the new reserved words, of course.  Those
  programs will identify themselves (syntax error), and will be easy 
  to fix -- just change the name.  They will be quite rare.  I can't
  remember ever having named a variable "volatile". 

  (This makes a good case for not reserving key-words in a language.
  But there are good arguments on the other side too. )

  I am under the impression that the old function declarations will
  work just fine.  A function prototype for a function requiring no
  parameters does not look like an old style function declaration.
  Instead it looks like this:

  int foo(void);

  The different form of function prototypes will make C++ and ANSII C
  even "more incompatible" than C++ and old C, not less so.

  Since you were discussing changes to C++, here's my preference:
  Do function prototypes the ANSII C way, and do the name-spaces the
  old C way.  But there may already be too large a body of C++ code out
  there to do that.  And then there's the books in print.
  Still, for my own purposes, that's what I would like to see.  It 
  would be great if C++ could be made a proper superset of ANSII C.
 
> ...

		Dave J.

shankar@hpclscu.HP.COM (Shankar Unni) (04/28/88)

>	Expression	C parse				(C++)++ parse 
>
>	a+++++b   	"a" "++" "+" "++" "b"		"a" "+++++" "b"
>

BZZZZZZZZ! There has already been a prolonged discussion about this
example. C does *not* parse it like this. It parses it as

    a "++" "++" "+" b

which is syntactically incorrect. Therefore, your example will choke any
decent C compiler. Remember, "longest sequence of chars".

--scu

jima@hplsla.HP.COM ( Jim Adcock) (04/28/88)

From the little bit I've messed around with writing "C" language compilers,
I'd guess the restrictions put on overloading operators [IE you're not 
allowed to defined new operators, just to overload existing ones ] 
were chosen to allow C++ to remain compatible with the traditional "C"
compiler approach -- with fixed definitions of what it is we need to lex and
parse [other that the traditional "C" hack of passing typedef info back to the
"lex" part of the compiler]

As your example points out, allowing the user to define his/her own operator
symbols greatly changes how we must interpret a string of non-alphanumerics
in the input file.  Which makes it potentially very difficult for the
reader of that input file to figure out what was meant.

Plus you'd have to give the user means to specify the new operator's precedence
and binding relationships......

Thus, the design of the compilers, the tools used to design the compilers,
and the way C++ users go about trying to read and interpret C++ program
sources would have to change considerably to be able to handle defining new
operators.

I believe the restrictive C++ approach to operator overloading is a good 
practical choice.  To give more flexibility in this area would cause C++
to diverge too greatly from C.

jima@hplsla.HP.COM ( Jim Adcock) (04/30/88)

| Most of the explained non-problems with operators have to do with lexical 
| analysis, solvable easily if by nothing else than whitespace rules.   

Where "whitespace rules" means that if users were allowed to define their
own operators, then you'd be forced to start separating operators with
whitespace, the way you presently have to separate "identifiers" with
whitespace.

I can't imagine writing C[++] code where operators "always" have to be
separated using whitespace! What a pain!  Just try going over your
C[++] programs, separating adjecent operators with whitespace, and you'll
realize why we don't want this "feature" in C++!

crowl@cs.rochester.edu (Lawrence Crowl) (05/10/88)

In article <6590049@hplsla.HP.COM> jima@hplsla.HP.COM (Jim Adcock) writes:
>I can't imagine writing C[++] code where operators "always" have to be
>separated using whitespace! What a pain!  Just try going over your C[++]
>programs, separating adjecent operators with whitespace, and you'll realize
>why we don't want this "feature" in C++!  

This is not a problem.  Consider three classes of characters, those for
identifiers (also keywords and literals), those for grouping (eg parentheses),
and those for operators.  If you do not mix classes, then you need no spaces
between tokens in different groups.  This covers most tokens in the stream. 
The major cases where this does not happen are variable declarations (which
currently requires the space) and unary operators adjacent to other operators.
If you are not putting a space in your code in the latter case, your code is
probably confusing anyway.  Consider: 

    a+=b+++*c;  versus  a+=b++ + *c;

The latter provides for user-defined operators with minimal additional typing
burden.  
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

kurt@color.ctt.bellcore.com (Kurt Gluck(PICS)) (05/11/88)

How about going with snobols method.  Predefine a small number of
additional unused operator symbols that can be used.

In snobols case the operator symbols are:

BINARY OPERATORS
Graphic	Defnition		    Associativity	    Precedence
=======	==========================  =============	    ==========
    ~	UNUSED			    right		    12
    ?	UNUSED			    left		    12
    $	immediate value assignment  left		    11
    .	conditional value assignment left		    11
    !	exponentiation		    right		    10
    **	exponentiation		    right		    10
    %	UNUSED			    left		    9
    *	multiplication		    left		    8
    /	division		    left		    7
    #	UNUSED			    left		    6
    +	addition		    left		    5
    -	subtraction		    left		    5
    @	UNUSED			    left		    4
 blank	concatenation		    left		    3
    |	alternation		    left		    2
    &	UNUSED			    left		    1

UNARY OPERATORS
graphic	definition
=======	==========
    ~	negation
    ?	interrogation
    $	indirect reference
    .	name
    !	UNUSED
    %	UNUSED
    *	unevaluated expression
    /	UNUSED
    #	UNUSED
    +	positive
    -	negative
    @	cursor position
    |	UNUSED
    &	keyword