mnc@m10ux.UUCP (Michael Condict) (04/20/87)
These arguments about whether C is "allowed to disregard parentheses" seem to be degenerating into confusion based on lack of precise terminology and/or understanding of the concepts involved. In particular, some people seem to be needlessly alarmed about the possibility of the compiler misinterpretating their parenthesized expressions. At the risk of appearing pedantic, I'd like to inject a couple of definitions and a little careful exposition into the conversation, in the hope of unmuddying the water. First the definitions (please bear with me -- I get to the point between and following the definitions): OPERATOR PRECEDENCE - a ranking of operators (e.g. +,-,*,/) in a language from "most tightly binding" to "least tightly binding". The purpose is to define the meaning of expressions of the form "v1 op1 v2 op2 v3", where the vn's are operands (such as numbers) and the op's are operators. In the example, assuming op1 has higher precedence (binds more tightly) than op2, the meaning would be the same as if "(v1 op1 v2) op2 v3" were written. Notice the mnemonicity of "more tightly binding". Op1 grabs onto its arguments more tightly than op2, hence op1 gets to use v2 as its argument and op2 does not. That is, op1 must be applied first to v1 and v2, then the result supplied as left operand to op2. Conversely, if the precedence were reversed, the interpretation would be "v1 op1 (v2 op2 v3)". This leaves open the question of what to do when op1 and op2 have the same precedence (see ASSOCIATIVITY, below). Note that parentheses can always be used to explicitly alter the default precedence rules. For instance, in "(v1 op1 v2) op2 v3", op1 must be applied to v1 and v2, then the result used as the left operand of op2 (actually, I mean that the result must be as if the evaluation happened in this order, al- though any evaluation mechanism guaranteed to produce the right answer is okay). In C, as in mathematics, such parentheses cannot in general be ignored, because that would lead to the wrong answer. For instance, do not worry that any C compiler will treat "(5+6)*7", which is 77, as "5+6*7", which is 47. OPERATOR ASSOCIATIVITY - a set of operators of the same precedence (e.g., +,-) is defined to be either left or right associative. If left associative, an expression of the form "v1 op1 v2 op2 v3 ...", where all the op's are the same precedence, is to be treated the same as "((v1 op1 v2) op2 v3) ...". That is, each op is treated as having higher precedence than an op of same precedence (and at the same level in the expression) but farther to the right. In mathematics, and most computer languages, virtually all operators are left associative (with the possible exception of the power operator (x ** y ** z). Finally, we say that an operator is "associative" if it does not matter (to the final result) whether multiple occurrences of that operator in an expression are treated as left associative or right associative. For example, "+" is associative (in mathematics, and computers that don't overflow!), because (x+y)+z = x+(y+z), but "-" is not. Again, parentheses can be used to bypass the default associativity rules where needed. And again, C does not in general allow these parentheses to be ignored. No C compiler will treat "5-(6-3)" the same as "5-6-3" (but see the exception below). EVALUATION ORDER - The order in which operations are applied to values, during the computation of the value of an expression. In general, the set of possible evaluation orders is constrained by the precedence, associa- tivity of the operators in the expression and by the explicit use of parentheses. (Hence, those who say it is completely independent of parsing/precedence are just plain wrong -- it may be independent of the order in which a parse tree is constructed, but is not independent of the shape of the parse tree.) Often, there is more than one correct evaluation order. For example, to compute "2 * 3 + 5 * 7" one can evaluate "2 * 3" before or after "5 * 7", but both must be evaluated before "+". The cases where this definition gets sticky are when, due to the occurrence of associative operators, an evaluation order which would not ordinarily be allowed, is guaranteed to produce a correct result, such as evaluating "(4+5)+6" by doing "5+6" first then "4+11". Here is the crux of the confusion and disagreement we have seen in recent articles. Is the evaluation just de- scribed a legal evaluation order for "(4+5)+6"? I'm inclined to say, no, it is a legal evaluation order for the different, but mathematically equivalent expression "4+(5+6)", but this is semantics for semantics' sake. The evaluation of an expression is a private matter between consenting compilers and computers and is not any business of the programmer, given that the correct result is produced. Now finally, the main point: The source of this whole unpleasant business is that C is defined as though its "+" and "*" operators were associative (int or float, it doesn't matter to the discussion). That is it allows the evaluation of "x+y+z" (or even"(x+y)+z") as though they were written "x+(y+z)". This does not necessarily produce the same answer in the presence of overflow and roundoff errors, hence the C operators are NOT associative, unlike the corresponding ideal mathematical operators. Is this a bug in the C definition? It depends on your point of view (or your programming needs), I guess. To summarize, C does not allow compilers to "disregard parentheses" in general. It only allows the reparenthesization of the expression in a manner that would be mathematically equivalent if the operators were "ideal", i.e., perfectly accurate and never over- or underflowing. While this may be an unfortunate violation of some persons' notion of mathematical correctness, it allows a class of optimizations which the majority of C programmers apparently want. -- Michael Condict {ihnp4|vax135|cuae2}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ
chris@mimsy.UUCP (Chris Torek) (04/21/87)
In article <200@m10ux.UUCP> mnc@m10ux.UUCP (Michael Condict) writes some nice definitions that might possibly mislead someone, so I am adding this note. >Now finally, the main point: The source of this whole unpleasant business >is that C is defined as though its "+" and "*" operators were associative [and commutative] >(int or float, it doesn't matter to the discussion). >To summarize, C does not allow compilers to "disregard parentheses" >in general. It only allows the reparenthesization of the expression >in a manner that would be mathematically equivalent if the operators >were "ideal", i.e., perfectly accurate and never over- or underflowing. The summary is correct. I think, though, that a few more examples are in order, just to show what else can be done by C compilers: C also believes in distribution: (a * b) + (a * c) can become a * (b + c) (or vice versa!) or even (c + b) * a Multipliers and addends can be propagated madly: a + (((((i * 5) + j) * 6) + k) * /* sizeof (short): */ 2) is a likely candidate, having been constructed by the declaration and reference short a[4][5][6]; a[i][j][k] which might be transformed as a + (i * 60) + (j * 12) + (k * 2) = a + (i * 64) - (i * 4) + (j * 16) - (j * 4) + (k * 2) = a + (i * 64) + (j * 16) - (i + j) * 4 + (k * 2) = a + ((i * 4) + j) * 16 - (i + j) * 4 + (k * 2) which can be compiled using only shift and add/sub instructions (in case you are comping for, e.g., a processor with no integer multiply). (Merging the shifts, the last step, is useful if the processor does not have a barrel shifter.) According to K&R p. 185, Expressions involving a commutative and associative operator ( * , + , & , | , ^ ) may be rearranged arbitrarily, even in the presence of parentheses; to force a particular order of evaluation an explicit temporary must be used. I have no idea how the corresponding idea is phrased in the dpANS, but at least they provided *some* sort of escape (the dreaded unary plus). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
mnc@m10ux.uucp (04/21/87)
These arguments about whether C is "allowed to disregard parentheses" seem to be degenerating into confusion based on lack of precise terminology and/or understanding of the concepts involved. In particular, some people seem to be needlessly alarmed about the possibility of the compiler misinterpretating their parenthesized expressions. At the risk of appearing pedantic, I'd like to inject a couple of definitions and a little careful exposition into the conversation, in the hope of unmuddying the water. First the definitions (please bear with me -- I get to the point between and following the definitions): OPERATOR PRECEDENCE - a ranking of operators (e.g. +,-,*,/) in a language from "most tightly binding" to "least tightly binding". The purpose is to define the meaning of expressions of the form "v1 op1 v2 op2 v3", where the vn's are operands (such as numbers) and the op's are operators. In the example, assuming op1 has higher precedence (binds more tightly) than op2, the meaning would be the same as if "(v1 op1 v2) op2 v3" were written. Notice the mnemonicity of "more tightly binding". Op1 grabs onto its arguments more tightly than op2, hence op1 gets to use v2 as its argument and op2 does not. That is, op1 must be applied first to v1 and v2, then the result supplied as left operand to op2. Conversely, if the precedence were reversed, the interpretation would be "v1 op1 (v2 op2 v3)". This leaves open the question of what to do when op1 and op2 have the same precedence (see ASSOCIATIVITY, below). Note that parentheses can always be used to explicitly alter the default precedence rules. For instance, in "(v1 op1 v2) op2 v3", op1 must be applied to v1 and v2, then the result used as the left operand of op2 (actually, I mean that the result must be as if the evaluation happened in this order, al- though any evaluation mechanism guaranteed to produce the right answer is okay). In C, as in mathematics, such parentheses cannot in general be ignored, because that would lead to the wrong answer. For instance, do not worry that any C compiler will treat "(5+6)*7", which is 77, as "5+6*7", which is 47. OPERATOR ASSOCIATIVITY - a set of operators of the same precedence (e.g., +,-) is defined to be either left or right associative. If left associative, an expression of the form "v1 op1 v2 op2 v3 ...", where all the op's are the same precedence, is to be treated the same as "((v1 op1 v2) op2 v3) ...". That is, each op is treated as having higher precedence than an op of same precedence (and at the same level in the expression) but farther to the right. In mathematics, and most computer languages, virtually all operators are left associative (with the possible exception of the power operator (x ** y ** z). Finally, we say that an operator is "associative" if it does not matter (to the final result) whether multiple occurrences of that operator in an expression are treated as left associative or right associative. For example, "+" is associative (in mathematics, and computers that don't overflow!), because (x+y)+z = x+(y+z), but "-" is not. Again, parentheses can be used to bypass the default associativity rules where needed. And again, C does not in general allow these parentheses to be ignored. No C compiler will treat "5-(6-3)" the same as "5-6-3" (but see the exception below). EVALUATION ORDER - The order in which operations are applied to values, during the computation of the value of an expression. In general, the set of possible evaluation orders is constrained by the precedence, associa- tivity of the operators in the expression and by the explicit use of parentheses. (Hence, those who say it is completely independent of parsing/precedence are just plain wrong -- it may be independent of the order in which a parse tree is constructed, but is not independent of the shape of the parse tree.) Often, there is more than one correct evaluation order. For example, to compute "2 * 3 + 5 * 7" one can evaluate "2 * 3" before or after "5 * 7", but both must be evaluated before "+". The cases where this definition gets sticky are when, due to the occurrence of associative operators, an evaluation order which would not ordinarily be allowed, is guaranteed to produce a correct result, such as evaluating "(4+5)+6" by doing "5+6" first then "4+11". Here is the crux of the confusion and disagreement we have seen in recent articles. Is the evaluation just de- scribed a legal evaluation order for "(4+5)+6"? I'm inclined to say, no, it is a legal evaluation order for the different, but mathematically equivalent expression "4+(5+6)", but this is semantics for semantics' sake. The evaluation of an expression is a private matter between consenting compilers and computers and is not any business of the programmer, given that the correct result is produced. Now finally, the main point: The source of this whole unpleasant business is that C is defined as though its "+" and "*" operators were associative (int or float, it doesn't matter to the discussion). That is it allows the evaluation of "x+y+z" (or even"(x+y)+z") as though they were written "x+(y+z)". This does not necessarily produce the same answer in the presence of overflow and roundoff errors, hence the C operators are NOT associative, unlike the corresponding ideal mathematical operators. Is this a bug in the C definition? It depends on your point of view (or your programming needs), I guess. To summarize, C does not allow compilers to "disregard parentheses" in general. It only allows the reparenthesization of the expression in a manner that would be mathematically equivalent if the operators were "ideal", i.e., perfectly accurate and never over- or underflowing. While this may be an unfortunate violation of some persons' notion of mathematical correctness, it allows a class of optimizations which the majority of C programmers apparently want. -- Michael Condict {ihnp4vax135cuae2}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ
hansen@mips.UUCP (Craig Hansen) (04/21/87)
In article <200@m10ux.UUCP>, mnc@m10ux.UUCP (Michael Condict) writes: [an excellent summary that distinguishes precedence, associativity, and order of evaluation] > Now finally, the main point: The source of this whole unpleasant business > is that C is defined as though its "+" and "*" operators were associative (int > or float, it doesn't matter to the discussion). That is it allows the > evaluation of "x+y+z" (or even"(x+y)+z") as though they were written > "x+(y+z)". This does not necessarily produce the same answer in the presence > of overflow and roundoff errors, hence the C operators are NOT associative, > unlike the corresponding ideal mathematical operators. Is this a bug in the > C definition? It depends on your point of view (or your programming needs), > I guess. This _is_ a bug in the C definition, particularly with respect to "float." Clearly, floating-point operations aren't associative, with particular problems in floating-point addition and subtraction. Since C doesn't check for integer overflow, most common reassociation involving integers are OK. Life would be much easier if compilers would stick to optimizations that are safe. ...aren't compilers here on this earth simply to make our lives a little easier, after all? All I'm proposing is a upward-compatible extension to ANSI C: you can conform to ANSI C and not perform these unsafe reassociations. If ANSI C prohibits these unsafe optimizations, it will be a better standard, but you can choose to build a better, safer compiler yourself. -- Craig Hansen Manager, Architecture Development MIPS Computer Systems, Inc. ...decwrl!mips!hansen
herndon@umn-cs.UUCP (Robert Herndon) (04/22/87)
ARG!! I feel I must applaud the efforts of X3J11 for their heroic efforts to supply a solution to the problem of expression rearrangement in C. I doubly applaud them if they've been suffering through all the nonsense posted so far in this group. First: In programming in C, I almost invariably write expressions in what I think is THE MOST EASILY UNDERSTOOD manner. Thus I may write #define PI 3.14159265 ... x = (PI / 4) * (hairy expression, perhaps with a multiplier of 2 or 8); and ignorantly expect the compiler to rearrange the expression at its whim. And happily so. Often-times, I also use C as a "machine independent assembly code". When I do so, I often define macros for performing various operations, e.g., #define ABS(x) ((x) < 0 ? -(x) : (x)) and again, hope the C compiler doesn't literally evaluate x repeatedly if it's a macro or expression that can be rearranged. Those of us who have a hammer called C available and insist on looking at numerically critical nails have generally had to suffer with temporaries. Those who insist that C should honor parentheses "because the programmer knows what he means" obviously don't. This kind of thinking brings us true behemoths like {pick your favorite humongous language} that nobody uses. X3J11 has tried to satisfy these people without destroying the reasonable rearrangements C compilers often make. As a solution: 1) Unary + is a kluge, but a reasonable, workable one that may help some people. 2) If a solution is REQUIRED, then it or one of the other solutions proposed should be used. 3) If a solution is NOT required, then conforming compilers should be allowed to either: a) cause the compiler to issue a warning that the expression may be rearranged and proceed; or b) report a syntax error on encountering the syntax; or c) cause the compiler to honor the parentheses AT THE COMPILER'S DISCRETION. C has a 'minimalist' tradition. Unfortunately, with the standardization efforts, creeping featurism is setting in. This is to be expected to some extent, but obviates many of the advantages of C -- small, fast, portable... My opinion is that neither unary + NOR an explicit primitive to force associativity should be included in the C standard. Keep our compilers simple! They're ALREADY complicated enough. While I feel that unary + or an equivalent syntax may help me occasionally, I am not thoroughly convinced that the added complexity is worth it. As an author of occasional translators, I'm sure it's not worth it. I therefore feel that: 1. UNARY PLUS SHOULD BE THROWN OUT. 2. NO EQUIVALENT SYNTAX SHOULD BE PROVIDED. -- Robert Herndon Dept. of Computer Science, ...!ihnp4!umn-cs!herndon Univ. of Minnesota, herndon@umn-cs.ARPA 136 Lind Hall, 207 Church St. SE herndon.umn-cs@csnet-relay.ARPA Minneapolis, MN 55455
mckeeman@wanginst.EDU (William McKeeman) (04/22/87)
In article <6378@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >According to K&R p. 185, > > Expressions involving a commutative and associative operator > ( * , + , & , | , ^ ) may be rearranged arbitrarily, even in > the presence of parentheses; to force a particular order of > evaluation an explicit temporary must be used. > >I have no idea how the corresponding idea is phrased in the dpANS, >but at least they provided *some* sort of escape (the dreaded unary >plus). >-- I know some folks are getting tired of all of this. Maybe some facts would help cut down on the various opinions in reaction to guesses. [begin of quote of x3j11 document July 9, 1986, available from Computer and Business Equipment Manufacturers Association, 311 First Street, N.W., Suite 500, Washington, DC 20001-2178. (202)737-8888] 3.3 EXPRESSIONS An _expression_ is a sequence of operators and operands that specifies how to compute a value, or how to generate side effects, or both. Except as indicated by the syntax [see footnote 15] or otherwise specified later (for the function-call operator (), the unary plus operator, &&, ||, ?:, and comma operators), the order of evaluation of an expression is unspecified. The implementation may evaluate subexpressions in any order, even if the subexpressions produce side effects. The order in which side effects take place is unspecified, except that the evaluation of the operands of an operator that involves a sequence point shall not be interleaved with other evaluations. An expression involving more than one occurrence of the same commutative and associate binary operator (*, +, &, ^, |) may be regrouped arbitrarily, even in the presence of parentheses, provided the types of the operands or of the results are not changed by this regrouping. To force a particular grouping of operations, either the value of the expression to be grouped may be explicitly assigned to an object, or grouping parentheses may be preceded by a unary plus operator. ... (skipping) If an expection occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not representable), the behavior is undefined. 3.6 STATEMENTS ... (skipping) A _full_expression_ is an expression that is not part of another expression. Each of the following is a full expression: an initializer; the expression in an expression statement; the controlling expression of a selection statement (if or switch); the controlling expression of an iteration statement (while, do, or for); the expression in a return statement. The end of a full expression is a sequence point. [end of quote] Footnote 15 tells how to imply precedence from the syntax and order of presentation in the draft standard. [begin of quote of Rationale, dated 7 July 1986, section 2.1.2.3, Program execution.] ... A question sometimes asked regarding optimization is, "Is the rearrangement still conforming if the pre-computed expression might raise a signal (such as division by zero)?" Fortunately for optimizers, the answer is "Yes", because any evaluation that raises a computational signal has fallen into an _undefined_behavior_ (refer to section 3.3) for which any action is allowable. [end of quote] [begin comment by McKeeman] The only constraint on compiler rearrangements is that the TYPE of the operands and results not thereby be changed. (emphasis mine) [end of comment] -- W. M. McKeeman mckeeman@WangInst Wang Institute decvax!wanginst!mckeeman Tyngsboro MA 01879
jjw@celerity.UUCP (Jim ) (04/24/87)
I hesitate to contradict what a standard says about itself but... In article <1038@wanginst.EDU> mckeeman@wanginst.UUCP (William McKeeman) quotes 3j11 document July 9, 1986: >Footnote 15 tells how to imply precedence from the syntax and order of >presentation in the draft standard. > >[begin of quote of Rationale, dated 7 July 1986, section 2.1.2.3, Program >execution.] > > ... A question sometimes asked regarding optimization is, "Is the >rearrangement still conforming if the pre-computed expression might raise a >signal (such as division by zero)?" Fortunately for optimizers, the answer >is "Yes", because any evaluation that raises a computational signal has >fallen into an _undefined_behavior_ (refer to section 3.3) for which any >action is allowable. > >[end of quote] > >[begin comment by McKeeman] > >The only constraint on compiler rearrangements is that the TYPE of the >operands and results not thereby be changed. (emphasis mine) > >[end of comment] The operative paragraph is: > An expression involving more than one occurrence of the same commutative >and associate binary operator (*, +, &, ^, |) may be regrouped arbitrarily, >even in the presence of parentheses, provided the types of the operands or of >the results are not changed by this regrouping. In my opinion, an exception resulting in an "undefined behavior" is a different result from an un-regrouped expression that does not generate an exception and therefore provides a "defined behavior" -- J. J. Whelan
levy@ttrdc.UUCP (04/26/87)
In article <1504@umn-cs.UUCP>, herndon@umn-cs.UUCP (Robert Herndon) writes: < Often-times, I also use C as a "machine independent assembly < code". When I do so, I often define macros for performing < various operations, e.g., < #define ABS(x) ((x) < 0 ? -(x) : (x)) < and again, hope the C compiler doesn't literally evaluate x < repeatedly if it's a macro or expression that can be rearranged. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Just being able to be rearranged isn't enough! It can be perfectly rearrange- able and still have side effects. Note that if MACRO_1 is called multiple times by MACRO_2, and MACRO_1 contains something with side effects like x++ or getchar() then you had jolly well realize that the side-effects in MACRO_1 WILL get evaluated as many times as it was called, in an undefined order. Even if there are function calls in MACRO_1 which do not change the state of the computation, the compiler doesn't know to optimize the multiple calls to that function into one call, which is wasteful of CPU time. -- |------------dan levy------------| Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa, | an engihacker @ | vax135}!ttrdc!ttrda!levy | at&t computer systems division | Disclaimer: try datclaimer. |--------skokie, illinois--------|
ado@elsie.UUCP (Arthur David Olson) (04/26/87)
> Footnote 15 tells how to imply precedence from the syntax and order of > presentation in the draft standard. Remember, though, that neither the footnotes nor the Rationale are part of the Standard. -- UUCP: ..seismo!elsie!ado ARPA: elsie!ado@seismo.CSS.GOV Elsie and Ado are trademarks of Borden, Inc. and Ampex.
mouse@mcgill-vision.UUCP (05/07/87)
In article <1038@wanginst.EDU>, mckeeman@wanginst.EDU (William McKeeman) writes: > [begin of quote of x3j11 document July 9, 1986, available from > Computer and Business Equipment Manufacturers Association, 311 First > Street, N.W., Suite 500, Washington, DC 20001-2178. (202)737-8888] > 3.3 EXPRESSIONS [...] > An expression [...] may be regrouped [...], provided the types of the > operands or of the results are not changed by this regrouping. [...] > [end of quote] > [begin comment by McKeeman] > The only constraint on compiler rearrangements is that the TYPE of > the operands and results not thereby be changed. (emphasis mine) Does this mean that after extern char _ctypes_[]; #define _ctypes (_ctypes_+1) /* so _ctypes[EOF] is ok */ the compiler is not permitted to change _ctypes['x'] into _ctypes_['y'] because (_ctypes_+1)['x'] <-> *((_ctypes_+1)+'x') cannot be transformed into *(_ctypes_+(1+'x')) <-> _ctypes_[1+'x']? (Yes, I realize that in the example above the entire subscripting operation would (hopefully) computed at compile time. It was intended to illustrate a point.) der Mouse (mouse@mcgill-vision.uucp)