chris@mimsy.UUCP (01/24/87)
>>> x = (a | b | c); >>>if the variable `a' contains zero, the compiler must still OR the >>>contents of `b' and `c' to determine the result. Almost true. >>>These are bitwise logical operators. Short-circuiting these makes >>>no more sense than short-circuiting a sequence of multiplies as >>>soon as one of the operands evaluates to `1'. >>>Mike McNally Digital Lynx Inc. Not so! >In article <102600001@datacube> stephen@datacube.UUCP writes: >>This posting indicates a misunderstanding of how short-circuit evaluation >>works. In the case of the '|' expression above, the decision to not evaluate >>is would occur when a or b are all ones, NOT when a or b was zero. Correct. In article <34@umich.UUCP>jtr485@umich.UUCP (Johnathan Tainter) writes: >You mean if the '|' had been a '||', of course. No, he means in the case of the `|' expression above. >And actually, when a or b is NONZERO not ALL ONES. Yes for `||', no for `|'. C guarantees short-circuit left-to-right evaluation for `||' and `&&'. It makes no guarantees for `|' and `&'. Here is a table enumerating all possibilities. [For these, read `all zeroes bit pattern' for `OFF', and `all ones bit pattern' for `ON'. This is a text compression device.] Given: Possible methods of evalution: ------ ------------------------------ && Left side evaluated. If zero, result is zero, and evaluation stops. If nonzero, right side evaluated; result is 0 if zero, 1 if nonzer. || Left side evaluated. If nonzero, result is 1, and evaluation stops. If zero, right side evaluated; result is 1 if nonzero, 0 if zero. & 1. Left side evaluated. If OFF, evaluation stops; result is OFF. If not, right side evaluated, and both results ANDed. 2. Left side evaluated. Right side evaluated. Results ANDed. 3. Right side evaluated. If OFF, evaluation stops; result is OFF. If not, left side evaluated, and both results ANDed. | 1. Left side evaluated. If ON, evaluation stops; result is ON. If not, right side evaluated, and both results ORed. 2. Left side evaluated. Right side evaluated. Results ORed. 3. Right side evaluated. If ON, evaluation stops; result is ON. If not, left side evaluated, and both results ORed. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
tim@amdcad.UUCP (01/25/87)
In article <5178@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: (OFF is all zeros, ON is all ones) +--------------------------------------- | & 1. Left side evaluated. If OFF, evaluation stops; | result is OFF. If not, right side evaluated, | and both results ANDed. | 2. Left side evaluated. Right side evaluated. | Results ANDed. | 3. Right side evaluated. If OFF, evaluation stops; | result is OFF. If not, left side evaluated, and | both results ANDed. | | | 1. Left side evaluated. If ON, evaluation stops; | result is ON. If not, right side evaluated, | and both results ORed. | 2. Left side evaluated. Right side evaluated. | Results ORed. | 3. Right side evaluated. If ON, evaluation stops; | result is ON. If not, left side evaluated, and | both results ORed. +--------------------------------------- What about if either the left side expression or the right side expression contained a side-effect (or a procedure call, which also may have a side-effect)? These cannot be short-circuited when bit-wise operators are used. -- Tim Olson Advanced Micro Devices "We plan products, not lunches"
tps@sdchem.UUCP (01/25/87)
In article <5178@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >... > [For these, read `all zeroes bit pattern' for `OFF', and `all ones > bit pattern' for `ON'. This is a text compression device.] > > Given: Possible methods of evalution: > ------ ------------------------------ > & 1. Left side evaluated. If OFF, evaluation stops; > result is OFF. If not, right side evaluated, > and both results ANDed. > 2. Left side evaluated. Right side evaluated. > Results ANDed. > 3. Right side evaluated. If OFF, evaluation stops; > result is OFF. If not, left side evaluated, and > both results ANDed. > > | 1. Left side evaluated. If ON, evaluation stops; > result is ON. If not, right side evaluated, > and both results ORed. > 2. Left side evaluated. Right side evaluated. > Results ORed. > 3. Right side evaluated. If ON, evaluation stops; > result is ON. If not, left side evaluated, and > both results ORed. >In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) I don't think so. "&" and "|" are like "+" and "-" in that you are not guaranteed that the left side will be evaluated before the right. Read K&R sections 7.8-10, p. 190: "The [&|^] operator is associative and expressions involving [&|^] may be rearranged." Otherwise, how could an optimizing compiler change (a | b) & (b | c) to (a & c) | b or fold constants together as in changing (a | CONST1) & (a | CONST2) to a | CONST3 where CONST3 is (CONST1 & CONST2). I agree with you that the evaluation might be short circuited if the result is already known. However, I don't think this is guaranteed, it might happen in the reverse order (right operand evaluated, left operand skipped), and it is guaranteed not to happen if the to-be-skipped operand has a side effect. || Tom Stockfisch, UCSD Chemistry tps%chem@sdcsvax.UCSD
chris@mimsy.UUCP (01/26/87)
In article <14479@amdcad.UUCP> tim@amdcad.UUCP (Tim Olson) writes: >What about if either the left side expression or the right side >expression [of an otherwise short-circuitable expression] contained >a side-effect (or a procedure call, which also may have a side- >effect)? These cannot be short-circuited when bit-wise operators >are used. c = *p++ & *q++; /* are both p and q always incremented? */ I can find no promise in K&R that bitwise expressions are not short circuited even in the presence of side effects. The ANSI draft may have more to say. In any case, I would advise not counting on full evaluation. It is even conceivable that a compiler might generate if ((c = *p) != 0) c &= *q; p++; q++; which, if p and q point to device registers, is not the same! In general, existing C compilers will not attempt to optimise away the AND above if *p++ (or *q++) is zero, simply because the `C philosophy' is that if you *meant* c = *p++ ? p[-1] & *q++ : 0; you would have written that. But is it mandated? I cannot say. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
chris@mimsy.UUCP (01/26/87)
In article <621@sdchema.sdchem.UUCP> tps@sdchem.UUCP (Tom Stockfisch) writes: >"&" and "|" are like "+" and "-" in that you are not guaranteed that >the left side will be [fully] evaluated before the right [or vice versa]. >Read K&R sections 7.8-10, p. 190: > "The [&|^] operator is associative and expressions involving [&|^] > may be rearranged." This is a good point too. It gets worse all the time! :-) >I agree with you that the evaluation might be short circuited if the result >is already known. However, I don't think this is guaranteed, Of course not. Intuitive languages make few guarantees.... :-) >it might happen in the reverse order (right operand evaluated, >left operand skipped), That was one of the possibilities I enumerated. >and it is guaranteed not to happen if the to-be-skipped operand has a >side effect. Is it? (This is one reason optimising compilers are so difficult to write: No one is sure just how much they are allowed to do....) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
m5d@bobkat.UUCP (01/28/87)
In article <5178@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>>>[ he's quoting me here ] >>>>These are bitwise logical operators. Short-circuiting these makes >>>>no more sense than short-circuiting a sequence of multiplies as >>>>soon as one of the operands evaluates to `1'. >>>>Mike McNally Digital Lynx Inc. > >Not so! > I know, I know, I'm sorry, I'm sorry! > [ > Chris then goes on to accurately describe the possible > paths of evaluation for |, &, ||, and &&, although he > omits ^, which is OK I think, because you can't say > much about ^ given an intermediate result > ] > >In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) >UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu I realize that my first posting(s? I can't remember how many I did) were a little confused. I do know what's going on, and would like to ask a related question. If the compiler decides to generate code that notices when part of a sequence of sibling `|' or `&' operations have generated an inevitable result, do side effects of sub-expressions still have to be performed? In other words, let's look at this: x = f1(something) | f2(something) | f3(something) ... If I wrote this, I would expect all the functions to be called, even if one of them returns ~0. Am I nuts? Seems to me that an optimization like that would be naughty. As I look at that example, I realize that people might still think I'm confused. Try this example: x = f1(something) * f2(something) * f3(something) ... Shouldn't I expect all the multiplications to be performed, even if one function returns zero? I don't think I've ever used a compiler that does such optimizations (not that that really means much). Has anyone else? If I actually *am* lost, please don't tell my employer. -- **** **** **** At Digital Lynx, we're almost in Garland, but not quite **** **** **** Mike McNally (feeling sheepish nowadays) Digital Lynx Inc. Software (not hardware) Person Dallas TX 75243 uucp: {texsun,killer,infotel}!pollux!bobkat!m5d (214) 238-7474
jpn@teddy.UUCP (01/29/87)
Why is there so much misinformation in this group?!?! I wish people who don't know what they are talking about would not post here! >I can find no promise in K&R that bitwise expressions are not short >circuited even in the presence of side effects. 60 seconds with my copy of K&R yeilds this (Appendix A): "The & operator is associative and expressions involving & may be rearranged" "Unlike &, && guarantees left-to-right evaluation; moreover, the second operand is not evaluated if the first operand is zero" >In any case, I would advise not counting on full evaluation (of bitwise). True, it is not explitly stated that & and | are not short circuited. It also doesn't explicitly say that + and * are not short circuited! However && and || are EXPLICITY defined as having short circuit behavier, no other operators are described this way, so it is safe to assume that all other operators do NOT short-circuit.
henry@utzoo.UUCP (Henry Spencer) (01/31/87)
> x = f1(something) * f2(something) * f3(something) ... > > Shouldn't I expect all the multiplications to be performed, even if one > function returns zero? > I don't think I've ever used a compiler that does such optimizations... > ... Has anyone else? Yeah, some variants of the V7 pdp11 C compiler did, in cases where one of the operands was a compile-time constant known to be 0. (This can arise legitimately when doing arithmetic on configuration-dependent #defined constants, for example.) Never bothered me, since I view side effects inside expressions as being unjustifiable pornography except for some very specific cases. I did ask Dennis about it at one point, since strict reading of K&R suggested it was a bug. His reply, as I recall it, was roughly "I think it is defensible in principle, but it caused so many complaints that newer versions of the compiler don't do it". -- Legalize Henry Spencer @ U of Toronto Zoology freedom! {allegra,ihnp4,decvax,pyramid}!utzoo!henry
henry@utzoo.UUCP (Henry Spencer) (02/03/87)
> I did ask Dennis about [short-circuiting of multiplies with a 0 operand] > ... His reply, as I recall it, was > roughly "I think it is defensible in principle, but it caused so many > complaints that newer versions of the compiler don't do it". I should add that X3J11 has come down quite firmly on the pragmatic side of this issue. With the obvious exceptions of && and friends, which are explicitly short-circuited, the compiler is not permitted to leave out side effects of expressions. (Personally, as you might have gathered from my earlier posting, I like the purist approach to this one, but I realize how much havoc it could cause...) -- Legalize Henry Spencer @ U of Toronto Zoology freedom! {allegra,ihnp4,decvax,pyramid}!utzoo!henry
jas@rtech.UUCP (02/03/87)
Apropos of whether it is legitimate to short-circuit an expression evaluation, when a short-circuited operand has side effects, Henry Spencer writes: > Yeah, some variants of the V7 pdp11 C compiler did, in cases where one of > the operands was a compile-time constant known to be 0.... > Never bothered me, since I view side effects > inside expressions as being unjustifiable pornography except for some very > specific cases. I did ask Dennis about it.... His reply ... was > roughly "I think it is defensible in principle, but it caused so many > complaints that newer versions of the compiler don't do it". I don't know, Henry, I think you and Dennis are on thin ice here. In Pascal, you're right; but C is close to being a pure expression language, in which the only way to change machine state is by a side effect. Three examples: (1) An assignment is an expression; as a side effect, it stores the value of its right operand in the memory location represented by the left operand. (2) The comma is a binary operator whose semantics are: evaluate the left operand, discard the result, then evaluate the right operand, whose value is the value of the whole expression. Obviously, this is only useful if the left operand has side effects. If a short-circuiting compiler decides never to evaluate the left operand, because it will not affect the value of the whole expression, then the comma operator becomes meaningless. (3) The most common form of statement in C is (syntactically), "<stmt> ::= <expression> ';'". The semantics associated with this production are, "evaluate the expression and discard the result." If a compiler writer were to decide to short-circuit the evaluation, since the result is being discarded, s/he would end up with a pretty useless compiler. Now, one might say, "It is poor style for an expression with side effects to itself be an operand of an expression other than a "comma" expression." (Though this prohibits the ever-popular "if ((c = getchar()) != EOF)".) But though this may be a valid stylistic constraint, I would hate to see it embedded in the semantics of the language. -- Jim Shankland ..!ihnp4!cpsc6a!\ rtech!jas ..!ucbvax!mtxinu!/
franka@mmintl.UUCP (02/04/87)
In article <3721@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes: >60 seconds with my copy of K&R yeilds this (Appendix A): > > "The & operator is associative and expressions involving & may be > rearranged" > > "Unlike &, && guarantees left-to-right evaluation; moreover, the second > operand is not evaluated if the first operand is zero" > >True, it is not explitly stated that & and | are not short circuited. It >also doesn't explicitly say that + and * are not short circuited! However >&& and || are EXPLICITY defined as having short circuit behavier, no other >operators are described this way, so it is safe to assume that all other >operators do NOT short-circuit. As a general rule, whatever is not specified in the C description (whether K&R or ANSI) is just that: not specified. Just because certain operators are described as having certain behavior does not guarantee that those operators for which the behavior is not specified behave in the opposite way. I would not write code which assumes that & and | are not short-circuited; nor would I write code which assumes that + and * are not short-circuited. (Although + really can't be.) But then, I use side-effects of expressions only in time-critical portions of the code, and only when the side effect is *guaranteed* to work by the language specification or in pieces of code marked as machine dependent. (However, my tendency is to write machine dependent code in assembler.) Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
mouse@mcgill-vision.UUCP (02/10/87)
In article <637@rtech.UUCP>, jas@rtech.UUCP (Jim Shankland) writes: > Now, one might say, "It is poor style for an expression with side > effects to itself be an operand of an expression other than a "comma" > expression." (Though this prohibits the ever-popular > "if ((c = getchar()) != EOF)".) It also prohibits *ptr++ (which I don't think is excessive). der Mouse USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse think!mosart!mcgill-vision!mouse Europe: mcvax!decvax!utcsri!mcgill-vision!mouse ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu
hoey@nrl-aic.arpa (02/12/87)
From: Chris Torek <mimsy!chris> Message-Id: <5199@mimsy.UUCP> Date: 26 Jan 87 02:17:54 GMT ... c = *p++ & *q++; /* are both p and q always incremented? */ I can find no promise in K&R that bitwise expressions are not short circuited even in the presence of side effects. The ANSI draft may have more to say. In any case, I would advise not counting on full evaluation. K&R's note that ``The order in which side-effects take place is ... unspecified'' conspicuously fails to lend credence to this point of view. Why should they unspecify the order, and not unspecify whether side-effects take place at all? It seems to me that the idea of bitwise operators (or other operators lacking sequence breaks) not performing all their side-effects is so counterintuitive that it should be outlawed. At the very least, a prominent warning must be included in the standard. Since this is a contentious issue, a standard silent on the point is no standard. If compilers are allowed to fail to increment both p and q, I would assume that they are also allowed to short-circuit evaluation of *, /, %, <<, >>, >=, and <= when the result is known by evaluation of one argument. This may also occur in evaluation of <, >, ==, and != when one of the arguments is being promoted from a restricted range. Shall we perhaps leave unspecified whether *p++ = 0; increments p when the compiler can deduce that *p is already 0? It is even conceivable that a compiler might generate if ((c = *p) != 0) c &= *q; p++; q++; which, if p and q point to device registers, is not the same! I find this perfectly acceptable, due to the above quote from K&R. If you are accessing device registers code in C, you are expected to know what the compiler is going to do with your code. If you have an optimizing compiler, you also have to worry about dead variables and code movement for code in which pointer dereference can cause side-effects. From: Henry Spencer <utzoo!henry> Message-Id: <7588@utzoo.UUCP> Date: 30 Jan 87 22:08:26 GMT ... Never bothered me, since I view side effects inside expressions as being unjustifiable pornography except for some very specific cases. I did ask Dennis about it at one point, since strict reading of K&R suggested it was a bug. His reply, as I recall it, was roughly "I think it is defensible in principle, but it caused so many complaints that newer versions of the compiler don't do it". Are your cases of justifiable pornography sufficiently specific that they should appear in the standard? I believe the standard must be precise here, and the ``so many complaints'' lead me to believe that the answer must be in favor of guaranteed execution of side-effects. Dan Hoey
chris@mimsy.umd.edu (02/12/87)
Dan Hoey <hoey@nrl-aic.arpa> writes: > From: Henry Spencer <utzoo!henry> > > I did ask Dennis about [a PDP-11 C compiler short circuiting > multiplies by constant zeroes, thus discarding side effects] > at one point, since strict reading of K&R suggested it was a > bug. His reply, as I recall it, was roughly "I think it is > defensible in principle, but it caused so many complaints that > newer versions of the compiler don't do it". > >Are your cases of justifiable pornography sufficiently specific that >they should appear in the standard? I believe the standard must be >precise here, and the ``so many complaints'' lead me to believe that >the answer must be in favor of guaranteed execution of side-effects. I would prefer to see some guarantee that side effects will occur. Guarantees as to precisely *when* are problematical, and since such have never been made, should not be part of the standard. So let us say that if (*p++ & *q++) and a = *p++ * *q++; and even a = 0 * *p++ * *q++; must always increment both p and q, but not in any particular order, so long as the memory references use the values p and q have before they are incremented. Should we also say that the memory references in the third expression must occur? If not, the compiler can optimise this into p++, q++, a = 0; but if so, must the compiler also do the multiply? (It might overflow, which could be important.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
chris@mimsy.UUCP (02/12/87)
Concerning guarantees as to evalutions, in article <4391@brl-adm.ARPA> I wrote: > a = 0 * *p++ * *q++; > >Should we also say that the memory references in the third expression >must occur? ... if so, must the compiler also do the multiply? (It might >overflow, which could be important.) Bad example; the compiler is certainly allowed (0 * *p++) giving 0 giving * *q++ giving 0, which does not overflow. Suppose instead we have double *p, r; ... r = *p++ / 1.0; Must this do the divide, possibly causing (e.g.) a reserved operand fault on a Vax due to an invalid float value at *p? -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: seismo!mimsy!chris ARPA/CSNet: chris@mimsy.umd.edu
john@viper.UUCP (02/13/87)
I agree totaly with Dan Hoey's statement on this subject. There is -FAR- too much code already existing which requires non-shortcircuit evaluation of '|' and '&' operations. Since one of the primary considerations that has been stated by the standards committee was to avoid "breaking" as much existing software as possible, and since, as Dan pointed out, "a standard which is silent on this point is no standard", it seems that at very least all side effects -must- be carried out. As long as the code produces the same results, I doubt any of us will object if the compiler creates some bizarre method of getting things done in the least amount of time. What this means to me is that code written in optimized C, where side effects are known and taken into account, won't break just because some tight loop got short circuited and an increment got skipped. This kind of code exists -everywhere-. If we allow side effects like increment or function calls to be skipped in one "optimized" compiler and not in the standard run-of-the-mill type complers that most of us use, the standard itself will have introduced portability problems by an omission... If the compiler people want to create code which stops evaluating the results after a fixed result is known, I say fine. BUT, they must take the responsibilty of picking up the peices and making sure the incidental increments,etc are taken care of!
mckeeman@wanginst.UUCP (02/16/87)
The discussion about side effects is the tip of a very subtle iceberg. I would like to separate concerns as follows: visible side effects are those which are reflected in specific C constructs (essentially all of these are assignments of one kind or another imbedded within larger expressions or hidden in arithmetic tautologies), and invisible side effects which are properties of the underlying hardware (this is a mixed bag including setting condition codes, causing overflow, causing normalization of floating point numbers by known hardware hacks, going through states that cannot be unwound by a symbolic debugger, and so on.). The compiler writer is surely allowed any optimization that will not change the program results. Some will work harder on this than others but the programmer should not care. The question arises when an unlikely set of circumstances would change the results (like aborting on overflow). This is a hard problem because the compiler writer cannot know what the programmer is looking for. If, for instance, run-time is a legitimate "result", then no optimization can be allowed at all. The safest approach is to warn the programmer when the compiler is about to throw away some of his/her carefully crafted C. A value that is computed, and not used is a significant programming error, not an optimization opportunity. With that approach, the following fragment would generate a single store of 0 to x and several warning messages about discarded code for the visible C commands /, +, *, *p and ++. x = 0; x /= 1; x += 0 * *p++; I regard this approach as an indication that the compiler writer respects the users of their product; the programmer would not write code unless the programmer expected it to matter. A numercial analyst once complained bitterly that my throwing away the assignment X = X + 0.0 prevented him from normalizing floating point numbers on the IBM 7090. I was unable to give him advice on what he could have written that I could not (at least in theory) have optimized away. And he didn't want to stand on his head to get numbers normalized, either. Enough said: compiler writers cannot know what is in the mind of the programmer, and anyway do not want to be limited to make all their object code worse because of rarely occurring special cases. The invisible side effects are more difficult to deal with at the language definition level. I often hear complaints, backed by debugger output, that some assignment never happens. True enough, the value is held in a register for use until context exit, and never ends up in memory. There is no consequence except for a bedeviled programmer trying to find a bug with optimizer-obscured clues. In this case the program is not being looked at as an input/output mapping, but rather as a living thing to be examined in vivo. A solution to both problems is some sort of construct in C that says "from here to here, do everything by the abstract machine rules". Associate left-to-right, evaluate operands left-to-right, do stores immediately, and so on. This would make the unwelcome warning messages go away (because the compiler would not be throwing away the code after all) and allow the rare but necessary forcing of non-optimal code. -- W. M. McKeeman mckeeman@WangInst Wang Institute decvax!wanginst!mckeeman Tyngsboro MA 01879
mccaugh@uiucdcsm.UUCP (02/21/87)
This is not submitted as a criticism to the foregoing erudite discussion of side-effects, but is rather an innocent question about C in particular: why does C refuse to abide by the associativity/precedence rules for expression- evaluation that even BASIC guarantees? I can well understand "optimization" as an excuse but can easily imagine cases where normally-evaluated expressions can crash a system when "optimized" for eavaluation without the programmer's express consent. Isn't it a little arbitrary for C to mnaipulate the parts of an expression to its satisfaction (or whim)? In particular, this renders the formal verification of C-code impractical.
michael@crlt.UUCP (02/22/87)
[Line eater shouldn't eat: no leading whitespace - but why take chances?] In article <7610@utzoo.UUCP>, henry@utzoo.UUCP writes: > > I should add that X3J11 has come down quite firmly on the pragmatic side > of this issue. With the obvious exceptions of && and friends, which are > explicitly short-circuited, the compiler is not permitted to leave out > side effects of expressions. Let me applaud X3J11 on this issue. Regardless of whether side-effects are allowed to be optimized out or not, conforming compilers should all do things predictable from the standard. Thus they should all either squeeze them out or not. Otherwise you can all-too-easily write code that works with some conforming compilers and not others - which is exactly what a standard is intended to prevent, or at least reduce and make it easy to avoid. When a programmer writes a statement with side-effects, it is probably because he wanted them to happen. Therefore, the logical choice is to force the compiler to let them happen, rather than to force it to squeeze them out. (In the same vein, perhaps my biggest objection to c is the possibility that associative operators may be rearranged, even if I try to coerce the order with parenthesis. This means that if, say, I want to coerce multiplies and divides into a certain order to insure against overflow, I must break expressions into several statements.) =========================================================================== "I've got code in my node." | UUCP: ...!ihnp4!itivax!node!michael | AUDIO: (313) 973-8787 Michael McClary | SNAIL: 2091 Chalmers, Ann Arbor MI 48104 --------------------------------------------------------------------------- Above opinions are the official position of McClary Associates. Customers may have opinions of their own, which are given all the attention paid for. ===========================================================================
henry@utzoo.UUCP (Henry Spencer) (02/24/87)
> I can find no promise in K&R that bitwise expressions are not short > circuited even in the presence of side effects. The ANSI draft > may have more to say. In any case, I would advise not counting on > full evaluation. > > K&R's note that ``The order in which side-effects take place is ... > unspecified'' conspicuously fails to lend credence to this point of > view. Why should they unspecify the order, and not unspecify whether > side-effects take place at all? You are using what is sometimes known as the "Biblical Theory of Standards": if the standard is by definition complete, an omission is highly significant and may be taken to imply answers to questions that the standard does not explicitly address. In real life, "insufficient data" is a legitimate answer to a question. The fact is, K&R simply doesn't discuss the issue. Your reasoning does *suggest* that the *intent* was to require side effects, but the fact remains that the document DOES NOT RESOLVE THE QUESTION. In response to this, one can take one of two approaches. One may assume that all implementors will read the same message between the lines, and thus it is proper to rely on side effects being present. Alternatively, if one has a bit less faith in the interlineal reading ability of implementors, one has no choice but to assume that "no specification" means "unspecified". > It seems to me that the idea of > bitwise operators (or other operators lacking sequence breaks) not > performing all their side-effects is so counterintuitive that it should > be outlawed... Why so? I know of at least one language which explicitly says, right up front, that no such assumptions are allowed and the compiler is free to short-circuit anything it pleases. This will affect programming style in small ways (in my view for the better), but it's not an unthinkable heresy. The language in question is Bliss, which has been quite successful in its limited niche. > ...Shall we perhaps leave unspecified whether > *p++ = 0; > increments p when the compiler can deduce that *p is already 0? Whether one thinks this is legitimate depends on fine points of semantics; you are trying to push the line of argument beyond the point where most people would defend it. The crucial observations here are that (a) the value of the = operator is not being used, so its whole purpose is a side effect, and (b) the postfix ++ operator likewise exists solely to cause a side effect. This isn't quite the same sort of animal as the situations we were originally discussing. > I find this perfectly acceptable, due to the above quote from K&R. If > you are accessing device registers code in C, you are expected to know > what the compiler is going to do with your code... Actually, the right solution to this is X3J11's "volatile" modifier for types. (They haven't handled it quite right, but the underlying idea is correct.) Then you do not need to be intimate with the compiler, because you can explicitly tell the compiler "hands off this one". > Are your cases of justifiable pornography sufficiently specific that > they should appear in the standard?... Probably not; I've never bothered pinning them down in detail. > ...the standard must be precise here, and the ``so many complaints'' lead > me to believe that the answer must be in favor of guaranteed execution of > side-effects. This is in fact what X3J11 has done. I concede the practical necessity while finding it aesthetically displeasing. -- Legalize Henry Spencer @ U of Toronto Zoology freedom! {allegra,ihnp4,decvax,pyramid}!utzoo!henry
greg@utcsri.UUCP (Gregory Smith) (02/24/87)
In article <4700004@uiucdcsm> mccaugh@uiucdcsm.UUCP writes: > > This is not submitted as a criticism to the foregoing erudite discussion of > side-effects, but is rather an innocent question about C in particular: why > does C refuse to abide by the associativity/precedence rules for expression- > evaluation that even BASIC guarantees? I can well understand "optimization" > as an excuse but can easily imagine cases where normally-evaluated expressions > can crash a system when "optimized" for eavaluation without the programmer's > express consent. Isn't it a little arbitrary for C to mnaipulate the parts > of an expression to its satisfaction (or whim)? In particular, this renders > the formal verification of C-code impractical. C does abide by associativity and precedence rules, which are laid out in the appendix of K&R. A-B+C means (A-B)+C and not A-(B+C). A*B+C/D means (A*B)+(C/D), and not A*((B+C)/D). What you are complaining about is sequence of evaluation. If I write a=b*c+d*e, do I care which multiply is done first? The reason you have not seen anybody complaining about this in other languages, is that these other languages do not have side-effects in expressions (at least not to the same extent). If I write a=foo()+bar(), I may indeed care whether foo() is called before bar(), in which case I *won't write that*. The problem has arisen through the notation we use in writing expressions. An expression has a tree structure, but it is written in a linear left to right fashion. Associativity and precedence rules serve merely to define how an expression tree is extracted from its linear form. Unfortunately, the linear form imposes an artificial ordering on an expression. The '+' operator is perfectly commutative, so that A+B and B+A are the same. Unfortunately, you have to write either A or B first. Even an operator like / has the same problem: if there were an 'under' operator \ you could have your choice of A/B or B\A; the justifiable lack of a \ operator forces you to write A before B even though they are at equal levels in the expression tree. The philosophy of the C language is that these limitations of the linear notation should not be allowed to restrict code generation. In an expression where two subexpressions X and Y must be added together, it may be vastly more efficient to evaluate Y before X. The programmer is not able to determine this, and is nevertheless forced to write one before the other. Putting it another way, a tree which represents an expression may be converted to linear form ( traversed ) in many ways. Some ways will be more efficient for evaluation, and that is hopefully what the compiler will generate. Some ways may be more useful for human reading, and that is what the programmer will write ( E.g. in a divide, the human always writes the dividend before the divisor, due only to lack of the aforementioned '\' operator ). As long as the same tree is involved in both cases, what's wrong with that? It is worth noting at this point that the parentheses '(' and ')' serve only in the process of converting a linear expression to a tree. Thus (a+b)*c is different from a+b*c but a+(b+c) can be treated the same as (a+b)+c. Simply allow your tree to contain a node which forms the sum of its three, equally ranked, children, and throw in the fact that redundant ()'s grow in C like mushrooms. Then a+(b+c) should be treated as a+b+c or SUM(a,b,c). ( I know, there are overflow considerations). If the expression tree contains no side-effects, it will make no difference in which order it is evaluated (just like in BASIC). However, if one branch of the tree changes the value of X and another branch uses the value of X, the result will depend on which branch is done first. Some would say , "the compiler should detect such cases and then use the order of the original expression". However, it is *impossible* to detect all of these cases at compile time. So the programmer is simply warned of the cases where expressions may be rearranged, and avoids expressions which work differently depending on how they are arranged. Note that these cases (where results are undefined) are *well defined* by the language, so you know what to avoid. [ I know this is oversimplification and doesn't deal with delaying of side-effects, or with sequence points, etc.. but I wanted to explain this stuff to those who just can't fathom what all the fuss is about and why the damn compilers don't just do what they're told ] -- ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg Have vAX, will hack...
john@viper.UUCP (03/02/87)
In article <4211@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes: > >It is worth noting at this point that the parentheses '(' and ')' serve >only in the process of converting a linear expression to a tree. Thus >(a+b)*c is different from a+b*c but a+(b+c) can be treated the same as >(a+b)+c. Simply allow your tree to contain a node which forms the sum of >its three, equally ranked, children, and throw in the fact that >redundant ()'s grow in C like mushrooms. Then a+(b+c) should be treated >as a+b+c or SUM(a,b,c). ( I know, there are overflow considerations). > >---------------------------------------------------------------------- >Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg >Have vAX, will hack... The example Greg provides here is misleading. If you have three expressions: 1: a+b+c 2: (a+b)+c 3: a+(b+c) all three of them should not be evaluated the same way. Greg implys that they should be. This is not so. When an equation contains '(' and ')' it intentionaly (and explicitly) defines the parse tree structure that will result. The statement "redundant ()'s grow in C like mushrooms" may be true, but it doesn't give anyone the right to arbitrarily ignore explicit cues to the compiler. When I don't care, I don't use them. When I do, I do so for a reason.......... Since there is additional, undesireable, and unnecessary overhead in the detection of this SUM(a1,a2,a3...,aN) special case, and since there appears to be little or no advantage to doing so (you have to add them up in -some- order, you might as well let the programmer decide as anyone), why bother? --- John Stanley (john@viper.UUCP) Software Consultant - DynaSoft Systems UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john
m5d@bobkat.UUCP (03/05/87)
In article <609@viper.UUCP> john@viper.UUCP (John Stanley) writes: > >The example Greg provides here is misleading. > >If you have three expressions: > 1: a+b+c > 2: (a+b)+c > 3: a+(b+c) >all three of them should not be evaluated the same way. Greg implys that >they should be. This is not so. When an equation contains '(' and ')' >it intentionaly (and explicitly) defines the parse tree structure that will >result. The statement "redundant ()'s grow in C like mushrooms" may be true, >but it doesn't give anyone the right to arbitrarily ignore explicit cues >to the compiler. When I don't care, I don't use them. When I do, I do >so for a reason.......... > > Since there is additional, undesireable, and unnecessary overhead in the >detection of this SUM(a1,a2,a3...,aN) special case, and since there appears >to be little or no advantage to doing so (you have to add them up in -some- >order, you might as well let the programmer decide as anyone), why bother? > >--- >John Stanley (john@viper.UUCP) >Software Consultant - DynaSoft Systems >UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john Sorry, but the compiler writer DOES have the right to use algebraic manipulation to improve code. Why? Because the specification of the language says so. There is some difference between "a + (b + c)" and "(a + b) + c": + + / \ / \ a + + c / \ / \ b c a b A typical code generation scheme will prefer one to the other (one saves a register). What happens in other languages (like FORTRAN) if a programmer doesn't want code re-arranged? At least in C, the compiler is not supposed to mess with stuff across statement boundaries. In a language like FORTRAN or Pascal, the compiler is liable to do common subexpression elimination across an entire basic block (or maybe even globally), not to mention loop unravelling, code hoisting, induction variable elimination, and removal of dead code (a real bad one for you device driver fans). Since lots of people seem outraged at the liscense given to C compilers, perhaps they should give examples of mechanisms which allow optimization to be prevented in other languages. I would hope that such mechanisms are portable. -- Mike McNally, mercifully employed at Digital Lynx --- Where Plano Road the Mighty Flood of Forest Lane doth meet, And Garland fair, whose perfumed air flows soft about my feet... uucp: {texsun,killer,infotel}!pollux!bobkat!m5d (214) 238-7474
greg@utcsri.UUCP (Gregory Smith) (03/06/87)
In article <609@viper.UUCP> john@viper.UUCP (John Stanley) writes: >The example Greg provides here is misleading. > >If you have three expressions: > 1: a+b+c > 2: (a+b)+c > 3: a+(b+c) >all three of them should not be evaluated the same way. Greg implys that >they should be. This is not so. When an equation contains '(' and ')' >it intentionaly (and explicitly) defines the parse tree structure that will >result. The statement "redundant ()'s grow in C like mushrooms" may be true, >but it doesn't give anyone the right to arbitrarily ignore explicit cues >to the compiler. When I don't care, I don't use them. When I do, I do >so for a reason.......... > How do you tell an explicitly defined tree from an implicit one? a+b+c parses to the same expression tree as (a+b)+c. That could be changed (i.e. () info could be included in the expr. tree ). But what about (a+b)*c? There is no way to represent that *expression tree* without parentheses. How do I then write this in such a way that the compiler is allowed to rearrange it? My point is that ()'s serve to override the binding strengths of operators, and thus allow arbitrary expressions to be constructed. They cannot be used to prevent optmizations, since they are *required* in many cases, simply to write the expression in the first place. I want my compiler to be free to change (i+3)*4+6 to i*4+18 (more on this later). If I don't want that, there are means of avoiding it. Furthermore, a redundant () may very well not be "intentional", in the applicable sense of the word, and therefore should not be construed as "explicit". more later. > Since there is additional, undesireable, and unnecessary overhead in the >detection of this SUM(a1,a2,a3...,aN) special case, and since there appears >to be little or no advantage to doing so (you have to add them up in -some- >order, you might as well let the programmer decide as anyone), why bother? > Wrong. There is an advantage to doing the sum in an arbitrary order. First of all, it rarely makes a difference to the result (if it does, it is the programmer's fault, by definition :-) ). It allows constants to be folded to the end (a+2)+(b+9)+4 = (a+b)+15. It allows fewer registers to be used. ((a+b)+(c+d))+((w+x)+(y+z)), if done literally, requires 3 working values at one point, while a running sum never requires more than one. The programmer cannot determine the best order, because the programmer is not writing code specifically for a certain machine, right? :-) the compiler is supposed to do that. Multiple constants in running sums tend to pop up (1) from macro expansions (2) in expressions like (p+3)->foo.y[i-1].abc ( after the semantics have been applied ). In the latter case the programmer can't control them because s/he can't really see them. One could of course distinguish the programmer-created weirdness from the internally-created ones, but why not optimize both of them? One cannot, of course, distinguish macro-created weirdness under the current preprocessor paradigm. And once you convert arbitrary additions into rearrangable running sums, it becomes very attractive to convert things like (i+7)*4 + 2 into i*4 + 28 + 2 into i*4 + 30. Again, this comes up a *lot* with array operations, and again, if it breaks a program, that program was probably doomed anyway. For more on this sort of thing, see "A Tour through the UN*X C Compiler" in the Version 7 books. PS if you don't believe me about the mushrooms, I give you the expansion of getchar(putchar()): (--((&_iob[1]))->_cnt>=0? ((int)(*((&_iob[1]))->_ptr++= (unsigned)((--((&_iob[0]))->_cnt>=0? *((&_iob[0]))->_ptr++&0377: _filbuf((&_iob[0])))))):_flsbuf((unsigned)((--((&_iob[0]))->_cnt>=0? *((&_iob[0]))->_ptr++&0377:_filbuf((&_iob[0])))),(&_iob[1]))); Most of those '()'s are in there to enforce precedence against arbitrary parameters. E.g. if I write #define INCH_TO_CM(x) x*2.54 then INCH_TO_CM(a+b) becomes a+b*2.54, which is wrong. To be safe, I have to write #define INCH_TO_CM(x) ((x)*2.54) >John Stanley (john@viper.UUCP) >Software Consultant - DynaSoft Systems >UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john -- ---------------------------------------------------------------------- Greg Smith University of Toronto UUCP: ..utzoo!utcsri!greg Have vAX, will hack...
root@sdd.UUCP (Root) (03/07/87)
In article <609@viper.UUCP> john@viper.UUCP (John Stanley) writes: > >If you have three expressions: > 1: a+b+c > 2: (a+b)+c > 3: a+(b+c) >all three of them should not be evaluated the same way. Greg implys that >they should be. This is not so. When an equation contains '(' and ')' >it intentionaly (and explicitly) defines the parse tree structure that will >result. The statement "redundant ()'s grow in C like mushrooms" may be true, >but it doesn't give anyone the right to arbitrarily ignore explicit cues >to the compiler. When I don't care, I don't use them. When I do, I do >so for a reason.......... NO!!!! The new DRAFT ANSI standard explicitly says: "An expression involving more than one occurance of the same cummutative and associtive binary operator (*, +, &, ^, |) may be regrouped arbitrarily, EVEN IN THE PRESENCE OF PARENTHESES, provided the types of the operands or of the results are not changed by this regrouping. To force a particular grouping of operations, either the value of the expression to be grouped may be explicitly assigned to an object, or grouping parentheses may be preceded by a unary plus operator." Therefore if one desires to explicitly specify an order, in the new ANSI standard one can use a unary plus operator as follows: 1: +(a + b) + c 2: a + (+(a + b)) This should guarantee the evaluation order! Daniel Corbett Software Design & Development Corp 360 Mobil, Suite 103 Camarillo, CA 93010
drw@cullvax.UUCP (03/11/87)
john@viper.UUCP (John Stanley) writes: > If you have three expressions: > 1: a+b+c > 2: (a+b)+c > 3: a+(b+c) > all three of them should not be evaluated the same way. Greg implys that > they should be. This is not so. When an equation contains '(' and ')' > it intentionaly (and explicitly) defines the parse tree structure that will > result. The statement "redundant ()'s grow in C like mushrooms" may be true, > but it doesn't give anyone the right to arbitrarily ignore explicit cues > to the compiler. When I don't care, I don't use them. When I do, I do > so for a reason.......... Read the opening paragraphs of section 3.3 of X3J11. The compiler is allowed to re-associate multiples uses of *, +, &, ^, and |. If you want to prevent this, you have to write: +(a+b) + c, etc. The parentheses you wrote aren't explicit cues to the compiler, by definition. (*Why* X3J11 did this is another question...) Dale -- Dale Worley Cullinet Software UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw ARPA: cullvax!drw@eddie.mit.edu Un*x (a generic name for a class of OS's) != Unix (AT&T's brand of such)
gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/14/87)
In article <903@cullvax.UUCP> drw@cullvax.UUCP (Dale Worley) writes: >The parentheses you wrote aren't explicit cues to the compiler, by >definition. (*Why* X3J11 did this is another question...) But an easily answered one: 1. X3J11's base document was K&R Appendix A. 2. Existing practice. 3. C is not Fortran. 4. Many parentheses are introduced during macro expansion, and it is undesirable to prohibit optimization of these. 5. An explicit mechanism (unary +) was provided that can be used to obtain the desired functionality.
gram@uctcs.uucp (Graham Wheeler) (05/24/91)
I've had several responses to the question I posted, one or two of them being no help at all and just short of abusive. I guess I phrased things a bit badly, as only one response actually had an answer to the *real* question. To recap, I wanted to know what ANSI C specified about i) short circuit evaluation of Boolean expressions ii) order of evaluation of expressions I was 99% sure that C used short circuit evaluation (thanks to those who confirmed this, anyway). My real question, and one which is relevant to me as I teach a compiler course, was: If short circuit evaluation is allowed, can the compiler still rearrange expressions? If so, a check like: if (p && p->next)... could cause a segmentation violation if p is NULL, *IF* the compiler rearranged the expression so that the p->next was tested first. In practice, a compiler that rearranges expressions will usually do so by putting the cheaper expression first, especially if short circuit evaluation is supported. The issue is that in C expressions may have side effects (by that I include the possibility of a segmentation violation 8-) ) and so the consequences of such a rearrangement may be serious. All but one seemed to miss this point. The logical answer is no, the compiler must evaluate Boolean expressions in the same order as specified in the source program. The two most useful responses I got were: ========================================================================= From: Xing Wu <wuxing@comp.mscs.mu.EDU> In article <1991May22.092404.25297@ucthpx.uct.ac.za> you write: >ii) Does it say that Boolean expressions must be evaluated with short- > circuit evaluation? Yes. Take a look at A7.14 and A7.15 in K&RII (pages 207-8 in my copy). ========================================================================= and, more pertinently: ========================================================================= From: Richard Flamsholt S0rensen <richard@iesd.auc.DK> C has *always* guaranteed short-circuit evaluation - you may safely use the practice "if (ptr != NUL && ptr->next ...) ..". This also implies, that it will *never* rearrange expressions. ========================================================================= Thanks to those who responded. No thanks to those who were insulting. Graham Wheeler <gram@cs.uct.ac.za> | "That which is weak conquers the strong, Data Network Architectures Lab | that which is soft conquers the hard. Dept. of Computer Science | All men know this; none practise it" University of Cape Town | Lao Tzu - Tao Te Ching Ch.78