diamond@csl.sony.co.jp (Norman Diamond) (11/16/89)
Sorry for the repost, but the original posting has not drawn any replies. Perhaps it was buried in kddlabs again. Both the standard and the rationale say that in the pp-number 0x7e-getchar() it is illegal for my preprocessor to expand the getchar() macro. If there is a real getchar() function, it is guaranteed that the real function must be invoked by this expression. This appears to match the committee's intention, is not optional, and is not implementation-defined. Why? I will have to add code to my scanner, and slow it down, so that it will not call the preprocessor if it finds a macro in the middle of a pp-number. We have recently had discussions of what-is-reasonable vs. what-is- written. Does anyone think we can appeal to reason in this case, so that implementations might be allowed to expand macros that are found as independent real-tokens even though they're not separate preprocessor-tokens? -- Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work) Should the preceding opinions be caught or | James Bond asked his killed, the sender will disavow all knowledge | ATT rep for a source of their activities or whereabouts. | licence to "kill".
datanguay@watmath.waterloo.edu (David Adrien Tanguay) (11/17/89)
In article <11134@riks.csl.sony.co.jp> diamond@ws.sony.junet (Norman Diamond) writes: >Sorry for the repost, but the original posting has not drawn any > 0x7e-getchar() >it is illegal for my preprocessor to expand the getchar() macro. >If there is a real getchar() function, it is guaranteed that the >real function must be invoked by this expression. This appears >to match the committee's intention, is not optional, and is not >implementation-defined. Why? The "0x7e-getchar" is picked up as a pre-processor number and later converted into a token. In section 3.1, under constraints, it says "Each preprocessing token that is converted into a token shall have the lexical form of a keyword, an identifier, a constant, a string literal, an operator, or a punctuator." "0x7e-getchar" is none of these, so I think a diagnostic message must be issued at that point. However, there might be a statement elsewhere that says that a pre-processor token can be converted into a sequence of tokens. >We have recently had discussions of what-is-reasonable vs. what-is- >written. Does anyone think we can appeal to reason in this case, >so that implementations might be allowed to expand macros that are >found as independent real-tokens even though they're not separate >preprocessor-tokens? This problem was brought to the committee's attention, but it took them a while to understand the problem (they thought everybody was complaining about the concept of a pre-processor number, rather than the specific definition). By the time they did figure it out, they had already declared that the botched definition would stand. (Hopefully a committee member will inject some reality into the previous sentence.) Oh well, you should be using white space anyway. David Tanguay
henry@utzoo.uucp (Henry Spencer) (11/18/89)
In article <11134@riks.csl.sony.co.jp> diamond@ws.sony.junet (Norman Diamond) writes: >... Does anyone think we can appeal to reason in this case, >so that implementations might be allowed to expand macros that are >found as independent real-tokens even though they're not separate >preprocessor-tokens? I don't think the situation can arise, actually. A careful reading of 2.1.1.2 item 7 yields: "Each preprocessing token is converted into a token." Note the singular pronoun; it's in there because I pointed out that there was no requirement elsewhere that the conversion be one-to-one. A preprocessing token which cannot be converted into a single real token is illegal. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
walter@hpclwjm.HP.COM (Walter Murray) (11/18/89)
Norman Diamond writes: > Both the standard and the rationale say that in the pp-number > 0x7e-getchar() > it is illegal for my preprocessor to expand the getchar() macro. > If there is a real getchar() function, it is guaranteed that the > real function must be invoked by this expression. Isn't this overlooking the constraint in 3.1? As I read it, <0x7e-getchar> is a pp-number. In translation phase 7, the translator attempts to convert each preprocessing token into a token. At that point, each preprocessing token must have the form of a keyword, an identifier, a constant, a string literal, an operator, or a punctuator. Because <0x7e-getchar> doesn't match any of these, the constraint is violated, the program is illegal, and a diagnostic must be produced. Walter Murray ---
karl@haddock.ima.isc.com (Karl Heuer) (11/18/89)
In article <11134@riks.csl.sony.co.jp> diamond@ws.sony.junet (Norman Diamond) writes: >Both the standard and the rationale say that in the pp-number > 0x7e-getchar() >it is illegal for my preprocessor to expand the getchar() macro. >If there is a real getchar() function, it is guaranteed that the >real function must be invoked by this expression. "0x7e-getchar" scans as a single pp-number. It fails to convert into a token when it hits translation phase 7, and hence the program containing it is not strictly conforming. We are now in the realm of undefined behavior. An ANSI-conforming compiler is required to issue a diagnostic, but is then permitted to guess what the user meant and continue processing the program. In particular, it should be legal for it to consider "getchar" for macro replacement. >I will have to add code to my scanner, and slow it down, so that >it will not call the preprocessor if it finds a macro in the middle >of a pp-number. Why should it "find" a macro in the middle of a pp-number, any more than it should try to expand the substring "getc" in the token "fgetc"? Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint (For those readers who don't know why "0x7e-getchar" is a single pp-token: The Committee defined a single pattern to cover all variants of numeric constants, including floating-point with exponents as well as hex integers. They chose to accept the resulting wart (that hex constants ending with "e" must not be immediately followed by a sign) rather than rewrite the pattern to fix it. Yes, I think this was a mistake. No, it can't be changed.)
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/18/89)
In article <11134@riks.csl.sony.co.jp>, diamond@csl.sony.co.jp (Norman Diamond) writes:
- Both the standard and the rationale say that in the pp-number
- 0x7e-getchar()
- it is illegal for my preprocessor to expand the getchar() macro.
- If there is a real getchar() function, it is guaranteed that the
- real function must be invoked by this expression. This appears
- to match the committee's intention, is not optional, and is not
- implementation-defined. Why?
The Rationale explains that.
- I will have to add code to my scanner, and slow it down, so that
- it will not call the preprocessor if it finds a macro in the middle
- of a pp-number.
No, since so far as I can tell you really have to implement a
tokenizing preprocessor anyway, the pp-number is a single preprocessing
token and thus will not match "getchar" naturally; no additional
kludgery is required to ensure this.
- We have recently had discussions of what-is-reasonable vs. what-is-
- written. Does anyone think we can appeal to reason in this case,
- so that implementations might be allowed to expand macros that are
- found as independent real-tokens even though they're not separate
- preprocessor-tokens?
There ARE no "real-tokens" until translation phase 7. Preprocessing
is REQUIRED to deal solely with preprocessing tokens. I think that
general framework is fairly easy to defend on "reasonable" grounds.
You could of course complain that you don't want to have to implement
a tokenizing preprocessor, or that you don't want to have to wait
until translation phase 7 to convert pp-numbers to C numbers, or that
you just don't like the whole notion of pp-numbers. Believe me,
X3J11 has heard all the arguments before..
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/18/89)
In article <15217@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes: | (For those readers who don't know why "0x7e-getchar" is a single pp-token: The | Committee defined a single pattern to cover all variants of numeric constants, | including floating-point with exponents as well as hex integers. They chose | to accept the resulting wart (that hex constants ending with "e" must not be | immediately followed by a sign) rather than rewrite the pattern to fix it. | Yes, I think this was a mistake. No, it can't be changed.) Correct on all three. That is the way the standard works, it is a mistake, and it can't be changed. I *don't* believe that there is a body of existing programs using hex constants with exponential notation, and I *do* believe that it breaks existing programs. I think the committee got tired of the job and decided that it was good enough. I admit I only found one program it actually broke, although I did find about 30 instances of #defined hex constants ending in e which *could* break if used with +/-. Example, for those not following this: #define F_LIMIT 0x14e /* and later in the program */ long error_count[F_LIMIT+5]; /* room for quadrant totals */ /* and also */ top3 = triad(bset, F_LIMIT+3); The last is interesting, because if F_LIMIT+3 is taken as a float value, and if there is a prototype, the number gets converted back to an int. This gives the arg the correct type but a vastly wrong value. I'm happy to say that for the moment I haven't seen any compilers implement this, even those which have many other ANSI features. I'm really hoping that this could be treated as a wording change, not requiring a vote, but I suspect it is too big for that. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
quiroz@cs.rochester.edu (Cesar Quiroz) (11/18/89)
In <1643@crdos1.crd.ge.COM>, davidsen@crdos1.UUCP (bill davidsen) wrote: | #define F_LIMIT 0x14e | | /* and later in the program */ | long error_count[F_LIMIT+5]; /* room for quadrant totals */ Aside: Over-parenthesizing your defines for paranoid reasons would have saved this program. Of course, the criticized behavior remains buggy in the general case. -- Cesar Augusto Quiroz Gonzalez Department of Computer Science University of Rochester Rochester, NY 14627
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/18/89)
In article <15217@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
-We are now in the realm of undefined behavior. An ANSI-conforming compiler is
-required to issue a diagnostic, but is then permitted to guess what the user
-meant and continue processing the program. In particular, it should be legal
-for it to consider "getchar" for macro replacement.
That's a useful point that has application in other areas as well.
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/18/89)
In article <1643@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes: >Example, for those not following this: > #define F_LIMIT 0x14e > /* and later in the program */ > long error_count[F_LIMIT+5]; /* room for quadrant totals */ > /* and also */ > top3 = triad(bset, F_LIMIT+3); There is no problem here, because the result of macro substitution is not retokenized. I don't feel like recounting the entire history of pp-numbers, even if I could remember it all, but it was a reasonable solution to a very difficult technical problem. Earlier drafts tried to do it the way you and Norman seem to think is "right", and it got more and more snarled as we tried to get it untangled. pp-numbers work well for the intended purpose and cause problems only in very rare circumstances and only for programmers with an obsessive aversion to white space.
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/18/89)
In article <1989Nov17.205004.19236@cs.rochester.edu> quiroz@cs.rochester.edu (Cesar Quiroz) writes: | Aside: Over-parenthesizing your defines for paranoid reasons would | have saved this program. Of course, the criticized behavior remains | buggy in the general case. That's what I had to go thru and do, but if that doesn't constitute "egregiously breaking existing programs" I don't know what does. If I ever get the time I'll grep thru the net sources and see how many have defined hex constants ending in e. Note that of the programs which did, only one actually failed, the rest were time-bombs, waiting until someone used them in an expression. I don't think this will bring a huge number of programs crashing down, but it does look like a case of a committee whose majority is vendors (or was during the two years I was there) choosing a behavior which has no benefit other than to simplify the writing of the parser. If Global sends me the rationale with this order I'll look to see if the thought process is described. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
karl@haddock.ima.isc.com (Karl Heuer) (11/18/89)
In article <1643@crdos1.crd.ge.COM> davidsen@crdos1 (bill davidsen) writes: > #define F_LIMIT 0x14e > top3 = triad(bset, F_LIMIT+3); >The last is interesting, because if F_LIMIT+3 is taken as a float >value, and if there is a prototype... As Doug has already pointed out, this is not a problem because there is a token delimiter folling F_LIMIT. Besides, there are no hex-floats in ANSI C (nor in any extension that I'm aware of); "0x14e+3" is a pp-token that cannot be resolved into a real token. (As is also, for example, "018" or "4s".) If the Committee had tried to make this a legal token, it would have been a Quiet Change--and *that* would have been a lot more controversial! >I'm really hoping that this could be treated as a wording change, not >requiring a vote, but I suspect it is too big for that. The guiding principle is that a change that reflects the Committee's original intent is editorial, one that does not is substantive. Unfortunately, this behavior appeared as an explicit example in the Rationale, so it's hard to argue that the Committee didn't intend it. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
pkturner@cup.portal.com (Prescott K Turner) (11/19/89)
> Both the standard and the rationale say that in the pp-number > 0x7e-getchar() > it is illegal for my preprocessor to expand the getchar() macro. > If there is a real getchar() function, it is guaranteed that the > real function must be invoked by this expression. The draft standard says, "Each preprocessing token is converted into a token". Since 0x7e-getchar cannot be converted into a token, there is an error, and the real getchar() function need not be invoked. > I will have to add code to my scanner, and slow it down, so that > it will not call the preprocessor if it finds a macro in the middle > of a pp-number. "getchar" in the middle of a pp-number should give a scanner no more difficulty than "getchar" in the middle of an identifier, e.g. mygetchar -- Prescott K. Turner, Jr. 13 Burning Tree Rd., Natick, MA 01760 USA (508) 653-0357 UUCP: ...sun!cup.portal.com!pkturner Internet: pkturner@cup.portal.com
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/19/89)
In article <1653@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes: >I don't think this will bring a huge number of programs crashing down, It won't. Several committee members "grepped" for such usage in existing code to see if it would be a significant factor. For example, the whole source code for UNIX was scanned. We determined that it would not be a significant problem for existing source code. >but it does look like a case of a committee whose majority is >vendors (or was during the two years I was there) choosing a behavior >which has no benefit other than to simplify the writing of the parser. That's a significant reason for this particular feature of the specification. However, you seem to be implying that selfish considerations by vendors are acting to the detriment of users. There were a significant number of user-oriented X3J11 committee members (including myself), and of course most vendor representatives also function as C users themselves. We bought into the notion of simplicitiy in this case as being of value to programmers as well as implementors. Only ignorant programmers might have a problem, but that is true of very many aspects of C; C is not a language for those who refuse to learn before doing. We expect that this particular quirk will be taught in C textbooks much as the need for () around parameters in macro definitions already is taught.
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/19/89)
In article <31615@watmath.waterloo.edu> datanguay@watmath.waterloo.edu (David Adrien Tanguay) writes: >However, there might be a statement elsewhere that says that a >pre-processor token can be converted into a sequence of tokens. No; the conversion in translation phase 7 is one-to-one. >This problem was brought to the committee's attention, but it took them >a while to understand the problem (they thought everybody was complaining >about the concept of a pre-processor number, rather than the specific >definition). By the time they did figure it out, they had already declared >that the botched definition would stand. (Hopefully a committee member will >inject some reality into the previous sentence.) Oh well, you should be >using white space anyway. This is misleading, because whenever really solid arguments were made, X3J11 was always willing to fix a demonstrated error in the draft specification; there were numerous occasions when this did occur. As I recall the committee sentiment, it wasn't felt that this slightly over-generous glomming onto source characters for pp-numbers posed a serious practical problem, and it did drastically simplify that part of the preprocessor. The trade-off seemed worthwhile.
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/19/89)
In article <12570031@hpclwjm.HP.COM> walter@hpclwjm.HP.COM (Walter Murray) writes: >Because <0x7e-getchar> doesn't match any of these, the constraint >is violated, the program is illegal, and a diagnostic must be produced. Right, and thus this is not a "quiet change". Fixing the source code might be a nuisance, but fortunately the cases where it would be necessary are exceedingly rare.
" Maynard) (11/19/89)
In article <11641@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >pp-numbers work well for the >intended purpose and cause problems only in very rare circumstances >and only for programmers with an obsessive aversion to white space. Uhm, Doug...does this mean that the behavior of a program differs with the use or non-use of white space? Isn't this different from the rest of C? -- Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can jay@splut.conmicro.com (eieio)| adequately be explained by stupidity. {attctc,bellcore}!texbell!splut!jay +---------------------------------------- Shall we try for comp.protocols.tcp-ip.eniac next, Richard? - Brandon Allbery
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/19/89)
In article <3060@splut.conmicro.com> jay@splut.conmicro.com (Jay "you ignorant splut!" Maynard) writes: >Uhm, Doug...does this mean that the behavior of a program differs with >the use or non-use of white space? Isn't this different from the rest of C? No, white space has always been significant for preprocessing. Consider: #define foo (void) foo bar(); foobar(); White space is also sometimes significant outside preprocessing: a = b / *c; /* comment */ ; a = b/*c; /* comment */ ; This was even worse when we had =op assignment operators. The latter example strikes me as quite analogous to the pp-number situation.
scjones@sdrc.UUCP (Larry Jones) (11/20/89)
In article <11645@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes: > In article <1653@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes: > >but it does look like a case of a committee whose majority is > >vendors (or was during the two years I was there) choosing a behavior > >which has no benefit other than to simplify the writing of the parser. > > There were a significant number of user-oriented X3J11 committee > members (including myself), and of course most vendor representatives > also function as C users themselves. We bought into the notion of > simplicitiy in this case as being of value to programmers as well as > implementors. Only ignorant programmers might have a problem, but > that is true of very many aspects of C; C is not a language for those > who refuse to learn before doing. We expect that this particular > quirk will be taught in C textbooks much as the need for () around > parameters in macro definitions already is taught. As another user-oriented committee member, I agree with Doug's assessment -- the decision was made to keep the specification simple, not to make implementers' jobs easier. In fact, a fair number of committee members found "greedy" pp numbers to be aesthetically repugnant and a few tried to rewrite the spec to make them less so. All of these were found to be defective in one way or another, although one was so subtle that it nearly got addopted! In the end, most everyone agreed that the existing spec does not cause any serious problems, is simple, and, most importantly, does cover all the desirable cases. ---- Larry Jones UUCP: uunet!sdrc!scjones SDRC scjones@SDRC.UU.NET 2000 Eastman Dr. BIX: ltl Milford, OH 45150-2789 AT&T: (513) 576-2070 "You know how Einstein got bad grades as a kid? Well MINE are even WORSE!" -Calvin
scjones@sdrc.UUCP (Larry Jones) (11/20/89)
In article <3060@splut.conmicro.com>, jay@splut.conmicro.com (Jay "you ignorant splut!" Maynard) writes: > In article <11641@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: > >pp-numbers work well for the > >intended purpose and cause problems only in very rare circumstances > >and only for programmers with an obsessive aversion to white space. > > Uhm, Doug...does this mean that the behavior of a program differs with > the use or non-use of white space? Isn't this different from the rest of > C? Well, not to put words in Doug's mouth ;-), but consider the following: i--1 vs i - -1 i/*p vs i / *p i=-1 vs i = -1 (obsolete) Although whitespace (or lack thereof) USUALLY doesn't make a difference, it's not unprecidented. ---- Larry Jones UUCP: uunet!sdrc!scjones SDRC scjones@SDRC.UU.NET 2000 Eastman Dr. BIX: ltl Milford, OH 45150-2789 AT&T: (513) 576-2070 "You know how Einstein got bad grades as a kid? Well MINE are even WORSE!" -Calvin
diamond@csl.sony.co.jp (Norman Diamond) (11/20/89)
In article <1989Nov17.171025.18983@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >2.1.1.2 item 7 yields: "Each preprocessing token is converted into a >token." Argh; you're right. OK, next question: if a program violates 2.1.1.2, is it necessary to issue a warning? I think not; 2.1.1.2 doesn't say "constraints." Therefore a macro imbedded in a pp-number results in undefined behavior, and my processor is allowed to expand the macro, right? (I hope.) -- Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work) Should the preceding opinions be caught or | James Bond asked his killed, the sender will disavow all knowledge | ATT rep for a source of their activities or whereabouts. | licence to "kill".
diamond@csl.sony.co.jp (Norman Diamond) (11/20/89)
I posted: >>I will have to add code to my scanner, and slow it down, so that >>it will not call the preprocessor if it finds a macro in the middle >>of a pp-number. In article <15217@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes: >Why should it "find" a macro in the middle of a pp-number, any more than it >should try to expand the substring "getc" in the token "fgetc"? Because my scanner finds the hex number, and then finds the operator, and then finds the macro. My scanner does not look for pp-numbers. (When it calls the preprocessor, the preprocessor looks for pp-numbers internal to its own operation.) In a related posting, I have already asked if it is really necessary to give a warning when a part of the program is processed which depends on undefined behavior. I guessed no, because 2.1.1.2 (and various other sections) do not say "constraints." But Mr. Heuer says yes. Why? (I will have to add code to my scanner, and slow it down, so that it will give a warning if it finds a macro in the middle of a pp-number.) -- Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work) Should the preceding opinions be caught or | James Bond asked his killed, the sender will disavow all knowledge | ATT rep for a source of their activities or whereabouts. | licence to "kill".
diamond@csl.sony.co.jp (Norman Diamond) (11/20/89)
In article <11134@riks.csl.sony.co.jp>, I wrote: >> in the pp-number 0x7e-getchar() it is illegal for my preprocessor to >> expand the getchar() macro. This appears >> to match the committee's intention, is not optional, and is not >> implementation-defined. Why? In article <11637@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes: >The Rationale explains that. The Rationale explains why the committee did not want to force preprocessors to do a complete true scan of numerics. Fine. My question is, why did the committee go to the other extreme and prohibit it? Why is it not optional, and why is it not implementation-defined? The Rationale does not explain that. You say that the committee heard all the arguments? Fine, so what reason did they give? -- Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work) Should the preceding opinions be caught or | James Bond asked his killed, the sender will disavow all knowledge | ATT rep for a source of their activities or whereabouts. | licence to "kill".
diamond@csl.sony.co.jp (Norman Diamond) (11/20/89)
In article <12570031@hpclwjm.HP.COM> walter@hpclwjm.HP.COM (Walter Murray) writes: [about the conversion of pp-numbers to tokens] >Isn't this overlooking the constraint in 3.1? Oh yes! So this issue did make it into a constraint somewhere, and a diagnostic has to be issued. Someone should produce a concordance so that for each section that gives some rules, the reader can find all the other sections scattered throughout the document that finish giving the rules for exactly the same conversion (or storage class declaration, or ...). 'xcept, Global would probably charge $150.00 and refuse to sell it to most of the globe. -- Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work) Should the preceding opinions be caught or | James Bond asked his killed, the sender will disavow all knowledge | ATT rep for a source of their activities or whereabouts. | licence to "kill".
diamond@csl.sony.co.jp (Norman Diamond) (11/20/89)
In article <11641@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >>pp-numbers work well for the >>intended purpose and cause problems only in very rare circumstances >>and only for programmers with an obsessive aversion to white space. (Well it also causes a problem for me in scanning ALL source programs, and I have no aversion to white space. But that's beside the point.) In article <3060@splut.conmicro.com> jay@splut.conmicro.com (Jay "you ignorant splut!" Maynard) writes: >Uhm, Doug...does this mean that the behavior of a program differs with >the use or non-use of white space? Isn't this different from the rest of >C? I'm not Doug, but this question is very easy to answer. a+++b is different from a+ ++b -- Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work) Should the preceding opinions be caught or | James Bond asked his killed, the sender will disavow all knowledge | ATT rep for a source of their activities or whereabouts. | licence to "kill".
tom@ssd.harris.com (Tom Horsley) (11/20/89)
>As I recall the committee sentiment, it wasn't felt that this slightly >over-generous glomming onto source characters for pp-numbers posed a >serious practical problem, and it did drastically simplify that part >of the preprocessor. The trade-off seemed worthwhile. I am sorry, I can't watch this discussion passively anymore. This is simply wrong. I was one of the first to complain to the committee about this bug. The reason I noticed it was that I was writing a tokenizing pre-processor as the standard was under development. In my implementation, I did not find *ANY* simplification that pp-numbers provided. When you reach phase 7, you have to have the ability to lex only legal numbers to determine if the conversion of a pp-token to a token is correct. By requiring you to match illegal tokens in early phases, then in a later phase determine if the token is actually legal, the scanner is considerably *COMPLICATED*, NOT SIMPLIFIED! There are more states required to recognize gibberish first, then legal numbers later than there would have been if you only had to recognize legal numbers in the first place. My proposed change to the standard called for a pp-token to be the longest sequence of characters that would match a valid legal token prefix (or a single cxharacter that does not match any legal token). This is unambiguously defined in the standard and would have actually been a simplification, since it would not require a separate definition of pp-tokens and real tokens. The committee response to this was that it would allow too much stuff that appears to be gibberish lexically to actually be a legitimate C program. I consider this to be the lamest excuse I have ever heard, after all, when hasn't gibberish been legal C? And it is a particularly lame excuse when the alternative the committee selected makes code that looks like perfectly ordinary (formerly) legal C, illegal instead. If the committee wants to justify this by saying that they were in a hurry to get the standard out, they didn't notice the problem with pp-numbers until it was too late and they would have had to do another round of public review, delaying the standard again, and they didn't think the problem was serious enough to take that hit, then I might agree, but for God's sake, *DON'T* try to claim that it simplifies things... (Of course the standard wound up being delayed by stupidity anyway, but thats another story...) -- ===================================================================== domain: tahorsley@ssd.csd.harris.com USMail: Tom Horsley uucp: ...!novavax!hcx1!tahorsley 511 Kingbird Circle or ...!uunet!hcx1!tahorsley Delray Beach, FL 33444 ======================== Aging: Just say no! ========================
peter@ficc.uu.net (Peter da Silva) (11/22/89)
Someone remind me of the motivation for a tokenising preprocessor instead of the old text-based preprocessor again. It's sounding more and more like a whole lot of work for little gain... (maybe it's time to dust `m4' off again) -- `-_-' Peter da Silva <peter@ficc.uu.net> <peter@sugar.lonestar.org>. 'U` -------------- +1 713 274 5180. "I agree 0bNNNNN would have been nice, however, and I sure wish X3J11 had taken time off from rabbinical hairsplitting to add it." -- Tom Neff <tneff@bfmny0>
martin@mwtech.UUCP (Martin Weitzel) (11/23/89)
In article <7076@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: [lines deleted] >(maybe it's time to dust `m4' off again) Agreed! I'm teaching courses on C and following this discussion I'll strongly recommend not to do too tricky things with the Preprocessor, if you want portable code. Rather use a tool, which is more under your control (sed, m4, awk, ...). Why not start a news group 'comp.lang.m4'?