leendert@cs.vu.nl (Leendert van Doorn) (12/22/88)
In article <11262@haddock.ima.isc.com> karl@haddock.ima.isc.com writes: >>anything using @ or $ (since they're not part of the C character set) > >That's what I thought, too, until I noticed that 3.1.7 in the May88 dpANS >contains an example wherein @ and $ are scanned as separate preprocessing >tokens. The accompanying text does not mention whether or not this behavior >is required of a conforming implementation. This example is valid. Hence the behaviour is required of a conforming implementation. This has to do with section 2.1.1.2 (phases of translation). In phase 3 the source file is decomposed into preprocessing tokens, and in phase 7 the preprocessor tokens are converted into (normal) tokens. This allows @ and $ character to be part of the preprocessor token set, but not to be part of the (normal) token set. However, nowhere in the standard is the conversion of preprocessor tokens to (normal) tokens described. This is an issue that should be clarified. In the compiler I wrote the lexical analyser simply breaks up the input into preprocessor tokens and these go (without any conversion) into the compilation process. The later one will filter out illegal things like $ and @. (the parser chokes and starts up an error recovery routine). -- Leendert P. van Doorn <leendert@cs.vu.nl> Vrije Universiteit / Dept. of Maths. & Comp. Sc. De Boelelaan 1081 1081 HV Amsterdam / The Netherlands tel. +31 20 548 5302
gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/27/88)
In article <1844@zell.cs.vu.nl> leendert@cs.vu.nl (Leendert van Doorn) writes: >However, nowhere in the standard is the conversion of preprocessor tokens >to (normal) tokens described. This is an issue that should be clarified. That too is covered under Phases of Translation, in recent drafts.
karl@haddock.ima.isc.com (Karl Heuer) (01/05/89)
Let's see if I've got this straight yet. o `$' is required to scan as a separate pp-token, despite existing practice making it an optional identifier-character. o When converting pp-tokens to tokens, an implementation is free to merge {foo}{$}{bar} into a single token {foo$bar}. (I'm guessing on this one.) o But, since macro expansion happens first, it is {foo}, and not {foo$bar}, that is subject to macro replacement, even if the above is true. o Hence, certain features of DEC and APOLLO implementations cannot be conforming. o DEC and APOLLO, through their representatives on X3J11, are aware of the above and accept it. Their ANSI C implementations, if any, will not use `$' in identifiers. o Non-English letters, which are clearly not usable in a strictly conforming program, are in fact not usable in *any* conforming program, for the same reasons that apply to `$'. o The international community is aware of this and accepts it. How much of the above is correct? Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
blarson@skat.usc.edu (Bob Larson) (01/05/89)
In article <11343@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes: [discussion of using $ in identifiers] >o DEC and APOLLO, Prime should be added to the list of compiler venders who use (and require in their non-portable libraries) $ in identifiers. -- Bob Larson Arpa: Blarson@Ecla.Usc.Edu blarson@skat.usc.edu Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson Prime mailing list: info-prime-request%ais1@ecla.usc.edu oberon!ais1!info-prime-request
leendert@cs.vu.nl (Leendert van Doorn) (01/05/89)
The following comments are based on the X3J11/88-090 (may/88) version of the dpANS report. In a couple of days I'll get the latest version, but for now it will do. In article <11343@haddock.ima.isc.com> karl@haddock.ima.isc.com writes: > Let's see if I've got this straight yet. > >o `$' is required to scan as a separate pp-token, despite existing practice > making it an optional identifier-character. Yes. The syntax of an identifier is (par. 3.1.2): identifier: nondigit | identifier nondigit | identifier digit ; nondigit: "_[a-z][A-Z]" digit: "0-9" Whether the '$' should be scanned as a separate pp-token depends on the source character set. >o When converting pp-tokens to tokens, an implementation is free to merge > {foo}{$}{bar} into a single token {foo$bar}. (I'm guessing on this one.) No, in this conversion the '$' is a garbage character. So what you get is {foo} <ERROR> {bar}. (the $ character is not part of the non-terminal identifier, see above). >o But, since macro expansion happens first, it is {foo}, and not {foo$bar}, > that is subject to macro replacement, even if the above is true. {foo$bar} can never be subject to any macro replacement, since it's not an identifier (see 3.8.3). >o Hence, certain features of DEC and APOLLO implementations cannot be > conforming. I don't know about DEC or APOLLO, but if they allow things like described above their implementations are not strictly conforming (perhaps there is a flag -pendatic as with the GNU C compiler ?). >o DEC and APOLLO, through their representatives on X3J11, are aware of the > above and accept it. Their ANSI C implementations, if any, will not use > `$' in identifiers. Depends on there policy. They are free to add features. Perhaps they will make a flag (if $ is the only nonconforming aspect). >o Non-English letters, which are clearly not usable in a strictly conforming > program, are in fact not usable in *any* conforming program, for the same > reasons that apply to `$'. The basic source set, the set in which source files are written, does not contain $, umlaut, accent grave, etc. The strings however, may contains these characters (depending on the size of the character representation you could use single or multibyte character strings). >o The international community is aware of this and accepts it. Yep, why not ? BTW: The best wishes for 1989. "Hope it's a good one" -- Leendert P. van Doorn <leendert@cs.vu.nl> Vrije Universiteit / Dept. of Maths. & Comp. Sc. De Boelelaan 1081 1081 HV Amsterdam / The Netherlands tel. +31 20 548 5302
scjones@sdrc.UUCP (Larry Jones) (01/06/89)
In article <11343@haddock.ima.isc.com>, karl@haddock.ima.isc.com (Karl Heuer) writes: > Let's see if I've got this straight yet. > > o `$' is required to scan as a separate pp-token, despite existing practice > making it an optional identifier-character. I don't believe that '$' is required to scan as anything. Since it is not in the C source character set, a conforming compiler is under no obligation to do anything in particular with it and so is at liberty to do anyting at all with it. If an implementation chooses to allow it in identifiers, that's fine (although it should diagnose the syntax violation - perhaps by congratulating you for seeing the value of using names containing dollar signs). ---- Larry Jones UUCP: uunet!sdrc!scjones SDRC scjones@sdrc.uucp 2000 Eastman Dr. BIX: ltl Milford, OH 45150 AT&T: (513) 576-2070 "Save the Quayles" - Mark Russell
karl@haddock.ima.isc.com (Karl Heuer) (01/11/89)
In article <1858@zell.cs.vu.nl> leendert@cs.vu.nl () writes: >In article <11343@haddock.ima.isc.com> karl@haddock.ima.isc.com writes: >> Let's see if I've got this straight yet. >> >>o `$' is required to scan as a separate pp-token, despite existing practice >> making it an optional identifier-character. > >Yes. The syntax of an identifier is [the pattern /[_a-zA-Z][_a-zA-Z0-9]*/]. > >Whether the '$' should be scanned as a separate pp-token depends on the source >character set. In the environment I'm thinking of, `$' should be legal in strings (where it represents the same symbol in the execution character set), hence it must be a member of the source character set, and by 3.1 it scans as a pp-token. >>o Hence, certain features of DEC and APOLLO implementations cannot be >> conforming. > >I don't know about DEC or APOLLO, but if they allow things like described >above their implementations are not strictly conforming (perhaps there is >a flag -pendatic as with the GNU C compiler ?). `Strictly conforming' is an attribute of programs, not implementations. An implementation is either ANSI C, or it isn't. According to the rules, accepting `$' in an identifier seems to yield a non-ANSI implementation. >>o DEC and APOLLO, through their representatives on X3J11, are aware of the >> above and accept it. Their ANSI C implementations, if any, will not use >> `$' in identifiers. > >Depends on there policy. They are free to add features. Perhaps they will >make a flag (if $ is the only nonconforming aspect). Hmm, assuming they do, I wonder if they'll follow Doug's suggestion of turning off __STDC__ whenever `$' is enabled. >>o Non-English letters, which are clearly not usable in a strictly conforming >> program, are in fact not usable in *any* conforming program, for the same >> reasons that apply to `$'. > >The basic source set, the set in which source files are written, does not >contain $, umlaut, accent grave, etc. The strings however, may contains these >characters (depending on the size of the character representation you could >use single or multibyte character strings). The source character set is used both inside and outside of string literals; those within string literals (or character constants) are mapped to the execution character set as they are tokenized. For the purposes of this discussion, I'm assuming that the source and execution character sets are identical, and that they contain `$' and/or non-English letters in addition to the minimal character set of 2.2.1. >>o The international community is aware of this and accepts it. > >Yep, why not ? Because the users can't use their native languages to name their variables. Doesn't it bother you that you can't have a variable named `IJspret' with a proper ligature instead of separate letters? It bothers me, and I don't even have any plans to use such a feature. (Actually, the problem occurs even in English; I once had a set of constants named DONT_xxx to selectively suppress individual features of a large system. I didn't worry about the lack of an apostrophe, because (a) there's nothing to be done about it, since the symbol is already in use, and (b) the meaning was clear without it. The correct use of the apostrophe seems to be declining in American English anyway. But that's a topic for a different group.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
henry@utzoo.uucp (Henry Spencer) (01/17/89)
In article <11383@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes: >`Strictly conforming' is an attribute of programs, not implementations. An >implementation is either ANSI C, or it isn't. According to the rules, >accepting `$' in an identifier seems to yield a non-ANSI implementation. Only if it is not diagnosed (e.g. by a warning message). I'm getting a bit tired of repeating this: accepting extensions does not make a compiler non-conforming. The requirements for a conforming implementation are that it handle all strictly conforming programs correctly, and that it diagnose (not necessarily reject, just diagnose) any construct which is illegal according to the standard. Actually, the character-set issue may be even less severe than this, depending on how the wording goes, but my copy of the October draft went out on a few days' loan a month ago (sigh) and isn't back yet, so I can't check the fine print just now. -- "God willing, we will return." | Henry Spencer at U of Toronto Zoology -Eugene Cernan, the Moon, 1972 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
scjones@sdrc.UUCP (Larry Jones) (01/19/89)
In article <1989Jan16.204214.15979@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes: > In article <11383@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes: > >`Strictly conforming' is an attribute of programs, not implementations. An > >implementation is either ANSI C, or it isn't. According to the rules, > >accepting `$' in an identifier seems to yield a non-ANSI implementation. > > Only if it is not diagnosed (e.g. by a warning message). I'm getting a > bit tired of repeating this: accepting extensions does not make a compiler > non-conforming. The requirements for a conforming implementation are that > it handle all strictly conforming programs correctly, and that it diagnose > (not necessarily reject, just diagnose) any construct which is illegal > according to the standard. That's what I thought, too. But Karl pointed out to me that is is possible to write a strictly conforming program that will NOT be interpreted correctly by an implementation that allows '$' in identifiers. All you need do is something like: #define foo$bar #ifdef foo . . . #endif The standard requires the #ifdef to be true, but any implementation that allows '$' in an identifier will evaluate it as false. ---- Larry Jones UUCP: uunet!sdrc!scjones SDRC scjones@sdrc.UU.NET 2000 Eastman Dr. BIX: ltl Milford, OH 45150 AT&T: (513) 576-2070 "When all else fails, read the directions."
gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/19/89)
In article <504@sdrc.UUCP> scjones@sdrc.UUCP (Larry Jones) writes: >That's what I thought, too. But Karl pointed out to me that is >is possible to write a strictly conforming program that will NOT >be interpreted correctly by an implementation that allows '$' in >identifiers. No, it isn't. Use of the $ character in an identifier produces "undefined behavior". The implementation of free to treat $ like _ in identifiers, because that cannot affect translation of any strictly conforming program.
henry@utzoo.uucp (Henry Spencer) (01/21/89)
In article <9438@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >>That's what I thought, too. But Karl pointed out to me that is >>is possible to write a strictly conforming program that will NOT >>be interpreted correctly by an implementation that allows '$' in >>identifiers. > >No, it isn't. Use of the $ character in an identifier produces >"undefined behavior"... Doug, can you cite chapter and verse for this? Remember that preprocessor tokens which are never converted to tokens are one of the exemptions from the rule in 2.2.1. After some study of the matter, I'm afraid my tentative conclusion is that when a funny character disappears before pptoken->token conversion time, the Oct draft is not entirely clear about whether its use is undefined, implementation-defined, or neither. In reality it must be considered to be at least implementation-defined, since it may not even exist in the source character set on some weird system, but I cannot find explicit words to that effect. One would actually prefer that it be undefined, but I doubt that you can do that without making it difficult to have funny characters in sections of code that are #ifdefed out -- and it is highly desirable that *that* be legitimate. A.6.3.4 thinks that 2.2.1 says that any extra members of either character set are implementation-defined, but those words are not found in 2.2.1. I think the right approach would be to tighten the "preprocessor token" exemption in 2.2.1 so that it refers only to the none-of-the-above single-character preprocessor tokens, but it's too late. In practice one can argue that the behavior is undefined under the "anything not mentioned is undefined" rule in 1.6, but this is not really entirely satisfactory. -- Allegedly heard aboard Mir: "A | Henry Spencer at U of Toronto Zoology toast to comrade Van Allen!!" | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
scjones@sdrc.UUCP (Larry Jones) (01/21/89)
In article <9438@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes: > In article <504@sdrc.UUCP> scjones@sdrc.UUCP (Larry Jones) writes: > >That's what I thought, too. But Karl pointed out to me that is > >is possible to write a strictly conforming program that will NOT > >be interpreted correctly by an implementation that allows '$' in > >identifiers. > > No, it isn't. Use of the $ character in an identifier produces > "undefined behavior". The implementation of free to treat $ like > _ in identifiers, because that cannot affect translation of any > strictly conforming program. But the critical point is that the $ character ISN'T in an identifier if the implementation is conforming: foo$bar gets parsed as three tokens just like foo+bar would. As long as the $ doesn't make it past the preprocessor phases of translation, I don't see anything in the standard that makes the program non- conforming, and that makes any implementation that allows $ in identifiers non-conforming since they do not parse the program correctly and thus do not translate it correctly. Please take another look at my (well, actaully Karl's) example. ---- Larry Jones UUCP: uunet!sdrc!scjones SDRC scjones@sdrc.UU.NET 2000 Eastman Dr. BIX: ltl Milford, OH 45150 AT&T: (513) 576-2070 "When all else fails, read the directions."
gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/21/89)
In article <511@sdrc.UUCP> scjones@sdrc.UUCP (Larry Jones) writes: >But the critical point is that the $ character ISN'T in an >identifier if the implementation is conforming: foo$bar gets >parsed as three tokens just like foo+bar would. It's still the case that $ is not going to appear in foo$bar context in a strict conforming application. I think the problem you have in mind is that foo$bar leads to surprises if foo or bar is a macro, just as use of EGAD when <errno.h> is included can lead to surprises. Perhaps the best way to implement extended identifier character sets would be with a non-conforming mode flag to the compiler to enable such an extension. I can see serious problems with use of non-Roman characters in foreign-language contexts. What did we respond to the Japanese comment about this?
gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/21/89)
In article <1989Jan20.175532.7447@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>No, it isn't. Use of the $ character in an identifier produces >>"undefined behavior"... >Doug, can you cite chapter and verse for this? I was concerned only with identifiers that made it through the preprocessing phase, since that is the situation I'm familiar with where $ in identifiers really is wanted in some implementations. Obviously, I don't recommend using $ in identifiers unless you HAVE to.
henry@utzoo.uucp (Henry Spencer) (01/24/89)
In article <9470@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >It's still the case that $ is not going to appear in foo$bar >context in a strict conforming application... I repeat a previous comment: as far as I can tell, there is no rule saying that a strictly-conforming program can't use $ in this context, provided that it disappears (or hides inside a string or whatever) before the end of preprocessing. I do believe that appearance of such a character in a context where it isn't being ignored *ought* to make a program non-strictly- conforming, but I cannot find anything in the Oct draft that *says* this. -- Allegedly heard aboard Mir: "A | Henry Spencer at U of Toronto Zoology toast to comrade Van Allen!!" | uunet!attcan!utzoo!henry henry@zoo.toronto.edu