hansen@pegasus.UUCP (09/06/84)
One of the minor(?) changes being proposed for the C standard is a change in the C Pre-Processor to change it from a totally character oriented processor closer to a token processor. It already does this to a certain degree by recognizing comments as separate tokens that don't get scanned for text to be replaced. However, this idea is being extended to include strings and character constants as tokens that don't get scanned for replacement text. The idea is to prevent bugs similar to the following: #define foo(d,g) printf("%d,%d", d, g) This would expand foo(f,e); to printf("%f,%f", f, e); and suddenly your integer variables f and e would be getting printed out in %f format rather than %d format. Under the new rules, the expansion would be: printf("%d,%d", f, e); This certainly solves the above problem. However, I have seen plenty of programs which use some of the following constructs: #define libpath(x) "/usr/lib/x" #define CTRL(x) ('x'&037) #define PRINT1(format,arg) printf("arg=%format.\n", arg); A common place to find the libpath construct is in uparm.h used by (among many others) the vi, curses, terminfo and termcap packages on both System Vr2 machines and 32V/BSD machines. I don't know of any system code that depends on the CTRL example, but I know of a number of people who have used it in the past. Those of you who have read the C Puzzle Book will realize that NONE of Alan Feuer's programs will work anymore! The questions are: Should this change be endorsed? If so, what should be done to bring back the lost functionality? If not, how would you make CPP more regular in its scanning rather than that which is the de-facto standard from Reiser? One possibility would be to introduce a new construct #sdefine which has the special property that strings and character constants would also be scanned for replacement; otherwise it would be identical to #define. The programs which use the above constructs would have to be changed to use #sdefine instead of #define, but no other changes would have to be made. Without something to replace the lost functionality, I feel that the number of programs which would have to be changed would be major. Tony Hansen pegasus!hansen
hamilton@uiucuxc.UUCP (09/08/84)
#R:pegasus:-169100:uiucuxc:21000013:000:211 uiucuxc!hamilton Sep 7 21:03:00 1984 c'mon, is it all that hard to avoid using "d" as a formal parameter for your "foo" macro? why break so many working programs to "fix" such a nonproblem? wayne ({decvax,ucbvax}!pur-ee!uiucdcs!uiucuxc!)hamilton
henry@utzoo.UUCP (Henry Spencer) (09/18/84)
> ............ However, this idea is being extended to include strings and > character constants as tokens that don't get scanned for replacement text. K+R, section 12.1: "Text inside a string or a character constant is not subject to replacement." In other words, this is not something new: the language has always been specified to behave that way. > ........................................ However, I have seen plenty of > programs which use some of the following constructs: > > #define libpath(x) "/usr/lib/x" > #define CTRL(x) ('x'&037) > #define PRINT1(format,arg) printf("arg=%format.\n", arg); Such programs are broken and unportable. Most non-Unix C compilers have been implemented "by the book", which means that none of the above things will work on them unless the implementors had a lot of Unix experience or had a Unix system to compare against. > The questions are: Should this change be endorsed? If so, what should be > done to bring back the lost functionality? If not, how would you make CPP > more regular in its scanning rather than that which is the de-facto standard > from Reiser? Of course it should be endorsed, since it's not really a change at all. The standard is the documentation, not Reiser's code. As for what should be done to bring back the lost functionality... the ANSI C folks have basically said "if you want a general-purpose macro processor, use m4". The programs that this "change" will break are broken already, and should be fixed to do it right. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
kpmartin@watmath.UUCP (Kevin Martin) (09/22/84)
>> ........................................ However, I have seen plenty of >> programs which use some of the following constructs: >> >> #define libpath(x) "/usr/lib/x" >> #define CTRL(x) ('x'&037) >> #define PRINT1(format,arg) printf("arg=%format.\n", arg); >Such programs are broken and unportable. > >> The questions are: Should this change be endorsed? If so, what should be >> done to bring back the lost functionality? If not, how would you make CPP >> more regular in its scanning rather than that which is the de-facto standard >> from Reiser? >Of course it should be endorsed, since it's not really a change at all. > Henry Spencer @ U of Toronto Zoology For a change, I disagree with Henry. However, there are two questions here, and I am not sure everyone is making the distinction: 1) Should strings be scanned for token replacement (i.e. look for #define'd names and replace them with their expansion)? (I call this "token replacement" or "macro substitution") 2) When a #define'd token is being inserted, and its expansion contains a string, should that string be scanned for formal parameters to the macro? (I call this "parameter substitution") It is fairly evident that the answer to (1) is NO. Otherwise, no string would be safe. You couldn't have the name 'putc' in a string, for instance. I think the answer to (2) is YES. It is often useful to have the formal parameters substituted into string or character constants, and it is not only possible but EASY for the programmer to avoid using any formal parameters which match tokens in any string in the expansion. e.g. it is easy to avoid #define f(d,x) printf( "%d %d", d, x ) The borderline between (1) and (2) is the size of the area which must be examined for conflicting identifiers. For (1), you must check every include file (and the source file up to the occurrence of the string in question). For (2), you only have the check that the formal parameters don't clash. And correcting clashes is far easier for (2) than for (1). Kevin Martin, UofW Software Development Group
joemu@tekecs.UUCP (Joe Mueller) (09/27/84)
> As for what should be done to bring back the lost functionality... the > ANSI C folks have basically said "if you want a general-purpose macro > processor, use m4". The programs that this "change" will break are > broken already, and should be fixed to do it right. As Henry stated, the X3J11 committee (ANSI C), felt that the preprocessor was not intended to be a general purpose macro processor, BUT, we did acknowledge that there was a large body of code that used these types of "features". The committee is currently concidering proposals for a) token concatination operations within the preprocessor. It will definitely NOT be startoftoken/**/argument. Currently it looks like the # will be used like this: startoftoken#argument. I don't believe we have definitely decided the syntax for the operation. I think that the committee did decide that the functionality was needed. b) "stringizing" (I didn't make up this term, someone else did) arguments is also under concideration. One proposal is to do the substitution if the argument name is the only thing within the quotes. i.e. #define foo(bar) printf("bar") will expand bar within the quotes where #define foo(bar) printf("the argument was bar") will not expand bar. The committee is not as dogmatic as it sounds on the net. It is our intention to produce a standard that will allow someone to do serious work without resorting to non-portable extensions. Please continue to discuss concerns about the developing standard on the net. I know several committe members (including myself) read it regularly. If you have alternate proposals for the machinery to do the above operations, let me know.
mwm@ea.UUCP (09/28/84)
/***** ea:net.lang.c / utzoo!henry / 4:20 am Sep 25, 1984 */ As for what should be done to bring back the lost functionality... the ANSI C folks have basically said "if you want a general-purpose macro processor, use m4". The programs that this "change" will break are broken already, and should be fixed to do it right. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry /* ---------- */ Great. Another tool that's nearly vital for writing C, but not available on most (all) non-Unix systems. Anybody got pointers to a public domain m4? <mike
kre@mulga.OZ (Robert Elz) (09/29/84)
From Henry Spencer: | | > ............ However, this idea is being extended to include strings and | > character constants as tokens that don't get scanned for replacement text. | | K+R, section 12.1: "Text inside a string or a character constant is | not subject to replacement." In other words, this is not something new: | the language has always been specified to behave that way. I think it instructional to consider the wording of the proposed (draft) standard. [This is from the July version, I doubt that its changed in the Sept one]. Sect 9.2: ..... Character constants and strings in the token sequence or in the rest of the program are not scanned for defined identifiers or formal parameters. .... Now consider the wording in the April version (it was sect 9.1 then) Sect 9.1: ..... Character strings in the token sequence or in the rest of the program are not scanned for defined identifiers. .... Note the difference. K&R was never clear on this point - its wording on this point (and others) was ambiguous. That is, a perfectly viable interpretation, taken by Reiser, was that strings in the token sequence could be scanned for parameters. There are (as has been pointed out many times) many reasons for allowing this. The ONLY one for denying it, that I can see, is that some people get confused (don't understand what's happening). The right way to solve that problem is to clearly document what happens - no-one will have any problems with it if its made clear what will happen. Henry continues: | | > The questions are: Should this change be endorsed? | | Of course it should be endorsed, since it's not really a change at all. | The standard is the documentation, not Reiser's code. The problem is that K&R is *not* a standard. If it was, we wouldn't need X3J11. In the absence of a standard, and in the presence of ambiguous documentation, the only place to look is in the implementations. Henry also stated (quote omitted) that most non unix C compilers adopted the restrictive approach. So, now we have a conflict - no immedate practical reason (in terms of broken code) for jumping one way or the other. In short, nearly the ideal situation for adopting the best solution. If C were a language for amateur programmers, beginnners, etc, I would tend to favour the restricted approach. But that's not what C is. Its a dangerous language, filled with dangerous features. Its for professionals. We should adopt the most useful approach - the one that gives the greatest power to the programnmer - that is clearly the liberal approach. Pragmatically too, it will be much easier to convert programs broken by this strategy (those in which macro replacement text contains strings containing "accidental" references to parameters) than those broken by the current draft proposed standard (those that use replacement inside strings to good effect). In the former case, all that needs to be done is to rename the formal parameter. In the latter, some whole new mechanism needs to be devised - possibly requiring changes in the source. I also suspect that less programs would be broken by the former. Henry again: | | As for what should be done to bring back the lost functionality... the | ANSI C folks have basically said "if you want a general-purpose macro | processor, use m4". The programs that this "change" will break are | broken already, and should be fixed to do it right. No-one is asking for a full blown macro processor, just that subset that is really useful for C programs. If the committee were to take the "use m4" attitude, they would logically have to standardize m4 as a (possibly optional) part of the C compiler. Otherwise all those programs that go to the trouble of adopting their recommendation, and use m4, will stop being portable, which can hardly be the aim. Joe Mueller replied: | | As Henry stated, the X3J11 committee (ANSI C), felt that the preprocessor | was not intended to be a general purpose macro processor, BUT, we did | acknowledge that there was a large body of code that used these types | of "features". The committee is currently concidering proposals for | | a) token concatination operations within the preprocessor. It will | definitely NOT be startoftoken/**/argument. Currently it looks like | the # will be used like this: startoftoken#argument. I don't believe | we have definitely decided the syntax for the operation. I think that | the committee did decide that the functionality was needed. I agree that this is needed - while I regret the need to alter some of my source (I am a xxx/**/yyy user) I admit that this is a revolting way of forming tokens, something better, anything better, would be welcome. [No, please don't tell me about your favourite revolting way of avoiding xxx/**/yyy, I've seen most of them, none of the existing ones is clearly better.] The '#' operator proposal looks reasonable to me. When you're considering this, please also remember to do something about the problems of blanks in the actual parameter strings - are they signifigant, or not? That is spaces between the preceding comma or '(' and the start of the replacement text, and blanks after the text before the ')' or next comma. I would prefer that the standard make it clear that these should not be included as part of the replacement text. Joe: | | b) "stringizing" (I didn't make up this term, someone else did) arguments | is also under concideration. One proposal is to do the substitution | if the argument name is the only thing within the quotes. i.e. | #define foo(bar) printf("bar") | will expand bar within the quotes where | #define foo(bar) printf("the argument was bar") | will not expand bar. Ugh! How could you justify that! I appreciate, that combined with constant string concatenation, it would give all the functionality that is needed - the second example could be rephrased as: #define foo(bar) printf("the argument was ""bar") but that's going to be a nasty distinction to try to explain to anyone. And that would break ALL existing implementations. Seems to me that in this case, adopting the Reiser interpretation is the better thing to do. Document it clearly, so people aren't trapped, and that should end the problems. Robert Elz decvax!mulga!kre
henry@utzoo.UUCP (Henry Spencer) (09/30/84)
> ............................ One proposal is to do the substitution > if the argument name is the only thing within the quotes. i.e. > #define foo(bar) printf("bar") > will expand bar within the quotes where > #define foo(bar) printf("the argument was bar") > will not expand bar. Some folks may not understand why the approach Joe describes is a full solution to "stringizing". Remember that the draft standard specifies that consecutive string constants are concatenated at compile time, so you could say something like #define foo(bar) printf("the argument was " "bar") to get the effect of substitution within a string. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (09/30/84)
Hurray for Robert Elz!
henry@utzoo.UUCP (Henry Spencer) (10/02/84)
> .................... K&R was never clear on this point - its > wording on this point (and others) was ambiguous. That is, > a perfectly viable interpretation, taken by Reiser, was that > strings in the token sequence could be scanned for parameters. Point taken, K&R was never entirely clear on this, but it *sounded* clear enough that a lot of people just assumed "no substitution inside strings". Including a lot of implementors. > The problem is that K&R is *not* a standard. If it was, we wouldn't > need X3J11... On the contrary, ask most C compiler implementors outside Bell and they will tell you that K&R is the standard they worked from. It is true that K&R is not precise enough or complete enough to be an ANSI-quality standard, but anyone who denies that K&R has been a *de facto* standard for quite some time is kidding himself. A poor one, yes, but a standard. > No-one is asking for a full blown macro processor, just that subset > that is really useful for C programs... The trouble is that "really useful" is a subjective judgement. My personal view is that both token concatenation and in-string substitution are useless junk. I am quite aware that other people feel otherwise. Anyone who has looked at the implementation of the S statistics language has some idea of just how far "really useful" can be pushed. (Much of the S stuff needs *several passes* through m4 before compilation!) In practice, one has to draw the line somewhere. The question is not "is this feature useful?" but "is this feature useful *enough* to force *everyone* to implement it?". Saying "if you want fancy stuff, use m4" is not a cop-out, it is a statement that the committee is not going to solve all the world's problems. > Seems to me that in this case, adopting the Reiser interpretation > is the better thing to do. Document it clearly, so people aren't > trapped, and that should end the problems. My impression is that the committee's biggest problem with this is that retrofitting the Reiser interpretation into existing compilers is not necessarily easy. A good many compilers do *not* do the preprocessing as a separate text-manipulation step first; their "preprocessors" are integrated into the scanner, or following it. Pulling tokens apart again into text, and then reassembling them, isn't trivial. The current committee notion (substitution only on whole strings) avoids much of the complexity of this. One can argue that issues of current implementation are not significant, that the future users should be the primary consideration. This ignores a nasty pragmatic consideration: if the standard is going to fly, standard-conforming implementations are going to have to be common. It would really be nice if existing compilers didn't have to be rewritten from scratch. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
henry@utzoo.UUCP (Henry Spencer) (10/02/84)
> Great. Another tool that's nearly vital for writing C, but not available > on most (all) non-Unix systems. Anybody got pointers to a public domain > m4? Gee, I've never found any of the stuff under discussion (token concatenation and substitution inside strings) either "nearly vital" or even particularly useful. The new string-constant-concatenation feature would answer the one or two places where I've wanted to use such things. My point is not that these features aren't useful in some sense -- it is quite possible that I just haven't encountered the particular situations where they are useful -- but that they are not, in any realistic sense, "nearly vital" for writing C. The existence of active C programmers who have never used them and don't miss them is notable, as is the existence and continuing use of C compilers that don't implement them. I don't know of a public-domain m4, but there are public-domain macro processors of other kinds. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
bsa@ncoast.UUCP (Brandon Allbery) (10/04/84)
> Article <> > From: mwm@ea.UUCP > /***** ea:net.lang.c / utzoo!henry / 4:20 am Sep 25, 1984 */ > As for what should be done to bring back the lost functionality... the > ANSI C folks have basically said "if you want a general-purpose macro > processor, use m4". The programs that this "change" will break are > broken already, and should be fixed to do it right. > -- > Henry Spencer @ U of Toronto Zoology > {allegra,ihnp4,linus,decvax}!utzoo!henry > /* ---------- */ > > Great. Another tool that's nearly vital for writing C, but not available > on most (all) non-Unix systems. Anybody got pointers to a public domain > m4? > > <mike Anybody got pointers to a sane ANSI committee? We just got a C compiler on CSUOHIO.BITNET (VM/370) and I intend to port quite a few of my compatible (i.e. not based on Unix peculiarities) programs. If the ANSI committee thinks I'm going to use m4 on Unix and lose ALL portability, they've another think coming. --bsa
henry@utzoo.UUCP (Henry Spencer) (10/07/84)
>> Great. Another tool that's nearly vital for writing C, but not available >> on most (all) non-Unix systems. Anybody got pointers to a public domain >> m4? >> >> <mike > >Anybody got pointers to a sane ANSI committee? We just got a C compiler >on CSUOHIO.BITNET (VM/370) and I intend to port quite a few of my compatible >(i.e. not based on Unix peculiarities) programs. If the ANSI committee >thinks I'm going to use m4 on Unix and lose ALL portability, they've another >think coming. My personal impression is that the committee is saner than most of the people flaming on this issue. If they say "if you want a general-purpose macro processor, use m4", all this means is that they are not able to solve all the world's problems. At some point, it is necessary to give up and say "the tool we are trying to settle on is not powerful enough to solve your problem". Otherwise they never produce a standard, since the number and complexity of problems that people would *like* their tool to solve tends to grow without bound. The committee, as nearly as I can see, is *not* crazy and is quite concerned about portability. They have simply judged that the problems that are under discussion are (a) sufficiently uncommon, (b) sufficiently ill-understood, and (c) sufficiently difficult, that attempting to solve them in the C standard is inappropriate. I agree. Bear in mind that we do not **WANT** a C standard committee that is bent on solving every possible problem. The result would look nothing like C. This has happened to other languages; ever looked at some of the recent output from the ANSI BASIC effort? If you are a serious C user, it is appropriate for you to thank whatever gods you believe in that the ANSI C committee hasn't gone that way. -- "If you ask for the moon, you may get the shaft instead." Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry