[comp.std.c] 0x47e+barney not considered C

tom@hcx2.SSD.HARRIS.COM (06/29/88)

I tried posting a comment about this before, but never got any
responses, so I suspect it became trapped in a maze of twisty little
passages in our local net without reaching the world at large.

To get your attention right away, I will point out the following
true fact:

      fred = 0x47e+barney ;

Is NOT a legal Ansii standard C statement. It contains a lexical
error.

Background:

Ansii C introduced the concept of pre-processing numbers, I quote
from the standard:

   Preprocessing numbers

   Syntax
           pp-number:
                   digit
                   . digit
                   pp-number digit
                   pp-number nondigit
                   pp-number e sign
                   pp-number E sign
                   pp-number .

   Description

   A preprocessing number begins with a digit optionally preceded by a
   period ( . ) and may be followed by letters, underscores, digits,
   periods, and e+ , e- , E+ , or E- character sequences.

   Preprocessing number tokens lexically include all floating and integer
   constant tokens.

   Semantics

   A preprocessing number does not have type or a value; it must be
   converted (as part of phase 7) to a floating constant token or an
   integer constant token to acquire both.

Prior to the introduction of pre-processing numbers (which, according
to the rationale, were introduced to `simplify' the definition of
tokens) I had always assumed that a pre-processing token consisted of
the longest valid prefix of a token according to the fairly
unambiguous definition at the front of the standard. I made this rash
assumption because it was sensible, easy to implement, well defined,
and only caused confusion when someone was doing something that was
just asking for trouble.

Apparently the C committee spent a lot of time looking at the last
case and decided that it was far more important to serve the needs of
the contestants in the annual obfuscated C contest than it was to
serve the needs of people who happen to have code in which a hex
constant ending in e is added to something. Thus, the origin of the
obscure and pointless `pre-processing number'.

I pointed out this error in a formal response to the standard, but the
committee was apparently too tired of arguing to do anything about it,
as a result, they decided to leave pre-processing numbers alone.

The best thing to do is get lots of complaints about this in to the
committee during this next review period. I am willing to put up with
pre-processing numbers, but they really need to be changed to separate
ones that lead off with 0x from the others.  I still like the longest
valid prefix rule much better, but I can live with anything that does
not turn perfectly valid C code into a lexical error.

If anyone out there can find some existing code that does this, that
would be excellent support for the need to fix this major flaw.

=====================================================================
    usenet: tahorsley@ssd.harris.com  USMail: Tom Horsley
compuserve: 76505,364                         511 Kingbird Circle
     genie: T.HORSLEY                         Delray Beach, FL  33444
====================== Astrology: Just say no! ======================

jss@hector.UUCP (Jerry Schwarz) (06/30/88)

In article <120200001@hcx2> tom@hcx2.UUCP writes:
>
>
>Prior to the introduction of pre-processing numbers (which, according
>to the rationale, were introduced to `simplify' the definition of
>tokens) I had always assumed that a pre-processing token consisted of
>the longest valid prefix of a token according to the fairly
>unambiguous definition at the front of the standard. I made this rash
>assumption because it was sensible, easy to implement, well defined,
>and only caused confusion when someone was doing something that was
>just asking for trouble.
>

Although I am not a member of the committee, I have seen many of
their working drafts and do not believe that there was ever any such
definition.  Nor do I believe it was ever the committee's intention.
Before the introduction of pp-numbers there was always a comment
about 1ex being an "illegal token" not two tokens. 

The rule you suggest is a bad one because it does not provide an an
easy way to extend the syntax of numbers.  For example, if I have an
implementation with a "long long" type  I might want to allow 6LL as
an integer constant of that type. Under the "longest legal prefix"
rule I can't. It gets tokenized as "6L" "L".  

>Apparently the C committee spent a lot of time looking at the last
>case and decided that it was far more important to serve the needs of
>the contestants in the annual obfuscated C contest than it was to
>serve the needs of people who happen to have code in which a hex
>constant ending in e is added to something. Thus, the origin of the
>obscure and pointless `pre-processing number'.
>

As the author of the comment (during the first public review period)
that proposed pp-numbers I resent the implication of the preceeding
paragraph. I anticipate an apology.

The earlier drafts were ambiguous about several lexical issues,
especially about the interactions between tokens and pp-tokens.   The
current proposal is not.  The main motivation was to clean up
issues surrounding things like the "1ex" question.  

I believe an explicit syntax for pp-numbers is required.  The "0xe+b"
example points to a flaw in the current definition. But, in my
opinion, it is a minor flaw and does not require a change at this
stage in the standardization process.

Jerry Schwarz
Bell Labs, Murray Hill

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/01/88)

In article <120200001@hcx2> tom@hcx2.SSD.HARRIS.COM writes:
>Apparently the C committee spent a lot of time looking at the last
>case and decided that it was far more important to serve the needs of
>the contestants in the annual obfuscated C contest than it was to
>serve the needs of people who happen to have code in which a hex
>constant ending in e is added to something. Thus, the origin of the
>obscure and pointless `pre-processing number'.

Revisionist.  I don't remember quite how pp-numbers arrived,
but certainly it had nothing to do with this fanciful scenario.

>I pointed out this error in a formal response to the standard, but the
>committee was apparently too tired of arguing to do anything about it,
>as a result, they decided to leave pre-processing numbers alone.

Excuse me, but that was NOT the X3J11 committee's response to your
comment.  In case you haven't received the official response document
yet (which is possible due to confusion between CBEMA and X3J11 as to
who was supposed to do what), here is the raw input for the response
to that item in your comment letter:

P02	88-032	X1	6	3.1.8	PD	Eliminate preprocessing numbers.
'\"PD:	The Standard reflects the result of previous discussion of this issue.
'\"
Before the introduction of the \fIpreprocessing number\fP syntax,
the Committee had considered other proposals
similar to the one suggested in this public comment.
We were uncomfortable with preprocessing behavior that
could parse ``garbage'' into a sequence that contained an identifier,
which is then macro-replaced to form a ``sensible'' statement.
[END RESPONSE]

Now, it is entirely possible that you do not consider this to be
an adequate response, and you are entitled to respond to X3J11's
response within 15 working days after you receive the official
response.  (This posting is NOT official, nor are responses posted
here or mailed to committee members instead of X3.)  Should you send
a re-response, it would be added to comments from the third formal
public review currently underway (ending 1-Sep-1988) and would be
considered at the next X3J11 full committee meeting, which has been
rescheduled to late September.

>The best thing to do is get lots of complaints about this in to the
>committee during this next review period.

NO!  X3J11 will be entitled to ignore such comments from anyone other
than the original second-round commentor, since it has requested that
comments in the third formal public review be limited to remarks on
substantive CHANGES made as a result of the second public review.
(Of course, if the third public review document is really being
sent out without accompanying Rationale, as someone reported, it is
possible that the cover letter containing this restriction got left
out too.)

If X3J11 happens to decide that there is a significant technical
flaw in this area that is worth delaying publication of the Standard
for yet another six months in order to fix, then I expect that they
will do so.  Or, if the fix could legitimately be considered
"editorial", meaning that the committee really intended it all along
but didn't state the specification quite right, then perhaps this
could be fixed without delaying the final Standard.

Why do you think it so important for "0x47e" to be considered a
preprocessing number token?  Just what is it that needs "fixing"?
Is it that "0x47e" is supposed to be split into preprocessing tokens
"0" and "x47e" (the second of which may be subject to macro
replacement!) and in translation phase 7 they are not said to be
spliced back together into a single (regular) token, so that it is
impossible for an integer constant "0x47e" to ever be seen after
phase 6?  If so, that does seem to me to be a problem, but it has
nothing to do with "+barney" or with the final "e" on the constant;
it's a generic problem for all hex constants (and was certainly not
the committee's intention, so fixing this would presumably be
considered editorial).

P.S.  I don't think the committee was "too tired of arguing to
do anything about it".  More likely the review subgroup that
tackled your comments didn't fully understand the problem.  If I've
correctly summarized it in the previous paragraph, then try an
argument along those lines in your re-response.

P.P.S.  I was the only committee member who voted against sending
out the revised draft for the third public review, on the grounds
that there had been insufficient time allotted to study second-
round comments before responses were required.  This may be an
example of that.  I do think the committee did a remarkably good
job under the [self-imposed] circumstances.

P.P.P.S.  No, I did NOT purposely cause the next meeting to be
delayed long enough to give more careful consideration to third-
round comments, even though that's how it seems to have turned out!

tom@hcx2.SSD.HARRIS.COM (07/01/88)

jss@hector.UUCP writes:

>Although I am not a member of the committee, I have seen many of
>their working drafts and do not believe that there was ever any such
>definition [longest valid prefix].

You are absolutely right, there never was, as I said, it was a rash
assumption on my part because the area of "What is a token?" needed a
definition and that was the only one I could imagine that made any
sense. The area definitely needed defining, and preprocessing numbers
are certainly better than the complete lack of definition that existed
before. [Feel free to consider this an apology].

>The rule you suggest is a bad one because it does not provide an an
>easy way to extend the syntax of numbers.  For example, if I have an
>implementation with a "long long" type  I might want to allow 6LL as
>an integer constant of that type. Under the "longest legal prefix"
>rule I can't. It gets tokenized as "6L" "L".  

This does not make sense. If I am extending C with new features and I
make LL a valid suffix, then I have changed the definition of what a
valid token *is*, so I would not tokenize 6LL as 6L L, but as 6LL.
Obviously, I may have problems if some twisted programmer somewhere
has previously written code which tries to take advantage of the 6L L
tokenization by making L a macro that expands to +5 or something, but
I am not sure I care much, because on the other side of the coin, I
can probably port your code that uses 6LL to my system by just
defining L to the empty string.

And gwyn@brl-smoke.UUCP writes:

>In case you haven't received the official response document
>yet (which is possible due to confusion between CBEMA and X3J11 as to
>who was supposed to do what),

Yeah, you're right, I haven't gotten it yet, although I did get a nice
letter telling me that I haven't gotten it yet.

>We were uncomfortable with preprocessing behavior that
>could parse ``garbage'' into a sequence that contained an identifier,
>which is then macro-replaced to form a ``sensible'' statement.

I guess I just don't care what happens to ``garbage'' when the
alternative definition in the standard turns ``sensible'' C into
``garbage''.

>Why do you think it so important for "0x47e" to be considered a
>preprocessing number token?  Just what is it that needs "fixing"?
>Is it that "0x47e" is supposed to be split into preprocessing tokens
>"0" and "x47e" (the second of which may be subject to macro
>replacement!) and in translation phase 7 they are not said to be
>spliced back together into a single (regular) token, so that it is
>impossible for an integer constant "0x47e" to ever be seen after
>phase 6?  If so, that does seem to me to be a problem, but it has
>nothing to do with "+barney" or with the final "e" on the constant;

Goodness gracious no. My proposal of longest valid prefix would parse
0x47e+barney into 0x47e + barney NOT 0 x47e + barney. The point is
that the definition of preprocessing numbers calls 0x47e+barney a
SINGLE token. This means it will be treated as a single unit all the
way up until it is converted to a token.  The standard says that
behavior is undefined if a pp-token cannot be converted to a token,
this (presumably) gives an implementation the right to convert this
single pp-token "0x47e+barney" into the three tokens "0x47e" "+"
"barney", but the major problem is that "barney" might have been a
macro. It is now too late to expand it.

Obviously, there is no reason to actually remove pp-numbers at this
point in the evolution of the standard, it would be too big a change.
But I do feel that the definition should be changed to allow something
that is currently perfectly legal C to remain legal. The real problem
is that is is actually fairly hard to write the grammar so that it
works. For what it is worth here is an attempt:

           pp-hex-prefix:       (the two chars that can start a hex number)
                   0 x
                   
           pp-not-hex-prefix:   (the things that can start other numbers)
                   . digit
                   digit .
                   digit digit
                   digit e sign
                   digit E sign
                   digit nondigit-except-x
                   
           pp-number:           (single digits, or hex or not-hex pp numbers)
                   digit
                   pp-hex-prefix
                   pp-not-hex-prefix
                   pp-hex-prefix pp-hex-suffix
                   pp-not-hex-prefix pp-not-hex-suffix
                   
           pp-not-hex-suffix:
                   digit
                   . digit
                   pp-number digit
                   pp-number nondigit
                   pp-number e sign
                   pp-number E sign
                   pp-number .
           
           pp-hex-suffix:
                   digit
                   pp-number digit
                   pp-number nondigit

I think that does it, but I can't be sure. A pp-number is now a number
that starts with "0x" and is followed by any number of digits or
letters, or it is a number that starts with something other than "0x"
and contains all the stuff the current pp-number definition has
(decimal points, e+, E+, e-, E-, letters).

With this definition 0x47e+barney will now parse as 0x47e + barney and
6LL will still parse as 6LL.

It sure seems like it ought to be possible to simplify this grammar.
Do I hear any alternate definitions? Or should the committee just
leave the grammar the way it is and stick in some language about
leading 0x not allowing the e+ e- stuff?  Also I took out '.' as long
as I was splitting the definition by prefix, but if someone has a good
reason to leave '.'s in hex pp-numbers I don't much care. It is the e+
that causes all the trouble. I would really like this fixed for the
final standard and something that could be considered an editorial
change is probably the only thing that stands a chance.

P.S. I am glad to see that stuff I post can actually make it out to
the net.

=====================================================================
    usenet: tahorsley@ssd.harris.com  USMail: Tom Horsley
compuserve: 76505,364                         511 Kingbird Circle
     genie: T.HORSLEY                         Delray Beach, FL  33444
======================== Aging: Just say no! ========================

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (07/01/88)

In article <120200001@hcx2> tom@hcx2.SSD.HARRIS.COM writes:
| 
| I tried posting a comment about this before, but never got any
| responses, so I suspect it became trapped in a maze of twisty little
| passages in our local net without reaching the world at large.
| 
| To get your attention right away, I will point out the following
| true fact:
| 
|       fred = 0x47e+barney ;
| 
| Is NOT a legal Ansii standard C statement. It contains a lexical
| error.

  If the standard really says that this is not legal C the standard is
broken, not the program. The language was changed once to replace
operators like "=*" with "*=" to avoid nonsense like this. I assume that
the standard is poorly stated rather than intended to break existing
programs, but after introducing trigraphs into an *American* standard to
make the language acceptable *elsewhere*, I wouldn't bet on it.

  At the early meetings the committee expressed a desire to "codify
existing practice without egregiously breaking existing programs."
Obviously the desire to be inventive has modified that somewhat.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (07/01/88)

In article <10413@ulysses.homer.nj.att.com> jss@hector (Jerry Schwarz) writes:

| I believe an explicit syntax for pp-numbers is required.  The "0xe+b"
| example points to a flaw in the current definition. But, in my
| opinion, it is a minor flaw and does not require a change at this
| stage in the standardization process.

  I hate to say this, but a bad standard which breaks existing programs
should not be rushed out the door, even if everyone is tired of working
or waiting on it. If it breaks existing programs, there better be a
better rationale than the convenience of the compiler implementors. If
this really was the intent that hex number ending in e can't be followed
by a variable, then I don't see any rationale at all. If the wording is
poor it should be changed.

  Was it intended that exponentiol value be allowed on hex (and I assume
octal) values? If so, how about fractional values, such as "0x1ea.b", or
even worse "0x1ea.e+2"? Is the final e a fractional part or an exponent?

  I will assume that the committee intended this to work as it always
has, and not have a program which will fail if there is no whitespace
somewhere. This should just be an editorial change.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (07/01/88)

In article <8194@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:

| NO!  X3J11 will be entitled to ignore such comments from anyone other
| than the original second-round commentor, since it has requested that
| comments in the third formal public review be limited to remarks on
| substantive CHANGES made as a result of the second public review.
| (Of course, if the third public review document is really being
| sent out without accompanying Rationale, as someone reported, it is
| possible that the cover letter containing this restriction got left
| out too.)

  Regardless of what X3J11 is entitled to do, it would be foolish to let
a mistake like this go through. "entitled to ignore" seems to be a
poorly chosen way of putting it.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/02/88)

In article <8194@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> Is it that "0x47e" is supposed to be split into preprocessing tokens
> "0" and "x47e" ...

Oops, Dave Prosser straightened me out on this.  When I looked at the
pp-number grammar in section 3.1.8, somehow the construction
	pp-number:	pp-number nondigit
did not register.  Perhaps my eyes saw it but my mind refused to
believe what I saw.

Anyway, the problem is that the pp-number syntax is TOO greedy, not
(as I feared) that it was insufficiently broad.

Although it is somewhat annoying to realize that
	123zzz456
is a single valid preprocessing token, that wouldn't really matter
so long as it became invalid in translation phase 7, as in fact it does.
The truly serious problem is that
	0x47e+barney
or even
	0xE+0x10
appears to be invalid C because of this error in specification; it's
turned into a single token in phase 7 instead of the three tokens one
might expect.  I don't know a simple way to fix this (which certainly
was unintended), but probably other X3J11 committee members can figure
out how.  This does deserve a re-response and committee reconsideration.

jss@hector.UUCP (Jerry Schwarz) (07/03/88)

In article <11445@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>In article <10413@ulysses.homer.nj.att.com> jss@hector (Jerry Schwarz) writes:
>
>| I believe an explicit syntax for pp-numbers is required.  The "0xe+b"
>| example points to a flaw in the current definition. But, in my
>| opinion, it is a minor flaw and does not require a change at this
>| stage in the standardization process.
>
>  I hate to say this, but a bad standard which breaks existing programs
>should not be rushed out the door, even if everyone is tired of working
>or waiting on it. If it breaks existing programs, there better be a
>better rationale than the convenience of the compiler implementors. If
>this really was the intent that hex number ending in e can't be followed
>by a variable, then I don't see any rationale at all. If the wording is
>poor it should be changed.
>

My purpose in proposing pp-numbers was ease of understanding of the
rules. That would frequently make it easier to implement, but that is
a happy side effect not the motivation.

I suppose "rush" is a subjective notion, but given the time that has
elapsed since the committee began work nothing that emerges at this
point would seem rushed to me.

If you believe that any standard that breaks any existing program is
bad then I'm afraid you will have to accept a bad standard.  There
are several points at which the current proposal may break existing
K&R programs.  See the "silent changes" discussed in the rationale.

>  Was it intended that exponentiol value be allowed on hex (and I assume
>octal) values? If so, how about fractional values, such as "0x1ea.b", or
>even worse "0x1ea.e+2"? Is the final e a fractional part or an exponent?
>

I can't speak to what the committee believes, but my intention when I
proposed pp-numbers was that all of these are pp-tokens.  They are
not, however, legal tokens.  Because they are illegal tokens it makes
no sense to ask what the "." or the "e+" are. As I pointed out in my
earlier note, the committee has always had a vague notion of "illegal
token".  The syntax of pp-number is intended to be part of the
formalization of that notion. Any change for "0xe+y" should not change
the status of the above.

>  I will assume that the committee intended this to work as it always
>has, and not have a program which will fail if there is no whitespace
>somewhere. This should just be an editorial change.

Whitespace has always been required in some contexts.  A well known
example is "x+++++y"  versus "x++ + ++y".

hugh@dgp.toronto.edu ("D. Hugh Redelmeier") (07/03/88)

In article <120200001@hcx2> tom@hcx2.UUCP points out that under the
draft ANSI standard for C, preprocessor numbers are too greedy.
He gave the example of

	0x47e+barney

which is parsed as a preprocessor number, and then rejected when it
cannot be converted to a legitimate C token.  He is correct, and I
agree with him in considering this a mistake.  I too submitted a
comment on this in the last public review period, and mine too seems
to have been ignored (I have not received the response document).

In article <10413@ulysses.homer.nj.att.com>, Jerry Schwarz says that
Tom's article insults his construct, the pp-number, and that Tom's
fix is bad.  Furthermore, Jerry thinks the problem does not warrant a
fix.

I think that the pp-number construct did clean up a mess (which I
too had been pointing out for a while).  But it can and should be
fixed to not break formerly valid and perfectly reasonable C
programs.

In article <8194@brl-smoke.ARPA>, Doug Gwyn asks:

| Why do you think it so important for "0x47e" to be considered a
| preprocessing number token?  Just what is it that needs "fixing"?
| Is it that "0x47e" is supposed to be split into preprocessing tokens
| "0" and "x47e" (the second of which may be subject to macro
| replacement!) and in translation phase 7 they are not said to be
| spliced back together into a single (regular) token, so that it is
| impossible for an integer constant "0x47e" to ever be seen after
| phase 6?  If so, that does seem to me to be a problem, but it has
| nothing to do with "+barney" or with the final "e" on the constant;
| it's a generic problem for all hex constants (and was certainly not
| the committee's intention, so fixing this would presumably be
| considered editorial).

For me, the problem is that the +barney is absorbed into the hex
constant.  The + clearly ought to be a separate token, and so should
the barney.

Here is what I submitted to the committee in the previous round:

Page 33, line 36, Section 3.1.8:
preprocessor number too greedy (consider 0xABCDE+1)

The current rules for parsing preprocessor numbers are too greedy.
They are willing to match + or - after an e or E.  If the e came
from a floating point number, that is fine, but if it came from a
hexadecimal number, it is not.  Consider the following examples:

	0xABCDE+1
	0xABCDE+cat

	0xABCDEF+1
	0xABCDEF+cat

The first two lines used to be legitimate C expressions.  Now each
is a pp-number that cannot be turned into a valid C token.  The
second two lines were and remain legitimate expressions.

Although I think that the whole concept is wrong, it can be patched
up to solve this problem.

Proposed grammar:

pp-number:
	integer-constant
	floating-constant
	pp-number digit
	pp-number nondigit
	pp-number .

I find this definition intuitively appealing: it reflects what is
really going on.  Others may prefer one that is simpler to implement:

Alternate grammar:

pp-number:
	pp-floating-constant
	pp-number digit
	pp-number nondigit
	pp-number .

pp-floating-constant:
	digit
	. digit
	pp-floating-constant .
	pp-floating-constant digit
	pp-floating-constant e sign
	pp-floating-constant E sign

Note that in pathological cases, these differ.  Consider:

	1.1.e+5

----------------------------------------------------------------

Further notes on Doug's comments:

| P.S.  I don't think the committee was "too tired of arguing to
| do anything about it".  More likely the review subgroup that
| tackled your comments didn't fully understand the problem.  If I've
| correctly summarized it in the previous paragraph, then try an
| argument along those lines in your re-response.

As I understand it, most committee members saw most comments for the
first time during the meeting (I am a member; I got only the early
comments in a mailing).  Since the meeting is a very busy period,
most comments could not have been read by very many committee
members, and certainly not read very carefully.

| P.P.S.  I was the only committee member who voted against sending
| out the revised draft for the third public review, on the grounds
| that there had been insufficient time allotted to study second-
| round comments before responses were required.  This may be an
| example of that.  I do think the committee did a remarkably good
| job under the [self-imposed] circumstances.

I think that you put it well, and very diplomatically (perhaps too
diplomatically).

Hugh Redelmeier
{utcsri, utzoo, yunexus, hcr}!redvax!hugh
In desperation: hugh@csri.toronto.edu
+1 416 482 8253

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/04/88)

In article <8807030058.AA18406@explorer.dgp.toronto.edu> hugh@dgp.toronto.edu ("D. Hugh Redelmeier") writes:
>I too submitted a comment on this in the last public review period,
>and mine too seems to have been ignored
>(I have not received the response document).

I assure you that none of the public comments were ignored.
The response document should arrive "soon", last I heard.
Meanwhile, here's the raw formatter input for the committee
response to your comment on this issue:

P24	88-054	3	33/36	3.1.8	N	\fIPreprocessor number\fP
						is too greedy.
'\" N:	The Committee discussed this proposal but decided against it.
'\"
The behavior pointed out is admittedly surprising,
but the Committee feels that this area is sufficiently complex
that trying to fix this particular case is very likely to lead to
surprises in other related areas.
Neither proposed solution is acceptable to the Committee.
The first proposal defines preprocessor grammar elements
by means of elements of the C language proper.
The second proposal would cause the number \f\*(cW1e4+6\fP
to be accepted as one preprocessing number,
which is behavior as surprising as that noted in the comment.

As I said earlier, this is not official (the hardcopy document
you receive will be), and you may well consider the response
to be inadequate, in which case you'll have 15 days after
receipt of the official response document to reply to that effect.

>Furthermore, Jerry thinks the problem does not warrant a fix.

That wasn't the impression I got.  I think everyone in this
discussion so far has agreed that the current specification
needs to be fixed to allow obviously correct constructs.

>As I understand it, most committee members saw most comments for the
>first time during the meeting (I am a member; I got only the early
>comments in a mailing).  Since the meeting is a very busy period,
>most comments could not have been read by very many committee
>members, and certainly not read very carefully.

Yes, that is essentially correct.  The tight schedule made it
difficult, in my opinion, to do justice to the second round
comments (which was the basis for my "no" vote).  They all did
receive consideration from at least a handful of committee
members, and in many cases that was sufficient since the issue
had been previously settled.  Questionable issues were brought
before the full committee.  A few public comment letters (for
example, ones dealing with internationalization or floating-
point issues) had suggested responses prepared in advance by
committee members with special expertise in the technical area;
even in those cases a committee subgroup discussed each issue.
I think proper procedure was followed; there just wasn't enough
time to study all comments carefully in advance.

I urge second round commentors to check the official responses
when they receive them to ascertain whether the committee appeared
to understand the essential point of each comment.  (However,
just because a suggestion was not adopted does not mean that it
wasn't understood!)  Just see whether the response makes sense.
Two committee members did review the responses, and in a few
cases modified the responses when they seemed inappropriate, but
we might have missed a few things..

gordan@maccs.McMaster.CA (gordan) (07/05/88)

Somebody pointed out a while ago that whitespace is sometimes needed in
programs.  For instance,

   a / *p

is not the same as

   a/*p



So type   0x47e + barney.

And smile.
-- 
                 Gordan Palameta
            uunet!mnetor!maccs!gordan

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (07/06/88)

In article <1287@maccs.McMaster.CA> gordan@maccs.UUCP () writes:
| Somebody pointed out a while ago that whitespace is sometimes needed in
| programs.  For instance,

| [ ... instance deleted ... ]

| So type   0x47e + barney.
| 
| And smile.
 And don't EVER use a pretty formatter again, because the option to
delete whitespace may be on. Maybe we should put (parenthesis)(on)
(every)(token).

It sounds as if the response is that the fix is complicated, so it won't
get done.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

msb@sq.uucp (Mark Brader) (07/06/88)

[If you see this article twice, my apologies]

The important question that hasn't been mentioned is this:

    How do existing compilers treat 0x47e+barney?

If -- as I would guess -- it is generally accepted, then the Standard should
accept it, and the current Draft needs a fix.

Without deep consideration I can't see why preprocessing numbers can't just
be assigned the same syntax as ordinary numbers.  Can't we have something
like
	preprocessing-number:
		floating-constant
		decimal-integer-constant
		hex-integer-constant
etc.?
(I'm winging this, but you get the idea.)

Mark Brader, Toronto		sed -e "s;??\\([-=(/)'<!>]\\);?\\\\?\\1;g"
utzoo!sq!msb, msb@sq.com	will fix them...	-- Karl Heuer

henry@utzoo.uucp (Henry Spencer) (07/08/88)

> Without deep consideration I can't see why preprocessing numbers can't just
> be assigned the same syntax as ordinary numbers...

Well, speaking as someone who has implemented a C lexical analyzer, I was
very happy when preprocessing numbers arrived.  Without them, you have to
implement lexical analysis of C numbers -- which are a baroque mess --
*twice*.  Why?  Because a lot of validity checks that are needed for real,
live numbers cannot be applied to preprocessing numbers, since abominations
like the # and ## operators may alter the tokens before the preprocessor
is finished with them.  Short of storing numbers in some complex broken-
down form -- and remember that those same two operators require that the
original text be recoverable! -- there is just no way to avoid repeating
the whole ugly lexical analysis when you've got the final tokens on hand.

Personally, my favorite solution to this involves eliminating the # and
## operators, by saturation thermonuclear bombing if nothing less will
kill them, but not everyone likes this idea.

turner@sdti.UUCP (Prescott K. Turner) (07/12/88)

>In article <1988Jul6.142014.6116@sq.uucp>, msb@sq.uucp (Mark Brader) writes:
>Without deep consideration I can't see why preprocessing numbers can't just
>be assigned the same syntax as ordinary numbers.

The first public review version of the standard seemed to do this.  But it
had a problem because this would cause
                                   1Ex
to be lexed into two tokens as
                                 {1}{Ex} 
whereas it also gave 1Ex as an example of something which gets lexed into
a single illegal token.

>The important question that hasn't been mentioned is this:
>    How do existing compilers treat 0x47e+barney?

How do existing compilers treat 1Ex?
--
Prescott K. Turner, Jr.
Software Development Technologies, Inc.
375 Dutton Rd., Sudbury, MA 01776 USA        (617) 443-5779
UUCP:...genrad!mrst!sdti!turner