[mod.std.c] mod.std.c Digest V10#3

osd@hou2d.UUCP (Orlando Sotomayor-Diaz) (09/19/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c>


mod.std.c Digest            Wed, 18 Sep 85       Volume 10 : Issue   3

Today's Topics:
                Comments on draft C standard (General)
                        Comments on Section B
                        Comments on Section C
----------------------------------------------------------------------

Date: Mon, 9 Sep 85 16:48:16 mdt
From: ihnp4!alberta!myrias!cg (Chris Gray)
Subject: Comments on draft C standard (General)
To: alberta!ihnp4!cbosgd!std-c

Why not add some definitions and use them throughout:

charspace: any number of ' ', '\t', '\b'
linespace: one of '\n', '\r', '\v', \'f'
whitespace: any amount of charspace intermixed with comments (which are
    allowed to contain linespace)
Now, are there any places in C where charspace is allowed, but whitespace
isn't? Should there be? (My definitions don't match with the draft's, but
at least they are consistent.) (My intent is that the preprocessor grammar
use linespace at the end of its productions.) (discussed a bit July 16)

------------------------------

Date: Mon, 9 Sep 85 16:48:16 mdt
From: ihnp4!alberta!myrias!cg (Chris Gray)
Subject: Comments on Section B
To: alberta!ihnp4!cbosgd!std-c

B.1.1.2 Translation phases

Wouldn't it be better to NOT delete all newline backslash sequences, but
rather to specify those places where a backslash token followed by a
newline token can be deleted (macro bodies, macro calls, strings)? The
current definition allows them inside keywords, identifiers, character
constants, preprocessor lines, etc. This flexibility doesn't buy anything.

The only use I can see for separating steps 3 and 4 is the special parsing
of #include file names using angle brackets. What am I missing which
requires character constants, string literals, and comments to be done
specially (other than that they allow newlines in them)? Also, given that
#include has to be fudged anyway, why not allow the rules that some older
compilers did, such as the file name (including delimiters) extending from
the first occurrence of the opening delimiter to the LAST occurrence of
the closing delimiter? Thus I could say

	#include <B0:<XYZ>.3>

and get file name   B0:<XYZ>.3   which might be valid (and maybe even
needed) on some wierd system. The same special processing is needed
for #pragma's as well. This should be stated under step 3.

Step 6 mentions newline characters. What newline characters? After step
4 there aren't any.

In step 6, the current rules indicate that adjacent string literals are
concatenated. Do we really intend that to happen if, by some chance
(or due to a programmer that should be shot), the last token in a
#include file is a string and the first token on the line after the
#include is another string? A compiler will need some sort of indication
of file transitions in order to produce useful error messages, so
disallowing this shouldn't be much of a burden.

Step 6's retokenization is a bit unclear. In order to retokenize the
source, it must first (conceptually at least) be untokenized. To preserve
meaning, some pairs of tokens must have spaces added between them, but
tokens concatenated by ## explicitly don't have this done to them. Perhaps
the step could be reworded to say that character sequences resulting from
token concatenation are retokenized according to the normal tokenization
rules. Another unclear aspect is that of exactly what happens when two
tokens are concatenated - if the input tokens (perhaps coming from
macro expansion) were 100L and 33L, I gather the result is NOT 10033L.
Tokenization is often an information-losing process. It might be better
to state exactly what all combinations are supported for ## and what
they do. (e.g. what does   33L ## 25   yield? Does the size of the
target ints affect what happens (does the tokenizer have to distinguish
between a number being long because it doesn't fit the target int v.s.
having 'L' on the end?)

B.2.1 Character sets

Perhaps should state that other things that look like trigraphs are
not, and do not produce any error messages. (People who use things
like "p < 0 ??????" would be upset, otherwise.)

B.2.2 Character display semantics

If you're gonna make backspacing past the beginning of a line undefined,
then printing past the end of the line should be as well. Thus the first
paragraph should end in something like " if there is a next position on
the current line, else the effect is undefined".  (mentioned ~ Jun 30)

------------------------------

Date: Mon, 9 Sep 85 16:48:16 mdt
From: ihnp4!alberta!myrias!cg (Chris Gray)
Subject: Comments on Section C
To: alberta!ihnp4!cbosgd!std-c

C.1 Lexical Elements

Types of tokens not including white-space conflicts with B.1.1.2 which
talks about white-space tokens.

C.1.2 Identifiers - semantics

Just when identifiers defined as macros are replaced by their bodies
is a lot more complicated than stated. The replacement can be inhibited
by the "defined" construct and by the fact that the macro name is being
produced by its own expansion.

C.1.2.5 Types

page 17, near top. Are not unions also classed as aggregates?

C.1.4 String literals - semantics

"Adjacent string literals" should be defined better. Consider:
    #define BLAH   "hello"
    "there"
I would imagine the intent is that the strings are NOT concatenated.
(It all works better if it's explicitly stated that string concatenation
isn't done until AFTER preprocessing.)

C.8 Syntax & Constraints

Again, the discussion concerning the newline character is not
appropriate, since, according to section B.1.1.2 there won't be
any when preprocessing is done (they have been turned into newline
TOKENs).

C.8 Semantics

Given that tokenization has been done, and that tokenization removed
all sign of comments, it would appear that comments are allowed
before the '#' and between it and the preprocessor command.

C.8.1 Source file inclusion

The form   # identifier new-line   is stated to allow the identifier to
expand into either form (".." or <..>). Given that the macro body was
tokenized just like everything else, the second form is impossible. E.g.

	#define STANDARDINCLUDE	<stdio.h>
	...
	#include STANDARDINCLUDE

would result in trying to process

	#-token include <-token stdio .-token h >-token

Also, given that an identifier must be expanded, it's no harder to allow
a macro call with parameters.

C.8.2 Macro Replacement

In the third last paragraph on page 62 - "white space preceding the first
token or following the last token is deleted." What does this mean?
I conclude that the intent is that the string generated should have no
space before the generated representation of the first token or after
the representation of the last token. What happens if one of the tokens
is a string - are it's quotes conceptually removed and it's body used
in the generated string, or are it's quotes escaped and included in the
generated string? The third alternative (which is easy if the preprocessing
is done using only characters, and not tokens) is to effectively
retokenize the resulting character sequence - what was in a string
before is now outside of one - ughh!

Are newline tokens allowed in the parameter list? They need not be, given
that backslash-newline pairs were previously deleted, but see my earlier
comment regarding that.

The last paragraph on page 62 and the first on page 63 are unclear. How
about:

"... The token sequence resulting from the macro expansion can be divided
into two parts - those tokens coming directly from the macro body, and
those coming from the macro parameters. Macro calls in the former are
expanded only if they are calls to a different macro. All macro calls in
the latter are expanded.

   After all such replacements have taken place... ... to a single token.
The result is not reprocessed as a preprocessing directive, even if it
resembles one."

This is still a bit unclear. Is the intent that macro calls that are
recursive through other macros be not expanded? Also, what if an inner
macro call is generated as a result of concatenating the two kinds of
replacement tokens - what rules does the expansion follow?

Again, token concatenation should be explained in more detail. For example,
newline tokens must not be concatenated with anything (it doesn't make
sense). I suggest putting in a table of those that ARE supported; e.g.

	string	string	(yields string)		(redundant)
	char	char	(yields string)
	string	char	(yields string)
	char	string	(yields string)
	string	int	(yields string, use decimal form, no 'L' or 'U')
	char	int	(yields string)
	int	int	(decimal forms, some sort of rules for
			 handling 'L' and 'U' combinations and
			 for out-of-range problems)
	id	id	(yields id)
	id	int	(yields id)

I vote for not allowing operator concatenation, and other funny things
that only lead to unreadable, unportable programs. Also note that allowing
id ## string => id  can result in illegal ids. This may in fact be useful,
but has implications for external character sets, etc.

Is there any reason for restricting the '#' enstringing (?) operator to
inside macro bodies?

The example given on page 63:

#define f(x) f(a * (x))

adds a lot to the complexity of macro expansion and results in nothing
except unreadable code. Programmers who use it should be shot. A much
more readable form of modifying a function call would be:

#define FUNC(x) func(a * (x))

Here at least the reader of the program has some warning that calls to
FUNC may not be quite what they seem to be. (For much the same reason,
I am opposed to ANY macro names that contain lower case letters. Thus,
I would suggest that <stdefs.h> define ERRNO, not errno. The others
I won't argue about too much, since they seem destined to become part
of the language, and the programmer/reader must be aware of all of them.)

C.8.4 Line control

Why not allow macro expansion here as well? It's conceivable that some
processor might put out #line directives in standard positions, but that
in some cases it doesn't have a new value to give and would want the
effect to be nil. In that case, allowing

	#line __LINE__ __FILE__

would be nice. In fact, allowing full macros with parameters here is no
harder than similar things on #include.

------------------------------

End of mod.std.c Digest - Wed, 18 Sep 85 18:58:36 EDT
******************************
USENET -> posting only through cbosgd!std-c.
ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C)
In all cases, you may also reply to the author(s) above.