rms@frosted-flakes.ai.mit.edu (Richard Stallman) (01/06/87)
These are comments I'm about to mail to CBEMA: I have implemented what I believe is a complete freestanding implementation of this draft, except for the insuperable problem described in item 1, some preprocessor features not yet finished, and some extensions which may be forbidden by 2.1.1.3. Disastrous Deficiencies. ITEM 1, 3.4. Arbitrary arithmetic and casts in static initializers cannot be implemented in most existing operating systems. In the Rationale, section 3.1.2, it is explained that extending the 31-character minimum significance to names with external linkage is not worth the price of requiring changes to existing assemblers and linkers. Concern is expressed for implementors of C compilers that must work with independently-maintained linkers and operating systems that they cannot change. Support for arbitrary casts and arithmetic in static initializers also requires changes to linkers. Consider int foo = ((int)&bar * 3) % 5001 | (int)&baz; The Rationale in 3.4 suggests that this initial value be computed and installed at run time. However, this is usually impossible. Just generating instructions to compute the value and store it is easy; the problem is how to cause them to be executed at a suitable time. The value of `foo' could be examined by code in a different source file before any function in this source file has been called. Only a special linker feature would make it possible for each separate compilation to specify code to be executed before `main' is called. If requiring linker changes is too great a price for fixing a flaw such as external name restrictions that bothers users greatly, then it is inconsistent to require them to fix a minor flaw that users do not care about. Therefore, add the following text to 3.4: The effect of using `&' in an initialization expression that is required to be constant is implementation-defined unless the `&' is the outermost operator in the expression or else appears within the operand of a `sizeof' operator. However, initialization expressions may use the standard `offsetof' macro with defined results, even in implementations in which the expansion of this macro makes use of the `&' operator in a fashion that does not fit the above rule. This allows constructs such &variable, &variable.component and &array[index]. ITEM 2, 2.2.4.2. Use an underscore prefix for library `#define's. The standard specifies many macro names such as `CHAR_BIT' for definition by the standard header files, and these macro names can potentially conflict with existing and future user programs. These names follow no system; they are just like the names recommended for programmers to define, so programmers must check each name they plan to use by looking it up in the full list of standard macro names. If the C standard, unchanging, were the only source of header files, this solution might be adequate. But operating systems such as Unix and Posix provide header files in profusion. They follow the lead of the C standard in choosing names. Unfortunately, these lists are long and programmers cannot know them all. It is not practical for programmers to avoid all the names in all the system header files. Granted that the only names that cause problems for an application program are those defined by header files it includes, this does not mean the problem can be solved by suggesting that programmers need only avoid the names in the header files they actually use. As a program evolves, it may need to use an additional header file, perhaps because it needs to use a library facility that it did not previously need. Yet it may already contain conflicting names, added by a programmer who was following this practice. In addition, operating systems evolve, and new names often need to be added to existing header files. What's worst, hosted implementations of the standard as written may be impossible because of this. Consider the identifier `read'. This is a Unix system call which has been rightly omitted from the C standard. Therefore, according to the standard, it may not be reserved by the implementation. But, in fact, any program that uses the standard input facilities `getchar' or `fread' will also get the `read' system call which they use, and any attempt in the program to define `read' in some other fashion will conflict. The C standard is an unequalled opportunity to establish a new convention for choosing system-defined names, one which will systematically separate them from names that application programmers should define. If the C standard adopts such a convention, operating system implementors will naturally follow it as well. The C standard is also defining many new names. If the problem is not solved now, these new names will make it more expensive to solve in the future. Changing the new names now is cheap because no programs use them yet. It is safe to retain a few traditional names that don't fit the new convention. For example, retaining `NULL' will do no harm because every C programmer already knows about `NULL'. I propose that *all* names defined by the standard be renamed with the addition of an initial underscore, with the exception of `NULL'. Those of the names that are traditional should be put in an explicit list of permitted synonyms. Standard header files should be permitted to define these synonyms as well as the recommended names. For example, stdio.h would be permitted to define `FILE' as well as the new name `_FILE'. This way, it is not required to remove the old names and break existing C programs in order to implement the standard. The existing programs C programs would remain conforming but could become strictly conforming only with name changes. Yes, this is brutal, but I don't see any other way to avoid chaos. Can you find another way? Important Deficiencies ITEM 3, 3.8.3.4. Nested macro problems. 3.8.3.4 says that after the macro arguments are substituted, the entire replacement token string is subject to further macro expansion. This, together with the separate preexpansion of macro arguments described in 3.8.3.1, appears to have the result that after #define h(x) 5+h(x) #define f(x) g(x) the string `f(h(y))' expands into `g(5+5+h(y))'. It took me a long time to figure out that the sentence These nonreplaced macro name tokens are not available for further replacement even if they are reexamined in contexts in which that macro name would otherwise have been replaced. in 3.8.3.4 was intended to apply to this case. It therefore needs to be clarified. (A friend of mine thought it had a completely different meaning: that once the macro name `h' was seen in the replacement for `h', the name `h' would effectively be undefined for the rest of the source file. He knew that this was ridiculous, and we both looked for another meaning, but we could not find one.) But with this rule, it becomes very difficult to implement a character-based preprocessor. Such a preprocessor has no way to distinguish an `h' that should not be replaced from another `h' that may be replaced. In a function-like macro it is possible to use the macro's name in its own definition without any special rule, simply by writing parentheses around the macro name where it occurs in the definition. This also avoids the problem described above. Thus, if the definition of `h' is rewritten as #define h(x) 5+(h)(x) then `f(h(y))' expands to `g(5+(h)(y))' even without the special rule. This makes the special nonrecursion rule mostly superfluous. I agree that it would be nice to allow macros' names to appear in their own expansions without causing infinite recursion, but in the light of these difficulties, and the ability to get the result without a special nonrecursion rule, I believe the nonrecursion rule should be removed by replacing the second paragraph of 3.8.3.4 with If the name of a macro appears within the replacement text of an expansion of the same macro, or in nested replacements resulting from that replacement text, the result is undefined, except in the case of a function-like macro whose name appears followed by a character other than `('. ITEM 4, 3.4. Don't forbid floating arithmetic in integral constant expressions. 3.4, combined with 2.1.1.3, appears to require compilers to report an error for things like char x[(int)(3.5)]; because floating constants are not in the list of what may appear in an integral constant expression. But in a compiler that does constant folding, `(int)3.5' will be changed into `3' before the consideration of the array declaration begins. There is no natural way to report an error for this code. Perhaps I have misunderstood 3.4. It also says that casts from arithmetic types to integer types are allowed, which appears to imply that there may be subexpressions whose type are arithmetic but not integral. Yet the list of allowed constructs does not include any subexpressions that could have floating type, making it superfluous to list the possibility of such casts. This suggests that perhaps the intent of the constraint was to allow any kind of constant expression of arithmetic type as long as it is used in a cast to integer type. If that was the intent of the constraint, it needs to be rewritten unambiguously, but that's not all. This meaning causes a serious problem because it implies that expressions such as `((float)((int)&foo | 38))' (where `foo' is static) are valid. Yet such expressions are impossible to compute at compile time so they cannot be implemented. I propose the following constraint for integral constant expressions: An integral constant expression is an expression of integral type which does not, except within a `sizeof' operator, refer to any function or variable or use the unary `*' or `&' operators or the postfix `[...]' or `(...)' operators. (Footnote: It follows that integral constant expressions contain no lvalues and cannot validly use the increment or assignment operators.) However, integral constant expressions may always use the standard `offsetof' macro even in implementations in which the expansion of this macro makes use of the `&' operator. ITEM 5, 3.8.3. Allow preprocessor to forget macro argument spelling. The constraints in 3.8.3 say that macro redefinitions must use the same spelling of arguments. This, combined with 2.1.1.3, implies that redefinitions that differ only in the spelling of the arguments must get an error. In other words, the preprocessor is required to remember the spelling of the arguments. There is no other reason why a preprocessor would record how the arguments of a macro were spelled after finishing processing the #define line. It is good to require strictly conforming programs to redefine only with the same argument spellings, because this makes it possible to use preprocessors that work in other ways. However, forbidding preprocessors to allow equivalent definitions with different argument spellings serves no useful purpose. The spirit of the constraints in 3.8.3 is to allow redefinitions that make no change and forbid those that would alter the meaning of the macro. A preprocessor that ignores the argument spellings when comparing definitions actually fits this spirit better than what is currently required by the standard. ITEM 6, 4.9.6.5. `sprintf' is unsuitable for robust programs. For most format strings, no fixed size of buffer is safe to use with `sprintf' because some data would cause it to overflow. This means that the usefulness of `sprintf' is limited to a few kinds of format strings. Moreover, the standard offers no robust way to do output formatting into buffers in memory. Rather than promoting this dangerous construct, the standard ought to define a similar function which accepts a buffer size as argument and guarantees not to write beyond that size. Thus, replace the text of 4.9.6.5 with the following: int snprintf (char *s, int len, const char *format, ...); The `snprintf' function is equivalent to `fprintf', except that the argument `s' specifies an array into which the generated output is to be written, rather than to a stream. No more than `len' characters of output are written to the array. The returned sum is the number of characters that the output would contain If the output would properly consist of fewer than `len' characters, then all the output is written to the array `s', followed by a null character that is not counted as part of the returned sum. In this case, the returned sum is less than `len'. If more than `len' characters of output would result from the specified format string and arguments, the first `len' characters of output are written to the array `s' without a terminating null character and the rest of the output is discarded. The returned sum is the number of characters that would be output if the array were big enough; therefore, it is greater than or equal to `len'. `vsprintf' has the same problem, so a `vsnprintf' function should be created along the same lines. Easy Minor Improvements. ITEM 7, 3.3.4. Allow casts to union type. For example, union foo { int x; double y; }; void bar (union foo); union foo hack () { bar ((union foo) 78); return (union foo) 1.3; } Right now it is necessary to assign temporary variables explicitly to construct a union to be passed or returned. This feature is very easy to implement in a way that would generate the same code that results from the explicit assignments now required. With a little more work, compilers can generate much better code that is hard to do now. This requires no new syntax, and its meaning is obvious. It cannot be hard to implement. It breaks no existing C programs. This change is done by changing 3.3.4's constraints as follows: If the type name specifies void type, the operand may be any expression other than a void expression. If the type name specifies a union type, the operand may be any expression whose type is that of any member of the union. Otherwise ... and add a section 3.2.2.4. A value of any type may be converted to a particular union type provided the value's type is the same as some member of the union. The result of the conversion is a value of union type such that access to such a member in it would yield the value converted. If multiple values of the union type could have this property, it is undefined which of them is the actual result. It would be even more convenient to be able to declare `real_bar' int real_bar (int x, union foo y); and then write `real_bar (1, 2)' or `real_bar (1, 1.5)'. This might merit an addition to the constraints of 3.3.2.2 as follows: The types shall be such that each formal parameter may be assigned the value of the corresponding argument; or else a formal parameter may be of union type and the corresponding argument of a type that can be converted to the union. ITEM 8, 3.3.8 and 3.3.9. Allow comparison of types such as `int *' and `const int *' that differ only in the presence or absence of `const' or `volatile' in the type pointed to. For example, the following code is currently invalid but should be valid. char *p; const char *q; if (p == q)... This change would parallel the handling of assignments. Add to the constraints of section 3.3.8 and 3.3.9: In addition, an expression that has type ``pointer to type without the const attribute'' may be compared with a pointer to a type with the const attribute. An expression that has type ``pointer to type without the volatile attribute'' may be compared with a pointer to a type with the volatile attribute. ITEM 9, 3.5.3.2. Allow the length of an array to be zero. An array of length zero is very useful in structures like this: struct table {int length; int contents[0]; } so you can do malloc (sizeof (struct table) + length * sizeof (int)) instead of having to use `(length - 1)'. Allowing this does does not alter the meaning of any conforming program or any existing C program, it gives a construct the meaning that C programmers would expect it to have by analogy, and it is very easy to implement. To make this change, alter the constraints of 3.5.3.2 (first sentence) as follows: The constant expression that specifies the type of the array shall have integral type and value greater than or equal to zero. ... Structures of zero length are also useful, in examples such as struct feature_for_next_year { }; struct forever { struct last_year a; struct this_year b; struct feature_for_next_year c; }; Here the idea is to add the structure `feature_for_next_year' to the program even though there is as yet no requirement to give it any members. The standard currently requires (in 3.5.2.1) at least one member in a structure. Making this change requires changing the syntax rules in 3.5.2.1. The Rationale, in section 4.10.3, speaks of the "theoretical disadvantage of requiring the concept of a zero-length object". However, there is no indication of what such a disadvantage might be. The examples above show why zero-length arrays are useful; the burden of proof is now on whoever wishes to show there is a reason not to allow them. I do not propose any change to the specifications of `malloc', etc., in 4.10.3 in connection with this. The currently specified behavior for size zero is adequate even when there are types of size zero. Significant Improvements. ITEM 10, 3.6. Provide a "frequency" statement to tell optimizing compilers which inner loops should be considered more important than the containing code. I know that the immediate first response will be, "Use #pragma." However, #pragma is unsuitable for this use for two reasons: 1. #pragma is not standard; therefore, one can never be sure that the same #pragma line will not have a completely different and disastrous meaning on some other implementation. It follows that no strictly conforming program can use #pragma. Yet it is desirable to be able to state frequency information in strictly conforming programs. 2. 3.8.3.4 says that the result of macro expansion is not taken as a preprocessor line even if it looks like one. This would appear to imply that macros cannot generate #pragma lines. (If that section is not intended to have this meaning, it should be clarified.) Being unable to put frequency information in code generated by macros is an undesirable restriction. What is really needed here is a standard construct that is guaranteed to mean either a standard thing or nothing (no affect on execution). I propose that `frequency (NN)' be syntactically equivalent `while (NN)' but have no effect on the execution of the abstract machine, serving only as a declaration that its body will be executed an average of NN times as often as the smallest containing `if', `while', `for' or `switch' statement. NN must be a constant expression, perhaps an integral constant expression or perhaps allowing floating point values. Examples of use include if (used == allocated) frequency (0) /* Initial size is almost always enough */ { allocated *= 2; ... call realloc ...; } and while (c = *p++) frequency (50) /* 50 is a typical length for these strings */ { ...} Instead of a new kind of statement, a new kind of declaration could be used. It would apply to the code within the block in which it is used. It could have the syntax . frequency (N); where the period at the beginning prevents conflict with any currently defined syntax; `frequency' would not be a keyword and could still be used as a variable name. The above examples, rewritten to use this construct, look like if (used == allocated) { .frequency 0; /* Initial size is almost always enough */ allocated *= 2; ... call realloc ...; } and while (c = *p++) { .frequency 50; /* 50 is a typical length for these strings */ ... } Someone else suggested that the construct `#pragma frequency N' be given a standard meaning and used for this. However, this is not a solution. It would resolve objection 1 to the use of `#pragma', but objection 2 would still stand. Omitted Issues. ITEM 11, 2.1.2.3. The standard ought to say more explicitly when aliasing can validly take place in a strictly conforming C program. When implementing an optimizing C compiler, the question always arises of when the compiler must assume that two pointers may be aliases. In some cases where the address of an object of block scope is taken, it might be possible to find all the places this address can reach, and then assume that no other pointers can alias with the object. However, this case is infrequent. Therefore, it is important to know what assumptions must be made by the compiler in other cases. One safe choice is to assume that any pointer that is not the address of a known static or automatic object may alias with any object. But the code that results from this assumption contains many instructions that humans know are wasteful. A compiler using a more restrictive rule would be much better if it were correct. I have heard suggestions of rules based on the types of objects involved. For example, one person who has read the standard suggests that casting a pointer to a different pointer type and accessing the object pointed to is always undefined, and that this could be the basis of aliasing determinations. But this is not true. If the pointer points to a member of a union, it could safely be cast to point to a different member of the union. This could make it defined to cast any pointer type to any other. Letting T1 and T2 be arbitrary types, here is an example that produces aliasing between them that is valid according to the current standard as far as I can tell. union { T1 a; T2 b;} myunion; .. foo (&myunion.a) ... foo (p) T1 *a; { .... (T2 *) a ... } One might consider a rule that a static scalar object which is not part of a union object cannot alias with a pointer pointing to any other type. But I cannot determine with certainty whether this rule is valid. I can see how to produce that kind of aliasing with an example such as static double foo; ... *(int *)&foo = 1; but I am not sure whether anything about the behavior of this example is defined. If all cases that produce such aliasing are undefined under the standard, then the rule is valid. I cannot tell whether this is so. I am not sure whether the standard implies that short in, out; { char *inptr, *outptr; inptr = (char *) ∈ outptr = (char *) &out; int i; for (i = 0; i < sizeof (short); i++) outptr[i] = inptr[i]; } is defined and equivalent to short out, in; out = in; If it does, the compiler may not assume that the previous value of `out' is still valid after the assignment to `outptr[i]'. I urge the committee to determine whether the standard implies that this rule, or some other rule of the same nature, can validly be used to assume aliasing is not taking place, and to state in the standard which rules are recommended. If this is not done, implementors will search, separately, for valid rules, thus duplicating effort. Some of these rules will make certain C programs not work as intended, and then users and implementors will argue inconclusively over whether the actual behavior violates the standard. Note that the first example in section 2.1.2.3 of the Rationale gives an example where this issue is relevant. If the variable `sum' is normally held in memory, keeping its value in a register during the loop will give incorrect results if `a' is equal to `&sum - 1'. Controversial Issues. ITEM 12, 2.1.1.3. Permit extensions. Very often the rationale says that certain proposals for new standard features were rejected because of a "lack of prior art". This by itself is good practice. It is wise not to include a feature in the standard when people don't yet see clearly what form it should take or whether it is truly useful. However, when 2.1.1.3 is added to these decisions, it has the effect of forbiding any subsequent art that would ever shed light on the matter. 2.1.1.3 should be amended so that documented extensions are allowed to give meaning to constructs that are invalid according to the standard: A conforming implementation shall produce at least one diagnostic message for every source file that contains a violation of any syntax rule or constraint, unless the violation is in accord with a documented extension of the implementation. Here is a list of many extensions that would be useful and interesting but are forbidden by the standard: 1. A preprocessor feature to test for the existence of an include file, such as the `definedfile' operator. 2. A preprocessor directive that allows defining a macro so that each time it is called it appends some text to the definition of another macro. Eventually the other macro would be expanded so as to get out the text previously appended to it. This is useful for making entries in a table of commands as the individual commands are defined. 3. Arrays of size zero. 4. Arrays whose sizes are not constant. 5. Aggregate initializer elements that are not constant. 6. A way of declaring labels with block scope. This is useful for certain macros. The only way to break out of nested loops in C is with a label; as a result, it is impossible to write a clean macro that expands into code containing such nested loops because if the macro is used twice in one function there will be a conflict of label names. 7. Compound statements within expressions. This would allow clean definition of safe macros to replace simple functions. Consider, for example, #define fmin(A,B) \ ({ double a = (A); double b = (B); return ((a < b) ? a : b); }) It would be natural to forbid gotos into the ({...}) construct by giving labels within it a local scope. This would enable the safe use of labels in macros. 8. String literals that are written with embedded significant newline characters. 9. Ranges in case statements. 10. Casts to union types. 11. Aggregate constructor expressions. ITEM 13, 2.1.1.3. It is unclear. The concept of "violation of any syntax rule" is unclear because the syntax rules used in the standard are generative, and the nature of generative grammar is to specify what is allowed, not what is prohibited. Thus, invalid syntax typically violates no particular rule, but rather fails to correspond to any rule. I am not sure whether 2.1.1.3 as now written forbids new syntax rules that give meaning to previously invalid syntax, such as a rule to allow new kinds of declarations: declaration: . <identifier> <integral-constant-expression>; (where <identifier> would be constrained to be one of a specific list of identifiers defined by the implementation which has this extension). ITEM 14, 2.1.1.3. The idea of erroneous program has been misapplied. To say that a certain construct is erroneous and must generate a diagnostic message has both advantages and disadvantages. The advantages are that it might prevent what would otherwise be mysterious unpredictable behavior, and that it helps keep unportable constructs supported by one implementation from creeping into programs which then become difficult to port. The disadvantages are that it restricts methods of implementation, interferes with improvement of the language, and can require a great deal of extra work. These advantages and disadvantages are always present, but not uniformly. In some cases the advantages are great while in others they are small. By adopting this position as a blanket policy rather than in the specific cases where the advantages are great, the standard can impose great burdens on implementors with no benefit to users. 2.1.1.3 does not help the applications programmer much because the worst kinds of unportability are the unspecified behaviors that abound in C--cases such as `foo (p++, p++)'. Suppose that one C compiler nonstandardly allows the size of an array to be zero. A programmer might start using zero-size arrays, and then on moving to another compiler the program would not work. But the trouble this would cause is limited by the fact that the other compiler would print an error message identifying where the zero-sized array was used. Changing the program would then be straightforward; a nuisance at worst. By contrast, if the programmer starts to use `foo (p++, p++)', he will get no diagnostic but much perplexity on moving to another compiler. It is disproportionate to pay any important price to protect the programmer from the nuisance resulting from some compiler's failure to support zero-length arrays while doing nothing to remove the real danger of ambiguous evaluation order. ITEM 15, 2.2.1.1. Eliminate the ?? trigraphs. The trigraphs, unlike the other internationalization changes, are not necessary. The belief that they are necessary comes from linking two independent questions: 1. Which character sets C programs can operate on properly. 2. Which character sets C programs can be written in. There is a great need for C programs to be able to operate on all the European character sets. The internationalization changes in the library make this possible. There is no such need to be able to write C programs in non-ASCII character sets. A program to interact with users in a French character set can do its job just as well if written in ASCII. The Rationale, in 2.2.1, says that the goal is to make sure that it is possible to translate a C translator written in C. This requirement can be met even if C programs must be written in ASCII, as long as every C translator can operate on ASCII. Now, if each C installation had to choose one character set and all C programs running on that installation were compelled to use this character set, this set would have to be that of the local country and therefore it would be necessary to be able to write C in all of those character sets. But C programs are not limited to operating on one character set. The internationalization changes in the library make it easy for C programs to choose even at run time among several supported character sets. As long as ASCII is always one of these supported character sets, C programs written in ASCII can be compiled everywhere. There is no need to be able to write C programs in anything else. Trigraphs cause an obscure problem in addition to the ones that are apparent at first glance: they make it much more difficult for programs to understand C syntax. Consider a text editor that needs to understand C syntax only to the extend of matching beginnings and ends of strings and balanced expressions. This becomes very difficult in the presence of the trigraph ??/. For example, complicated special-purpose code would be needed to be able to find the beginning of the string "foo? ??/" ??/??/", given the position of the end of the string. Most likely such editors will simply not support the use of trigraphs. A German friend who I asked says that his colleagues would rather use ASCII for their C programs, and if forced to use a German character set would rather have braces display as umlauted letters than use trigraphs. ITEM 16, 3.5.6. Allow variable elements in aggregate initializers. The constraints of this section, together with what 2.1.1.3 says about required diagnostics, appear to forbid the use of an extension in which the elements of initializers for automatic aggregates could be other than constant. The Rationale mentions no objection to this usage, even as a proposed standard feature, except for cases of the form int x[2] = { f(x[1]), g(x[0]) }; and appeals to the difficulty of writing rules that would exclude such cases or specify their order of evaluation. The Rationale thus implicitly assumes that such a feature would have to be accompanied by rules to eliminate the ambiguity of ordering, but gives no justification for this assumption, which goes against the spirit of C. The Rationale gives no arguments against the idea of including this feature without such rules. The ambiguities of examples such as the one given above are not new. They exist in C statements already. There are four possible orders of execution in this example, with different results: 1. f is called first (receiving garbage), and g receives f's result. 2. f is called first, both f and g receive garbage. 3. g is called first (receiving garbage), and f receives g's result. 4. g is called first, both f and g receive garbage. (It should also be noted that the example involves, for any possible order, passing an undefined value to at least one of `f' and `g'. Therefore, this particular initializer would probably be useless even if the order of evaluation could be predicted. Most of the unspecified cases share this property.) The following C statement, which is valid according to the standard, shows the same problem: (x[0] = f(x[1])) + (x[1] = f(x[0])); This statement contains no sequence points except for the function calls, so it has the same four possibilities. Since rejecting variable elements of aggregate initializers does not accomplish the goal of eliminating these ambiguities, variable elements should be allowed, with order of evaluation of such elements and the storing of the results all being unspecified. If that is not done, perhaps out of a desire to avoid requiring any additional features at this time, at least the rule in 2.1.1.3 that forbids this feature to be provided as extension should be relaxed. ITEM 17, 3.8.3. Don't say whether keywords can be #defined. It is not necessary to choose between allowing and forbidding the definition of keywords as macros. Another alternative is to make it undefined (or perhaps implementation-defined). I consider allowing macro definition of keywords to be somewhat preferable to making it undefined, but making it undefined greatly preferable to forbidding it. I expect that most fans of character-based preprocessors will share this feeling. Many of those who like token-based preprocessors are likely to have a similar attitude, in reverse: that forbidding definition of keywords is a little better than making them undefined, which is much better than allowing them. If this is how people feel, "undefined" gets a higher combined rating from the entire community than either "allowed" or "forbidden". It is a natural compromise. Clarifications Needed. ITEM 18, 2.1.1.2. Converting preprocessor tokens. Is the intention of step 7 that each preprocessor token is converted individually to normal tokens; that it is impossible for two adjacent preprocessor tokens such as `+' and `=' to form one normal token `+='? I believe this is what is meant, but it is not clear. ITEM 19, 2.2.2. The wording of the definition of `\f' should be changed. The current wording, by speaking in terms of moving the cursor on a display device, creates a spurious conflict with issues of user interface design. It is not desirable for an operating system to move the cursor in this way, when a formfeed character is output to a display by an ordinary program. Some operating systems do this, or clear the screen, but the only result is confusion for users when programs that were not designed for explicit display control output samples of the user's data that happen to contain formfeeds. This problem has nothing to do with the spirit of the standard, so a change in wording would make it go away. I propose \f (form feed) Is regarded as dividing a document into pages. ITEM 20, 2.2.4.2. Why no FLOAT_ROUNDS? The example of float.h values for IEEE standard floating point does not define FLOAT_ROUNDS. Is this an omission? What is the reason for not specifying the value FLOAT_ROUNDS should have in an implementation that does rounding? Is some specific application envisioned for conveying additional nonstandard information through the value of this variable? If so, it should be described in a footnote. If no such use is envisioned, then it would be better to specify the standard value `1' for implementations that round. Any nonstandard additional information could be conveyed by some other nonstandard name. ITEM 21, 3.1.2.5 says that `int' and `long' are different types even if they are identical in range. I expect that this is intended to have operational consequences for C translators but it is not clear what those consequences are. After long thought, I arrived at the idea that the intent might be to require an error message from the following fragment. int *foo; long *bar; foo = bar; If this is the intended meaning, it should be stated explicitly with an example. ITEM 22, 3.2.2.1. This says that arrays are coerced to pointers "where an lvalue is not permitted". I cannot find any coherent meaning for this statement. Lvalues are permitted (but so are other expressions) as operands to all the arithmetic operators, for example, but arrays are coerced in those places. Also, searching through the standard, I find that most if not all places that call for lvalues require modifiable lvalues, which excludes arrays. As far as I know, only within the `sizeof' and `&' operators is an array not converted to a pointer. If the intention is to specify these places, it would be cleaner to do so by listing them explicitly. ITEM 23, 3.3, paragraph 3. It would be natural to allow associative regrouping for `+' and `- together', as in `a - (b - c)' => `(a - b) + c' and `a - (b + c)' => `(a - b) - c'. But the wording in use seems to rule this out. However, at the end of section 3.3 in the Rationale it says that a decision was made against "extending grouping semantics [of unary plus] to the unary minus operator". This would seem to mean that regrouping through unary minus is permitted. In addition, it is conceptually simple to regard `a-b' as equivalent to `a+-b'. `(a + -b) + -c' can be regrouped into `a + (-b + -c)', so if it is not possible `(a - b) - c' into `(a + -(b + c))' the result is that users will be confused. If regrouping through unary and binary minus along with binary plus is not allowed, I believe this needs to be stated explicitly. I think it would be better to state explicitly that it is allowed, and here is how it might be done. Add to 3.3: ... are not changed by this regrouping. An expression consisting of the binary operator `-' applied to two expressions m1 and m2 may be regrouped as m1+(-(m2)), and an expression of the form m1+(-p1) may be regrouped as m1-p1. An expression of the form -(m1+m2) may be regrouped through the distributive law as (-(m1))+(-(m2)) and an expression of the form (-p1)+(-p2) may be regrouped as -(p1+p2). Here m1 and m2 stand for arbitrary multiplicative-expressions and p1 and p2 stand for arbitrary primary-expressions. To force a particular grouping... ITEM 24, 3.3.3.4. What is the value of `sizeof' applied to an array whose size has not been declared, as in this example: extern char x[]; ... sizeof x ... ITEM 25, 3.5.2.1. It is not clear whether an `unsigned int' bit field of fewer bits than the width of an `int' undergoes an integral promotion of type to type `int'. 3.2.1.1 suggests that it does. 3.5.2.1 suggests that it does not. It would be useful for both of these sections to state explicitly what happens to these bit fields. ITEM 26, 3.5.5 says, "two types are the same if they have the same ordered set of type specifiers and abstract declarators". Does this mean that `long unsigned int' and `unsigned long int' might be different? Must be different? Does this mean that in the following fragment, `x' and `y' have different types? typedef const int foo; volatile foo x; typedef volatile int bar; const bar y; I believe that a clarification is required. ITEM 27, 3.7.6. The examples here are misleading. By showing the use of a formal parameter declared as a pointer to a function and called with explicit use of `*', and also showing a formal parameter declared as function and called with no `*', they seem to suggest that the two choices (declaration and call) are coupled. But my reading of the standard says they are independent; either call will work with either declaration. It would be better to pick examples that don't suggest a nonexistent correlation. ITEM 28, 3.8. The constraints in this section appear to imply that comments are not allowed on preprocessor lines except at the end. I do not believe that is what it is intended to mean, because 3.8.3 contains examples in which comments appear within the replacement text in a #define. I propose adding the following footnote to the constraints of 3.8: The horizontal spaces allowed within a preprocessing directive include horizontal spaces resulting from the elimination of comments, which has taken place at an earlier phase of translation. ITEM 29, 3.8.3. The semantics portion says that initial and final whitespace are not considered part of a macro's replacement token list. This appears to imply that whitespace is significant and required not to be copied through by a preprocessor. As I understand it, whitespace other than newlines is not significant at the stage of preprocessor tokens, so this statement is misleading. What is worse is that the only way a separate preprocessor can make sure that two preprocessor tokens don't convert to one normal token is to output whitespace between them. Specifically this must be done at the beginning and end of a macro's replacement text. Thus, the semantics portion of 3.8.3 appears to forbid the only practical method of implementing the standard's specifications for token conversion (assuming that this is what the standard specifies; see the item above that refers to 2.1.1.2). I propose the following change to the semantics section of 3.8.3: Any whitespace characters preceding or following the replacement list of tokens are not considered significant when comparing macro definitions to determine the validity of a redefinition. ITEM 30, 3.8.3. Is there some motivation for not standardly supporting the use of empty macro arguments? If so, it would be useful to have a footnote explaining why they might fail to be straightforwardly handled by the mechanisms used to handle nonempty arguments. ITEM 31, 3.8.8. This would appear to forbid the common practice of predefining some macro names that identify the type of hardware and software that are in use. A.6.5.13 says that this practice is expected to continue. A.6.5 says that only names beginning with an underscore could be predefined. No two of the above can be true. Actually, the predefined names currently in use are undesirable because they do not begin with underscore. Since they are chosen by implementors, no one can predict what names might be predefined in some implementation. Thus, all names chosen by applications programmers are vulnerable to conflicts. However, the need for some way to indicate to the C program what kind of environment it is being compiled for is a great one and has to be filled somehow. It would be better to predefine names that begin with underscore for this purpose so as to avoid conflict with names chosen by applications programmers. ITEM 32, 4.9.4.1. The Rationale says that `remove' was defined because the definition of `unlink' is too Unix-specific. As far as I can see, `remove' may differ from `unlink' only in that certain behavior is implementation-defined. If so, it would have been just as good to call this `unlink'. Many traditional constructs are included in the standard with certain cases undefined. The definition of `remove' could be read as requiring that the file (not just one of its names) actually disappear immediately unless it is currently open. Under this interpretation, `remove' cannot be implemented on Unix systems. If the definition is read as requiring that only the specified name disappear, and that the file remain accessible under any other names it may have had, `remove' cannot be implemented on some other systems such as VMS. I think the definition of `remove' should clearly indicate that the choice between those two behaviors is implementation-defined. A similar clarification may be required for `rename'. ITEM 33, 4.9.4.3. Mention effect of abnormal termination on `tmpfile'. Elsewhere it is stated that it is not defined whether files created by `tmpfile' are removed on abnormal termination. This is a good specification, but it needs to be stated with the definition of `tmpfile'. The definition could now be read as implying that the file will be deleted on normal termination. ITEM 34, A.6.3.3. What does the "order of bits in a character" mean? I do not know what operational definition could be assigned to the order of bits in a character. Can this be explained? I know about the difference between byte ordering among machines, but I don't see that it constitutes any ordering of the bits within a byte.
faustus@ucbcad.berkeley.edu (Wayne A. Christopher) (01/06/87)
A lot of people have been worrying about the proliferation of names that don't begin with `_' which are pre-defined by the implementation. But seriously, we can't expect `read' to be re-defined as `_read' in UNIX -- the things that UNIX defines are going to stay defined. How many programmers have had serious problems with conflicts like this? The problem of non-standard identifiers in libraries isn't a problem at all, as I pointed out in a previous message (just make sure other library routines use "hidden" variations), and macros defined in header files usually will cause an error, so at least bugs aren't going to remain hidden because of this. Before we spend too much time fixing problems, let's make sure that they're problems in the first place. Wayne
jss@ulysses.homer.nj.att.com (Jerry Schwarz) (01/07/87)
In article <2144@brl-adm.ARPA> rms@frosted-flakes.ai.mit.edu makes many sensible comments on the ANSI proposal. Here are some comments on his comments. (My failure to comment on a particular ITEM does imply either agreement or disagreement.) > >ITEM 1, 3.4. Arbitrary arithmetic and casts in static initializers >cannot be implemented in most existing operating systems. > [extended discussion omitted.] > >Therefore, add the following text to 3.4: > > The effect of using `&' in an initialization expression that is > required to be constant is implementation-defined unless the `&' is > the outermost operator in the expression or else appears within the > operand of a `sizeof' operator. > > [ ... ] >This allows constructs such &variable, &variable.component >and &array[index]. I agree that there is a problem, but the proposed addition is inadequate. For example it still permits (short)array A simpler fix is to forbid casts from pointers to arithmetic types. The change required is to add "except casts from pointer types to arithmetic types," after "arbitrary casts" on line 18 page 48 >ITEM 5, 3.8.3. Allow preprocessor to forget macro argument spelling. > >The spirit of the constraints in 3.8.3 is to allow redefinitions >that make no change and forbid those that would alter the meaning >of the macro. A preprocessor that ignores the argument spellings >when comparing definitions actually fits this spirit better than >what is currently required by the standard. I disagree that this is the intention. In fact I think the opposite, namely the intention is to allow redefinition only when the redefinition is identical in every respect, not just meaning. In any event, if you just eliminate the phrase requiring spelling of parameters to be identical, the preprocessor will normally have to preserve the spelling of parameters that are used in order to test for identity of the replacement lists. Finding a clear concise change in the definition of equality of replacement lists might be tricky. >ITEM 8, 3.3.8 and 3.3.9. Allow comparison of types such as `int *' >and `const int *' that differ only in the presence or absence of >`const' or `volatile' in the type pointed to. > >For example, the following code is currently invalid but should be >valid. > > char *p; > const char *q; > > if (p == q)... > >This change would parallel the handling of assignments. > The problem is deeper. Suppose instead of the above, the code was char **p; const char** q; ... p==q ... This would be still be illegal under your suggestion. What is needed is a more careful examination of the notion of type equality. I'm not sure what such a proposal would be. If I work one out I will post it. >ITEM 11, 2.1.2.3. The standard ought to say more explicitly when >aliasing can validly take place in a strictly conforming C program. > Such a discussion might be useful in an appendix or the rationale, but not in the standard proper. Either such rules are already implied by the semantics, in which case they are redundant or these rules would contradict the semantics in which case we wouldn't know which whether the semantics or the rules were to govern. >I have heard suggestions of rules based on the types of objects >involved. For example, one person who has read the standard suggests >that casting a pointer to a different pointer type and accessing the >object pointed to is always undefined, > Looking at 3.3.4 it seems that although casts between pointer types are allowed, nothing explict about their meaning, beyond the general assertion that the cast converts the value is asserted. Although I cannot find any explicit language that requires it I think the intention is that, for example ... (char*)&obj ... should point to the first byte of "obj" whatever "obj's" type. This is certainly implicit in library functions like "memcpy" (although these use void*, rather than char*). Also the standard takes care takes with defining "bytes" and giving rules for layout in structures and unions. I don't think there are any syntactic rules of the kind you want. This makes life hard on compiler writters, but it is part of the "spirit of C". >ITEM 16, 3.5.6. Allow variable elements in aggregate initializers. > >The constraints of this section, together with what 2.1.1.3 >says about required diagnostics, appear to forbid the use of an >extension in which the elements of initializers for automatic >aggregates could be other than constant. > I don't think so. All 2.1.1.3 seems to require is that a warning message be generated if the extension is used. >ITEM 20, 2.2.4.2. Why no FLOAT_ROUNDS? > >The example of float.h values for IEEE standard floating point >does not define FLOAT_ROUNDS. Is this an omission? > The rationale explains this in 3.2.1.4. Briefly, IEEE chips use the same bit to control rounding of floating arithmetic and the conversion from floating to integral. Since C requires that the latter truncate an implementation might choose to have floating arithmetic truncate as well. >ITEM 22, 3.2.2.1. This says that arrays are coerced to pointers >"where an lvalue is not permitted". I cannot find any coherent >meaning for this statement. Lvalues are permitted (but so are other >expressions) as operands to all the arithmetic operators, for example, >but arrays are coerced in those places. I think it should read "where an lvalue is required". In describing the type constraints of the various kinds of expressions some sections of 3.3 assert "... shall be a modifiable lvalue" and others don't. Jerry Schwarz Bell Labs, Murray Hill ulysses!jss
ron@brl-sem.ARPA (Ron Natalie <ron>) (01/07/87)
In article <1202@ucbcad.berkeley.edu>, faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes: > A lot of people have been worrying about the proliferation of names > that don't begin with `_' which are pre-defined by the implementation. > But seriously, we can't expect `read' to be re-defined as `_read' in > UNIX -- the things that UNIX defines are going to stay defined. How > many programmers have had serious problems with conflicts like this? Easy, UNIX can have "read" in libc (or liba for those running Version 6). It is just prohibitted that any of the Standard C routines such as PRINTF use "read." That way if a user defines his own function called read, he doesn't break any calls he made to the "Standard Set of Routines." -Ron
mat@mtx5a.UUCP (01/13/87)
> These are comments I'm about to mail to CBEMA: > > I have implemented what I believe is a complete freestanding > implementation of this draft, except for the insuperable problem > described in item 1, ... > > Disastrous Deficiencies. > > ... > Support for arbitrary casts and arithmetic in static initializers > also requires changes to linkers. Consider > > int foo = ((int)&bar * 3) % 5001 | (int)&baz; > > The Rationale in 3.4 suggests that this initial value be computed and > installed at run time. However, this is usually impossible. Just > generating instructions to compute the value and store it is easy; the > problem is how to cause them to be executed at a suitable time. The > value of `foo' could be examined by code in a different source file > before any function in this source file has been called. Only a > special linker feature would make it possible for each separate > compilation to specify code to be executed before `main' is called. This is the problem faced by C++ in dealing with ``static constructors.'' Solutions are being found; they are generally dependent upon the machine environment, but the problem is not insurmountable, at least on the machines that C++ has thus far been ported to. I suspect that there is even a solution for the HP3000, and that is a bad machine to write language systems for (the linker must be a trusted program to protect the system.) -- from Mole End Mark Terribile (scrape .. dig ) mtx5b!mat (Please mail to mtx5b!mat, NOT mtx5a! mat, or to mtx5a!mtx5b!mat) (mtx5b!mole-end!mat will also reach me) ,.. .,, ,,, ..,***_*.