phr@mit-hermes.UUCP (01/22/87)
--text follows this line-- [This is from RMS. Please mail responses to rms@prep.ai.mit.edu since he doesn't read this newsgroup.] >Support for arbitrary casts and arithmetic in static initializers >also requires changes to linkers. Consider > int foo = ((int)&bar * 3) % 5001 | (int)&baz; There would seem to be the problem that RMS describes only if the implementation chooses to support casts of pointers to arithmetic types. However, the draft standard does not require this (3.3.4, Semantics). Any use of such address arithmetic is system-dependent. It would perhaps be worth adding a footnote to 3.4, Constraints or 3.3.4, Semantics to the effect that support for such pointer casts implies full address arithmetic support for external linkage; certainly this should be pointed out in the Rationale. If I read this right, it proposes to make it clearer that the standard requires a choice between 1) changing the linker, which is impossible, and 2) ceasing to support casts from pointer types to int, thus becoming unable to compile most existing Unix C programs. Neither of these choices is acceptable. Every C compiler except on the few systems with unusually powerful linkers will be forced to disregard the standard rather than choose either of them. This is a serious problem, and it requires a solution, not a clarification. >... serious problem because it implies that expressions such as >`((float)((int)&foo | 38))' (where `foo' is static) are valid. I discussed this above. The draft standard does not require support for casting pointers to other types. The conclusion was omitted from this statement, but I believe it means, "Since the standard doesn't require support for casting pointers to other types, you shouldn't complain if the standard has requirements that cause trouble if you *do* support such casts. Just don't allow casting pointers to int if you can't meet the requirements." But look at the consequence of these requirements: 3.3.4 says that the result of casting a pointer to an int is implementation-defined. It would follow that in any implementation either 1) the implementation defines that you can cast a pointer to an int, and static int foo; char x[(int)&foo]; is a legitimate declaration, or 2) the implementation defines that you can't cast a pointer to an int, and any attempt to do so, *in any context*, gets an error message. Nobody can implement 1; therefore, all implementations must choose 2. Even though the standard says that implementations may allow casts from pointer to int, it contains other requirements which effectively forbid them! [Regarding reserved macro names and a proposed `_' prefix for them] Sure. Keep a list of reserved names nearby while programming... "A list". But which list? The list of names reserved by ANSI C? That solves nothing; a name conflict with a file such as termio.h or files.h or types.h or sys/machine.h or sys/param.h potentially causes just as much trouble as a conflict with float.h. (I'd be much more likely to include sys/param.h at a future time than float.h.) The list of reserved words for all the header files defined by the system I am using? Such a list might come with the system, but it doesn't solve the problem. Names that aren't reserved on that system could be reserved by other systems I don't know about. That won't hurt me, but it could hurt someone else. Perhaps the system I am using doesn't have TCSETA. So it would not be in the list that I look at, and I might use the name, not knowing that most of you use a system where TCSETA is pseudo-reserved. Then my program is distributed everywhere, and one day you want to change it to do a little terminal control. You add #include <termio.h>. Surprise! What I would need is a list of all the names defined in headers in ANY system. Such a list is not easy to maintain. These system-defined reserved macro names are not part of ANSI C at all. I think some of the designers of ANSI C think that means they can ignore the problem that they cause. That would be true if the purpose of ANSI C were to evade blame for problems. The committee would simply say, "From the point of view of ANSI C, that file termio.h which you are having trouble with is just a user program. It's not our fault that it contains symbols that conflict with the ones used in your C program." However, the ANSI C standard will help C programmers a lot more by *solving* problems than by shifting the blame for them. [Use of (int)3.5 in integral constant expressions.] Since the operands can include casts of arithmetic types to integral types, and because floating constants have some floating-point type (3.1.3.1, Semantics), (int)3.5 is okay. I think it is reasonable to say that (int)3.5 is okay, but 3.4 does not unambiguously say so. The kind of reasoning quoted above is sensible, but we cannot rely on it when interpreting the standard, so the standard must be written so as to be clear without the need for such reasoning. The reason that we cannot rely on such reasoning is that it is too easy for people to use it to get conflicting conclusions. I give an example farther on of how the same reasoning can be used to derive a conclusion that contradicts other parts of the standard. [Casting to union types.] The problem is, a union is not a scalar type. In general, it cannot be. To make this suggestion work, we would have to distinguish between scalar-unions and aggregate-unions, and for symmetry the same should be done for structures. It is true that a union type is not a scalar type, but I don't see why this constitutes a problem. I am not proposing to change the fact that union types are not scalar types. Given the declarations struct foo {int a, b} structure; union bar {struct foo x; double y;}; what I am proposing is to make it possible to cast either a `double' or a 'struct foo' into a `union bar'. It is true that `double' is a scalar type and the struct and union are aggregate types, but I do not see how that causes either a conceptual problem or an implementation problem with code such as `(union bar) structure' or `(union bar) 4.3'. [Comparison of an (int *) with a (const int *).] >ITEM 8, 3.3.8 and 3.3.9. Allow comparison of types such as `int *' >and `const int *' that differ only in the presence or absence of >`const' or `volatile' in the type pointed to. I don't believe that there is any constraint with regard to this now. Certainly const and volatile are not part of the object's type, and the constraint is only that the pointers be to objects of the same type. Ok, I now see the subtlety of saying that the pointers are "to objects of the same type". I think that the implications of this for comparisons should be stated explicitly in the section on comparisons. However, some of the people writing the standard didn't always keep this distinction in mind. 3.3.8 says, "both may be pointers to objects that have the same type." 3.3.9 says, "both may be pointers of the same type." Therefore, the standard allows (int *) x < (const int *) y but forbids (int *) x == (const int *) y [Regarding zero-length arrays] The spirit of C is that a construct that is useful in some cases should not be forbidden entirely just because there are other contexts where it makes no sense. This runs afoul of the common idiom for the number of elements in an array: sizeof array / sizeof array[0], since sizeof array[0] might be 0. This is not a problem. It is true that `sizeof array / sizeof array[0]' cannot produce any useful value (and might be undefined, or cause a trap) when `array' is an array of zero-size objects, but it is not a *problem*. Actual uses of zero-size objects will not run afoul of this idiom because there will be no reason to try to use the idiom. For logical consistency, malloc(0) should return an actual storage address of a zero-sized object (furthermore, each such object should be at a different address). To demand such consistency is against the spirit of C, because it is not important in practice. In actual use, no one will try to malloc a zero-length object, but rather will use them as parts of other structures that have nonzero overall length. Therefore, no one will care, in connection with zero length objects, what malloc(0) does. Therefore, I propose to leave malloc(0) unchanged, so that it does what is best for whatever other reasons there are, while adding zero-length object types. [Confusion about #pragma] Use of #pragma is not permitted to alter the virtual-machine semantics of the code, so it isn't totally unsafe. 3.8.6 says, "a #pragma.. causes the implementation to behave in an implementation-defined manner." It says nothing about what kinds of effects there may be on the semantics. I can't find anything that restricts what #pragma can do. If the committee intends to require that #pragma not affect the meaning of a program, the standard should say so. [Aliasing] >If this is not done, implementors will search, separately, for valid >rules, thus duplicating effort. Some of these rules will make certain >C programs not work as intended, and then users and implementors will >argue inconclusively over whether the actual behavior violates the >standard. Not conforming to the operation of the abstract machine described in the draft standard is definitely a compiler bug; if some characteristic of the abstract machine is left undefined, then it is folly to rely on it. What is explicitly stated above is something we all agree with, but it does not address the problem I am talking about. I think that the reply does address the right problem, but implicitly: by emphasizing what is to be done in the two clear extremes (clear nonconformance and clear undefinedness), it claims that one of the two will always clearly apply in any actual case. I disagree completely. Maybe the implications of the standard are clear to Gwyn, but they aren't clear to everyone else. And no two people seem to agree on exactly what they are. This very discussion proves it's not clear. Here is my scenario in greater detail: A user program does not compile as intended. The user points to one part of the standard and says that the compiler is not conforming to the operation of the abstract machine. The compiler implementor points to another part of the standard and claims to show that the compiler is conforming and the user is relying on something undefined. Both arguments will be plausible, but not conclusive. What is to be done about this? I say, make the standard address aliasing issues explicitly so that unclear cases will be rare. I also wonder just how much typical code would be sped up by the sort of optimizer you're concerned about. I think this is another case of worrying about "microefficiency", which is usually misplaced concern. My reading of output from my compiler gives me the feeling that it's important. I see many repeated fetches of a value from memory after a store elsewhere in memory. I know there is no aliasing and the old common subexpression is still valid, but the compiler can't prove it. Some short loops could be sped up by sizable fractions. This doesn't seem much different to me from what happens for i += i; The wise programmer avoids writing code that could fall into such situations. Maybe he does, but that's not the question. The question is, "What does the standard require a C compiler to do when it gets such a program?" IF the standard doesn't specify the behavior, then the compiler writer can disregard the effects of his optimizations on such code because wise programmers won't write like that anyway. If the standard specifies the behavior, then the compiler writer can't do so. I appeal to the committee to make it clear, in the standard, whether the behavior is specified or not. If this isn't sufficient for pointers, I would like to know why. Note that not-specifying something is sometimes a deliberate decision intended to avoid unduly constraining implementations in non-crucial areas. There is no need to persuade me that it might be desirable to make this unspecified behavior. Question is, *is it* unspecified in the current draft? Not as far as I can tell. Also, while the specific example I used was something that no wise programmer would write, I used that example only because it appears verbatim in the Rationale. I don't see a clean rule to distinguish those cases from other cases that programmers do use. At least, the standard doesn't *give* such a rule. And the standard is where the rule has to be. [Unclarity re Converting preprocessor tokens] >ITEM 18, 2.1.1.2. Converting preprocessor tokens. + is already an operator, therefore a preprocessing token, therefore in step 7 it is converted to a normal token, by itself (not combined with other pp tokens such as =). I agree with what you say, but it the standard should make this unmistakeably clear without need for you to add explanations. [Severe unclarity regarding coercing arrays] >ITEM 22, 3.2.2.1. This says that arrays are coerced to pointers >"where an lvalue is not permitted". This is not only correct, it is absolutely essential. It is not correct. For example, lvalues are permitted as operands of the binary `+' operator. According to the standard, it would follow that arrays are not coerced to pointers when used as operands of `+'. I think there is no disagreement in the C community on when arrays should be coerced. But the words of the standard today disagree totally with the community and make no sense. [Unsigned bit-fields] >ITEM 25, 3.5.2.1. It is not clear whether an `unsigned int' bit field >of fewer bits than the width of an `int' undergoes an integral >promotion of type to type `int'. 3.2.1.1 suggests that it does. >3.5.2.1 suggests that it does not. Each bit field already has a type: int, unsigned int, or signed int, as specified in 3.5.2.1, Semantics. Conversions in expressions occur by the usual rules given for these types. I follow this reasoning. This is why 3.5.2.1 seems to imply that an unsigned int bit field is treated as an unsigned int. However, 3.2.1.1 says. "A char, a short int, or an int bit-field, or their signed or unsigned varieties, may be used in an expression wherever an int may be used. In all cases, the value is converted to an int if an int can represent all the values of the original type." If we apply the kind of reasoning quoted in a previous section as a justification for the validity of `char x[(int)3.5]', we can say that the presence of bit-fields in the list implies that it makes a difference that they are listed. The only way it can make a difference is if bit-fields were promoted according to the rules stated, as if they had short integer types. Therefore, an unsigned int bit-field with fewer bits than an int must promote to unsigned int. If bit-fields are not expected to receive this special treatment, and are treated as having the type they are declared with, then they should not be mentioned specifically in 3.2.1.1. However, I hope I have demonstrated the unreliability of a certain sort of reasoning about the standard, and why the standard needs to be written so that its consequences are clear without such reasoning.
gwyn@brl-smoke.UUCP (01/24/87)
(Reminder: this is not an official X3J11 response.) Thanks to RMS again for additional discussion of these matters. I believe that he has indeed uncovered several potential areas of ambiguity or confusion that the committee should address. I wish he had been able to participate in formulating the draft, but am happy that he has taken the trouble to scrutinize it so carefully. Hopefully the next publication will be "good enough" for use as the initial official C standard. (People wanting extensions could prepare implementations of them and develop supporting evidence for their desirability for the next standard, which would probably be at least 5 years after the initial one.) I really don't know what we will end up doing about the linker external-symbol arithmetic issue. Dave Prosser remarked to me that even extern int foo; static char *bar = (char *)&foo; are impossible for some linkers. It may well turn out that C a la X3J11 will pretty much force initializer via "constructor thunks" (performed once, at run-time start-off or upon first access to the object). I sure hope we can avoid demanding that. However, I don't know how to formulate appropriate (universal) restrictions on initializers to prevent this (other than outlawing initializing with addresses of externs, which we really do need to have). Suggestions for this are solicited.
gwyn@brl-smoke.UUCP (01/24/87)
In article <5556@brl-smoke.ARPA> I wrote: > extern int foo; > static char *bar = (char *)&foo; >are impossible for some linkers. Prosser actually had in mind something like: extern int foo; static char bar = (char)(long)&foo; but it looks more useless than the other example. Don't blame him for my changing the example!