[comp.lang.c] RMS's reply to Doug Gwyn's reply to RMS's comments on ANSI C

phr@mit-hermes.UUCP (01/22/87)

--text follows this line--
[This is from RMS.  Please mail responses to rms@prep.ai.mit.edu since
he doesn't read this newsgroup.]

    >Support for arbitrary casts and arithmetic in static initializers
    >also requires changes to linkers.  Consider
    >    int foo = ((int)&bar * 3) % 5001 | (int)&baz;

    There would seem to be the problem that RMS describes only if the
    implementation chooses to support casts of pointers to arithmetic
    types.  However, the draft standard does not require this (3.3.4,
    Semantics).  Any use of such address arithmetic is system-dependent.
    It would perhaps be worth adding a footnote to 3.4, Constraints or
    3.3.4, Semantics to the effect that support for such pointer casts
    implies full address arithmetic support for external linkage;
    certainly this should be pointed out in the Rationale.

If I read this right, it proposes to make it clearer that the
standard requires a choice between
1) changing the linker, which is impossible, and
2) ceasing to support casts from pointer types to int,
   thus becoming unable to compile most existing Unix C programs.

Neither of these choices is acceptable.  Every C compiler except on
the few systems with unusually powerful linkers will be forced to
disregard the standard rather than choose either of them.

This is a serious problem, and it requires a solution, not a clarification.

    >... serious problem because it implies that expressions such as
    >`((float)((int)&foo | 38))' (where `foo' is static) are valid.

    I discussed this above.  The draft standard does not require
    support for casting pointers to other types.

The conclusion was omitted from this statement, but I believe it
means, "Since the standard doesn't require support for casting
pointers to other types, you shouldn't complain if the standard has
requirements that cause trouble if you *do* support such casts.  Just
don't allow casting pointers to int if you can't meet the
requirements."

But look at the consequence of these requirements:

3.3.4 says that the result of casting a pointer to an int is
implementation-defined.  It would follow that in any implementation
either

1) the implementation defines that you can cast a pointer to an int, and
       static int foo;
       char x[(int)&foo];
   is a legitimate declaration, or

2) the implementation defines that you can't cast a pointer to an int,
   and any attempt to do so, *in any context*, gets an error message.

Nobody can implement 1; therefore, all implementations must choose 2.
Even though the standard says that implementations may allow casts from
pointer to int, it contains other requirements which effectively forbid them!


[Regarding reserved macro names and a proposed `_' prefix for them]

    Sure.  Keep a list of reserved names nearby while programming...

"A list".  But which list?  The list of names reserved by ANSI C?
That solves nothing; a name conflict with a file such as termio.h or
files.h or types.h or sys/machine.h or sys/param.h potentially causes
just as much trouble as a conflict with float.h.  (I'd be much more
likely to include sys/param.h at a future time than float.h.)

The list of reserved words for all the header files defined by the
system I am using?  Such a list might come with the system, but it
doesn't solve the problem.  Names that aren't reserved on that
system could be reserved by other systems I don't know about.
That won't hurt me, but it could hurt someone else.

Perhaps the system I am using doesn't have TCSETA.  So it would not be
in the list that I look at, and I might use the name, not knowing that
most of you use a system where TCSETA is pseudo-reserved.  Then my
program is distributed everywhere, and one day you want to change it
to do a little terminal control.  You add #include <termio.h>.
Surprise!

What I would need is a list of all the names defined in headers in ANY
system.

Such a list is not easy to maintain.

These system-defined reserved macro names are not part of ANSI C at
all.  I think some of the designers of ANSI C think that means they
can ignore the problem that they cause.

That would be true if the purpose of ANSI C were to evade blame for
problems.  The committee would simply say, "From the point of view of
ANSI C, that file termio.h which you are having trouble with is just a
user program.  It's not our fault that it contains symbols that
conflict with the ones used in your C program."

However, the ANSI C standard will help C programmers a lot more
by *solving* problems than by shifting the blame for them.


[Use of (int)3.5 in integral constant expressions.]

    Since the operands can include casts of arithmetic types to
    integral types, and because floating constants have some
    floating-point type (3.1.3.1, Semantics), (int)3.5 is okay.

I think it is reasonable to say that (int)3.5 is okay,
but 3.4 does not unambiguously say so.  The kind of reasoning
quoted above is sensible, but we cannot rely on it when
interpreting the standard, so the standard must be written
so as to be clear without the need for such reasoning.

The reason that we cannot rely on such reasoning is that it
is too easy for people to use it to get conflicting conclusions.
I give an example farther on of how the same reasoning can be
used to derive a conclusion that contradicts other parts of
the standard.


[Casting to union types.]

    The problem is, a union is not a scalar type.  In general,
    it cannot be.  To make this suggestion work, we would have
    to distinguish between scalar-unions and aggregate-unions,
    and for symmetry the same should be done for structures.

It is true that a union type is not a scalar type, but I don't
see why this constitutes a problem.  I am not proposing to change
the fact that union types are not scalar types.

Given the declarations

   struct foo {int a, b} structure;
   union bar {struct foo x; double y;};

what I am proposing is to make it possible to cast either a `double'
or a 'struct foo' into a `union bar'.

It is true that `double' is a scalar type and the struct and union are
aggregate types, but I do not see how that causes either a conceptual
problem or an implementation problem with code such as `(union bar)
structure' or `(union bar) 4.3'.


[Comparison of an (int *) with a (const int *).]

    >ITEM 8, 3.3.8 and 3.3.9.  Allow comparison of types such as `int *'
    >and `const int *' that differ only in the presence or absence of
    >`const' or `volatile' in the type pointed to.

    I don't believe that there is any constraint with regard to this
    now.  Certainly const and volatile are not part of the object's
    type, and the constraint is only that the pointers be to objects
    of the same type.

Ok, I now see the subtlety of saying that the pointers are "to objects
of the same type".  I think that the implications of this for comparisons
should be stated explicitly in the section on comparisons.

However, some of the people writing the standard didn't
always keep this distinction in mind.
3.3.8 says, "both may be pointers to objects that have the same type."
3.3.9 says, "both may be pointers of the same type."

Therefore, the standard allows

   (int *) x < (const int *) y

but forbids

   (int *) x == (const int *) y


[Regarding zero-length arrays]

The spirit of C is that a construct that is useful in some cases
should not be forbidden entirely just because there are other contexts
where it makes no sense.

    This runs afoul of the common idiom for the number of elements
    in an array: sizeof array / sizeof array[0], since sizeof array[0]
    might be 0.

This is not a problem.  It is true that `sizeof array / sizeof array[0]'
cannot produce any useful value (and might be undefined, or cause a trap)
when `array' is an array of zero-size objects, but it is not a *problem*.
Actual uses of zero-size objects will not run afoul of this idiom because
there will be no reason to try to use the idiom.

    For logical consistency, malloc(0) should return an actual storage
    address of a zero-sized object (furthermore, each such object
    should be at a different address).

To demand such consistency is against the spirit of C, because it is
not important in practice.  In actual use, no one will try to malloc a
zero-length object, but rather will use them as parts of other
structures that have nonzero overall length.  Therefore, no one will
care, in connection with zero length objects, what malloc(0) does.
Therefore, I propose to leave malloc(0) unchanged, so that it does
what is best for whatever other reasons there are, while adding
zero-length object types.


[Confusion about #pragma]

    Use of #pragma is not permitted to alter the virtual-machine
    semantics of the code, so it isn't totally unsafe.

3.8.6 says, "a #pragma.. causes the implementation to behave in an
implementation-defined manner."  It says nothing about what kinds of
effects there may be on the semantics.  I can't find anything that
restricts what #pragma can do.

If the committee intends to require that #pragma not affect the
meaning of a program, the standard should say so.


[Aliasing]

    >If this is not done, implementors will search, separately, for valid
    >rules, thus duplicating effort.  Some of these rules will make certain
    >C programs not work as intended, and then users and implementors will
    >argue inconclusively over whether the actual behavior violates the
    >standard.

    Not conforming to the operation of the
    abstract machine described in the draft standard is definitely a
    compiler bug; if some characteristic of the abstract machine is
    left undefined, then it is folly to rely on it.

What is explicitly stated above is something we all agree with, but
it does not address the problem I am talking about.

I think that the reply does address the right problem, but implicitly:
by emphasizing what is to be done in the two clear extremes (clear
nonconformance and clear undefinedness), it claims that one of the two
will always clearly apply in any actual case.

I disagree completely.  Maybe the implications of the standard are
clear to Gwyn, but they aren't clear to everyone else.  And no two
people seem to agree on exactly what they are.  This very discussion
proves it's not clear.


Here is my scenario in greater detail:

A user program does not compile as intended.  The user points to one
part of the standard and says that the compiler is not conforming to
the operation of the abstract machine.  The compiler implementor
points to another part of the standard and claims to show that the
compiler is conforming and the user is relying on something undefined.
Both arguments will be plausible, but not conclusive.

What is to be done about this?  I say, make the standard address
aliasing issues explicitly so that unclear cases will be rare.

						     I also wonder
    just how much typical code would be sped up by the sort of
    optimizer you're concerned about.  I think this is another case of
    worrying about "microefficiency", which is usually misplaced concern.

My reading of output from my compiler gives me the feeling that it's
important.  I see many repeated fetches of a value from memory after a
store elsewhere in memory.  I know there is no aliasing and the old
common subexpression is still valid, but the compiler can't prove it.
Some short loops could be sped up by sizable fractions.

    This doesn't seem much different to me from what happens for
	    i += i;
    The wise programmer avoids writing code that could fall into such
    situations.

Maybe he does, but that's not the question.  The question is,
"What does the standard require a C compiler to do when it
gets such a program?"

IF the standard doesn't specify the behavior, then the compiler
writer can disregard the effects of his optimizations on such code
because wise programmers won't write like that anyway.

If the standard specifies the behavior, then the compiler writer
can't do so.

I appeal to the committee to make it clear, in the standard,
whether the behavior is specified or not.

		 If this isn't sufficient for pointers, I would like
    to know why.  Note that not-specifying something is sometimes a
    deliberate decision intended to avoid unduly constraining
    implementations in non-crucial areas.

There is no need to persuade me that it might be desirable to make
this unspecified behavior.  Question is, *is it* unspecified in
the current draft?  Not as far as I can tell.

Also, while the specific example I used was something that no wise
programmer would write, I used that example only because it appears
verbatim in the Rationale.  I don't see a clean rule to distinguish
those cases from other cases that programmers do use.  At least, the
standard doesn't *give* such a rule.  And the standard is where the
rule has to be.


[Unclarity re Converting preprocessor tokens]

    >ITEM 18, 2.1.1.2.  Converting preprocessor tokens.

    + is already an operator, therefore a preprocessing token,
    therefore in step 7 it is converted to a normal token,
    by itself (not combined with other pp tokens such as =).

I agree with what you say, but it the standard should make this
unmistakeably clear without need for you to add explanations.


[Severe unclarity regarding coercing arrays]

    >ITEM 22, 3.2.2.1.  This says that arrays are coerced to pointers
    >"where an lvalue is not permitted".

    This is not only correct, it is absolutely essential.

It is not correct.  For example, lvalues are permitted as operands of
the binary `+' operator.  According to the standard, it would follow
that arrays are not coerced to pointers when used as operands of `+'.

I think there is no disagreement in the C community on when arrays
should be coerced.  But the words of the standard today disagree
totally with the community and make no sense.


[Unsigned bit-fields]

    >ITEM 25, 3.5.2.1.  It is not clear whether an `unsigned int' bit field
    >of fewer bits than the width of an `int' undergoes an integral
    >promotion of type to type `int'.  3.2.1.1 suggests that it does.
    >3.5.2.1 suggests that it does not.

    Each bit field already has a type: int, unsigned int, or signed
    int, as specified in 3.5.2.1, Semantics.  Conversions in
    expressions occur by the usual rules given for these types.

I follow this reasoning.  This is why 3.5.2.1 seems to imply that
an unsigned int bit field is treated as an unsigned int.  However,
3.2.1.1 says.

"A char, a short int, or an int bit-field, or their
signed or unsigned varieties, may be used in an expression wherever an
int may be used.  In all cases, the value is converted to an int if an
int can represent all the values of the original type."

If we apply the kind of reasoning quoted in a previous section as a
justification for the validity of `char x[(int)3.5]', we can say that
the presence of bit-fields in the list implies that it makes a difference
that they are listed.  The only way it can make a difference is if
bit-fields were promoted according to the rules stated, as if they
had short integer types.  Therefore, an unsigned int bit-field
with fewer bits than an int must promote to unsigned int.

If bit-fields are not expected to receive this special treatment, and
are treated as having the type they are declared with, then they should
not be mentioned specifically in 3.2.1.1.

However, I hope I have demonstrated the unreliability of a certain sort
of reasoning about the standard, and why the standard needs to be written
so that its consequences are clear without such reasoning.

gwyn@brl-smoke.UUCP (01/24/87)

(Reminder: this is not an official X3J11 response.)

Thanks to RMS again for additional discussion of these matters.
I believe that he has indeed uncovered several potential areas
of ambiguity or confusion that the committee should address.  I
wish he had been able to participate in formulating the draft,
but am happy that he has taken the trouble to scrutinize it so
carefully.  Hopefully the next publication will be "good enough"
for use as the initial official C standard.  (People wanting
extensions could prepare implementations of them and develop
supporting evidence for their desirability for the next standard,
which would probably be at least 5 years after the initial one.)

I really don't know what we will end up doing about the linker
external-symbol arithmetic issue.  Dave Prosser remarked to me
that even
	extern int	foo;
	static char	*bar = (char *)&foo;
are impossible for some linkers.  It may well turn out that C
a la X3J11 will pretty much force initializer via "constructor
thunks" (performed once, at run-time start-off or upon first
access to the object).  I sure hope we can avoid demanding that.
However, I don't know how to formulate appropriate (universal)
restrictions on initializers to prevent this (other than outlawing
initializing with addresses of externs, which we really do need
to have).  Suggestions for this are solicited.

gwyn@brl-smoke.UUCP (01/24/87)

In article <5556@brl-smoke.ARPA> I wrote:
>	extern int	foo;
>	static char	*bar = (char *)&foo;
>are impossible for some linkers.

Prosser actually had in mind something like:
	extern int	foo;
	static char	bar = (char)(long)&foo;
but it looks more useless than the other example.
Don't blame him for my changing the example!