[comp.lang.c] My comments to X3J11

rms@frosted-flakes.ai.mit.edu (Richard Stallman) (01/06/87)

These are comments I'm about to mail to CBEMA:


I have implemented what I believe is a complete freestanding
implementation of this draft, except for the insuperable problem
described in item 1, some preprocessor features not yet finished, and
some extensions which may be forbidden by 2.1.1.3.


Disastrous Deficiencies.

ITEM 1, 3.4.  Arbitrary arithmetic and casts in static initializers
cannot be implemented in most existing operating systems.

In the Rationale, section 3.1.2, it is explained that extending the
31-character minimum significance to names with external linkage is
not worth the price of requiring changes to existing assemblers and
linkers.  Concern is expressed for implementors of C compilers that
must work with independently-maintained linkers and operating systems
that they cannot change.

Support for arbitrary casts and arithmetic in static initializers
also requires changes to linkers.  Consider

    int foo = ((int)&bar * 3) % 5001 | (int)&baz;

The Rationale in 3.4 suggests that this initial value be computed and
installed at run time.  However, this is usually impossible.  Just
generating instructions to compute the value and store it is easy; the
problem is how to cause them to be executed at a suitable time.  The
value of `foo' could be examined by code in a different source file
before any function in this source file has been called.  Only a
special linker feature would make it possible for each separate
compilation to specify code to be executed before `main' is called.

If requiring linker changes is too great a price for fixing a flaw
such as external name restrictions that bothers users greatly, then
it is inconsistent to require them to fix a minor flaw that users
do not care about.

Therefore, add the following text to 3.4:

   The effect of using `&' in an initialization expression that is
   required to be constant is implementation-defined unless the `&' is
   the outermost operator in the expression or else appears within the
   operand of a `sizeof' operator.

   However, initialization expressions may use the standard `offsetof' macro
   with defined results, even in implementations in which the expansion
   of this macro makes use of the `&' operator in a fashion that does
   not fit the above rule.

This allows constructs such &variable, &variable.component
and &array[index].

ITEM 2, 2.2.4.2.  Use an underscore prefix for library `#define's.

The standard specifies many macro names such as `CHAR_BIT' for
definition by the standard header files, and these macro names can
potentially conflict with existing and future user programs.  These
names follow no system; they are just like the names recommended for
programmers to define, so programmers must check each name they plan
to use by looking it up in the full list of standard macro names.

If the C standard, unchanging, were the only source of header files,
this solution might be adequate.  But operating systems such as Unix
and Posix provide header files in profusion.  They follow the lead of
the C standard in choosing names.

Unfortunately, these lists are long and programmers cannot know them
all.  It is not practical for programmers to avoid all the names in
all the system header files.

Granted that the only names that cause problems for an application
program are those defined by header files it includes, this does not
mean the problem can be solved by suggesting that programmers need
only avoid the names in the header files they actually use.  As a
program evolves, it may need to use an additional header file, perhaps
because it needs to use a library facility that it did not previously
need.  Yet it may already contain conflicting names, added by a
programmer who was following this practice.

In addition, operating systems evolve, and new names often need to be
added to existing header files.

What's worst, hosted implementations of the standard as written may be
impossible because of this.  Consider the identifier `read'.  This is
a Unix system call which has been rightly omitted from the C standard.
Therefore, according to the standard, it may not be reserved by the
implementation.  But, in fact, any program that uses the standard
input facilities `getchar' or `fread' will also get the `read' system
call which they use, and any attempt in the program to define `read'
in some other fashion will conflict.

The C standard is an unequalled opportunity to establish a new
convention for choosing system-defined names, one which will
systematically separate them from names that application programmers
should define.  If the C standard adopts such a convention, operating
system implementors will naturally follow it as well.

The C standard is also defining many new names.  If the problem is not
solved now, these new names will make it more expensive to solve in
the future.  Changing the new names now is cheap because no programs
use them yet.

It is safe to retain a few traditional names that don't fit the new
convention.  For example, retaining `NULL' will do no harm because
every C programmer already knows about `NULL'.

I propose that *all* names defined by the standard be renamed with the
addition of an initial underscore, with the exception of `NULL'.

Those of the names that are traditional should be put in an explicit
list of permitted synonyms.  Standard header files should be permitted
to define these synonyms as well as the recommended names.  For
example, stdio.h would be permitted to define `FILE' as well as the
new name `_FILE'.  This way, it is not required to remove the old
names and break existing C programs in order to implement the
standard.  The existing programs C programs would remain conforming
but could become strictly conforming only with name changes.

Yes, this is brutal, but I don't see any other way to avoid chaos.
Can you find another way?

Important Deficiencies

ITEM 3, 3.8.3.4.  Nested macro problems.

3.8.3.4 says that after the macro arguments are substituted, the
entire replacement token string is subject to further macro expansion.
This, together with the separate preexpansion of macro arguments
described in 3.8.3.1, appears to have the result that after

    #define h(x)  5+h(x)
    #define f(x) g(x)

the string `f(h(y))' expands into `g(5+5+h(y))'.  It took me
a long time to figure out that the sentence

   These nonreplaced macro name tokens are not available for
   further replacement even if they are reexamined in contexts
   in which that macro name would otherwise have been replaced.

in 3.8.3.4 was intended to apply to this case.  It therefore
needs to be clarified.  (A friend of mine thought it had a
completely different meaning: that once the macro name `h'
was seen in the replacement for `h', the name `h' would
effectively be undefined for the rest of the source file.
He knew that this was ridiculous, and we both looked for
another meaning, but we could not find one.)

But with this rule, it becomes very difficult to implement a
character-based preprocessor.  Such a preprocessor has no way to
distinguish an `h' that should not be replaced from another `h' that
may be replaced.

In a function-like macro it is possible to use the macro's name in its
own definition without any special rule, simply by writing parentheses
around the macro name where it occurs in the definition.  This also
avoids the problem described above.  Thus, if the definition of `h' is
rewritten as

    #define h(x)  5+(h)(x)

then `f(h(y))' expands to `g(5+(h)(y))' even without the special rule.

This makes the special nonrecursion rule mostly superfluous.

I agree that it would be nice to allow macros' names to appear in
their own expansions without causing infinite recursion, but in the
light of these difficulties, and the ability to get the result
without a special nonrecursion rule, I believe the nonrecursion
rule should be removed by replacing the second paragraph of
3.8.3.4 with

    If the name of a macro appears within the replacement text
    of an expansion of the same macro, or in nested replacements
    resulting from that replacement text, the result is undefined,
    except in the case of a function-like macro whose name appears
    followed by a character other than `('.

ITEM 4, 3.4.  Don't forbid floating arithmetic in integral constant
expressions.

3.4, combined with 2.1.1.3, appears to require compilers to report an
error for things like

   char x[(int)(3.5)];

because floating constants are not in the list of what may appear in
an integral constant expression.

But in a compiler that does constant folding, `(int)3.5' will be
changed into `3' before the consideration of the array declaration
begins.  There is no natural way to report an error for this code.

Perhaps I have misunderstood 3.4.  It also says that casts
from arithmetic types to integer types are allowed, which appears
to imply that there may be subexpressions whose type are arithmetic
but not integral.  Yet the list of allowed constructs does not
include any subexpressions that could have floating type, making
it superfluous to list the possibility of such casts.

This suggests that perhaps the intent of the constraint was to
allow any kind of constant expression of arithmetic type as long
as it is used in a cast to integer type.

If that was the intent of the constraint, it needs to be rewritten
unambiguously, but that's not all.  This meaning causes a serious
problem because it implies that expressions such as
`((float)((int)&foo | 38))' (where `foo' is static) are valid.
Yet such expressions are impossible to compute at compile time
so they cannot be implemented.

I propose the following constraint for integral constant expressions:

   An integral constant expression is an expression of integral type
   which does not, except within a `sizeof' operator, refer to any
   function or variable or use the unary `*' or `&' operators or the
   postfix `[...]' or `(...)' operators.  (Footnote: It follows that
   integral constant expressions contain no lvalues and cannot validly
   use the increment or assignment operators.)

   However, integral constant expressions may always use the standard
   `offsetof' macro even in implementations in which the expansion of this
   macro makes use of the `&' operator.

ITEM 5, 3.8.3.  Allow preprocessor to forget macro argument spelling.

The constraints in 3.8.3 say that macro redefinitions must use the
same spelling of arguments.  This, combined with 2.1.1.3, implies
that redefinitions that differ only in the spelling of the
arguments must get an error.  In other words, the preprocessor
is required to remember the spelling of the arguments.

There is no other reason why a preprocessor would record how
the arguments of a macro were spelled after finishing processing
the #define line.

It is good to require strictly conforming programs to redefine
only with the same argument spellings, because this makes it
possible to use preprocessors that work in other ways.  However,
forbidding preprocessors to allow equivalent definitions
with different argument spellings serves no useful purpose.

The spirit of the constraints in 3.8.3 is to allow redefinitions
that make no change and forbid those that would alter the meaning
of the macro.  A preprocessor that ignores the argument spellings
when comparing definitions actually fits this spirit better than
what is currently required by the standard.

ITEM 6, 4.9.6.5.  `sprintf' is unsuitable for robust programs.

For most format strings, no fixed size of buffer is safe to use
with `sprintf' because some data would cause it to overflow.
This means that the usefulness of `sprintf' is limited to a few
kinds of format strings.  Moreover, the standard offers no
robust way to do output formatting into buffers in memory.

Rather than promoting this dangerous construct, the standard ought to
define a similar function which accepts a buffer size
as argument and guarantees not to write beyond that size.  Thus,
replace the text of 4.9.6.5 with the following:

   int snprintf (char *s, int len, const char *format, ...);

   The `snprintf' function is equivalent to `fprintf', except that
   the argument `s' specifies an array into which the generated
   output is to be written, rather than to a stream.  No more than
   `len' characters of output are written to the array.  The returned
   sum is the number of characters that the output would contain

   If the output would properly consist of fewer than `len'
   characters, then all the output is written to the array `s',
   followed by a null character that is not counted as part of the
   returned sum.  In this case, the returned sum is less than `len'.

   If more than `len' characters of output would result from the
   specified format string and arguments, the first `len' characters
   of output are written to the array `s' without a terminating null
   character and the rest of the output is discarded.  The returned
   sum is the number of characters that would be output if the array
   were big enough; therefore, it is greater than or equal to `len'.

`vsprintf' has the same problem, so a `vsnprintf' function should
be created along the same lines.

Easy Minor Improvements.

ITEM 7, 3.3.4.  Allow casts to union type.

For example,

  union foo { int x; double y; };
  void bar (union foo);

  union foo hack ()
  {
    bar ((union foo) 78);
    return (union foo) 1.3;
  }

Right now it is necessary to assign temporary variables explicitly to
construct a union to be passed or returned.  This feature is very
easy to implement in a way that would generate the same code that
results from the explicit assignments now required.  With a little
more work, compilers can generate much better code that is hard to do
now.

This requires no new syntax, and its meaning is obvious.  It cannot
be hard to implement.  It breaks no existing C programs.

This change is done by changing 3.3.4's constraints as follows:

   If the type name specifies void type, the operand may be any expression
   other than a void expression.  If the type name specifies a union type,
   the operand may be any expression whose type is that of any member
   of the union.  Otherwise ...

and add a section 3.2.2.4.

   A value of any type may be converted to a particular union type
   provided the value's type is the same as some member of the union.
   The result of the conversion is a value of union type such that
   access to such a member in it would yield the value converted.
   If multiple values of the union type could have this property,
   it is undefined which of them is the actual result.

It would be even more convenient to be able to declare `real_bar'

   int real_bar (int x, union foo y);

and then write `real_bar (1, 2)' or `real_bar (1, 1.5)'.  This might merit
an addition to the constraints of 3.3.2.2 as follows:

  The types shall be such that each formal parameter may be assigned
  the value of the corresponding argument; or else a formal parameter
  may be of union type and the corresponding argument of a type
  that can be converted to the union.

ITEM 8, 3.3.8 and 3.3.9.  Allow comparison of types such as `int *'
and `const int *' that differ only in the presence or absence of
`const' or `volatile' in the type pointed to.

For example, the following code is currently invalid but should be
valid.

  char *p;
  const char *q;

  if (p == q)...

This change would parallel the handling of assignments.

Add to the constraints of section 3.3.8 and 3.3.9:

   In addition, an expression that has type ``pointer to type without
   the const attribute'' may be compared with a pointer to a type with
   the const attribute.  An expression that has type ``pointer to type
   without the volatile attribute'' may be compared with a pointer to
   a type with the volatile attribute.


ITEM 9, 3.5.3.2.  Allow the length of an array to be zero.

An array of length zero is very useful in structures like this:

  struct table {int length; int contents[0]; }

so you can do

  malloc (sizeof (struct table) + length * sizeof (int))

instead of having to use `(length - 1)'.

Allowing this does does not alter the meaning of any conforming
program or any existing C program, it gives a construct the meaning
that C programmers would expect it to have by analogy, and it is very
easy to implement.

To make this change, alter the constraints of 3.5.3.2 (first sentence)
as follows:

   The constant expression that specifies the type of the array shall
   have integral type and value greater than or equal to zero.
   ...

Structures of zero length are also useful, in examples such as

   struct feature_for_next_year { };

   struct forever {
	   struct last_year a;
	   struct this_year b;
	   struct feature_for_next_year c;
   };

Here the idea is to add the structure `feature_for_next_year' to the
program even though there is as yet no requirement to give it any
members.  The standard currently requires (in 3.5.2.1) at least one
member in a structure.

Making this change requires changing the syntax rules in 3.5.2.1.

The Rationale, in section 4.10.3, speaks of the "theoretical
disadvantage of requiring the concept of a zero-length object".
However, there is no indication of what such a disadvantage might be.
The examples above show why zero-length arrays are useful; the burden
of proof is now on whoever wishes to show there is a reason not to
allow them.

I do not propose any change to the specifications of `malloc', etc.,
in 4.10.3 in connection with this.  The currently specified behavior
for size zero is adequate even when there are types of size zero.

Significant Improvements.

ITEM 10, 3.6.  Provide a "frequency" statement to tell optimizing
compilers which inner loops should be considered more important than
the containing code.

I know that the immediate first response will be, "Use #pragma."
However, #pragma is unsuitable for this use for two reasons:

1. #pragma is not standard; therefore, one can never be sure
that the same #pragma line will not have a completely different
and disastrous meaning on some other implementation.  It follows
that no strictly conforming program can use #pragma.  Yet it is
desirable to be able to state frequency information in strictly
conforming programs.

2. 3.8.3.4 says that the result of macro expansion is not taken as a
preprocessor line even if it looks like one.  This would appear to
imply that macros cannot generate #pragma lines.  (If that section is
not intended to have this meaning, it should be clarified.)  Being
unable to put frequency information in code generated by macros is an
undesirable restriction.

What is really needed here is a standard construct that is guaranteed
to mean either a standard thing or nothing (no affect on execution).

I propose that `frequency (NN)' be syntactically equivalent `while
(NN)' but have no effect on the execution of the abstract machine,
serving only as a declaration that its body will be executed an
average of NN times as often as the smallest containing `if', `while',
`for' or `switch' statement.  NN must be a constant expression,
perhaps an integral constant expression or perhaps allowing floating
point values.

Examples of use include

  if (used == allocated)
    frequency (0)		/* Initial size is almost always enough */
      { allocated *= 2;
	... call realloc ...; }

and

  while (c = *p++)
    frequency (50)		/* 50 is a typical length for these strings */
      { ...}

Instead of a new kind of statement, a new kind of declaration could
be used.  It would apply to the code within the block in which it
is used.  It could have the syntax

  . frequency (N);

where the period at the beginning prevents conflict with any currently
defined syntax; `frequency' would not be a keyword and could still be
used as a variable name.  The above examples, rewritten to use this
construct, look like

  if (used == allocated) {
    .frequency 0;		/* Initial size is almost always enough */
    allocated *= 2;
    ... call realloc ...;
  }

and

  while (c = *p++) {
    .frequency 50;		/* 50 is a typical length for these strings */
    ...
  }

Someone else suggested that the construct `#pragma frequency N' be
given a standard meaning and used for this.  However, this is not a
solution.  It would resolve objection 1 to the use of `#pragma', but
objection 2 would still stand.

Omitted Issues.

ITEM 11, 2.1.2.3. The standard ought to say more explicitly when
aliasing can validly take place in a strictly conforming C program.

When implementing an optimizing C compiler, the question always arises
of when the compiler must assume that two pointers may be aliases.  In
some cases where the address of an object of block scope is taken, it
might be possible to find all the places this address can reach, and
then assume that no other pointers can alias with the object.
However, this case is infrequent.  Therefore, it is important to know
what assumptions must be made by the compiler in other cases.

One safe choice is to assume that any pointer that is not the address
of a known static or automatic object may alias with any object.  But
the code that results from this assumption contains many instructions
that humans know are wasteful.  A compiler using a more restrictive
rule would be much better if it were correct.

I have heard suggestions of rules based on the types of objects
involved.  For example, one person who has read the standard suggests
that casting a pointer to a different pointer type and accessing the
object pointed to is always undefined, and that this could be the
basis of aliasing determinations.  But this is not true.  If the
pointer points to a member of a union, it could safely be cast to
point to a different member of the union.  This could make it defined
to cast any pointer type to any other.  Letting T1 and T2 be arbitrary
types, here is an example that produces aliasing between them that is
valid according to the current standard as far as I can tell.

   union { T1 a; T2 b;} myunion;
   
   .. foo (&myunion.a) ...

   foo (p)
        T1 *a;
   {
      .... (T2 *) a ...
   }

One might consider a rule that a static scalar object which is not
part of a union object cannot alias with a pointer pointing to any
other type.  But I cannot determine with certainty whether this rule
is valid.  I can see how to produce that kind of aliasing with an
example such as

   static double foo;
   ...
   *(int *)&foo = 1;

but I am not sure whether anything about the behavior of this example
is defined.  If all cases that produce such aliasing are undefined
under the standard, then the rule is valid.  I cannot tell whether
this is so.

I am not sure whether the standard implies that

    short in, out;
    {
      char *inptr, *outptr;

      inptr = (char *) &in;
      outptr = (char *) &out;
      int i;

      for (i = 0; i < sizeof (short); i++)
	outptr[i] = inptr[i];
    }

is defined and equivalent to

    short out, in;

    out = in;

If it does, the compiler may not assume that the previous value
of `out' is still valid after the assignment to `outptr[i]'.

I urge the committee to determine whether the standard implies that
this rule, or some other rule of the same nature, can validly be used
to assume aliasing is not taking place, and to state in the standard
which rules are recommended.

If this is not done, implementors will search, separately, for valid
rules, thus duplicating effort.  Some of these rules will make certain
C programs not work as intended, and then users and implementors will
argue inconclusively over whether the actual behavior violates the
standard.

Note that the first example in section 2.1.2.3 of the Rationale gives
an example where this issue is relevant.  If the variable `sum' is
normally held in memory, keeping its value in a register during the
loop will give incorrect results if `a' is equal to `&sum - 1'.

Controversial Issues.

ITEM 12, 2.1.1.3. Permit extensions.

Very often the rationale says that certain proposals for new standard
features were rejected because of a "lack of prior art".

This by itself is good practice.  It is wise not to include a feature
in the standard when people don't yet see clearly what form it should
take or whether it is truly useful.

However, when 2.1.1.3 is added to these decisions, it has the effect
of forbiding any subsequent art that would ever shed light on the
matter.

2.1.1.3 should be amended so that documented extensions are allowed
to give meaning to constructs that are invalid according to the
standard:

   A conforming implementation shall produce at least one diagnostic
   message for every source file that contains a violation of any
   syntax rule or constraint, unless the violation is in accord
   with a documented extension of the implementation.

Here is a list of many extensions that would be useful and interesting
but are forbidden by the standard:

1. A preprocessor feature to test for the existence of an include file,
such as the `definedfile' operator.

2. A preprocessor directive that allows defining a macro so
that each time it is called it appends some text to the definition
of another macro.  Eventually the other macro would be expanded
so as to get out the text previously appended to it.  This is useful
for making entries in a table of commands as the individual commands
are defined.

3. Arrays of size zero.

4. Arrays whose sizes are not constant.

5. Aggregate initializer elements that are not constant.

6. A way of declaring labels with block scope.  This is useful
for certain macros.

The only way to break out of nested loops in C is with a label; as a
result, it is impossible to write a clean macro that expands into code
containing such nested loops because if the macro is used twice in one
function there will be a conflict of label names.

7. Compound statements within expressions.
This would allow clean definition of safe macros to replace
simple functions.  Consider, for example,

#define fmin(A,B)   \
  ({ double a = (A); double b = (B); return ((a < b) ? a : b); })

It would be natural to forbid gotos into the ({...}) construct by
giving labels within it a local scope.  This would enable the safe use
of labels in macros.

8. String literals that are written with embedded significant newline
characters.

9. Ranges in case statements.

10. Casts to union types.

11. Aggregate constructor expressions.


ITEM 13, 2.1.1.3.  It is unclear.

The concept of "violation of any syntax rule" is unclear
because the syntax rules used in the standard are generative, and
the nature of generative grammar is to specify what is allowed,
not what is prohibited.  Thus, invalid syntax typically violates
no particular rule, but rather fails to correspond to any rule.
I am not sure whether 2.1.1.3 as now written forbids new syntax
rules that give meaning to previously invalid syntax, such as
a rule to allow new kinds of declarations:

   declaration:
     . <identifier>  <integral-constant-expression>;

(where <identifier> would be constrained to be one of a specific
list of identifiers defined by the implementation which has this
extension).

ITEM 14, 2.1.1.3.  The idea of erroneous program has been misapplied.

To say that a certain construct is erroneous and must generate
a diagnostic message has both advantages and disadvantages.

The advantages are that it might prevent what would otherwise be
mysterious unpredictable behavior, and that it helps keep unportable
constructs supported by one implementation from creeping into programs
which then become difficult to port.  The disadvantages are that it
restricts methods of implementation, interferes with improvement of
the language, and can require a great deal of extra work.

These advantages and disadvantages are always present, but not
uniformly.  In some cases the advantages are great while in others
they are small.  By adopting this position as a blanket policy rather
than in the specific cases where the advantages are great, the
standard can impose great burdens on implementors with no benefit to
users.

2.1.1.3 does not help the applications programmer much because the
worst kinds of unportability are the unspecified behaviors that abound
in C--cases such as `foo (p++, p++)'.

Suppose that one C compiler nonstandardly allows the size of an array
to be zero.  A programmer might start using zero-size arrays, and then
on moving to another compiler the program would not work.  But the
trouble this would cause is limited by the fact that the other
compiler would print an error message identifying where the zero-sized
array was used.  Changing the program would then be straightforward;
a nuisance at worst.

By contrast, if the programmer starts to use `foo (p++, p++)', he will
get no diagnostic but much perplexity on moving to another compiler.

It is disproportionate to pay any important price to protect the
programmer from the nuisance resulting from some compiler's failure to
support zero-length arrays while doing nothing to remove the real
danger of ambiguous evaluation order.

ITEM 15, 2.2.1.1.  Eliminate the ?? trigraphs.

The trigraphs, unlike the other internationalization changes, are not
necessary.  The belief that they are necessary comes from linking
two independent questions:

1. Which character sets C programs can operate on properly.
2. Which character sets C programs can be written in.

There is a great need for C programs to be able to operate on
all the European character sets.  The internationalization changes
in the library make this possible.  There is no such need to be
able to write C programs in non-ASCII character sets.  A program
to interact with users in a French character set can do its job
just as well if written in ASCII.

The Rationale, in 2.2.1, says that the goal is to make sure that
it is possible to translate a C translator written in C.
This requirement can be met even if C programs must be written in
ASCII, as long as every C translator can operate on ASCII.

Now, if each C installation had to choose one character set and all C
programs running on that installation were compelled to use this
character set, this set would have to be that of the local country and
therefore it would be necessary to be able to write C in all of those
character sets.

But C programs are not limited to operating on one character set.  The
internationalization changes in the library make it easy for C
programs to choose even at run time among several supported character
sets.  As long as ASCII is always one of these supported character
sets, C programs written in ASCII can be compiled everywhere.  There
is no need to be able to write C programs in anything else.

Trigraphs cause an obscure problem in addition to the ones that are
apparent at first glance: they make it much more difficult for
programs to understand C syntax.  Consider a text editor that needs to
understand C syntax only to the extend of matching beginnings and ends
of strings and balanced expressions.  This becomes very difficult in
the presence of the trigraph ??/.  For example, complicated
special-purpose code would be needed to be able to find the beginning
of the string "foo? ??/" ??/??/", given the position of the end of the
string.  Most likely such editors will simply not support the use of
trigraphs.

A German friend who I asked says that his colleagues would rather
use ASCII for their C programs, and if forced to use a German
character set would rather have braces display as umlauted letters
than use trigraphs.

ITEM 16, 3.5.6.  Allow variable elements in aggregate initializers.

The constraints of this section, together with what 2.1.1.3
says about required diagnostics, appear to forbid the use of an
extension in which the elements of initializers for automatic
aggregates could be other than constant.

The Rationale mentions no objection to this usage, even as a proposed
standard feature, except for cases of the form

   int x[2] = { f(x[1]), g(x[0]) };

and appeals to the difficulty of writing rules that would exclude such
cases or specify their order of evaluation.

The Rationale thus implicitly assumes that such a feature would have
to be accompanied by rules to eliminate the ambiguity of ordering, but
gives no justification for this assumption, which goes against the
spirit of C.  The Rationale gives no arguments against the idea of
including this feature without such rules.

The ambiguities of examples such as the one given above are not new.
They exist in C statements already.  There are four possible orders of
execution in this example, with different results:

1.  f is called first (receiving garbage), and g receives f's result.
2.  f is called first, both f and g receive garbage.
3.  g is called first (receiving garbage), and f receives g's result.
4.  g is called first, both f and g receive garbage.

(It should also be noted that the example involves, for any possible
order, passing an undefined value to at least one of `f' and `g'.
Therefore, this particular initializer would probably be useless even
if the order of evaluation could be predicted.  Most of the unspecified
cases share this property.)

The following C statement, which is valid according to the standard,
shows the same problem:

   (x[0] = f(x[1])) + (x[1] = f(x[0]));

This statement contains no sequence points except for the function
calls, so it has the same four possibilities.

Since rejecting variable elements of aggregate initializers does not
accomplish the goal of eliminating these ambiguities, variable
elements should be allowed, with order of evaluation of such elements
and the storing of the results all being unspecified.

If that is not done, perhaps out of a desire to avoid requiring
any additional features at this time, at least the rule in 2.1.1.3 that
forbids this feature to be provided as extension should be relaxed.

ITEM 17, 3.8.3.  Don't say whether keywords can be #defined.

It is not necessary to choose between allowing and forbidding
the definition of keywords as macros.  Another alternative is
to make it undefined (or perhaps implementation-defined).

I consider allowing macro definition of keywords to be somewhat
preferable to making it undefined, but making it undefined 
greatly preferable to forbidding it.  I expect that most fans
of character-based preprocessors will share this feeling.

Many of those who like token-based preprocessors are likely to have a
similar attitude, in reverse: that forbidding definition of keywords
is a little better than making them undefined, which is much better
than allowing them.

If this is how people feel, "undefined" gets a higher combined rating
from the entire community than either "allowed" or "forbidden".  It is
a natural compromise.

Clarifications Needed.

ITEM 18, 2.1.1.2.  Converting preprocessor tokens.

Is the intention of step 7 that each preprocessor token is converted
individually to normal tokens; that it is impossible for two adjacent
preprocessor tokens such as `+' and `=' to form one normal token `+='?

I believe this is what is meant, but it is not clear.


ITEM 19, 2.2.2.  The wording of the definition of `\f' should be changed.

The current wording, by speaking in terms of moving the cursor on a
display device, creates a spurious conflict with issues of user
interface design.  It is not desirable for an operating system to move
the cursor in this way, when a formfeed character is output to a
display by an ordinary program.  Some operating systems do this, or
clear the screen, but the only result is confusion for users when
programs that were not designed for explicit display control
output samples of the user's data that happen to contain formfeeds.

This problem has nothing to do with the spirit of the standard,
so a change in wording would make it go away.  I propose

   \f (form feed) Is regarded as dividing a document into pages.


ITEM 20, 2.2.4.2.  Why no FLOAT_ROUNDS?

The example of float.h values for IEEE standard floating point
does not define FLOAT_ROUNDS.  Is this an omission?

What is the reason for not specifying the value FLOAT_ROUNDS should
have in an implementation that does rounding?  Is some specific
application envisioned for conveying additional nonstandard
information through the value of this variable?  If so, it should be
described in a footnote.

If no such use is envisioned, then it would be better to specify the
standard value `1' for implementations that round.  Any nonstandard
additional information could be conveyed by some other nonstandard
name.

ITEM 21, 3.1.2.5 says that `int' and `long' are different types even if
they are identical in range.

I expect that this is intended to have operational consequences for C
translators but it is not clear what those consequences are.  After
long thought, I arrived at the idea that the intent might be to
require an error message from the following fragment.

  int *foo;
  long *bar;

  foo = bar;

If this is the intended meaning, it should be stated explicitly
with an example.

ITEM 22, 3.2.2.1.  This says that arrays are coerced to pointers
"where an lvalue is not permitted".  I cannot find any coherent
meaning for this statement.  Lvalues are permitted (but so are other
expressions) as operands to all the arithmetic operators, for example,
but arrays are coerced in those places.

Also, searching through the standard, I find that most if not all
places that call for lvalues require modifiable lvalues, which
excludes arrays.

As far as I know, only within the `sizeof' and `&' operators is
an array not converted to a pointer.  If the intention is to
specify these places, it would be cleaner to do so by listing them
explicitly.

ITEM 23, 3.3, paragraph 3.  It would be natural to allow
associative regrouping for `+' and `- together', as in
`a - (b - c)' => `(a - b) + c' and `a - (b + c)' => `(a - b) - c'.
But the wording in use seems to rule this out.

However, at the end of section 3.3 in the Rationale it says that a
decision was made against "extending grouping semantics [of unary
plus] to the unary minus operator".  This would seem to mean that
regrouping through unary minus is permitted.

In addition, it is conceptually simple to regard `a-b' as equivalent
to `a+-b'.  `(a + -b) + -c' can be regrouped into `a + (-b + -c)', so
if it is not possible `(a - b) - c' into `(a + -(b + c))' the result
is that users will be confused.

If regrouping through unary and binary minus along with binary plus is
not allowed, I believe this needs to be stated explicitly.  I think
it would be better to state explicitly that it is allowed, and here
is how it might be done.  Add to 3.3:

  ... are not changed by this regrouping.  An expression consisting of
  the binary operator `-' applied to two expressions m1 and m2 may be
  regrouped as m1+(-(m2)), and an expression of the form m1+(-p1) may
  be regrouped as m1-p1.  An expression of the form -(m1+m2)
  may be regrouped through the distributive law as (-(m1))+(-(m2)) and
  an expression of the form (-p1)+(-p2) may be regrouped as -(p1+p2).
  Here m1 and m2 stand for arbitrary multiplicative-expressions and
  p1 and p2 stand for arbitrary primary-expressions.

  To force a particular grouping...


ITEM 24, 3.3.3.4.  What is the value of `sizeof'
applied to an array whose size has not been declared, as in this
example:

   extern char x[];
   ... sizeof x ...

ITEM 25, 3.5.2.1.  It is not clear whether an `unsigned int' bit field
of fewer bits than the width of an `int' undergoes an integral
promotion of type to type `int'.  3.2.1.1 suggests that it does.
3.5.2.1 suggests that it does not.

It would be useful for both of these sections to state explicitly
what happens to these bit fields.

ITEM 26, 3.5.5 says, "two types are the same if they have the same ordered set
of type specifiers and abstract declarators".

Does this mean that `long unsigned int' and `unsigned long int' might
be different?  Must be different?

Does this mean that in the following fragment, `x' and `y'
have different types?

   typedef const int foo;
   volatile foo x;

   typedef volatile int bar;
   const bar y;

I believe that a clarification is required.


ITEM 27, 3.7.6.  The examples here are misleading.  By showing the use of a
formal parameter declared as a pointer to a function and called with
explicit use of `*', and also showing a formal parameter declared as
function and called with no `*', they seem to suggest that the two
choices (declaration and call) are coupled.  But my reading of the
standard says they are independent; either call will work with either
declaration.  It would be better to pick examples that don't suggest a
nonexistent correlation.


ITEM 28, 3.8.  The constraints in this section appear to imply that
comments are not allowed on preprocessor lines except at the end.

I do not believe that is what it is intended to mean, because
3.8.3 contains examples in which comments appear within the
replacement text in a #define.

I propose adding the following footnote to the constraints of 3.8:

  The horizontal spaces allowed within a preprocessing directive
  include horizontal spaces resulting from the elimination of
  comments, which has taken place at an earlier phase of translation.

ITEM 29, 3.8.3.  The semantics portion says that initial and final whitespace
are not considered part of a macro's replacement token list.
This appears to imply that whitespace is significant and required
not to be copied through by a preprocessor.

As I understand it, whitespace other than newlines is not significant
at the stage of preprocessor tokens, so this statement is misleading.

What is worse is that the only way a separate preprocessor can
make sure that two preprocessor tokens don't convert to one normal token
is to output whitespace between them.  Specifically this must be done
at the beginning and end of a macro's replacement text.  Thus, the
semantics portion of 3.8.3 appears to forbid the only practical
method of implementing the standard's specifications for token conversion
(assuming that this is what the standard specifies; see the item above
that refers to 2.1.1.2).

I propose the following change to the semantics section of 3.8.3:

   Any whitespace characters preceding or following the replacement list
   of tokens are not considered significant when comparing macro
   definitions to determine the validity of a redefinition.


ITEM 30, 3.8.3.  Is there some motivation for not standardly supporting
the use of empty macro arguments?  If so, it would be useful
to have a footnote explaining why they might fail to be
straightforwardly handled by the mechanisms used to handle
nonempty arguments.


ITEM 31, 3.8.8.  This would appear to forbid the common practice of
predefining some macro names that identify the type of hardware
and software that are in use.  A.6.5.13 says that this practice
is expected to continue.  A.6.5 says that only names beginning
with an underscore could be predefined.

No two of the above can be true.

Actually, the predefined names currently in use are undesirable
because they do not begin with underscore.  Since they are chosen
by implementors, no one can predict what names might be predefined in some
implementation.  Thus, all names chosen by applications programmers
are vulnerable to conflicts.

However, the need for some way to indicate to the C program what
kind of environment it is being compiled for is a great one and
has to be filled somehow.

It would be better to predefine names that begin with underscore for
this purpose so as to avoid conflict with names chosen by applications
programmers.

ITEM 32, 4.9.4.1.  The Rationale says that `remove' was defined
because the definition of `unlink' is too Unix-specific.

As far as I can see, `remove' may differ from `unlink' only in that
certain behavior is implementation-defined.  If so, it would have been
just as good to call this `unlink'.  Many traditional constructs are
included in the standard with certain cases undefined.

The definition of `remove' could be read as requiring that the file
(not just one of its names) actually disappear immediately unless it
is currently open.  Under this interpretation, `remove' cannot be
implemented on Unix systems.

If the definition is read as requiring that only the specified name
disappear, and that the file remain accessible under any other names
it may have had, `remove' cannot be implemented on some other systems
such as VMS.

I think the definition of `remove' should clearly indicate that the
choice between those two behaviors is implementation-defined.
A similar clarification may be required for `rename'.


ITEM 33, 4.9.4.3.  Mention effect of abnormal termination on `tmpfile'.

Elsewhere it is stated that it is not defined whether files created
by `tmpfile' are removed on abnormal termination.  This is a good
specification, but it needs to be stated with the definition of
`tmpfile'.  The definition could now be read as implying that the
file will be deleted on normal termination.


ITEM 34, A.6.3.3.  What does the "order of bits in a character" mean?

I do not know what operational definition could be assigned to the
order of bits in a character.  Can this be explained?

I know about the difference between byte ordering among machines,
but I don't see that it constitutes any ordering of the bits within
a byte.

faustus@ucbcad.berkeley.edu (Wayne A. Christopher) (01/06/87)

A lot of people have been worrying about the proliferation of names
that don't begin with `_' which are pre-defined by the implementation.
But seriously, we can't expect `read' to be re-defined as `_read' in
UNIX -- the things that UNIX defines are going to stay defined.  How
many programmers have had serious problems with conflicts like this?
The problem of non-standard identifiers in libraries isn't a problem at
all, as I pointed out in a previous message (just make sure other
library routines use "hidden" variations), and macros defined in header
files usually will cause an error, so at least bugs aren't going to
remain hidden because of this.  Before we spend too much time fixing
problems, let's make sure that they're problems in the first place.

	Wayne

jss@ulysses.homer.nj.att.com (Jerry Schwarz) (01/07/87)

In article <2144@brl-adm.ARPA> rms@frosted-flakes.ai.mit.edu makes
many sensible comments on the ANSI proposal.  Here are some comments 
on his comments. (My failure to comment on a particular ITEM does
imply either agreement or disagreement.)

>
>ITEM 1, 3.4.  Arbitrary arithmetic and casts in static initializers
>cannot be implemented in most existing operating systems.
>
  [extended discussion omitted.]
>
>Therefore, add the following text to 3.4:
>
>   The effect of using `&' in an initialization expression that is
>   required to be constant is implementation-defined unless the `&' is
>   the outermost operator in the expression or else appears within the
>   operand of a `sizeof' operator.
>
>   [ ... ]
>This allows constructs such &variable, &variable.component
>and &array[index].

I agree that there is a problem, but the proposed addition is inadequate.
For example it still permits
		(short)array

A simpler fix is to forbid casts from pointers to arithmetic
types. The change required is to add "except casts from pointer 
types to arithmetic types," after "arbitrary casts" on line 18 page 48

>ITEM 5, 3.8.3.  Allow preprocessor to forget macro argument spelling.
>
>The spirit of the constraints in 3.8.3 is to allow redefinitions
>that make no change and forbid those that would alter the meaning
>of the macro.  A preprocessor that ignores the argument spellings
>when comparing definitions actually fits this spirit better than
>what is currently required by the standard.

I disagree that this is the intention.  In fact I think the opposite, 
namely the intention is to allow redefinition only when the redefinition 
is identical in every respect, not just meaning.  

In any event, if you just eliminate the phrase requiring
spelling of parameters to be identical, the preprocessor will normally
have to preserve the spelling of parameters that are used in order
to test for identity of the replacement lists.  Finding a clear
concise change in the definition of equality of replacement lists
might be tricky.


>ITEM 8, 3.3.8 and 3.3.9.  Allow comparison of types such as `int *'
>and `const int *' that differ only in the presence or absence of
>`const' or `volatile' in the type pointed to.
>
>For example, the following code is currently invalid but should be
>valid.
>
>  char *p;
>  const char *q;
>
>  if (p == q)...
>
>This change would parallel the handling of assignments.
>

The problem is deeper.  Suppose instead of the above, the code was

	char **p;
	const char** q;
	... p==q ...

This would be still be illegal under your suggestion.  What is
needed is a more careful examination of the notion of type equality. 
I'm not sure what such a proposal would be.  If I work one out I
will post it.

>ITEM 11, 2.1.2.3. The standard ought to say more explicitly when
>aliasing can validly take place in a strictly conforming C program.
>
Such a discussion might be useful in an appendix or the rationale, 
but not in the standard proper.  Either such rules are already implied 
by the semantics, in which case they are redundant or these rules would 
contradict the semantics in which case we wouldn't know which whether
the semantics or the rules were to govern.

>I have heard suggestions of rules based on the types of objects
>involved.  For example, one person who has read the standard suggests
>that casting a pointer to a different pointer type and accessing the
>object pointed to is always undefined, 
>

Looking at 3.3.4 it seems that although casts between pointer
types are allowed, nothing explict about their meaning, beyond
the general assertion that the cast converts the value is asserted.
Although I cannot find any explicit language that requires it
I think the intention is that, for example 

	... (char*)&obj ...

should point to the first byte of "obj" whatever "obj's" type.  
This is certainly implicit in library functions like "memcpy"
(although these use void*, rather than char*).
Also the standard takes care takes with defining "bytes" and giving 
rules for layout in structures and unions.  

I don't think there are any syntactic rules of the kind you want.
This makes life hard on compiler writters, but it is part of the
"spirit of C".

>ITEM 16, 3.5.6.  Allow variable elements in aggregate initializers.
>
>The constraints of this section, together with what 2.1.1.3
>says about required diagnostics, appear to forbid the use of an
>extension in which the elements of initializers for automatic
>aggregates could be other than constant.
>
I don't think so. All 2.1.1.3 seems to require is that a warning 
message be generated if the extension is used. 

>ITEM 20, 2.2.4.2.  Why no FLOAT_ROUNDS?
>
>The example of float.h values for IEEE standard floating point
>does not define FLOAT_ROUNDS.  Is this an omission?
>
The rationale explains this in 3.2.1.4.  Briefly, IEEE chips use the
same bit to control rounding of floating arithmetic and the conversion
from floating to integral.  Since C requires that the latter truncate
an implementation might choose to have floating arithmetic truncate as
well.

>ITEM 22, 3.2.2.1.  This says that arrays are coerced to pointers
>"where an lvalue is not permitted".  I cannot find any coherent
>meaning for this statement.  Lvalues are permitted (but so are other
>expressions) as operands to all the arithmetic operators, for example,
>but arrays are coerced in those places.

I think it should read "where an lvalue is required".  In describing
the type constraints of the various kinds of expressions some sections
of 3.3 assert "... shall be a modifiable lvalue" and others don't.


Jerry Schwarz
Bell Labs, Murray Hill
ulysses!jss

ron@brl-sem.ARPA (Ron Natalie <ron>) (01/07/87)

In article <1202@ucbcad.berkeley.edu>, faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
> A lot of people have been worrying about the proliferation of names
> that don't begin with `_' which are pre-defined by the implementation.
> But seriously, we can't expect `read' to be re-defined as `_read' in
> UNIX -- the things that UNIX defines are going to stay defined.  How
> many programmers have had serious problems with conflicts like this?

Easy, UNIX can have "read" in libc (or liba for those running Version
6).  It is just prohibitted that any of the Standard C routines such
as PRINTF use "read."  That way if a user defines his own function
called read, he doesn't break any calls he made to the "Standard Set
of Routines."

-Ron

mat@mtx5a.UUCP (01/13/87)

> These are comments I'm about to mail to CBEMA:
> 
> I have implemented what I believe is a complete freestanding
> implementation of this draft, except for the insuperable problem
> described in item 1, ...
> 
> Disastrous Deficiencies.
> 
> ...
> Support for arbitrary casts and arithmetic in static initializers
> also requires changes to linkers.  Consider
> 
>     int foo = ((int)&bar * 3) % 5001 | (int)&baz;
> 
> The Rationale in 3.4 suggests that this initial value be computed and
> installed at run time.  However, this is usually impossible.  Just
> generating instructions to compute the value and store it is easy; the
> problem is how to cause them to be executed at a suitable time.  The
> value of `foo' could be examined by code in a different source file
> before any function in this source file has been called.  Only a
> special linker feature would make it possible for each separate
> compilation to specify code to be executed before `main' is called.

This is the problem faced by C++ in dealing with ``static constructors.''
Solutions are being found; they are generally dependent upon the machine
environment, but the problem is not insurmountable, at least on the machines
that C++ has thus far been ported to.  I suspect that there is even a solution
for the HP3000, and that is a bad machine to write language systems for (the
linker must be a trusted program to protect the system.)
-- 

	from Mole End			Mark Terribile
		(scrape .. dig )	mtx5b!mat
					(Please mail to mtx5b!mat, NOT mtx5a!
						mat, or to mtx5a!mtx5b!mat)
					(mtx5b!mole-end!mat will also reach me)
    ,..      .,,       ,,,   ..,***_*.