[comp.lang.c] Clarifications on ANSI C nits

gnu@hoptoad.uucp (John Gilmore) (12/24/87)

While testing the GNU C compiler using the MetaWare C Validation Suite,
I found a bunch of things that are not clearly marked in the Oct 86 copy
of the C draft standard.  I'm interested in clarification from the group
and/or from the standards committee on these:

 * I've heard a rumor that in newer drafts, hex escape sequences inside
 strings are no longer limited to 3 characters, e.g. "abc\x00345"
produces "abcE" since 'E' (0x45) is (char)0x00345.  This strikes me as odd.

 * Is a "const void *" a void pointer?  Is a "volatile void *"?  Howabout
a "const volatile void *"?  Howabout a "void *const"?  A "pointer to void"
gets special treatment in a few places (e.g. in assignment) but it's not
clear whether these are "pointers to void".  (Howabout "volatile void
*noalias const foo", just for fun?)

 * It appears from the text in section 3.5.3.3 that variable-argument-list
functions can ONLY be defined in the 

	foo(int c, float bar, ...)

syntax, and not in the

	foo(c, bar, ...)
		int c;
		float bar;

syntax.  GCC implements it this way.  However, this makes it impossible
to support <varargs.h> since no existing code uses the new declaration
method.  It also seems to be a silly inconsistency.

 * Though a union can now be initialized, you can only initialize one
member, but you have to surround it with { } anyway.  To correctly
initialize the structure below, ALL the braces used are required:

	struct s{union {int x,y; int z;} u; int q;} s[2] =
		{{{1}, 2}, {{3}, 4}}; 

I would have expected:

		{1, 2, 3, 4};

to work, but the wording of the standard does not support it.  Furthermore,
the standard does not allow extra braces (e.g. around 2 and 4, or around the
whole thing), so you have to get it exactly right.  Is this what was intended?

(These next two aren't from the validation suite.)

 * When calling a function, side effects caused by evaluating the arguments
must be complete before the call takes place.  What about side effects
caused by evaluating the function name?  Ron Light gave this example:

	typedef int (*Inst)();          /* machine instruction */
	Inst *pc;                       /* program counter during execution */

	execute(p)                      /* run the machine */
	Inst *p;
	{
		for(pc = p;;)
			(*(*pc++))();
	}

If I reference "pc" from inside a function called from the forloop, is
its value guaranteed to be incremented, to not be incremented, or not
guaranteed?

 * Are null statements (extra semicolons) allowed between declarations?
Between struct/union members' declarations?
-- 
{pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu			  gnu@toad.com
  I forsee a day when there are two kinds of C compilers: standard ones and 
  useful ones ... just like Pascal and Fortran.  Are we making progress yet?
	-- ASC:GUTHERY%slb-test.csnet

gnu@hoptoad.uucp (John Gilmore) (12/27/87)

I wrote:
>  * When calling a function, side effects caused by evaluating the arguments
> must be complete before the call takes place.  What about side effects
> caused by evaluating the function name?

I found this answer myself:  section 3.3.2.2 says:

	The order of evaluation of the function designator, the arguments,
	and subexpressions within the arguments is unspecified, but
	there is a sequence point before the actual call.

Thus all side effects in the function name and/or arguments must take place
before the call.
-- 
{pyramid,ptsfa,amdahl,sun,ihnp4}!hoptoad!gnu			  gnu@toad.com
  I forsee a day when there are two kinds of C compilers: standard ones and 
  useful ones ... just like Pascal and Fortran.  Are we making progress yet?
	-- ASC:GUTHERY%slb-test.csnet

mnc@m10ux.UUCP (Michael Condict) (12/28/87)

In article <3725@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
> While testing the GNU C compiler using the MetaWare C Validation Suite,
> I found a bunch of things that are not clearly marked in the Oct 86 copy
> of the C draft standard.  I'm interested in clarification from the group
> and/or from the standards committee on these:
> 
> . . .
> 
>  * It appears from the text in section 3.5.3.3 that variable-argument-list
> functions can ONLY be defined in the 
> 
> 	foo(int c, float bar, ...)
> 
> syntax, and not in the
> 
> 	foo(c, bar, ...)
> 		int c;
> 		float bar;
> 
> syntax.  GCC implements it this way.  However, this makes it impossible
> to support <varargs.h> since no existing code uses the new declaration
> method.  It also seems to be a silly inconsistency.

Another silly inconsistency (although it is probably too late to fix in the
current standard) is the use of comma instead of semicolon in the new
declaration syntax.  There already were at least three places in the existing
language where one could write a sequence of declarations of names, e.g. in
struct declarations between { and }, and in the old-style declaration of the
formal args of a function.  In all these places, semicolons are used to
terminate each declaration, not comma.  This is crucial, because comma is
also allowed in these constructs and serves to concisely declare a list of
identifiers of the same type:

		int a,b,c;

The ANSI committee's adopted syntax introduces two defects in the
language:

(1) It confuses users by being inconsistent with these other, highly analogous
    syntactic constructs, and for the same reason makes parsers unnecessarily
    complex.

(2) It eliminates the possibility of allowing the concise declaration syntax
    shown above in the new declaration syntax.

My guess is that the committee's choice of syntax was based on reasoning
something like: "currently, commas are required between formal arguments
in the header of the function (i.e., between '(' and ')'), so we must
preserve that to avoid confusing users."  This argument however is less
than persuasive if we note that the addition of the new declaration syntax
so radically alters what is allowed inside the () that some users
are bound to be confused anyway.  And besides, it is easy to describe my
advocated version of the new declaration syntax in words that make it out to
be a natural extension of the old syntax:

(1) In K&R C, the construct "f(a,b,c) ... {" is an abbreviation for
    "f(int a,b,c;) ... {". (Note that these same two abbreviations are allowed
    elsewhere in the language.)  The meaning is that a, b and c are all
    declared to be ints, as is the case elsewhere in the language,
    except that their declaration(s) can be overridden by a redeclaration in
    the ... stuff between ')' and '{'.  This is consistent with how things
    work now.

(2) In ANSI C, the type word "int" may not be omitted, just as it may no longer
    be omitted elsewhere in the language.  The trailing ";" is still optional.

(3) Furthermore, in ANSI C, we extend the syntax and semantics to allow an
    arbitrary sequence of declarations inside the (), with the semicolon
    optional for the last one.  E.g.:

		f(int a,b; struct {float i,r;}) {

    (We should probably also allow bit field declarations, since our syntax
    and semantics is essentially equivalent to the case where every function
    takes one argument, but that argument is a struct type.  No new
    implementation difficulties arise, since it is already legal to
    declare a function that takes as argument a struct with bitfields.)

(4) Now, since any types of arguments can be declared without putting stuff
    between ')' and '{', we don't allow you to redeclare args there, with one
    exception: for backward compatibility, we allow the old style
    declaration, at least until the next version of the standard, but it is a
    depecrated feature.  That is, you can still abbreviate "int a,b,c" to
    "a,b,c", and if your entire declaration sequence within () consists
    of such an abbreviation, you can redeclare the types of some or all of
    the args, using declarations occurring between ')' and '{'.

Described this way, it is clear why (my version of) the new syntax is to be
preferred to K&R syntax: it doesn't make sense to be declaring the args as
ints inside of the () then redeclaring them afterwards.

Am I the only one bothered by this?  I've noticed no other discussion of it.
-- 
Michael Condict		{ihnp4|vax135|cuae2}!m10ux!mnc
AT&T Bell Labs		(201)582-5911    MH 3B-416
Murray Hill, NJ

msb@sq.uucp (Mark Brader) (12/29/87)

Michael Condict (mnc@m10ux.UUCP) expresses regret that the new function
prototype syntax uses commas rather than semicolons as delimiters, and asks:


> Am I the only one bothered by this?  I've noticed no other discussion of it.

No, I asked the same question well over a year ago.  As I recall, the answer
given was that if semicolons were allowed then error recovery became very hard.

Notice that the following would be VALID input:

	int f (int a, b;
	float c;
	char *p, s[20];
	int p (int q, r;);
	);

Now that I think of it, the force of this argument seems somewhat weakened
since, if I understand correctly (my copy of the latest Draft being at the
office, and me not), even under the existing syntax a declaration such as

	int f (struct {int p, q;} r);

is legal and does contain embedded semicolons.  Hmm.

Mark Brader				"C takes the point of view
SoftQuad Inc., Toronto			 that the programmer is always right"
utzoo!sq!msb, msb@sq.com				-- Michael DeCorte

OWENSJ%VTVM1.BITNET@CUNYVM.CUNY.EDU (John Owens) (12/30/87)

[Michael Condict suggests allowing declaration syntax, separated by
 semicolons, in the function definition argument lists.]

While this may sound clean from a language-design perspective, it
makes the resulting definitions harder to use.  With the current
proposed syntax, someone wanting to know the types and number of
arguments can see them easily, separated by commas, just as they
are specified in the calling sequence.  Michael's syntax would
lose the one-to-one correspondence both with the calling sequence
*and* the function prototypes.  I think this correspondence is
important to preserve, unless we want to see C go the way of
Algol 68....

        -John Owens                     +1 703 961 7827
        Virginia Tech   Communications Network Services
        OWENSJ@VTVM1.CC.VT.EDU      OWENSJ@VTVM1.BITNET

gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/06/88)

In article <3725@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
> * I've heard a rumor that in newer drafts, hex escape sequences inside
> strings are no longer limited to 3 characters, e.g. "abc\x00345"
>produces "abcE" since 'E' (0x45) is (char)0x00345.  This strikes me as odd.

Yes, hex escapes are arbitrarily long now.  I initiated the action that
ended up with this, although it's not what I originally proposed.  The
problem was that a 3-character limit is not enough for implementations
with char sizes > 12 bits.  I proposed that the implementation define
what the limit is, but the committee preferred to remove the limit
(for hex escapes only; it's too late to change octals).

\x00345 is no weirder than \x345 on an 8-bit machine.

The problem of wanting to follow a hex sequence with a digit character
can be solved by using string concatenation: "\x003""45".

> * Is a "const void *" a void pointer?  Is a "volatile void *"?  Howabout
>a "const volatile void *"?  Howabout a "void *const"?  A "pointer to void"
>gets special treatment in a few places (e.g. in assignment) but it's not
>clear whether these are "pointers to void".

I think they're all just "void pointers".  I didn't find any special
meaning specified for qualified void pointer types.

> * It appears from the text in section 3.5.3.3 that variable-argument-list
>functions can ONLY be defined in the 
>	foo(int c, float bar, ...)
>syntax, and not in the
>	foo(c, bar, ...)
>		int c;
>		float bar;
>syntax.  GCC implements it this way.  However, this makes it impossible
>to support <varargs.h> since no existing code uses the new declaration
>method.  It also seems to be a silly inconsistency.

Yes, the ", ..." is not existing practice.  Existing practice (non-
prototype declarations) was retained simply to "grandfather" in existing
code, but it has been flagged "obsolescent" to permit its removal in
some future revision of the standard.  There was little sentiment for
propping up the obsolescent syntax by adding ", ..." to it.  If you have
to add the variadic indicator ", ..." to a declaration, you might as
well convert it to prototype form at the same time.

> * Though a union can now be initialized, you can only initialize one
>member, but you have to surround it with { } anyway.  To correctly
>initialize the structure below, ALL the braces used are required: ...

Logically, it could have been made more convenient for unions, but it
apparently didn't occur to anyone to do so.

> Furthermore,
>the standard does not allow extra braces (e.g. around 2 and 4, or around the
>whole thing), so you have to get it exactly right.  Is this what was intended?

I think extra {} are allowed by the grammar.  There was some small change
made to the bracketing wording at the December meeting, but I don't recall
what it was.  (It seemed correct at the time, so I quit worrying about it.)

> * When calling a function, side effects caused by evaluating the arguments
>must be complete before the call takes place.  What about side effects
>caused by evaluating the function name?  Ron Light gave this example: ...

Using a pointer to a function does not constitute evaluating the function
name.  Quoting the latest draft: "The order of evaluation of the function
designator [postfix-expression], the arguments, and subexpressions within
the arguments is unspecified, but there is a sequence point before the
actual call."  I added the [] remark for clarity.  It seems simple enough
to me: because of the sequence point, the increment must occur before the
actual call.

>If I reference "pc" from inside a function called from the forloop, is
>its value guaranteed to be incremented, to not be incremented, or not
>guaranteed?

Guaranteed to be incremented.

> * Are null statements (extra semicolons) allowed between declarations?
>Between struct/union members' declarations?

I don't see how it can be; a null statement is an expression-statement,
which involves "evaluation".

P.S.  Of course, the above are merely my own opinions.  Send in comments
to X3J11 during the next formal public review if you remain unsatisfied
about any of these issues.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/06/88)

In article <458@m10ux.UUCP> mnc@m10ux.UUCP (Michael Condict) writes:
>Another silly inconsistency (although it is probably too late to fix in the
>current standard) is the use of comma instead of semicolon in the new
>declaration syntax.

In C, semicolon has always been a statement terminator (it can be
considered as such even in the "for(;;)" kludge), while comma has
been used as a separator in lists (and, much less often, as a
sequencing operator).  If you're going to make "consistency"
arguments, you should also deal with this one.