[comp.lang.c] ANSI C -- miscellaneous suggestions

minow@decvax.UUCP (Martin Minow) (12/14/86)

This is one of a collection of comments on the Draft Standard, posted to
comp.lang.c for discussion before I mail a final draft to the Ansi C
committee.  Each message discusses one problem I have found with the Draft
Standard that I feel warrants a "no" vote.  Note that this message is my
personal opinion, and does not reflect on the opinions of my employer.

This message lists concerns -- these are questions or problems,
but are not sufficiently serious as to preclude my acceptance of
the standard.

----

Page 1, line 14.  The standard should specify the total list of words
reserved to the compiler and its libraries.

Page 6, line 40ff.  It is unclear whether the main() function may be
declared or invoked with more than 2 parameters.  One common extension
is to invoke main with a third parameter which specifies a list of
"environment variables."

Page 7, line 12.  Must the string in argv[0] be modifiable?

Page 10, 27.  The horizontal tab, vertical tab, and form feed characters
are not needed by the language.  The standard should declare that
horizontal tab is identical to space except in character and string
literals, and vertical tab and form feed are everywhere identical to newline.

Page 11, lines 29ff.  The standard should specify the internal representations
for the predefined escape sequences for implementations that use the
USASCII or Latin 1 alphabets, 

Page 12, line 29.  The minimum significance for external identifiers
should be changed to ``6 significant monocase initial characters in
an external identifier.''

Page 14, line 20ff.  FLT should be FLOAT. DBL should be DOUBLE. etc.  As
the first 31 characters of macro definitions are significant, there is no
need to sacrifice legibility (and maintainability) for consiseness.

Page 26, line 13.  The exceptions (the characters that may not appear
in string literals) should include the vertical tab character
and the form feed character, as these are equivalent to newlines.

Page 74, line 28.  Horizontal tab does not have an independent existance
during preprocessing.  The example should note that comments may preceed
or follow the # that introduces a preprocessing directive.

Page 75, line 36.  An arithmetic error in an #if expression (such as divide
by zero) shall result in a diagnostic error message.  However, a sequence
such as:

    #if (foo == 0) ? 0 : (10 / foo)

should not result in a diagnostic error message for any value of foo.

Page 82, line 24ff.  I would recommend the following clarifications to
the definition of the predefined macro names:

   __LINE__	The line number shall be as defined in section 3.8.4,
		page 81, line 30.

   __FILE__	There is no presumption that this string can be used to
		open a file during execution of the program.

   __DATE__	Neither this value nor the value of __TIME__ change during
		compilation.

A predefined name should be redefinable (by #undef). (The identifier
"defined" may not be redefined.)

Page 83, line 15ff.  Function prototypes with separate parameter identifier
and declaration lists offer a better environment for documentation than
the more concise function prototype format.  I would strongly recommend
that they not be marked obsolescent.

Page 85, line 35.  The ability to redefine any function declared in
a header as a macro may break existing programs that write, e.g.,

    #include <stdlib.h>
    extern long rand();

If rand() is declared as a macro, 

Page 89, line 13 (footnote 64):  The Standard should note that, in an
implementation that uses the Latin 1 character set, the printing
characters are those whose values lie from 0x20 through 0x7E or from
0xA0 through 0xFF. Control characters are those whose values lie from
0x00 through 0x1F, 0x7F, or from 0x80 through 0x9F.  The ranges for the other
<ctype.h> macros should be similarly extended.

Page 91, line 46ff. Note that, in a Latin 1 environment, the ispunct() and
isspace() functions should test for the non-breaking space at 0xA0.

Page 102, line 46ff. If longjmp() is called from a signal handler, volatile
objects may have indeterminate values as they cannot always be updated by
atomic (one machine cycle) operations. It is unrealistic to require an
implementation to lock interrupts before modifying a volatile object.  The
Standard should note that volatile objects are indeterminate when longjmp()
is called from an interrupt or signal handler.

Page 128, line 7.  Is one character of pushback guaranteed even before
anything has been read from the stream or after end of file or error?
The standard should be clarified on this point. (I don't care either way,
but would prefer permitting one character pushback at any time.)

Page 140, line 21ff. Predefined values for "successful termination"
and "unsuccessful termination" (argumemts to exit()) should be provided.

Page 142, line 16ff. An unsigned division function analogous to
div() would be useful.

----

Martin Minow
decvax!minow

gwyn@brl-smoke.ARPA (Doug Gwyn ) (12/15/86)

In article <112@decvax.UUCP> minow@decvax.UUCP (Martin Minow) writes:
>Page 1, line 14.  The standard should specify the total list of words
>reserved to the compiler and its libraries.

While this would be "nice", one can pretty much find this out
from the index, and the standard isn't intended to be either a
tutorial or a user reference manual.  I would hope that vendors
and textbook authors will consider providing such a list.

>Page 6, line 40ff.  It is unclear whether the main() function may be
>declared or invoked with more than 2 parameters.

I thought this was clear: main() can be defined with either 0 or 2
parameters.  Other schemes are not defined, which allows extensions
such as UNIX's envp but does not mandate them for all implementations.
(Note that envp is not normally necessary, given getenv().)

>Page 7, line 12.  Must the string in argv[0] be modifiable?

That's what the draft says.  Is this a problem?

>Page 10, 27.  The horizontal tab, vertical tab, and form feed characters
>are not needed by the language.  The standard should declare that
>horizontal tab is identical to space except in character and string
>literals, and vertical tab and form feed are everywhere identical to newline.

There are several flavors of whitespace in C (including the
preprocessor).  Some generalization was done where possible;
did we miss any?

>Page 11, lines 29ff.  The standard should specify the internal representations
>for the predefined escape sequences for implementations that use the
>USASCII or Latin 1 alphabets, 

So long as we don't mandate ASCII/ISO character sets, this is
infeasible.

>Page 12, line 29.  The minimum significance for external identifiers
>should be changed to ``6 significant monocase initial characters in
>an external identifier.''

Section 3.1.2 permits implementations to ignore case distinctions.
2.2.4.1 is merely to establish that at least 6 significant characters
can be used in external identifiers simultaneously with meeting other
implementation limit requirements, and nothing is gained by mentioning
case-mapping in this context.

>Page 14, line 20ff.  FLT should be FLOAT. DBL should be DOUBLE. etc.  As
>the first 31 characters of macro definitions are significant, there is no
>need to sacrifice legibility (and maintainability) for consiseness.

That would be nice, but we also have SHRT_MAX, for example, which
is defined in a header that is shared between two standards bodies
and is therefore difficult to redefine.  (It's also possible that these
names were chosen to agree with the new Fortran standard; I forget.)

>Page 26, line 13.  The exceptions (the characters that may not appear
>in string literals) should include the vertical tab character
>and the form feed character, as these are equivalent to newlines.

Where are these characters declared to be "equivalent to newlines"?

>Page 74, line 28.  Horizontal tab does not have an independent existance
>during preprocessing.  The example should note that comments may preceed
>or follow the # that introduces a preprocessing directive.

Section 2.1.1.2 (Translation phases) states that an implementation
MAY retain distinct white-space characters at the point of
preprocessing.  However, comments must have been turned into
single space characters at that point.

>Page 75, line 36.  An arithmetic error in an #if expression (such as divide
>by zero) shall result in a diagnostic error message.  However, a sequence
>such as:
>
>    #if (foo == 0) ? 0 : (10 / foo)
>
>should not result in a diagnostic error message for any value of foo.

[The page/line reference seems wrong.]  I think the error handling
is already implied by the syntax, but perhaps explicit wording
would help.  (Note that the example is correct code and should
not cause a diagnostic in any case.)

>Page 82, line 24ff.  I would recommend the following clarifications to
>the definition of the predefined macro names:
>
>   __LINE__	The line number shall be as defined in section 3.8.4,
>		page 81, line 30.

That's already my understanding of the draft.

>   __FILE__	There is no presumption that this string can be used to
>		open a file during execution of the program.

That's the way it is now.  The sources clearly need not even exist
in the run-timem environment!

>   __DATE__	Neither this value nor the value of __TIME__ change during
>		compilation.

That might be nice, but how important is such a constraint on
implementations?  I bet there even are people who would prefer
the __TIME__ clock to continue to tick during compilation.

>A predefined name should be redefinable (by #undef). (The identifier
>"defined" may not be redefined.)

No, since these names begin with underscore, the user cannot safely
redefine them anyway; they're not in his "allowable name space".

>Page 83, line 15ff.  Function prototypes with separate parameter identifier
>and declaration lists offer a better environment for documentation than
>the more concise function prototype format.  I would strongly recommend
>that they not be marked obsolescent.

The intent is to eliminate any requirement that old-style function
parameter declarations be supported in a future revision of the
standard.  The only way (it appears) that we can do that is by
calling them "obsolescent" in a previous draft.

>Page 85, line 35.  The ability to redefine any function declared in
>a header as a macro may break existing programs that write, e.g.,
>
>    #include <stdlib.h>
>    extern long rand();
>
>If rand() is declared as a macro, 

First of all, I doubt that existing programs #include <stdlib.h>.
When adding such an #include to existing source, you should also
remove any explicit redundant declarations (except when they are
really necessary, in which case use #undef or one of the other
usual tricks to force use of a genuine function).

I'll be among the first to admit that this approach has its
problems, but I don't know of anything better.  If you can
suggest a better way to handle this, please write it up and
mail it in to ANSI.

>Page 89, line 13 (footnote 64):  The Standard should note that, in an
>implementation that uses the Latin 1 character set, the printing
>characters are those whose values lie from 0x20 through 0x7E or from
>0xA0 through 0xFF. Control characters are those whose values lie from
>0x00 through 0x1F, 0x7F, or from 0x80 through 0x9F.  The ranges for the other
><ctype.h> macros should be similarly extended.
>
>Page 91, line 46ff. Note that, in a Latin 1 environment, the ispunct() and
>isspace() functions should test for the non-breaking space at 0xA0.

No particular character set is required, so we can't make such
remarks in the standard itself.  Perhaps the Rationale should
give such examples.

>Page 102, line 46ff. If longjmp() is called from a signal handler, volatile
>objects may have indeterminate values as they cannot always be updated by
>atomic (one machine cycle) operations. It is unrealistic to require an
>implementation to lock interrupts before modifying a volatile object.  The
>Standard should note that volatile objects are indeterminate when longjmp()
>is called from an interrupt or signal handler.

I don't know that anything needs to be said about this.  The only
object for which atomic operations is guaranteed is sig_atomic_t.
[longjmp() vs. signal handlers was discussed in a previous note]

>Page 128, line 7.  Is one character of pushback guaranteed even before
>anything has been read from the stream or after end of file or error?
>The standard should be clarified on this point. (I don't care either way,
>but would prefer permitting one character pushback at any time.)

Yes, since this is not specifically excepted it is required.

>Page 140, line 21ff. Predefined values for "successful termination"
>and "unsuccessful termination" (argumemts to exit()) should be provided.

Done at last week's meeting, via a compromise solution that
requires that 0 also always be taken to mean success.

>Page 142, line 16ff. An unsigned division function analogous to
>div() would be useful.

This keeps getting proposed and defeated.  Basically, the only
reason div() etc. are defined is because we didn't want to
insist that / and % work "correctly"; that's not an issue for
unsigned integers.  (It's also nice that both the quotient and
remainder are returned simultaneously; this can be exploited by
some implementations to improve efficiency in the frequent
situation where both values are needed.)

A lot of proposals for new features have been rejected in an
attempt to keep the size of the language and its environment
relatively small.  (This attempt hasn't been totally successful,
but it's certainly a worthwhile goal.)  Therefore, please don't
interpret failure to adopt a suggestion as necessarily implying
that there is something wrong with the idea, although often
there is (in which case the response should point out what).

Reminder:  Current public review period ends 07-Mar-1986.
There WILL be another, 2-month, public review, since X3J11
has decided to make substantive changes to the current draft
[as reported in another note].

faustus@ucbcad.BERKELEY.EDU (Wayne A. Christopher) (12/17/86)

Regarding the requirement that exit(0) be success -- this will break a lot
of VMS C programs, which use 1 for success and 0 for "undefined error"
(I think -- I'm not a big VMS fan...)

	Wayne

minow@decvax.UUCP (Martin Minow) (12/17/86)

Sorry about the length of this, but my original comments apparently
require clarification.  I'm greatful to Doug Gwyn (@ brl-smoke.arpa)
for his comments.

1. I recommend that the total list of words be standardized (excepting
   those defined with an initial underscore.  This (hopefully) prevents
   proliferation of new quasi-reserved words. I.e. I want a guarantee
   that Ansi will never add a foo() function to <math.h>.

2. Horizontal tab has an independent existance in the pre-processor
   (page 74, lines 26-29).  It shouldn't (if I understand translation
   phases).

3. (Specifying internal representations linked to Latin 1) -- I understand
   that is infeasable to *require* ANSI (or Latin 1), but I would recommend
   defining Latin 1 as a reference, and stating that, for implementations
   supporting Latin 1, the internal representations of the specified
   characters *shall* be that given by Latin 1.  (You are also free to
   give representations for EBCDIC, if you can find a standard.)

4. Doug asks where VT and FF are declared "equivalent to newlines."
   That's my reading of section 2.2.2 (page 11) defining character
   display semantics.  If this is not the case, perhaps a clarification
   is in order.

5. I note that a sequence such as
	#define foo	0
	#if 0 && 10 / foo
	int this;
	#else
	int that;
   should not result in an error.  Doug seems to agree.  Unfortunately,
   this bugchecks at least one C compiler.  (And I had to work hard
   in Decus cpp to prevent it.)  The problem is that some preprocessors
   do not properly "short-circuit" evaluate && || and ?:.  Also, the
   standard should clarify just what happens if you do write
	#if 10 / 0
   I doubt that bugchecking is correct behavior.  I don't see anything
   in section 3.8.1 (pp. 75ff) discussing this.

Martin Minow
decvax!minow

joemu@nscpdc.NSC.COM (Joe Mueller) (12/17/86)

In article <1171@ucbcad.BERKELEY.EDU>, faustus@ucbcad.BERKELEY.EDU (Wayne A. Christopher) writes:
> Regarding the requirement that exit(0) be success -- this will break a lot
> of VMS C programs, which use 1 for success and 0 for "undefined error"
> (I think -- I'm not a big VMS fan...)


The question of exit status came up again during the last meeting. The position
the committee eventually adopted is this:

exit(0) always indicates success (for unix code)
exit(EXIT_SUCCESS) always indicates success
exit(EXIT_FAILURE) always indicates failure
exit(anything else) implementation defined

The EXIT* macros will be defined in (I believe) stddefs.h.

							Joe Mueller
							...!nsc!nscpdc!joemu

bzs@bu-cs.BU.EDU (Barry Shein) (12/18/86)

>Regarding the requirement that exit(0) be success -- this will break a lot
>of VMS C programs, which use 1 for success and 0 for "undefined error"
>(I think -- I'm not a big VMS fan...)
>
>	Wayne

There's no reason that the run-time support for VMS/C can't return 1
to the O/S if the program exits 0. Unfortunately, there's really no
other resolution. Unix and IBM systems both treat zero exits as
success, lord knows why VMS decided to be different, but the problem
is not a problem, the O/S can be handed whatever's correct.

		-Barry Shein, Boston University