[mod.std.c] mod.std.c Digest V8#3

osd@hou2d.UUCP (Orlando Sotomayor-Diaz) (07/01/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c>


mod.std.c Digest            Mon,  1 Jul 85       Volume 8 : Issue   3 

Today's Topics:
         (B.1.1.2, C.8) Use of Whitespace in the Preprocessor
                 (B.2.2) Character display semantics
                     (B.2.4.1) Translation limits
              (B.2.4.2, D.1, etc.) Quasi-reserved words.
                      (C.8.2) Macro replacement
                    (C.8.3) Conditional inclusion
        (D.) Operating-system defined values and C data types.
                            (D.10.2) Rand
                          (D.12.3.1) asctime
                          (D.12.3.4) gmtime
                  (D.3) Character Testing Functions
                       (D.8) Variable arguments
                        (D.9.9) File pointers
----------------------------------------------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (B.1.1.2, C.8) Use of Whitespace in the Preprocessor
To: std-c@cbosgd

Comments on ANSI C Draft Standard (X3J11/85-045, April 30, 1985).
**
** Note: these are my personal comments and not necessarily those
** of my employer.
**

There seems to be some confusion (in my mind anyways) regarding
the interaction of "tokenization" (sec B.1.1.2) and the preprocessor.
According to the description of translation phases (phases 1 and 2
are irrelevant to this discussion):
  Phase
   3. Comments are replaced by one space character.
   4. The source text is completely tokenized... Each sequence
      of other [not newline] white-space characters becomes a single
      white-space token; alternatively, each other white-space
      token becomes a unique token.  [I don't understand this and
      aren't sure that "alternatives" are appropriate here.]
   5. The source text is preprocessed.

However, in the description of the preprocessor (sec C.8),
   "there may be any number of space and horizontal-tab characters
   between the # token and the identifier that constitutes the next
   token, and before the new-line character that terminates the directive."

Some questions:

1.  If I understand the translation phases correctly, there are no
    "space and horizontal-tab characters", but rather "white-space
    tokens."  By explicitly specifying space and tab, and not
    specifying the other two white-space characters (vertical-tab
    and form-feed), it appears that the translation phase description
    should be changed as follows:
    3. ... Each comment is replaced by one <SPACE> token.  Space and
       horizontal-tab characters are replaced by <SPACE> tokens.
       Vertical-tab and form-feed characters are replaced by <FEED>
       tokens.
    6. ... The preprocessing concatenation operation is applied and the
       full source is retokenized.  <FEED> tokens become <SPACE> tokens.

    While this appears to clarify the intent of the Draft Standard,
    it would seem simpler to drop all distinction between the
    various white-space tokens.  It would also eliminate confusion
    if the Draft Standard were to emphasize that comments may
    be placed anywhere in preprocessor command lines and may
    also cause such lines to extend over more than one physical
    input source line.  For example, comments are legal in the
    following contexts:
      /*1*/ # /*2*/ include /*3*/ "filename" /*4*/
    and any of these comments may extend over multiple source lines.

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (B.2.2) Character display semantics
To: std-c@cbosgd

Section B.2.2 states that

  "The effect of writing a printable character ... to a display device
  is to display a graphic representation of that character at the current
  printing position and then advance the printing position to the next
  position on the current line."

I would recommend adding:

  "The effect of writing a printable character at the final printing
  position of a line is implementation defined."

I would recommend deleting \a and \v as they offer no useful capability
and cannot be implemented in an implementation-independent manner.

I would recommend that, for all \ escapes except \n, the Standard
provide a definition in terms of the draft ISO DIS 8859/1 (or the
equivalent draft ANSI X3.134.2 and approved ECMA-94) 8-bit code
standards, with reference to the earlier ANSI X3.4-1977, ISO 646,
ISO dis 2022.2, etc. standards and that all actions be labeled
"implementation dependent."  \n should have the semantics described
in (B.2.2) and the Draft Standard should note that the semantics of
the ANSI line-feed code (code position 0/10) are implementation-dependent.
(I.e, putchar('\n') is not necessarily equivalent to putchar(0xA)).
this section might be rewritten in roughly the following manner:

  The preferred implementation character set for C is specified in
  draft ISO DIS 8859/1, called Latin-1 in this Draft Standard.  If
  the implementation supports the Latin-1 character set, escape codes
  have the following representation:

    Escape  ANSI code  Action
      \a      0/7	Audible alert
      \b      0/8       Backspace
      \f      0/12	Form feed
      \r      0/13	Carriage return without advance to new line
      \t      0/9       Horizontal tab
      \v      0/11      Vertical tab

  If the implementation does not support the Latin-1 character set,
  the above escapes have implementation-dependent value and actions.
  They must, however, describe different character values (so that
  case statements will not fail.)

It should be noted that on a display device implementing ANSI standard
character set invocation and designation, that "putchar('a')" does not
necessarily display the first character of the roman alphabet in lower
case. The presentation layer of the display device will show its
representation of the character currently invoked into GL at position
6/1. Actually, since devices can interpret information between the
DCS, OSC, APC, or PM introducers and the ST delimiter in just about
any way they please, it does not even promise to display anything!

Should the standard (especially an ANSI one) promise more than 
sending the bit pattern to the display device? Perhaps "\n" should be
special.

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (B.2.4.1) Translation limits
To: std-c@cbosgd

Section B.2.4.1 states that "the implementation must be able to compile
at least one program that meets or exceeds all of the following
translations limits."  Note, however, that a program with 15 nesting
levels for compounds, 31 character identifiers, and 1024 identifiers
in each nested block will require about 500,000 bytes of symbol table
storage which is probably not feasible for any but the largest
implementation.  I would suggest removing "all" or changing "maximum
number of identifiers with block scope in one block" to "... in one
block and all of its parents.", if for no other reason than making it
unnecessary for implementors to ignore this portion of the standard.

The following translation limits seem unreasonably small:

    Conditional compilation nesting levels (6) -- I would recommend 16.

    Case labels in a switch (255) -- I would recommend at least 512
    and preferably 1024.  The yacc grammar for Pascal offers an
    example of a large switch statement.

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (B.2.4.2, D.1, etc.) Quasi-reserved words.
To: std-c@cbosgd

The Draft Standard has added a large number of #defined symbols
and type definitions.  For example, section B.2.4.2 adds 30 numerical
limits.  I would recommend that all new definitions (i.e. #defined
symbols and type definitions that are not currently in widespread
use) be specified with a leading _ so they cannot conflict with
user code.  I.e., the user should have a reasonable chance of defining
variables can never conflict with reserved words.   As it is now,
there is no way that I can tell that a symbol does not conflict
with a symbol in some library header file.

I would further recommend that the symbols be composed of full English
words, even if this means more typing.  Thus "SHRT_MIN" should be
"_SHORT_MIN".

Similarly, I would recommend "readonly" rather than "const."

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (C.8.2) Macro replacement
To: std-c@cbosgd

The current Draft Standard added the following to C.8.2:

    If the identifier following the initial # in a directive has been
    defined as a macro name, the identifier is not replaced by an expansion
    of the macro.

I think this means that if you have written

    #define foo endif
    #foo

you don't get #endif, but an example would be helpful.


(C.8.2) ## unclarities.

It appears from my reading that the token created by ## concatenation
cannot cause further macro expansion, but this is not clearly stated.
For example, what is the result of the following:

    #define concat(a, b) a ## b
    #line 123
    int line = concat(__LI, NE__);

is it
    line = __LINE__;
or
    line = 123;

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (C.8.3) Conditional inclusion
To: std-c@cbosgd

            #if directives verified for correctness?

The Draft Standard now specifies that

    Directives are verified for correctness, but processed only to keep
    track of the level of nested conditionals.  Does this mean that

	#if 0
	/* never compiled */
	#if )syntax error(
	#endif
	#endif

    should print a compiler error message?  What about

	#undef never_defined
	#ifdef never_defined
	#define xyz(a)	((a) * 1)
	#endif
	...
	#ifdef never_defined
	#if xyz(123)
	#endif
	#endif

   Should the use of xyz() result in an error message, or be replaced
   by zero (undefined preprocessor symbol) or what?

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (D.) Operating-system defined values and C data types.
To: std-c@cbosgd

Several functions (notably fseek/ftell and kill) take parameters
that are defined in terms of C data types (the ftell result is
a long, and kill takes an integer program identifier).  This
cannot be made to work correctly on many systems.  For example,
a process is specified on RSX-11M by a 3 (16-bit) word vector.
Since the language supports passing structures to functions,
I would recommend redefining these functions in terms of
structures (whose contents are defined by appropriate #include
files) and adding functions to perform implementation-dependent
arithmetic in an implementation-dependent manner.

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (D.10.2) Rand
To: std-c@cbosgd

The definition of rand semantics is reasonable for systems
with 32-bit two's-complement long integers.  It is not clear
from Knuth (volume 2) that it will yield the same
sequence of numbers for implementations with greater numeric
precision or different arithmetic behavior.

Unless, of course, rand is implemented as a function that reads
a file of 2^32 pre-compiled integers.  But, since the size of this
file cannot be expressed as a long, srand cannot correctly reposition
the file.

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (D.12.3.1) asctime
To: std-c@cbosgd

In order to provide for orderly development of local-language
variants of C, the alphabetic words in the returned string
should be standardized to their current (English) values --
which should be included in the Draft Standard.  Alternatively,
the Standard should explicitly state that the actual contents of
these fields may be implementation dependent.  I would prefer the former.

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (D.12.3.4) gmtime
To: std-c@cbosgd

The standard should note that Greenwich Mean Time is more properly
known as UTC (Universal Coordinated Time.)

[ You mean UCT? -- Mod -- ]

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (D.3) Character Testing Functions
To: std-c@cbosgd

The "~" character has the hexadecimal value 0x7E, not 0xFE.
The DEL character has the hexadecimal value 0x7F, not 0xFF.  As
noted above, they should be defined in terms of the Latin-1 alphabet
and a particular set of presentation layer designations/invocations
(ASCII_G in GL, Latin-1 in GR). This avoids all of the issues with NRC's.

I would suggest that the C Standards Committee coordinate with
the X3 character set committees.

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (D.8) Variable arguments
To: std-c@cbosgd

This section refers to section C.7.7.1, which doesn't exist.

What is the behavior of an implementation when number and
type of the arguments, as accessed by va_arg disagree with
those of the actual function call?

------------------------------

Date: Sun, 30 Jun 85 12:45:44 edt
From: decvax!minow (Martin Minow)
Subject: (D.9.9) File pointers
To: std-c@cbosgd

Defining the value returned by ftell() as a long cannot work
on some implementations and unnecessarily limits the size of
files on all implementations.  I would recommend that ftell return
an implementation-defined structure and that functions be
provided to manipulate the values of such structures.

------------------------------

End of mod.std.c Digest - Mon,  1 Jul 85 10:23:12 EDT
******************************
USENET -> posting only through cbosgd!std-c.
ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C)
In all cases, you may also reply to the author(s) above.