osd@hou2d.UUCP (Orlando Sotomayor-Diaz) (07/01/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c> mod.std.c Digest Mon, 1 Jul 85 Volume 8 : Issue 3 Today's Topics: (B.1.1.2, C.8) Use of Whitespace in the Preprocessor (B.2.2) Character display semantics (B.2.4.1) Translation limits (B.2.4.2, D.1, etc.) Quasi-reserved words. (C.8.2) Macro replacement (C.8.3) Conditional inclusion (D.) Operating-system defined values and C data types. (D.10.2) Rand (D.12.3.1) asctime (D.12.3.4) gmtime (D.3) Character Testing Functions (D.8) Variable arguments (D.9.9) File pointers ---------------------------------------------------------------------- Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (B.1.1.2, C.8) Use of Whitespace in the Preprocessor To: std-c@cbosgd Comments on ANSI C Draft Standard (X3J11/85-045, April 30, 1985). ** ** Note: these are my personal comments and not necessarily those ** of my employer. ** There seems to be some confusion (in my mind anyways) regarding the interaction of "tokenization" (sec B.1.1.2) and the preprocessor. According to the description of translation phases (phases 1 and 2 are irrelevant to this discussion): Phase 3. Comments are replaced by one space character. 4. The source text is completely tokenized... Each sequence of other [not newline] white-space characters becomes a single white-space token; alternatively, each other white-space token becomes a unique token. [I don't understand this and aren't sure that "alternatives" are appropriate here.] 5. The source text is preprocessed. However, in the description of the preprocessor (sec C.8), "there may be any number of space and horizontal-tab characters between the # token and the identifier that constitutes the next token, and before the new-line character that terminates the directive." Some questions: 1. If I understand the translation phases correctly, there are no "space and horizontal-tab characters", but rather "white-space tokens." By explicitly specifying space and tab, and not specifying the other two white-space characters (vertical-tab and form-feed), it appears that the translation phase description should be changed as follows: 3. ... Each comment is replaced by one <SPACE> token. Space and horizontal-tab characters are replaced by <SPACE> tokens. Vertical-tab and form-feed characters are replaced by <FEED> tokens. 6. ... The preprocessing concatenation operation is applied and the full source is retokenized. <FEED> tokens become <SPACE> tokens. While this appears to clarify the intent of the Draft Standard, it would seem simpler to drop all distinction between the various white-space tokens. It would also eliminate confusion if the Draft Standard were to emphasize that comments may be placed anywhere in preprocessor command lines and may also cause such lines to extend over more than one physical input source line. For example, comments are legal in the following contexts: /*1*/ # /*2*/ include /*3*/ "filename" /*4*/ and any of these comments may extend over multiple source lines. ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (B.2.2) Character display semantics To: std-c@cbosgd Section B.2.2 states that "The effect of writing a printable character ... to a display device is to display a graphic representation of that character at the current printing position and then advance the printing position to the next position on the current line." I would recommend adding: "The effect of writing a printable character at the final printing position of a line is implementation defined." I would recommend deleting \a and \v as they offer no useful capability and cannot be implemented in an implementation-independent manner. I would recommend that, for all \ escapes except \n, the Standard provide a definition in terms of the draft ISO DIS 8859/1 (or the equivalent draft ANSI X3.134.2 and approved ECMA-94) 8-bit code standards, with reference to the earlier ANSI X3.4-1977, ISO 646, ISO dis 2022.2, etc. standards and that all actions be labeled "implementation dependent." \n should have the semantics described in (B.2.2) and the Draft Standard should note that the semantics of the ANSI line-feed code (code position 0/10) are implementation-dependent. (I.e, putchar('\n') is not necessarily equivalent to putchar(0xA)). this section might be rewritten in roughly the following manner: The preferred implementation character set for C is specified in draft ISO DIS 8859/1, called Latin-1 in this Draft Standard. If the implementation supports the Latin-1 character set, escape codes have the following representation: Escape ANSI code Action \a 0/7 Audible alert \b 0/8 Backspace \f 0/12 Form feed \r 0/13 Carriage return without advance to new line \t 0/9 Horizontal tab \v 0/11 Vertical tab If the implementation does not support the Latin-1 character set, the above escapes have implementation-dependent value and actions. They must, however, describe different character values (so that case statements will not fail.) It should be noted that on a display device implementing ANSI standard character set invocation and designation, that "putchar('a')" does not necessarily display the first character of the roman alphabet in lower case. The presentation layer of the display device will show its representation of the character currently invoked into GL at position 6/1. Actually, since devices can interpret information between the DCS, OSC, APC, or PM introducers and the ST delimiter in just about any way they please, it does not even promise to display anything! Should the standard (especially an ANSI one) promise more than sending the bit pattern to the display device? Perhaps "\n" should be special. ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (B.2.4.1) Translation limits To: std-c@cbosgd Section B.2.4.1 states that "the implementation must be able to compile at least one program that meets or exceeds all of the following translations limits." Note, however, that a program with 15 nesting levels for compounds, 31 character identifiers, and 1024 identifiers in each nested block will require about 500,000 bytes of symbol table storage which is probably not feasible for any but the largest implementation. I would suggest removing "all" or changing "maximum number of identifiers with block scope in one block" to "... in one block and all of its parents.", if for no other reason than making it unnecessary for implementors to ignore this portion of the standard. The following translation limits seem unreasonably small: Conditional compilation nesting levels (6) -- I would recommend 16. Case labels in a switch (255) -- I would recommend at least 512 and preferably 1024. The yacc grammar for Pascal offers an example of a large switch statement. ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (B.2.4.2, D.1, etc.) Quasi-reserved words. To: std-c@cbosgd The Draft Standard has added a large number of #defined symbols and type definitions. For example, section B.2.4.2 adds 30 numerical limits. I would recommend that all new definitions (i.e. #defined symbols and type definitions that are not currently in widespread use) be specified with a leading _ so they cannot conflict with user code. I.e., the user should have a reasonable chance of defining variables can never conflict with reserved words. As it is now, there is no way that I can tell that a symbol does not conflict with a symbol in some library header file. I would further recommend that the symbols be composed of full English words, even if this means more typing. Thus "SHRT_MIN" should be "_SHORT_MIN". Similarly, I would recommend "readonly" rather than "const." ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (C.8.2) Macro replacement To: std-c@cbosgd The current Draft Standard added the following to C.8.2: If the identifier following the initial # in a directive has been defined as a macro name, the identifier is not replaced by an expansion of the macro. I think this means that if you have written #define foo endif #foo you don't get #endif, but an example would be helpful. (C.8.2) ## unclarities. It appears from my reading that the token created by ## concatenation cannot cause further macro expansion, but this is not clearly stated. For example, what is the result of the following: #define concat(a, b) a ## b #line 123 int line = concat(__LI, NE__); is it line = __LINE__; or line = 123; ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (C.8.3) Conditional inclusion To: std-c@cbosgd #if directives verified for correctness? The Draft Standard now specifies that Directives are verified for correctness, but processed only to keep track of the level of nested conditionals. Does this mean that #if 0 /* never compiled */ #if )syntax error( #endif #endif should print a compiler error message? What about #undef never_defined #ifdef never_defined #define xyz(a) ((a) * 1) #endif ... #ifdef never_defined #if xyz(123) #endif #endif Should the use of xyz() result in an error message, or be replaced by zero (undefined preprocessor symbol) or what? ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (D.) Operating-system defined values and C data types. To: std-c@cbosgd Several functions (notably fseek/ftell and kill) take parameters that are defined in terms of C data types (the ftell result is a long, and kill takes an integer program identifier). This cannot be made to work correctly on many systems. For example, a process is specified on RSX-11M by a 3 (16-bit) word vector. Since the language supports passing structures to functions, I would recommend redefining these functions in terms of structures (whose contents are defined by appropriate #include files) and adding functions to perform implementation-dependent arithmetic in an implementation-dependent manner. ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (D.10.2) Rand To: std-c@cbosgd The definition of rand semantics is reasonable for systems with 32-bit two's-complement long integers. It is not clear from Knuth (volume 2) that it will yield the same sequence of numbers for implementations with greater numeric precision or different arithmetic behavior. Unless, of course, rand is implemented as a function that reads a file of 2^32 pre-compiled integers. But, since the size of this file cannot be expressed as a long, srand cannot correctly reposition the file. ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (D.12.3.1) asctime To: std-c@cbosgd In order to provide for orderly development of local-language variants of C, the alphabetic words in the returned string should be standardized to their current (English) values -- which should be included in the Draft Standard. Alternatively, the Standard should explicitly state that the actual contents of these fields may be implementation dependent. I would prefer the former. ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (D.12.3.4) gmtime To: std-c@cbosgd The standard should note that Greenwich Mean Time is more properly known as UTC (Universal Coordinated Time.) [ You mean UCT? -- Mod -- ] ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (D.3) Character Testing Functions To: std-c@cbosgd The "~" character has the hexadecimal value 0x7E, not 0xFE. The DEL character has the hexadecimal value 0x7F, not 0xFF. As noted above, they should be defined in terms of the Latin-1 alphabet and a particular set of presentation layer designations/invocations (ASCII_G in GL, Latin-1 in GR). This avoids all of the issues with NRC's. I would suggest that the C Standards Committee coordinate with the X3 character set committees. ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (D.8) Variable arguments To: std-c@cbosgd This section refers to section C.7.7.1, which doesn't exist. What is the behavior of an implementation when number and type of the arguments, as accessed by va_arg disagree with those of the actual function call? ------------------------------ Date: Sun, 30 Jun 85 12:45:44 edt From: decvax!minow (Martin Minow) Subject: (D.9.9) File pointers To: std-c@cbosgd Defining the value returned by ftell() as a long cannot work on some implementations and unnecessarily limits the size of files on all implementations. I would recommend that ftell return an implementation-defined structure and that functions be provided to manipulate the values of such structures. ------------------------------ End of mod.std.c Digest - Mon, 1 Jul 85 10:23:12 EDT ****************************** USENET -> posting only through cbosgd!std-c. ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C) In all cases, you may also reply to the author(s) above.