wittig@gmdzi.UUCP (Georg Wittig) (10/02/89)
May be the follwing are RTFM questions, but I don't have the ANSI C papers; Harbison & Steele II don't seem to cover it ... My questions are about the legal characters in a C source programme: [1] There exist editors that allow you to enter any ASCII character. Consider the following program fragment: /* in the following lines let @ be the character '\0' */ int x; x = 1 + /* foo @ bar */ 2 /* */ ; Is this program fragment equivalent to [a] ``int x; x = 1 + 2;'' In this case C compilers cannot use ``fgets'' to read the source lines. or [b] ``int x; x = 1 + ;'' This will result in a syntax error message in later compiler phases. What about a '\0' outside a C comment? Does it terminate the current line or must it be kept so that a syntax error message will be the result? What about a '\0' in a string constant? [2] Furthermore, there are (non-UNIX) operating systems that encode the end of a source line by the number of bytes of that line instead of inserting a newline character (\x0a or \x0d in ASCII, \x15 in EBCDIC) at the end of that line. As an example, the line ``abc'' could be encoded as ``\3abc'', and not as ``abc\x0d''. In those environments ``[f]getc'' must generate an artificial '\n' character at the end of the line. Or am I mistaken? What if exactly this artificial '\n' is also a character of the line? What is a ``line'' in this context? Consider a (perverse looking) macro like the following: /* in the following line let @ be the character '\n' */ #define X(a,b) foo@#define X(a,b) ((a)+(b)) i = X(27,38); Is this required to pass the preprocessor phase without an error message, and if so what is the output of that phase? I can think of at least 5 different ways to process such a crazy macro. [3] Line continuation by `\': Does it only apply to #define contexts and string constant contexts, or is it a general rule? Example: int terrible_long_identifier; terrible_lon\ g_identifier = 1; Does the assignment statement alter the value of that terrible long variable, or is it a syntax error (``terrible_lon'' and ``g_identifier'' undeclared)? Thanks in advance, -- Georg Wittig GMD-Z1.BI P.O. Box 1240 D-5205 St. Augustin 1 (West Germany) email: wittig@gmdzi.uucp phone: (+49 2241) 14-2294 ------------------------------------------------------------------------------- "Freedom's just another word for nothing left to lose" (Kris Kristofferson)
minow@mountn.dec.com (Martin Minow) (10/03/89)
In article <1302@gmdzi.UUCP> wittig@gmdzi.UUCP (Georg Wittig) writes: >[1] There exist editors that allow you to enter any ASCII character. Consider > the following program fragment: > > /* in the following lines let @ be the character '\0' */ > int x; > x = 1 + /* foo @ bar */ > 2 /* */ > ; This is probably a "quality of implementation" issue (because of NUL's specific use in C to terminate strings. A good implementation ought to sweep out such characters (my opinion). More interesting is whether the '@' can stand for one of the national letters in the ISO Latin-1 alphabet (these have values from 0xA0 to 0xFF). Again, "good" implementations will allow characters in comments, 'char' and "string" constants that aren't in the C source alphabet. > >[2] Furthermore, there are (non-UNIX) operating systems that encode the end of > a source line by the number of bytes of that line instead of inserting a > newline character fgets() should encode these lines as "string\n" -- how it would treat an embedded \n is a quality of implementation issue. I would suggest that there should be no difference between an explicit \n and one generated to signal an end-of-record. > I can think of at least 5 > different ways to process such a crazy macro. >[3] Line continuation by `\' May occur anywhere (ignoring trigraphs). Thus "terribly_lon\ g_identifier" is legal anywhere. Martin Minow minow@thundr.dec.com
gwyn@smoke.BRL.MIL (Doug Gwyn) (10/03/89)
In article <1302@gmdzi.UUCP> wittig@gmdzi.UUCP (Georg Wittig) writes: > /* in the following lines let @ be the character '\0' */ > int x; > x = 1 + /* foo @ bar */ > 2 /* */ > ; The character you're representing by "@" is not in the standard C source character set, so such a program is not strictly conforming. Some implementations may be able to deal with that source code but others will not. If an implementation does deal with it, it is up to that implementation how to interpret this non-standard extension. >[2] Furthermore, there are (non-UNIX) operating systems that encode the end of > a source line by the number of bytes of that line ... There is a misunderstanding here. The specifications for C source character set do not constrain how C source code files are represented in a particular implementation, nor how text editors present C source code visually, nor myriad other similar issues. C source code characters must be seen as distinct units by the conforming C translator; what mapping is done from physical source character encoding before that point lies beyond the scope of the C standard. Presumably it will be similar to that done for "text" files in the hosted C library text-stream support, but it need not be. >[3] Line continuation by `\': Does it only apply to #define contexts and string > constant contexts, or is it a general rule? It's a general rule. The first translation phase is physical-to-C source code character mapping, then trigraph replacement, then \ newline splicing. Preprocessing occurs after that.