[comp.std.c] translation phases

daniel@terra.ucsc.edu (Daniel Edelson) (02/23/91)

Can someone please tell me if this is a correct interpretation
of section 2.1.1.2 of the standard. (I apologize if this is something
that has been thrashed to death already in this newsgroup.)

According to the translation phases, line splicing occurs before
escape sequence replacement. Therefore, a '\\newline' sequence
in a character or string constant should translate to a single
backslash. This will be a syntax error unless it is part of
a valid escape sequence.

In particular:

char msg[] = "\\
t";

This should translate to:

msg[0] = '9';   /* tab */
msg[1] = '0';   /* null sentinel */

rather than

msg[0] = '92';  /* backslash */
msg[1] = '10';  /* newline */
msg[2] = '116'; /* `t' */
msg[3] = '0';   /* null sentinel */


Thanks,

Daniel Edelson
uunet!peren!daniel
 or
daniel@cis.ucsc.edu

gwyn@smoke.brl.mil (Doug Gwyn) (02/24/91)

In article <12696@darkstar.ucsc.edu> daniel@terra.ucsc.edu (Daniel Edelson) writes:
>According to the translation phases, line splicing occurs before
>escape sequence replacement.

But after trigraph replacement.

>Therefore, a '\\newline' sequence
>in a character or string constant should translate to a single
>backslash.

Correct.

>This will be a syntax error unless it is part of
>a valid escape sequence.

Or comment, header name, etc.

>char msg[] = "\\
>t";
>This should translate to:
>msg[0] = '9';   /* tab */
>msg[1] = '0';   /* null sentinel */

Assuming the ASCII code set, the effect of the initializer is:
	msg[0] = 9;
	msg[1] = 0;
(Not quite what you said.)

I suspect there are so-called "ANSI C" compilers in existence that
get this wrong.

ccplumb@rose.uwaterloo.ca (Colin Plumb) (02/24/91)

>In article <12696@darkstar.ucsc.edu> daniel@terra.ucsc.edu (Daniel Edelson) writes:
>>char msg[] = "\\
>>t";

gwyn@smoke.brl.mil (Doug Gwyn) wrote:
> Assuming the ASCII code set, the effect of the initializer is:
>	msg[0] = 9;
>	msg[1] = 0;
>
> I suspect there are so-called "ANSI C" compilers in existence that
> get this wrong.

Just for the record, gcc gets it right.
-- 
	-Colin

garry@ithaca.uucp (Garry Wiegand) (02/26/91)

bhoughto@pima.intel.com (Blair P. Houghton) writes:
>Line splicing (the deletion of the sequence {backslash,newline}
>from the translation unit; thus reducing the unit's "length" by
>two characters) occurs before anything else other than trigraph
>replacement and translation of funny newline markers into
>real newline characters.  Strings don't even exist at this
>point; just lots of characters, a few of them to be deleted.

Was this a late addition to the standard? I tried:

    \
    \
    \
    z() {}

on all the compilers I could get to quickly. DEC (Ultrix and VMS),
HP, SG, Sun, and the Sony all complained. Gcc passed it. (Gcc is
also the only one that so far is brave enough to define __STDC__.)

I changed this to:

    z\
    z\
    z\
    z() {}

which made the VMS compiler smile upon it (making a global symbol
named ZZZZ - I checked :-) and didn't change the others. 

Interesting. Not useful, but interesting.

Garry Wiegand    ---    Ithaca Software, Alameda, California
...!uunet!ithaca!garry, garry%ithaca.uucp@uunet.uu.net