[comp.lang.c] Trigraphs: Is sed sufficient?

cline@suntan.ece.clarkson.edu (Marshall Cline) (07/01/89)

In article <1989Jun27.164758.1379@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

>In article <2029@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:
>>	1. Trigraph support significantly slows down the scanner, which is
>>	   the most time-consuming part of a compiler. Trigraphs are useless,
>>	   and so are left out of the Useful C mode.

>It's not necessary for trigraphs to be in the scanner at all, provided the
>implementation supports them *somehow* (a sed script is what I'd use) for
>official conformance.

Ah TriGraphs.  Henry's comment about using "sed" is interesting.  But is it
true that trigraphs change the contents of strings literals??.  Example:
"Is this a trigraph --> ??."  What is printed by:
		printf("Foo ??. bar ??; baz ??? barf ??$");
(I don't even know if the ".;?$" are valid endings for trigraphs, but you
get the idea...)

If these nasty little fellers are gonna chomp down on my existing C code and
munge my string literals, I'd like to know about it!

But the real point of me posting is: "sed" is _only_ appropriate if trigraphs
are expanded _WHEREVER_ they appear (including inside strings, in char
literals, etc, etc.  Otherwise the regular expression support in sed isn't
powerful enough to parse a Context Free Grammar such as the BNF _syntax_
for ANSI-C.  Recall that parsing a CFG requires a Push-Down Automata, which
is strictly more powerful than any Finite Automata.  (_Semantic_ aspects such
as whether variable names are declared and/or are of compatible types are
issues which can't even be resolved by a PDA; they require at least a Context
Sensitive Grammar, and probably a full Turing Machine).

Marshall
--
	________________________________________________________________
	Marshall P. Cline	ARPA:	cline@sun.soe.clarkson.edu
	ECE Department		UseNet:	uunet!sun.soe.clarkson.edu!cline
	Clarkson University	BitNet:	BH0W@CLUTX
	Potsdam, NY  13676	AT&T:	315-268-6591

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/01/89)

In article <CLINE.89Jun30154443@suntan.ece.clarkson.edu> cline@sun.soe.clarkson.edu (Marshall Cline) writes:
>Ah TriGraphs.  Henry's comment about using "sed" is interesting.  But is it
>true that trigraphs change the contents of strings literals??.

If you think about what trigraphs are intended for, the answer is obvious.
Yes, trigraph replacement is the first thing done after mapping the
physical source file characters to the internal source character set.
Note that the physical-to-internal mapping provides another opportunity
for handling local character set problems, and indeed is where I recommend
that it be done whenever possible.