[comp.lang.c] trigraphs

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (06/07/88)

Let's consider the various combinations of compilers and terminals.
Commonly, either of these can be US-ASCII, 7-bit French-ASCII (or
some other national character set), or 8-bit IS0-ASCII.

1) if I am using a US-ASCII terminal, I have the full C source
character set at my fingertips and all three types of compilers
must accept these characters according to the way they appear
on my screen.  Thus, I have no need for trigraphs.

2) similarly, if I am using an ISO-ASCII terminal, the keyboard will
contain the full C source character set, and all three types of
compilers must accept these characters.  Thus, I still have no need
for trigraphs.

3) finally, if I am using a 7-bit French-ASCII terminal, the situation
is a little more complicated.

3a) if the compiler only knows about US-ASCII I have a choice of
entering "\" either as "??/" or as "cedilla-c".

3b) if the compiler uses ISO-ASCII, then again I must enter "\"
either as "??/" or as "cecilla-c".

3c) and finally, if the compiler knows about French-ASCII, then I
would think that I must enter "\" as "??/", since the compiler
will treat "cedilla-c" as a real letter.  But if I try to define
static char language??(??) = "FranCais";
where the "C" is actually the cedilla-c character, then strange
things will happen since the standard says that the character set
must include the "\" character, and so the string will actually
contain "Fran\ais", which is "Fran<beep>is".  Thus again I still
have the choice of entering "\" as either "??/" or as "cedilla-c".

So, putting this all together, regardless of what the compiler's
character set is, it is only the French-ASCII terminal that has
any need of the trigraphs.  Now, on such a terminal I cannot
use the cedilla-c character as anything but a back-slash since
all three types of compilers must interpret this as a back-slash,
and not as a cedilla.

So, the only case that needs trigraphs is the French-ASCII terminal,
and such a terminal will have nine keys that I am better off not
using since they appear to give me something that I don't really
get.

People using French have three choices.  Use the trigraphs and
avoid those 9 keys; use those 9 keys, remembering their special
meanings and forget about trigraphs; or get a different terminal
and forget about trigraphs.

That reduces the cases that need trigraphs to those that have
French-ASCII terminals and that also prefer to avoid using the
national keys.

From what I can gather, there are not many people still buying
French-ASCII terminals and those that have such terminals seem
to prefer using the funny characters to using the trigraphs.
Consider that at the moment trigraphs don't even exist outside
the minds of the X3J11 Committee, and decide how many people
that now use the funny characters and are going to switch to
using trigraphs.

The number of people that would actually use trigraphs must
be amazingly small.  For what it is costing the Committee in
time, the publishers in paper, the net in shipping articles
denouncing trigraphs, and the readers in time to read these
articles, I'm sure it would be cheaper if we all chipped in
and bought new terminals for those few individuals and then
completely dropped the concept of trigraphs from the Standard.

scs@athena.mit.edu (Steve Summit) (06/17/88)

Here's what I don't understand about trigraphs in character
strings (the only kind I'm worried about): of what possible
utility are they?  As I understand it, trigraphs let you utter
characters, which you need in C, which your local terminal
doesn't understand.  However, the thing you usually do with
strings is print them out (usually on your local terminal) so if
your local terminal can't handle the character, why is it
important to have a special way to encode it within a string?

If I am overlooking some obvious or oft-discussed fact, or if I
am repeating Ray Butterworth's argument, please respond by mail
or not at all; the net has had about enough trigraph articles.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

chris@mimsy.UUCP (Chris Torek) (02/14/89)

In article <1875@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
>It's irritating to have to implement a feature that nobody in their right
>mind is going to use, and that has such a negative impact on the product.

Indeed (he says, wincing at the mangerial misuse of the word `impact').
My suggestion is to provide two separate versions of the compiler, one
that completely ignores trigraphs, and one that optionally scans them.
The installation sequence, then, might go like this:

	This package comes with two versions of the compiler.
	The fast one does not implement trigraphs, and is therefore
	not an ANSI C compiler.  The slow one does implement trigraphs.
	If you want to use trigraphs, install the slow compiler,
	otherwise use the fast compiler.  See Appendix A if you
	decide you want to switch.

	Do you want to have trigraphs available?

If the user answers `yes', the next prompt is:

	Why?

:-)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/15/89)

In article <15941@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>	Do you want to have trigraphs available?
>If the user answers `yes', the next prompt is:
>	Why?

I'd be about the last person to defend trigraphs as a technical element
of the C language, as anyone who has attended X3J11 meetings could
confirm.  However, by now I've heard the official party line enough
times that I think I can answer questions about this "feature".

Trigraphs are intended as a means of portably transmitting maximally
portable C programs between systems with potentially different character
sets.  Because separate preprocessors, data transmission protocols, etc.
were outside the charter of X3J11 but nevertheless the Committee desired
to ensure this degree of source code portability, they agreed that the
minimal ISO character set requirements could be taken as the basis for
such source code transfer.  Because C traditionally uses symbols not in
the ISO base character set, some substitutes for such symbols, that could
be expressed entirely within the ISO base set, had to be found.  The ??*
form of trigraphs was chosen as the least problematic of all suggested
alternatives.

The important practical point is that C programmers are NOT expected to
use trigraphs when they type in their source code, and they should not
see trigraphs when displaying source code on any device on common modern
computing systems.  Trigraphs are intended for program interchange only.
(Quite honestly, I doubt that everyone in X3J11 originally had this
notion, but it appears to be the current party line.)

Note that trigraphs may best be dealt with by a separate translator,
ideally a separate program that could practically be skipped except
the first time that code is imported from another site.  The translator
could be officially defined as part of one's Standard-conforming
implementation, but in practice used only for validation testing
and for translating imported source code.  One can imagine
circumstances in which some such translation would always be necessary,
for example in some existing European character set environments.
An extra level of translation (having nothing to do with trigraphs) is
allowed in translation phase 1 to deal with such environments, which are
beyond the scope of X3J11 or indeed any programming language standards
group.  In fact the C source code character "x" need not look anything
like a Roman "X" as stored, displayed, or manipulated externally, and it
can occupy any number of bytes in external storage.  Therefore, even
in character sets lacking a representation for the letter "x" it is
possible to devise an encoding for C program source that might contain
instances of source code character "x".  Fortunately the ISO base set
includes all the traditional C alphanumerics, just not all its special
symbols such as "\".  Thus in some ISO environments, "\" and other
special C source symbols must be mapped into external encodings.
Trigraphs were an attempt to standardize this mapping for ISO-based
systems.  Looking back at the consequent noise and confusion, I think
many X3J11 members now wish we hadn't tried to "pioneer" in this area.

dg@lakart.UUCP (David Goodenough) (02/16/89)

From article <15941@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
I In article <1875@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
n>It's irritating to have to implement a feature that nobody in their right
e>mind is going to use, and that has such a negative impact on the product.
w 
s My suggestion is to provide two separate versions of the compiler, one
. that completely ignores trigraphs, and one that optionally scans them.
i The installation sequence, then, might go like this:
s 
. 	Do you want to have trigraphs available?
d 
u If the user answers `yes', the next prompt is:
m 
b 	Why?

Beause he's trying to install a C compiler on a Commodore Pet with a silly
64 Character non-ascii character set :-) :-) :-) :-)
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

henry@utzoo.uucp (Henry Spencer) (02/19/89)

In article <9650@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>Note that trigraphs may best be dealt with by a separate translator,
>ideally a separate program that could practically be skipped except
>the first time that code is imported from another site...

And let us not forget that in a Unix-like environment, a reasonably (not
wonderfully, but reasonably) efficient implementation of such a translator
is the following:

	#! /bin/sh
	sed "/??/ {
		s/??=/#/g
		s/??(/[/g
		s;??/;\\\\;g
		s/??)/]/g
		s/??'/^/g
		s/??</{/g
		s/??!/|/g
		s/??>/}/g
		s/??-/~/g
		}" $*

The one possible problem here is that old implementations of sed may have
annoyingly low limits on input line length.
-- 
The Earth is our mother;       |     Henry Spencer at U of Toronto Zoology
our nine months are up.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

acu@mentor.cc.purdue.edu (Floyd McWilliams) (08/29/89)

In article <10859@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <1392@atanasoff.cs.iastate.edu> John Hascall writes:

	(Discussion on EBCDIC deleted.)

>>(pps. I think trigraphs were a misguided effort as well)

>I think that most of X3J11 might even privately agree with that
>assessment.  However, they serve a possibly useful function with
>very little adverse impact (mainly on idiots who use "??!").

	Not to be a pain, but why did X3J11 use ??! for a trigraph?
They only needed 8 or 9 distinct trigraphs, and "!" is one of the
two characters I can think of that have any meaning after "??". 
	I realize that over-emphasizing with ??! is bad style, but
that's not scope of X3J11... :-)

"Life's for my own, to live my own way."
Floyd McWilliams			mentor.cc.purdue.edu!acu

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/30/89)

In article <3776@mentor.cc.purdue.edu> acu@mentor.cc.purdue.edu (Floyd McWilliams) writes:
>They only needed 8 or 9 distinct trigraphs, and "!" is one of the
>two characters I can think of that have any meaning after "??". 

The only meaning I've ever seen that was considered correct usage is in
chess notation.

I think ??! was chosen as being more mnemonic than other alternatives.

bls@u02.svl.cdc.com (Brian Scearce) (03/14/91)

What should "???-" turn into under ANSI? "???-" or "?~"?  My H&S
says that "all other trigraph sequences (including relatives such
as ??&) should be left untranslated", so I could see it going either
way, although I would expect "?~".

Please email me, I will post a summary.

--
     Brian Scearce (bls@robin.svl.cdc.com  -or-  robin!bls@shamash.cdc.com)
    "Don't be surprised when a crack in the ice appears under your feet..."
 Any opinions expressed herein do not necessarily reflect CDC corporate policy.

henry@zoo.toronto.edu (Henry Spencer) (05/29/91)

In article <1991May28.231253.5226@csrd.uiuc.edu> bliss@sp64.csrd.uiuc.edu (Brian Bliss) writes:
>which brings up the question:  what if I want to use the
>sequence "??!" within a string?

Write "?\?!" instead.  That's why there is a \? escape in ANSI C.

Bletch.
-- 
"We're thinking about upgrading from    | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 to SunOS 3.5."              |  henry@zoo.toronto.edu  utzoo!henry

wollman@emily.uvm.edu (Garrett Wollman) (06/03/91)

I'm certain the Committee must have had a good reason, but I agree
(mostly) with the following quote from "Using and Porting GNU CC"
(about page 21 in the 8.5x11 hardcopy):

     @item -trigraphs
     Support ANSI C trigraphs.  You don't want to know about this
     brain-damage.  The @samp{-ansi} option also has this effect.

I also agree with the spirit of the following option (page 14):

     @item -Wtrigraphs
     Warn if any trigraphs are encountered (assuming they are
     enabled).


-GAWollman

Garrett A. Wollman - wollman@emily.uvm.edu

Disclaimer:  I'm not even sure this represents *my* opinion, never
mind UVM's, EMBA's, EMBA-CF's, or indeed anyone else's.