rbutterworth@watmath.waterloo.edu (Ray Butterworth) (06/07/88)
Let's consider the various combinations of compilers and terminals. Commonly, either of these can be US-ASCII, 7-bit French-ASCII (or some other national character set), or 8-bit IS0-ASCII. 1) if I am using a US-ASCII terminal, I have the full C source character set at my fingertips and all three types of compilers must accept these characters according to the way they appear on my screen. Thus, I have no need for trigraphs. 2) similarly, if I am using an ISO-ASCII terminal, the keyboard will contain the full C source character set, and all three types of compilers must accept these characters. Thus, I still have no need for trigraphs. 3) finally, if I am using a 7-bit French-ASCII terminal, the situation is a little more complicated. 3a) if the compiler only knows about US-ASCII I have a choice of entering "\" either as "??/" or as "cedilla-c". 3b) if the compiler uses ISO-ASCII, then again I must enter "\" either as "??/" or as "cecilla-c". 3c) and finally, if the compiler knows about French-ASCII, then I would think that I must enter "\" as "??/", since the compiler will treat "cedilla-c" as a real letter. But if I try to define static char language??(??) = "FranCais"; where the "C" is actually the cedilla-c character, then strange things will happen since the standard says that the character set must include the "\" character, and so the string will actually contain "Fran\ais", which is "Fran<beep>is". Thus again I still have the choice of entering "\" as either "??/" or as "cedilla-c". So, putting this all together, regardless of what the compiler's character set is, it is only the French-ASCII terminal that has any need of the trigraphs. Now, on such a terminal I cannot use the cedilla-c character as anything but a back-slash since all three types of compilers must interpret this as a back-slash, and not as a cedilla. So, the only case that needs trigraphs is the French-ASCII terminal, and such a terminal will have nine keys that I am better off not using since they appear to give me something that I don't really get. People using French have three choices. Use the trigraphs and avoid those 9 keys; use those 9 keys, remembering their special meanings and forget about trigraphs; or get a different terminal and forget about trigraphs. That reduces the cases that need trigraphs to those that have French-ASCII terminals and that also prefer to avoid using the national keys. From what I can gather, there are not many people still buying French-ASCII terminals and those that have such terminals seem to prefer using the funny characters to using the trigraphs. Consider that at the moment trigraphs don't even exist outside the minds of the X3J11 Committee, and decide how many people that now use the funny characters and are going to switch to using trigraphs. The number of people that would actually use trigraphs must be amazingly small. For what it is costing the Committee in time, the publishers in paper, the net in shipping articles denouncing trigraphs, and the readers in time to read these articles, I'm sure it would be cheaper if we all chipped in and bought new terminals for those few individuals and then completely dropped the concept of trigraphs from the Standard.
scs@athena.mit.edu (Steve Summit) (06/17/88)
Here's what I don't understand about trigraphs in character strings (the only kind I'm worried about): of what possible utility are they? As I understand it, trigraphs let you utter characters, which you need in C, which your local terminal doesn't understand. However, the thing you usually do with strings is print them out (usually on your local terminal) so if your local terminal can't handle the character, why is it important to have a special way to encode it within a string? If I am overlooking some obvious or oft-discussed fact, or if I am repeating Ray Butterworth's argument, please respond by mail or not at all; the net has had about enough trigraph articles. Steve Summit scs@adam.pika.mit.edu
chris@mimsy.UUCP (Chris Torek) (02/14/89)
In article <1875@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes: >It's irritating to have to implement a feature that nobody in their right >mind is going to use, and that has such a negative impact on the product. Indeed (he says, wincing at the mangerial misuse of the word `impact'). My suggestion is to provide two separate versions of the compiler, one that completely ignores trigraphs, and one that optionally scans them. The installation sequence, then, might go like this: This package comes with two versions of the compiler. The fast one does not implement trigraphs, and is therefore not an ANSI C compiler. The slow one does implement trigraphs. If you want to use trigraphs, install the slow compiler, otherwise use the fast compiler. See Appendix A if you decide you want to switch. Do you want to have trigraphs available? If the user answers `yes', the next prompt is: Why? :-) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/15/89)
In article <15941@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: > Do you want to have trigraphs available? >If the user answers `yes', the next prompt is: > Why? I'd be about the last person to defend trigraphs as a technical element of the C language, as anyone who has attended X3J11 meetings could confirm. However, by now I've heard the official party line enough times that I think I can answer questions about this "feature". Trigraphs are intended as a means of portably transmitting maximally portable C programs between systems with potentially different character sets. Because separate preprocessors, data transmission protocols, etc. were outside the charter of X3J11 but nevertheless the Committee desired to ensure this degree of source code portability, they agreed that the minimal ISO character set requirements could be taken as the basis for such source code transfer. Because C traditionally uses symbols not in the ISO base character set, some substitutes for such symbols, that could be expressed entirely within the ISO base set, had to be found. The ??* form of trigraphs was chosen as the least problematic of all suggested alternatives. The important practical point is that C programmers are NOT expected to use trigraphs when they type in their source code, and they should not see trigraphs when displaying source code on any device on common modern computing systems. Trigraphs are intended for program interchange only. (Quite honestly, I doubt that everyone in X3J11 originally had this notion, but it appears to be the current party line.) Note that trigraphs may best be dealt with by a separate translator, ideally a separate program that could practically be skipped except the first time that code is imported from another site. The translator could be officially defined as part of one's Standard-conforming implementation, but in practice used only for validation testing and for translating imported source code. One can imagine circumstances in which some such translation would always be necessary, for example in some existing European character set environments. An extra level of translation (having nothing to do with trigraphs) is allowed in translation phase 1 to deal with such environments, which are beyond the scope of X3J11 or indeed any programming language standards group. In fact the C source code character "x" need not look anything like a Roman "X" as stored, displayed, or manipulated externally, and it can occupy any number of bytes in external storage. Therefore, even in character sets lacking a representation for the letter "x" it is possible to devise an encoding for C program source that might contain instances of source code character "x". Fortunately the ISO base set includes all the traditional C alphanumerics, just not all its special symbols such as "\". Thus in some ISO environments, "\" and other special C source symbols must be mapped into external encodings. Trigraphs were an attempt to standardize this mapping for ISO-based systems. Looking back at the consequent noise and confusion, I think many X3J11 members now wish we hadn't tried to "pioneer" in this area.
dg@lakart.UUCP (David Goodenough) (02/16/89)
From article <15941@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
I In article <1875@dataio.Data-IO.COM> bright@Data-IO.COM (Walter Bright) writes:
n>It's irritating to have to implement a feature that nobody in their right
e>mind is going to use, and that has such a negative impact on the product.
w
s My suggestion is to provide two separate versions of the compiler, one
. that completely ignores trigraphs, and one that optionally scans them.
i The installation sequence, then, might go like this:
s
. Do you want to have trigraphs available?
d
u If the user answers `yes', the next prompt is:
m
b Why?
Beause he's trying to install a C compiler on a Commodore Pet with a silly
64 Character non-ascii character set :-) :-) :-) :-)
--
dg@lakart.UUCP - David Goodenough +---+
IHS | +-+-+
....... !harvard!xait!lakart!dg +-+-+ |
AKA: dg%lakart.uucp@xait.xerox.com +---+
henry@utzoo.uucp (Henry Spencer) (02/19/89)
In article <9650@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >Note that trigraphs may best be dealt with by a separate translator, >ideally a separate program that could practically be skipped except >the first time that code is imported from another site... And let us not forget that in a Unix-like environment, a reasonably (not wonderfully, but reasonably) efficient implementation of such a translator is the following: #! /bin/sh sed "/??/ { s/??=/#/g s/??(/[/g s;??/;\\\\;g s/??)/]/g s/??'/^/g s/??</{/g s/??!/|/g s/??>/}/g s/??-/~/g }" $* The one possible problem here is that old implementations of sed may have annoyingly low limits on input line length. -- The Earth is our mother; | Henry Spencer at U of Toronto Zoology our nine months are up. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
acu@mentor.cc.purdue.edu (Floyd McWilliams) (08/29/89)
In article <10859@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <1392@atanasoff.cs.iastate.edu> John Hascall writes: (Discussion on EBCDIC deleted.) >>(pps. I think trigraphs were a misguided effort as well) >I think that most of X3J11 might even privately agree with that >assessment. However, they serve a possibly useful function with >very little adverse impact (mainly on idiots who use "??!"). Not to be a pain, but why did X3J11 use ??! for a trigraph? They only needed 8 or 9 distinct trigraphs, and "!" is one of the two characters I can think of that have any meaning after "??". I realize that over-emphasizing with ??! is bad style, but that's not scope of X3J11... :-) "Life's for my own, to live my own way." Floyd McWilliams mentor.cc.purdue.edu!acu
gwyn@smoke.BRL.MIL (Doug Gwyn) (08/30/89)
In article <3776@mentor.cc.purdue.edu> acu@mentor.cc.purdue.edu (Floyd McWilliams) writes: >They only needed 8 or 9 distinct trigraphs, and "!" is one of the >two characters I can think of that have any meaning after "??". The only meaning I've ever seen that was considered correct usage is in chess notation. I think ??! was chosen as being more mnemonic than other alternatives.
bls@u02.svl.cdc.com (Brian Scearce) (03/14/91)
What should "???-" turn into under ANSI? "???-" or "?~"? My H&S says that "all other trigraph sequences (including relatives such as ??&) should be left untranslated", so I could see it going either way, although I would expect "?~". Please email me, I will post a summary. -- Brian Scearce (bls@robin.svl.cdc.com -or- robin!bls@shamash.cdc.com) "Don't be surprised when a crack in the ice appears under your feet..." Any opinions expressed herein do not necessarily reflect CDC corporate policy.
henry@zoo.toronto.edu (Henry Spencer) (05/29/91)
In article <1991May28.231253.5226@csrd.uiuc.edu> bliss@sp64.csrd.uiuc.edu (Brian Bliss) writes: >which brings up the question: what if I want to use the >sequence "??!" within a string? Write "?\?!" instead. That's why there is a \? escape in ANSI C. Bletch. -- "We're thinking about upgrading from | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 to SunOS 3.5." | henry@zoo.toronto.edu utzoo!henry
wollman@emily.uvm.edu (Garrett Wollman) (06/03/91)
I'm certain the Committee must have had a good reason, but I agree (mostly) with the following quote from "Using and Porting GNU CC" (about page 21 in the 8.5x11 hardcopy): @item -trigraphs Support ANSI C trigraphs. You don't want to know about this brain-damage. The @samp{-ansi} option also has this effect. I also agree with the spirit of the following option (page 14): @item -Wtrigraphs Warn if any trigraphs are encountered (assuming they are enabled). -GAWollman Garrett A. Wollman - wollman@emily.uvm.edu Disclaimer: I'm not even sure this represents *my* opinion, never mind UVM's, EMBA's, EMBA-CF's, or indeed anyone else's.