rcd@ico.ISC.COM (Dick Dunn) (05/20/88)
I've etalked to a few people about this, but I'd like to see if there's
more info floating around.
Background: "Trigraphs" in dpANS C are a way of avoiding the problems of
character-set restrictions, by introducing 3-character replacements for
those characters which are required for C but do not exist in the ISO 7-bit
set. For example, if your character set doesn't have braces {}, you can
use ??< and ??> to denote them. The behavior is as if trigraphs were
replaced by the corresponding single characters in a prepass to the compiler,
*including* replacement within strings. All trigraphs begin with "??".
The draft standard seems to be written in such a way that a compiler MUST
accept these trigraph sequences. I'm perplexed on a couple of points here.
1. Replacement within strings: This is a change to the existing language.
It breaks existing programs. I looked through existing source code
that we have here and found several programs which get broken or
significantly altered. Here's an example--sanitized, but typical of
what can happen. Suppose you now have:
printf("bad status ??<%x>??--device %n\n", st, dev);
What you're going to get, according to the draft standard, is something
that has the effect of:
printf("bad status {%x>~-device %n\n", st, dev);
Point: The sequence "??" is not at all rare. Why was it chosen as the
introducer? (I think people who start getting messages about using
`/dev/tty^ are going to be confused.)
Note also that it is common practice to use "?" in initializing strings
where the "?" positions will be replaced at execution time. Pity the
poor programmer who sets up something like:
char ta[] = "/tmp/d?????/a", tb[] = "/tmp/d?????/b";
and discovers (eventually) that these strings are each two characters
shorter than they used to be; if he tries to replace the ?s, he'll
write off the ends of the strings!
NOW, before you light 'em up and blast me, YES, I realize it's a hard
problem. There aren't many safe character sequences to use--and YES, I
know that you can't use backslash because that's one of the possibly-
missing characters. What I don't understand is why it was decided to
introduce a brand-new (I assume) mechanism which breaks existing code.
2. Replacement in program text: My philosophical objections to
replacement of trigraphs within a program are much less...but I wonder
who might ever use them. Is there any precedent for these sequences?
Is there any reason to think they'll be used? Let's take another
(slightly contrived but realistic) example here--I'll construct a
piece of code which says, roughly, "If the first character of `line'
is a sharp or percent, call function prepro to handle the rest of the
line, then increment linect". We would now write this as:
if (line[0]=='#' || line[0]=='%') {
prepro(&line[1]);
linect++;
}
Replacing all the nasty characters with corresponding trigraphs gives:
if (line??(0??)=='??=' ??!??! line??(0??)=='%') ??<
prepro(&line??(1??));
linect++;
??>
I submit that this will produce code which is so near to unreadable
that there is virtually no prospect of the mechanism ever seeing
significant use. If you believe that, you have to wonder why every
standard compiler should have to carry the extra baggage. If you don't
believe that, I'd like to see some real evidence to show that
programmers might use it.
A general question: Has the trigraph mechanism been tried out, in real
practice, anywhere prior to the introduction in X3J11? If so, I'd like to
hear about how it's worked out.
--
Dick Dunn UUCP: {ncar,cbosgd,nbires}!ico!rcd (303)449-2870
...Never attribute to malice what can be adequately explained by stupidity.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/20/88)
In article <5215@ico.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: >The draft standard seems to be written in such a way that a compiler MUST >accept these trigraph sequences. Yes, a standard-conforming implementation MUST understand trigraphs. >1. Replacement within strings: This is a change to the existing language. > It breaks existing programs. ... > Point: The sequence "??" is not at all rare. Trigraphs ARE relatively rare in existing code. Yours is the first example I've seen, in fact. Most applications think ? should be used as a question mark in messages, perhaps ?? at the end of a few message strings or in a chess program. > Why was it [??] chosen as the introducer? Because all single characters in the ISO invariant code set already had valid C meanings. Many double-character sequences also already have meanings. ?? seemed to cause the least disruption to existing practice. > What I don't understand is why it was decided to > introduce a brand-new (I assume) mechanism which breaks existing code. Because nobody, including you, has proposed anything that the Committee agreed was better, and many C users (for example, Europeans) have a perceived need that the parochial American outlook does not meet. The point is that existing practice was deemed unsatisfactory, so SOMEthing had to change. X3J11 tried to minimize the impact of this "quiet change". >Has the trigraph mechanism been tried out, in real practice, anywhere >prior to the introduction in X3J11? This specific mechanism is an invention of X3J11, so far as I can determine. However, use of multi-byte sequences to encode things that cannot be represented by a single byte is extremely common practice. Note, by the way, that I oppose trigraphs, but I can provide a definite explanation of how the European needs can be met without them, just as I can explain how the Japanese needs can be met without introducing the wchar_t stuff. My feeling is that people develop mindsets based on previous non-optimal design that precludes their understanding what an optimal design would be like. Probably the difficulty of learning how to deal with a kludge causes a psychological investment that is hard to give up. None of the above, of course, should be construed as official X3J11 information.
alan@Apple.COM (Alan Mimms) (05/21/88)
Perhaps the best solution to the trigraph dilemma is to make available some public-domain filters for converting from- and to- the trigraph notation. This would permit those unfortunate enough to have strange character sets to write C code and to port that code to a machine whose C compiler does NOT support the trigraph notation and back again with minimal pain. I BELIEVE I understand that the trigraph notation is a simple transformation of the normal ASCII-based C notation. Consequently, it should be quite simple to convert in both directions. The only problem might be in strings in programs which produce C programs as their output -- in which case, the filters come to the rescue by converting the program's output before it is compiled. Doesn't this make most of the flamers happy? -- Alan Mimms My opinions are generally Communications Products Group pretty worthless, but Apple Computer they *are* my own... ...it's so simple that only a child can do it! -- Tom Lehrer, "New Math"
chuck@eneevax.UUCP (Chuck Harris) (05/21/88)
In article <5215@ico.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: > >2. Replacement in program text: My philosophical objections to > replacement of trigraphs within a program are much less...but I wonder > who might ever use them. Is there any precedent for these sequences? Yes, back in the olden days, some implementations of APL had a "digraph" character set that was composed of combinations of "$" and another char. I <<had>> to use this set while I was at UoM, using Model 33's. It was pretty disgusting, but worked. Our particular implimentation was controlled by an option flag, so it didn't harm native mode APL work. A clear deficiency in the ANSI proposal. > Is there any reason to think they'll be used? Let's take another Not in my opinion. the "digraph" set was simple enough, and APL's needs easy enough to accomodate, that it didn't cause any real confusion. (you ended up with something that looked a little like DEC's FOCAL) APL needed an <- ($P), matrix divide ($#), "lamp" ($.), delta ($F) (or was it $D), index ($I), ... It's been so long, I forget most of it. "C" uses a very rich set of characters, even when compared with APL. Many of its most used characters are not representable in ISO (eg. {|\}[]) I LOVE IT!! 8-) > > if (line??(0??)=='??=' ??!??! line??(0??)=='%') ??< > prepro(&line??(1??)); > linect++; > ??> > > I submit that this will produce code which is so near to unreadable > that there is virtually no prospect of the mechanism ever seeing > significant use. If you believe that, you have to wonder why every >-- The last time I railed about Trigraphs, I caused quite a stir. I gave a few examples of the garbage that would result, likened the use of trigraphs to the techniques used to "enhance" the deficiencies of the old Model 33 TTY, called the offending ISO terminals "Braindamaged", ranted and raved about how simple it was for anybod who was stuck with the ISO terminals to implement their own "trigraph" preprocessor and leave the language intact. For my efforts, I got called a "Chauvanistic American" , a fool, and a few other things that might have harmed my EGO. So, outside of it being too late to change things, there is NO way that I will risk post anything on this subject. :-) Chuck Harris C.F. Harris - Consulting
beckenba@cit-vax.Caltech.Edu (Joe Beckenbach) (05/21/88)
--- I'm not sure how nit-picky a detail this is, but the impression I've gotten from the trigraph postings of late is that the compiler would rather not deal with it. Isn't that what a preprocessor is for? (Or is the preprocessor considered part of the compiler?) For C code using trigraphs, I assume that judicious use of spacing will ease matters, eg {int garbage[MAX];} goes to ??< int garbage??( MAX ??) ; ??> much more legibly than a direct substitution without spacing ??<int garbage??(MAX??);??> Of course trigraphs mean more characters in a source file. But spaces and \n's are cheap, or at least were until the compiler broke. :-) BTW, I had to program on an old IBM4381 workstation in Pascal. There was no way to get the curly braces AT ALL from the keyboard; the language support kludge was a trigraph sequence. It worked, but it was hard to spot the comments for a while. The machine could display the curly braces, but the machine couldn't generate them from any of the input devices! [Bad design in action. :-( ] -- Joe Beckenbach beckenba@csvax.caltech.edu Caltech 1-58, Pasadena CA 91125 Graduating in June, knowing that C ain't bad, tools exist and are useful, and that digital watches could be a neat idea. :-)
gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/21/88)
In article <10626@apple.Apple.Com> alan@apple.UUCP (Alan Mimms) writes: >Perhaps the best solution to the trigraph dilemma is to make available some >public-domain filters for converting from- and to- the trigraph notation. Since the mapping is done during translation phase 1, this should be feasible. I would also suggest removing #pragma lines.
mcdonald@uxe.cso.uiuc.edu (05/21/88)
The solution to the trigraph botch is simple: have compiler vendors make it optional. Provide an option in the "install" program for the compiler to turn it only only if the user wants it. Otherwise, don't do it. I don't see why it is necessary, anyway. The character set of ANSI C is presumably the ANSI character set ( Oh! can you say EBCDIC? Not if you don't want to throw up on your keyboard! Besides, EBCDIC has plenty of characters.) If someone wants to use a non-standard set, let them find appropriate characters. I don't care if it is in the standard, so long as it doesn't appear in my compiler. Doug McDonald
nather@ut-sally.UUCP (Ed Nather) (05/23/88)
In article <10626@apple.Apple.Com>, alan@Apple.COM (Alan Mimms) writes: > > Doesn't this make most of the flamers happy? > By definition, *nothing* can make a flamer happy. You may be able to satisfy a few grumps or malcontents, but a true flamer yields to no solution. If one ever did, he would be exiled to Bitnet. -- Ed Nather Astronomy Dept, U of Texas @ Austin {allegra,ihnp4}!{noao,ut-sally}!utastro!nather nather@astro.AS.UTEXAS.EDU
jas@rain.rtech.UUCP (Jim Shankland) (05/23/88)
And then there's what Stallman has to say about trigraphs, in *Internals of GNU CC*: You don't want to know about this brain-damage. Jim Shankland ..!ihnp4!cpsc6a!\ sun!rtech!jas ..!ucbvax!mtxinu!/
henry@utzoo.uucp (Henry Spencer) (05/23/88)
> Our particular implimentation was controlled by an option flag, so it didn't > harm native mode APL work. A clear deficiency in the ANSI proposal. I would expect that this is the way many C compilers will implement trigraphs; I know of some that already take that approach. Lots of people share your view that trigraphs are ugly. -- NASA is to spaceflight as | Henry Spencer @ U of Toronto Zoology the Post Office is to mail. | {ihnp4,decvax,uunet!mnetor}!utzoo!henry
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/23/88)
In article <7937@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: | In article <5215@ico.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: | >The draft standard seems to be written in such a way that a compiler MUST | >accept these trigraph sequences. | | Yes, a standard-conforming implementation MUST understand trigraphs. This is the first case I've seen where the committee really blew it in my opinion (yes I could live with noalias). I completely agree with the need to do this, but as currently implemented will cause a number of problems. A preprocessor function could specify the trigraph inducer, with the default "none" to avoid breaking existing programs. The committee seems to have lost sight of that goal in this case. The same functionality could be provided by a new preprocessor function (can't break existing programs). Consider: #trigraph ?? Now your program can run on my machine, using the notation you used. If I choose, I can run it through a filter and convert to full ASCII. Better yet, I can take my existing programs and convert them before sending them to you. Why do it this way? If I want to send you a program of mine, which I wrote filled with the ?? sequence **like many machine control programs which have to get certain ASCII characters to the device** I can give you another sequence: #trigraph TX This is ugly as hell, but it will let you edit the program, and not break it. PLEASE X3J11, fix this sucker! It CAN be done without breaking existing programs. It makes more sense in the preprocessor. Best reason is that as specified it will lead to compilers which don't do full ANSI by default, or even subset compilers. I scanned my local source directory and found three programs of 102 which would break. I don't know if that's typical, but why do it wrong when it can be done another way. I did NOT scan the directory of programs which do device control, since I have made that point and every one would break and have to be handcoded with escape sequences, etc, do get by this. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
karl@haddock.ISC.COM (Karl Heuer) (05/24/88)
In article <1988May23.000451.751@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>[It should be controlled by an option flag] >I would expect that this is the way many C compilers will implement trigraphs; >I know of some that already take that approach. Lots of people share your >view that trigraphs are ugly. I countersuggest that the compiler should always recognize trigraphs, and issue a warning message if any are encountered. Then add an option to supress this warning. This way, the compiler would still be conforming, and in the unlikely event that some of my code uses a string containing two question marks followed by one of the magic characters, I'd find out about it. (This is based on the assumption that trigraphs stay in. I'd prefer that they be removed, but in any case I don't expect them to get in my way.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
lenoil@Apple.COM (Robert Lenoil) (05/24/88)
In article <5215@ico.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: > Note also that it is common practice to use "?" in initializing strings > where the "?" positions will be replaced at execution time. Dick is dead right here. What is the justification for breaking existing programs when the ability to include untypeable characters into strings already exists via the \xxx mechanism? Instead of introducing a totally new notion (to C, anyway) of trigraphs, why not simply extend the backslash escape mechanism to be valid outside of strings? This would allow the use of #defines to perform the same function as trigraphs: #define ??< \173 /* open brace */ #define ??> \175 /* close brace */ By using the backslash escapes in strings and your favorite synonym outside of strings, the same effect is reached without breaking any existing code. If people don't want to use the backslash escapes in strings, they can make use of the new stringizing operators to get the #define'd constants into their strings. Robert Lenoil Apple Computer, Inc.
jss@hector.UUCP (Jerry Schwarz) (05/24/88)
In article <10941@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: > > PLEASE X3J11, fix this sucker! It CAN be done without breaking >existing programs. It makes more sense in the preprocessor. Best >reason is that as specified it will lead to compilers which don't do >full ANSI by default, or even subset compilers. > Attached below is the text of the official committee response to Letter P02 (during the first public review period around a year ago). The standard is now in the third public review period and the committee is only accepting comments on changes made between the second and third drafts. Thus Trigraphs (which were accepted very early in by the committee and were discussed extensively in this group several times since) are almost certainly going to be in the final version. I can understand that it may be frustrating for someone to come upon the proposed standard today, see something they don't like, and feel it is being rammed down their throats without due consideration. Especially if they think they have a better way to solve the problem. However, I hope such people will try to understand that the process of creating a standard goes on for a long time and that suggestions made toward the end of the process may not receive the same consideration as suggestions made earlier. For the record: I think trigraphs are a bad idea. Jerry Schwarz ----------------------------- X3J11 Response to Letter P02 Summary of Issue: Eliminate trigraphs. Committee Response: The Committee has reaffirmed this decision on more than one occasion. The Committeee discussed alternatives to trigraphs on a number of occassions, but always decided that they fill a need. C must support a wide variety of terminals and keyboards many of which lack the full C character set.
thorinn@diku.dk (Lars Henrik Mathiesen) (05/24/88)
As one who regularly uses a non-ASCII terminal setup, I'd better explain a little. In Danish (my native language) we have three `extra' letters which we much prefer to use when writing Danish text - it is possible to get by with two-letter replacements, but it's not very readable. By the way, these are not `accented letters;' they are separate letters of the alphabet, with their own place at the end of the sorting sequence. Much the same applies to German, Swedish, Norwegian, and many other European languages. That's not usually a problem as most modern terminals have provisions for various national character sets, which are defined in an ISO standard. This standard allows the glyphs at some eight or ten positions to vary, including @, $, [, \, ], {, | and }. The latter six are used for the non- ASCII letters in Danish, as they follow the other letters nicely. So, the X3J11 people think, the poor Europeans can't use ASCII: we'll have to invent some kludge to bring C to their benighted shores. The only excuse for inventing something so horrible is that it only breaks a very few programs, and that it won't be used anyway. You see, over here we get by just fine without trigraphs. The less fortunate are stuck with a national character set, and have to put up with seeing the various punctuation as letters - they are not as visually distinctive (and the brackets and braces don't pair naturally), but with a little attention to layout one gets by quite well. And it's _much_ better than trigraphs. The lucky ones have terminals which can switch between ASCII and national character sets. If not for the warped minds of the terminal manufacturers, this would be the perfect solution. But we (at this institute) have yet to see a terminal with an escape sequence to switch character sets, or (and this is worse) one whose keyboard layout did _not_ change with the character set shown on the screen. (And none of them had LCD keytops). So we have to pay the importer to hack new PROMs to enable us to switch without moving the keys around. But I digress. By the way, I find that it's easier to read Danish with ASCII characters than it is to parse convoluted C code in Danish characters, so I hardly ever bother to switch any more. To make it pleasant to use C and national letters in the same file, there would have to be _convenient_ replacements for the ASCII characters in question, and it would have to allow the national letters to be used in identifiers (trigraphs don't). This cannot be done as an extension of the ASCII C input format because the national letters are punctuation in ASCII. Now we're talking about an alternate input format for C - we'll have to tell the compiler if a given source file is in the `old' or the `new' format. On the other hand this frees us to use extra keywords etc. The new format shouldn't use any characters that may be replaced in national character sets. The tokens [ ] { } | || (and in some compilers |=) must be replaced; one off-the-cuff possibility is (. .) beg end or cor (or=). We need a new pre-processor escape and a new string escape, which can't very well be keywords. // might be a possibility for both, as it's rare in C, but does it look too much like JCL? This new format could probably be implemented by a little lex pre-pre- processor; national characters in identifiers would have to be encoded somehow (e.g. using Q as an escape), increasing the identifier length. This would cause problems with symbolic debuggers and short-name compilers, but could easily be retrofitted on old compilers (write your own cc ...). Oh well, it wouldn't be portable anyway. Hey, anybody from GNU reading this? By the way, Standard Pascal is designed to be possible to write without specific ASCII characters: It allows (. .) for [ ] (indexing), and (* *) for { } (comments). Since e.g. .5 is a legal constant, this may cause unexpected parse errors for programmers who're unaware of the feature. -- Lars Mathiesen, DIKU, U of Copenhagen, Denmark [uunet!]mcvax!diku!thorinn Institute of Datalogy -- we're scientists, not engineers.
chris@mimsy.UUCP (Chris Torek) (05/25/88)
In article <10949@apple.Apple.Com> lenoil@Apple.COM (Robert Lenoil) writes: >... why not simply extend the backslash escape mechanism to be valid >outside of strings? Backslash is one of the characters that cannot be represented in some character sets (the trigraph ??/ is a synonym for it in the dpANS). >This would allow the use of #defines to perform the same function >as trigraphs: > >#define ??< \173 /* open brace */ >#define ??> \175 /* close brace */ This would be almost as big a change as trigraphs; the #define syntax is now # define <identifier><arglist_opt> <replacement-text> and `??' is not part of an <identifier>. I think the `#trigraph' suggestion is a suitable way to keep trigraphs from affecting old code and/or infesting new code. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
karl@haddock.ISC.COM (Karl Heuer) (05/25/88)
In article <10949@apple.Apple.Com> lenoil@apple.UUCP (Robert Lenoil) writes: >Instead of introducing [trigraphs], why not simply extend the backslash >escape mechanism to be valid outside of strings? That's an easy one: backslash is one of the characters that may not exist! >This would allow the use of #defines >#define ??< \173 /* open brace */ >#define ??> \175 /* close brace */ Not unless you extend the preprocessor's notion of what constitutes a valid macro name. Note also that the magic constants \173 and \175 are unportable. In article <10941@steinmetz.ge.com> davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes: >I scanned my local source directory and found three programs of 102 which >would break. ... I did NOT scan the directory of programs which do device >control, since I have made that point and every one would break and have to >be handcoded with escape sequences, etc, do get by this. Are you sure you have that many programs that would break? Note that `??' alone is not a problem; it becomes a trigraph only when followed by one of the nine characters "=(/)'<!>-". (Unlike backslash, which is reserved even if the following character is unrecognized.) Assuming trigraphs stay in, the fix is simple: filter your code through sed -e "s;??\\([-=(/)'<!>]\\);?\\\\?\\1;g" as part of the ANSIfication process. (Better yet, do it now before you run into a compiler with trigraphs. It won't hurt, unless your current compiler complains about the unrecognized escape "\?".) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
rcd@ico.ISC.COM (Dick Dunn) (05/25/88)
> > Our particular implimentation was controlled by an option flag, so it didn't > > harm native mode APL work. A clear deficiency in the ANSI proposal. > I would expect that this is the way many C compilers will implement trigraphs; > I know of some that already take that approach. Lots of people share your > view that trigraphs are ugly. Adding a compiler switch may be the "least bad" solution that a rational compiler writer can find...but it gives programmers an ugly choice: The compiler either transmogrifies trigraphs or it compiles programs in a nonstandard way. If the programmer writes: printf("What on earth??!\n"); a standard-conforming compiler should produce code which will cause the program to print: What on earth| If instead it produces code which causes the program to print: What on earth??! it's violating the standard. If you're compiling your own code, you know when to turn on the trigraph switch on the compiler...but if you're compiling jrandom.c that you got on a tape from somebody, what do you do? Is it a standard program? Was it written before the standard came out? (There are a couple of files in the netnews source which fall into just this hole.) -- Dick Dunn UUCP: {ncar,cbosgd,nbires}!ico!rcd (303)449-2870 ...If you get confused just listen to the music play...
ok@quintus.UUCP (Richard A. O'Keefe) (05/25/88)
In article <10949@apple.Apple.Com>, lenoil@Apple.COM (Robert Lenoil) writes: > Instead of introducing a totally new notion > (to C, anyway) of trigraphs, why not simply extend the backslash escape > mechanism to be valid outside of strings? Because backslash itself is one of the missing characters. (This is all fixed in the ISO 8859 character set family anyway.)
rcd@ico.ISC.COM (Dick Dunn) (05/25/88)
In article <10949@apple.Apple.Com>, lenoil@Apple.COM (Robert Lenoil) writes: > Dick is dead right here. What is the justification for breaking existing > programs when the ability to include untypeable characters into strings already > exists via the \xxx mechanism?... The problem is that backslash is one of the characters which does not exist in the European character sets of concern! You can't use backslash to dodge the problem, because you don't have backslash! (That is one of the little nits that makes it such a nasty problem.) -- Dick Dunn UUCP: {ncar,cbosgd,nbires}!ico!rcd (303)449-2870 ...If you get confused just listen to the music play...
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (05/25/88)
I'm not asking for the removal of the feature, I'm pointing out that it is currently done in a way which breaks existing programs, and that there are ways to prevent that from happening. I was on the committe for the first two years, and I can't find any references to trigraphs in my old notes. Bill Plauger's original comment on things like this (from my notes on the Washington meeting) was that "we should not egregiously break existing programs." I think that the current implementation is a major deviation from that philosophy, justified only if there is no other way. As for last minute things, the vendors wanted to add noalias at the last minute to allow better code generation (I actually didn't object to that) so changing the implementation of a feature which (a) no one is currently using, and (b) breaks existing programs is certainly NOT an impossibility. Please remember the A in ANSI stands for American, as does the A in ASCII. In an effort to make this a viable international standard, X3J11 may not have considered the impact of this implementation. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
rcd@ico.ISC.COM (Dick Dunn) (05/25/88)
Thanks to Doug Gwyn for some answers on trigraphs. Unfortunately, the more I learn, the less I like them...but that's not Doug's fault. >=me, >>=Doug > >1. Replacement within strings: This is a change to the existing language. > > It breaks existing programs. ... > > Point: The sequence "??" is not at all rare. > Trigraphs ARE relatively rare in existing code. Yours is the first > example I've seen, in fact. Most applications think ? should be used > as a question mark in messages, perhaps ?? at the end of a few message > strings or in a chess program. Wait. I said that ?? (the trigraph introducer, if you will) is not at all rare, and this is easy to confirm. Occurrences of ?? are important because they represent situations where the next character could cause trouble. Go look at source code! If you're on a UNIX system, find some source and: find . -name '*.[ch]' -exec grep '??' '{}' ';' I suggest that you look for all ?? instead of just trigraphs so that you can get an appreciation of where ?? appears. When I first found trigraphs, I said "WTF??!" and immediately looked at my own source code. I found one conflict. So I went to a UNIX source tree and found several occurrences in Sys V code. More poking around turned up scattered others--some netnews source, some networking stuff. There aren't a lot of them, but they *do* exist. I would have expected the committee to do as I did--search large piles of source code to look for conflicts. It only took me a little while one evening. Some repeats--??! as an expletive; (???) for a questionable item. The following is NOT meant as a flame against Doug (who has stuck his neck out to explain some of what has gone on), but I think the committee reneged on its responsibilities in putting trigraphs in. From the X3J11 rationale: | The X3J11 charter clearly mandates the Comittee to _codify_existing_ | _practice_. (emphasis present; "_" is italics) | ... | Existing code is important. | ... | Avoid "quiet changes." Trigraphs are not existing practice; apparently they have not even been really tried out! They break existing code in a "quiet change" fashion. There are real examples of code currently in use which will be "broken" if recompiled by a compiler conforming to this part of the draft standard. > > What I don't understand is why it was decided to > > introduce a brand-new (I assume) mechanism which breaks existing code. > Because nobody, including you, has proposed anything that the Committee > agreed was better,... I intentionally avoided any sort of counterproposal in the first posting because I wanted to focus on what the committee had done and why; I didn't want to start with a debate over anything I would propose. I have a philosophical view that this problem would be better off with no solution than with a clumsy solution that breaks existing code. (I don't agree that "a bad solution is better than none at all.") There are other areas where X3J11 said "there's no prior art" and/or deferred work on a problem to extension work. Trigraphs in strings are the important issue; trigraph symbols in code are ugly but don't break anything. So, just for the sake of argument I'll toss out some ideas for strings: There is already one form for an alternate interpretation of the mapping of a literal character or string into its memory representation, namely L"stuff" for wide chars and strings. Why not use the same model--say, precede the string with R for restricted or T for trigraph; thus R"stuff??/n" would mean R"stuff\n". Even if you think L"stuff" is a mistake, this would only be a second occurrence of the same class of mistake. (Karl Heuer noted that L"stuff" is a quiet change too, but it's highly unlikely to hit; I've found no occurrences.) As I said, that was JUST a proposal for the sake of argument. You might equally well construct names for the problem characters and build them into a header file; then construct strings by the compile-time concate- nation business. There are other ways. YES, they're ugly, BUT they don't have to break existing code, while the draft standard method is ugly AND breaks code. What about an ISO 8859 character set? Wouldn't that cover a lot of the problem area? >...and many C users (for example, Europeans) have a > perceived need that the parochial American outlook does not meet. I understand their need. I agree that it's "parochial" to ignore the problem, but I don't think it's parochial to say "we don't have a good solution yet, so let's not cast a bad one in concrete." =>What do Europeans do about C now?<= Is there NO prior art? If not, it's certainly not ready to be standardized! > >Has the trigraph mechanism been tried out, in real practice, anywhere > >prior to the introduction in X3J11? > This specific mechanism is an invention of X3J11, so far as I can > determine. However, use of multi-byte sequences to encode things > that cannot be represented by a single byte is extremely common > practice. I know that multi-byte sequences are common--I worked with 370ish Pascal quite a while back, and we had to use digraphs for about six characters. These digraphs became part of the Pascal standard, BUT there's a big difference: the digraphs were established practice long before the standard was done. They were in use, known to be practical (if ugly), and didn't break anything on machines that didn't need them. It is also clear that you don't get very far trying to invent believable digraphs for C, so you need trigraphs if you go that route. The objection is that they haven't been tried out. You're standardizing something you haven't really used in practice, and since C is not Ada (oops; sorry:-), that's just not wise. > Note, by the way, that I oppose trigraphs, but I can provide a definite > explanation of how the European needs can be met without them... Then I wish folks had pushed against them harder. (Maybe you did, Doug; I don't know.) -- Dick Dunn UUCP: {ncar,cbosgd,nbires}!ico!rcd (303)449-2870 ...If you get confused just listen to the music play...
henry@utzoo.uucp (Henry Spencer) (05/26/88)
> I countersuggest that the compiler should always recognize trigraphs, and > issue a warning message if any are encountered. Then add an option to supress > this warning. Actually, in the experimental scanner I'm playing with, they are always recognized, but how they are interpreted depends on an option. If the trigraph option is on, they are interpreted as per X3J11. If the option is off -- the default -- a warning message is produced and each trigraph is interpreted as three characters.
henry@utzoo.uucp (Henry Spencer) (05/26/88)
> Consider: > #trigraph ?? You can't even write this without trigraphs, because # is one of the magic characters that may not exist in the source character set. I don't much like trigraphs, and I think there are more graceful approaches (like saying "use ISO Latin 1", which eliminates the problem), but you can be fairly sure that X3J11 has already thought of all the simplistic quick fixes and turned them down for one reason or another.
henry@utzoo.uucp (Henry Spencer) (05/26/88)
> ... What is the justification for breaking existing programs when the > ability to include untypeable characters into strings already > exists via the \xxx mechanism? Instead of introducing a totally new notion > (to C, anyway) of trigraphs, why not simply extend the backslash escape... Because, for openers, backslash is one of those ASCII-specific characters that you can't even *write* without trigraphs in some of the European character sets. I do wish people who want to sound off about this problem would first spend some time understanding it!
ok@quintus.UUCP (Richard A. O'Keefe) (05/26/88)
In article <5424@ico.ISC.COM>, rcd@ico.ISC.COM (Dick Dunn) writes: > Wait. I said that ?? (the trigraph introducer, if you will) is not at all > rare, and this is easy to confirm. Occurrences of ?? are important because > they represent situations where the next character could cause trouble. I just checked a directory containing 126 utility sources (some of which I got from the net, some of which I got from a friendly wizard years ago) and 4 of them contained ?? inside strings. If I've understood the rules, only two of them would actually break. Rather alarming: before I made this check I was happy about trigraphs: they won't break _my_ code, I said!
johnl@n3dmc.UUCP (John Limpert) (05/26/88)
I think a simple solution to this problem is possible. Why not have the compiler print a warning if it detects a trigraph? This would reduce the chances of breaking a program when it was recompiled with an ANSI C compiler. If the warning was only printed the first time a trigraph was encountered, it wouldn't be too annoying. Restricting the check to literal strings might be worthwhile. Many compilers print warnings about constructs that are legal, but are often unintentional coding errors. -- John A. Limpert UUCP: johnl@n3dmc.UUCP uunet!n3dmc!johnl PACKET: n3dmc@n3dmc.ampr.org n3dmc@wa3pxx
faustus@ic.Berkeley.EDU (Wayne A. Christopher) (05/26/88)
Nobody has said what the existing practice is with regard to European character sets. Do Europeans just use an ascii keyboard when they want to use C? Or do they use u-umlaut for backslash (or whatever it is)? Trigraphs are so ugly I can't believe anybody actually uses them, or will use them if they're part of C. I think trigraphs are a trick of American terminal manufacturers who want to fool Europeans into thinking they can use their terminals for writing programs. Wayne
flaps@dgp.toronto.edu (Alan J Rosenthal) (05/26/88)
In article <5391@ico.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: [ re a compiler switch for trigraphs ] >If the programmer writes: > printf("What on earth??!\n"); >a standard-conforming compiler should produce code which will cause the >program to print: > What on earth| >If instead it produces code which causes the program to print: > What on earth??! >it's violating the standard. Just goes to show you that conforming compilers are not likely to be the most useful compilers. However, the ansi standard doesn't say that the conforming compiler must be called `cc'. Most implementors will probably say that it's called `cc -trigraph' (with probably other switches as well). Or, to put it another way, I fully expect all ansi-conforming compilers to come in two flavours: a strictly conforming one and a useful one. >If you're compiling your own code, you know when to turn on the trigraph >switch on the compiler...but if you're compiling jrandom.c that you got on >a tape from somebody, what do you do? Is it a standard program? Was it >written before the standard came out? (There are a couple of files in the >netnews source which fall into just this hole.) Ahem, all C programs were written before the standard comes out. The standard has not yet come out. Anyway, to answer your question, you simply compile it without the -trigraph switch and also without the -nocomplaintrigraph switch. In other words, a useful compiler will not implement trigraphs but will give a warning message when it encounters them. One might still argue that you then have to decide whether to recompile with the -trigraph switch or not. However I maintain that this problem exists whether or not there is a compiler switch to solve it. ajr -- - Any questions? - Well, I thought I had some questions, but they turned out to be a trigraph.
meissner@xyzzy.UUCP (Michael Meissner) (05/26/88)
In article <11655@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: | | I think the `#trigraph' suggestion is a suitable way to keep trigraphs | from affecting old code and/or infesting new code. Unfortunately, the problem with #trigraph and others of it's ilk, is that '#' is one of the characters replaced in European 7-bit character sets. -- Michael Meissner, Data General. Uucp: ...!mcnc!rti!xyzzy!meissner Arpa: meissner@dg-rtp.DG.COM (or) meissner%dg-rtp.DG.COM@relay.cs.net
gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/26/88)
In article <10941@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: > #trigraph ?? This is a nice idea, assuming that one remains committed to having the compiler deal with trigraphs at all, which in my opinion was never necessary even for European C users. There have been further ISO developments affecting character sets since the invention of trigraphs, so it might be appropriate to rexamine this invention to see whether or not it could be totally removed. However, at this stage of the approval process, the only way I can imagine a substantive change to the trigraph specs would be for a serious objection ("veto") to be raised at the ISO level. X3J11 has indicated a desire for the next round of public review to be the last, which it cannot be if substantive changes are made. > PLEASE X3J11, fix this sucker! It CAN be done without breaking >existing programs. It makes more sense in the preprocessor. Best >reason is that as specified it will lead to compilers which don't do >full ANSI by default, or even subset compilers. Trigraph mapping is specified as being done in translation phase 1, which precedes what is normally considered "preprocessing", but could certainly be handled by separate preprocessors. I think your proposal could be fit into the translation-phase scheme adequately if it were accepted by the committee. I don't really think there will be any compilers that will fully conform to all ANSI/ISO C specs, except for trigraph handling, as the default case. Much more likely is that there would be separate PCC-like and ANSI-conforming compilers (perhaps controlled by a command-line "switch").
gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/27/88)
In article <5424@ico.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: >Thanks to Doug Gwyn for some answers on trigraphs. Unfortunately, the more >I learn, the less I like them...but that's not Doug's fault. Thanks for recognizing that I don't like them either and am just trying to explain what I think X3J11's motivation/reasoning was. Of course, I'm not speaking officially for X3J11 here and may have gotten this wrong (the original decision was made before I started attending meetings). One thing to keep in mind is that almost everyone agrees that it is important for the ANSI and ISO standards for C to be technically identical. Therefore X3J11 is dealing with internationalization issues, even though this might seem unnecessary for ANSI purposes. >| Avoid "quiet changes." The proposed ANSI/ISO C standard introduces several "quiet changes", as noted in the Rationale document. Certainly one guideline was to minimize these, but there were many guidelines and they conflicted to some degree. Therefore compromises had to be worked out; if it makes you feel better, call these "optimal solutions to constrained problems" instead of "compromises". >There are real examples of code currently in use which will be "broken" >if recompiled by a compiler conforming to this part of the draft standard. Yes, that's true for all "quiet changes". >I have a philosophical view that this problem would be better off with >no solution than with a clumsy solution that breaks existing code. I don't think "no solution" was considered acceptable to ISO at the time. >... R"stuff??/n" would mean "stuff\n". This is not a bad idea, but as the proposed standard stands trigraphs are mapped well before anything else is done to analyze the source code, so the "??/" would not hang around long enough for this method to be applied. If it weren't for the need to deal with {} etc. then trigraph mapping could possibly be deferred, but the main use of trigraphs is for {} etc. so the mapping cannot be deferred long enough. >What about an ISO 8859 character set? Wouldn't that cover a lot of the >problem area? It was considered inappropriate for the C standard to constrain the choice of character set like that. However, it recently was revised to promise that '0' through '9' have ascending numerical representations, and of course it does require that a large set of characters be representable, so there is some precedent. I doubt that enough vendors would support such a requirement, though. The ISO 646-1983 invariant code set was taken as the least common denominator for respresentable character glyphs. I think that was the real mistake; glyphs are just silly marks on paper or displays, and we aren't really interested in their shapes other than that all of the ones we need for C be unique. I don't much care if { sometimes looks like [[ or \(lb, so long as I have tools for dealing with it when I program. >What do Europeans do about C now? The only existing practice I had heard about was use of (< >) etc., details varying from place to place. Perhaps some Europeans can contribute more info here. >Then I wish folks had pushed against them harder. Two factors conspired here. One is that many existing environments don't offer much support for better source code import/export/printing translation, which is how I think this issue should be dealt with. The other is that "ISO insisted on this sort of solution", which may or may not be true but it certainly makes it hard to deal with since X3J11 and the ISO C people don't meet concurrently.
ado@elsie.UUCP (Arthur David Olson) (05/27/88)
> I think a simple solution to this problem is possible. Why not > have the compiler print a warning if it detects a trigraph? No. . .the commpiler should only print a warning if a particular file contains both trigraphs *and* trigraphable characters. This way folks who write "pure trigraph" code won't get inundated with warnings. The scheme is effective since '#' is a trigraphable character. -- ado@ncifcrf.gov ADO is a trademark of Ampex.
minow@thundr.dec.com (Martin Minow THUNDR::MINOW ML3-5/U26 223-9922) (05/27/88)
In a message to comp.lang.c, fuastus@ic.Berkeley.edu asks what Europeans use to write C programs. As you no doubt know, there are about a dozen code positions in "Ascii" that are reserved for national use. The C language uses most of these for syntactic purposes. X3J11 invented the "trigraph" notation to allow C programming on European terminals without the (current) kludge of interpreting, say, "upper-case A-umlaut" as "left square bracket". The problem only occurs for terminals that are limited to a single seven-bit ISO-646 based character set. EBCDIC terminals, and terminals that conform to the newer ISO 8859 (Latin-1) or that are compatible with Dec's VT200 series can use a coherent 8-bit character set that permits C programming in its current form without loss of national characters. Central to this is operating system support for 8-bit characters. Some operating systems (and utilities) assume that the eighth bit is free for "flagging" which causes problems. Although ISO 8859 is the best base for future programming, it should be noted that non-ISO workstations such as the IBM PC, the Atari St and the Macintosh support a mixture of national letters and the ISO invariant set. The only problem, then, is caused by "old-style" terminals combined with seven-bit limited operating environments. At the time trigraphs were proposed, these were fairly common. They are much less common now, and are quickly being replaced by ISO-compliant terminals and workstations. Imagine if C were being standardized in, say, 1974, when there were very few terminals that supported lower-case: one could well imagine a kludge to allow mixed case programming on monocase terminals. One such kludge was, in fact, provided in the Unix operating system. It finds little, if any, use today -- and you would have to search carefully to find an upper-case only terminal. Because of the speed of conversion to ISO-8859 (and similar 8-bit environments), coupled with ambiguities in the definition of trigraphs, I recommended in my comments to the standard that they be dropped. The committee rejected my arguments, but I would hope they reconsider before release of the standard. Martin Minow minow%thundr.dec@decwrl.dec.com PS: there was some question of "American Chauvism". For the record, I have a European university degree, and worked as a programmer in Europe for ten years. The above does not represent the position of Digital Equipment Corporation
henry@utzoo.uucp (Henry Spencer) (05/28/88)
> [warning of trigraphs] Restricting the check to literal strings might > be worthwhile... Practical but a bit of a nuisance. Ideally one would like to use the same code for the checking and (if enabled) the actual interpretation of trigraphs. This is almost impossible to do if one wants to be selective about issuing warnings, because trigraph interpretation is defined to happen at a time when you don't even know whether you're inside a comment or not, never mind what kind of token you're examining. -- "For perfect safety... sit on a fence| Henry Spencer @ U of Toronto Zoology and watch the birds." --Wilbur Wright| {ihnp4,decvax,uunet!mnetor}!utzoo!henry
thorinn@diku.dk (Lars Henrik Mathiesen) (05/28/88)
In article <3655@pasteur.Berkeley.Edu> faustus@ic.Berkeley.EDU (Wayne A. Christopher) writes: >Nobody has said what the existing practice is with regard to European >character sets. I posted an article the other day, but it maybe it didn't get past mcvax. I shall include it here. >I think trigraphs are a trick of American terminal manufacturers who >want to fool Europeans into thinking they can use their terminals for >writing programs. Think again: If we use American ASCII-only terminals on an operating system and compiler designed for ASCII, as most of them are, there's no problem in writing C code, only in getting our national characters in the output. I think a similar confusion may be part of the reason why trigraphs are so badly concieved. My prior article follows; I apologize if it's been seen before, but I haven't seen any signs that it has. As one who regularly uses a non-ASCII terminal setup, I'd better explain a little. In Danish (my native language) we have three `extra' letters which we much prefer to use when writing Danish text - it is possible to get by with two-letter replacements, but it's not very readable. By the way, these are not `accented letters;' they are separate letters of the alphabet, with their own place at the end of the sorting sequence. Much the same applies to German, Swedish, Norwegian, and many other European languages. That's not usually a problem as most modern terminals have provisions for various national character sets, which are defined in an ISO standard. This standard allows the glyphs at some eight or ten positions to vary, including @, $, [, \, ], {, | and }. The latter six are used for the non- ASCII letters in Danish, as they follow the other letters nicely. So, the X3J11 people think, the poor Europeans can't use ASCII: we'll have to invent some kludge to bring C to their benighted shores. The only excuse for inventing something so horrible is that it only breaks a very few programs, and that it won't be used anyway. You see, over here we get by just fine without trigraphs. The less fortunate are stuck with a national character set, and have to put up with seeing the various punctuation as letters - they are not as visually distinctive (and the brackets and braces don't pair naturally), but with a little attention to layout one gets by quite well. And it's _much_ better than trigraphs. The lucky ones have terminals which can switch between ASCII and national character sets. If not for the warped minds of the terminal manufacturers, this would be the perfect solution. But we (at this institute) have yet to see a terminal with an escape sequence to switch character sets, or (and this is worse) one whose keyboard layout did _not_ change with the character set shown on the screen. (And none of them had LCD keytops). So we have to pay the importer to hack new PROMs to enable us to switch without moving the keys around. But I digress. By the way, I find that it's easier to read Danish with ASCII characters than it is to parse convoluted C code in Danish characters, so I hardly ever bother to switch any more. To make it pleasant to use C and national letters in the same file, there would have to be _convenient_ replacements for the ASCII characters in question, and it would have to allow the national letters to be used in identifiers (trigraphs don't). This cannot be done as an extension of the ASCII C input format because the national letters are punctuation in ASCII. Now we're talking about an alternate input format for C - we'll have to tell the compiler if a given source file is in the `old' or the `new' format. On the other hand this frees us to use extra keywords etc. The new format shouldn't use any characters that may be replaced in national character sets. The tokens [ ] { } | || (and in some compilers |=) must be replaced; one off-the-cuff possibility is (. .) beg end or cor (or=). We need a new pre-processor escape and a new string escape, which can't very well be keywords. // might be a possibility for both, as it's rare in C, but does it look too much like JCL? This new format could probably be implemented by a little lex pre-pre- processor; national characters in identifiers would have to be encoded somehow (e.g. using Q as an escape), increasing the identifier length. This would cause problems with symbolic debuggers and short-name compilers, but could easily be retrofitted on old compilers (write your own cc ...). Oh well, it wouldn't be portable anyway. Hey, anybody from GNU reading this? By the way, Standard Pascal is designed to be possible to write without specific ASCII characters: It allows (. .) for [ ] (indexing), and (* *) for { } (comments). Since e.g. .5 is a legal constant, this may cause unexpected parse errors for programmers who're unaware of the feature. -- Lars Mathiesen, DIKU, U of Copenhagen, Denmark [uunet!]mcvax!diku!thorinn Institute of Datalogy -- we're scientists, not engineers.
nather@ut-sally.UUCP (Ed Nather) (05/29/88)
In article <8805271311.AA12359@decwrl.dec.com>, minow@thundr.dec.com (Martin Minow THUNDR::MINOW ML3-5/U26 223-9922) writes: [much clearly stated wisdom omitted] > > Because of the speed of conversion to ISO-8859 (and similar 8-bit > environments), coupled with ambiguities in the definition of trigraphs, > I recommended in my comments to the standard that they be dropped. > The committee rejected my arguments, but I would hope they reconsider > before release of the standard. > So would I. The many, many negative comments about trigraphs on the net, some from Europeans who would be expected to "benefit" from this new, ugly and totally untested idea, say it is not just bad, but very bad. Why mess up a fine job (according to dmr) of standardizing by quietly introducing something that is so ugly it will never be used? Of course, compilers which comply with the new standard might be advertized as "Not Including Trigraphs" to gain sales, in the same way the ads say "Not Copy Protected." -- Ed Nather Astronomy Dept, U of Texas @ Austin {allegra,ihnp4}!{noao,ut-sally}!utastro!nather nather@astro.AS.UTEXAS.EDU
bill@proxftl.UUCP (T. William Wells) (05/29/88)
In article <5215@ico.ISC.COM>, rcd@ico.ISC.COM (Dick Dunn) writes: > [lots of stuff demonstrating how trigraphs break existing code]. > Replacing all the nasty characters with corresponding trigraphs gives: > > if (line??(0??)=='??=' ??!??! line??(0??)=='%') ??< > prepro(&line??(1??)); > linect++; > ??> Ugh. How horrible. However, I imagine that few programmers will actually have to cope with this. As you suggest, the effort of using the trigraphs would not be well rewarded; however, mechanical translation of programs without the trigraphs into those with trigraphs would permit compilation of existing programs (and those written offline) on a machine without the characters. > A general question: Has the trigraph mechanism been tried out, in real > practice, anywhere prior to the introduction in X3J11? If so, I'd like to > hear about how it's worked out. I remember all to well writing APL on a machine that had two kinds of terminals: those with the APL character set and those without; digraphs were used for entry using the latter. I also remember the intense competition to get the terminals with the APL set. BUT, we did write an awful lot of code with the digraphs.
bts@sas.UUCP (Brian T. Schellenberger) (05/30/88)
In article <10949@apple.Apple.Com> lenoil@apple.UUCP (Robert Lenoil) writes: |In article <5215@ico.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes: |> Note also that it is common practice to use "?" in initializing strings |> where the "?" positions will be replaced at execution time. | |Dick is dead right here. What is the justification for breaking existing |programs when the ability to include untypeable characters into strings already |exists via the \xxx mechanism? Instead of introducing a totally new notion |(to C, anyway) of trigraphs, why not simply extend the backslash escape |mechanism to be valid outside of strings? This would allow the use of #defines |to perform the same function as trigraphs: | |#define ??< \173 /* open brace */ |#define ??> \175 /* close brace */ No, you are both DEAD WRONG here. This will break badly on the IBM, PR1ME, and other non-ASCII machines. You should *NEVER* assume anything (that the ANSI C standard doesn't guarantee) about the character set in portable programs. And if your program isn't intended to be portable, ANSI is irrelevent anyway. -- --Brian, the man from Babble-on. |Brian T. Schellenberger| ...!mcnc!rti!sas!bts | |104 Willoughby Lane |work: (919) 467-8000 x7783| |Cary, NC 27513 |home: (919) 469-9389 |
aledm@cvaxa.sussex.ac.uk (Aled Morris) (05/30/88)
In article <10949@apple.Apple.Com>, lenoil@Apple.COM (Robert Lenoil) writes: > Instead of introducing a totally new notion > (to C, anyway) of trigraphs, why not simply extend the backslash escape > mechanism to be valid outside of strings? I strongly agree with this proposal. Trigraphs introduce a totally new feature into the language, which is going to take some getting used to. I can see some bugs creeping into my strings (and I bet they wont be in the strings that get used very often, so they won't be easy to spot!) Just one minor problem---isn't the backslash character one of the glyphs missing from the Invarient Code Set? Ah well.... Aled Morris Janet/Arpa: aledm@uk.ac.sussex.cvaxa | School of Cognitive Science uucp: ..!mcvax!ukc!cvaxa!aledm | University of Sussex talk: +44-(0)273-606755 e2372 | Falmer, Brighton, England "I'm living in the future/I feel wonderful/I'm tipping over backwards... I'm so ambitious/I'm looking back/I'm running a race/and your the book i read"
karl@haddock.ISC.COM (Karl Heuer) (06/01/88)
The story so far: X3J11/ISO says that trigraphs have to exist because some important character sets don't include symbols like "#". However, some external representation of this character has to exist anyway. After all, I can do putc('#', outf) to a text stream and read it back in, whereupon it must compare equal to '#'; hence there is already some mapping, independent of trigraphs, between the source character set and the external character set. Why can't the translator use this mapping instead of trigraphs? Example: suppose I don't have '#' but I do have at least one character which is not part of ISO 646 (say, '$'). When writing to a text stream, in addition to possibly mucking around with newlines I convert '#' to the digraph '$='. I do the opposite conversion on input. There is no '$' in the source character set. My compiler and text editor are both written in portable C, and neither knows about this translation (only the stdio library does). There's no need for '$' to even be printable. Rebuttal, anyone? Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
boby@pyuxf.UUCP (robert yaeger) (06/01/88)
In article <3655@pasteur.Berkeley.Edu>, faustus@ic.Berkeley.EDU.UUCP writes: > Nobody has said what the existing practice is with regard to European > character sets. Do Europeans just use an ascii keyboard when they want > to use C? Or do they use u-umlaut for backslash (or whatever it is)? > Trigraphs are so ugly I can't believe anybody actually uses them, or > will use them if they're part of C. > > I think trigraphs are a trick of American terminal manufacturers who > want to fool Europeans into thinking they can use their terminals for > writing programs. Well just to let you know, trigraphs are indeed needed in the good ol' USA. Try writing MVS/c programs using a 3270! Fortunately, the only trigraphs needed are the ??( and ??) ( ie., [ and ] ). The practice we've adopted is to code trigraphs only when declaring arrays. All references to these arrays in the code use ptr arithmetic. This contains the ugliness of them to the declare sections. -- Bob Yaeger uucp : ...!inhp4!bellcore!pyuxf!boby phone: 1-201-699-5128
gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/01/88)
In article <4314@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes: >However, some external representation of this character has to exist anyway. >After all, I can do putc('#', outf) to a text stream and read it back in, ... >Rebuttal, anyone? How can it be rebutted? It's exactly correct, and is why I think trigraphs were unnecessary in the first place. Note: the code might be written "putc('??=', outf)" but it's still a distinct character represented in the proposed C standard by the glyph "#". Sites that want to import strictly conforming programs have to be able to handle non-trigraph sources anyway. Trigraphs are a (poor) solution to the wrong problem.
minow@thundr.dec.com (Martin Minow THUNDR::MINOW ML3-5/U26 223-9922) (06/02/88)
Perhaps one of the trigraphs experts could anser a simple question: Suppose I've written a fully-compliant C compiler (that handles trigraphs) that I sell to my friend in Visby, Sweden who needs trigraphs since his language has national letters replacing the "[\]{|}" of US ASCII. He writes his first program as: ??= include <stdio.h> main() ??< printf("H{lsningar fr}n Visby p} \land!??/n"); ??> When he runs my compiler, How does it know that the charcter whose value is decimal 92 is a national letter, and not a backslash that crept in? Do I need command line arguments or a ??=pragma? Are they permitted by the standard? Will all ??=include files be required to be distributed in their trigraphed format? Martin Minow minow%thundr.dec@decwrl.dec.com
karl@haddock.ISC.COM (Karl Heuer) (06/03/88)
In article <343@pyuxf.UUCP> boby@pyuxf.UUCP (robert yaeger) writes: >Well just to let you know, trigraphs are indeed needed in the good ol' USA. >Try writing MVS/c programs using a 3270! Fortunately, the only trigraphs >needed are the ??( and ??) ( ie., [ and ] ). And what, pray tell, do you see on your terminal if you run the program #include <stdio.h> main() { printf("??(??)\n"); } >The practice we've adopted is to code trigraphs only when declaring arrays. >All references to these arrays in the code use ptr arithmetic. I once wrote a program using a certain style because it happened to look better on the one printer that was then available. I soon regretted that decision. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
karl@haddock.ISC.COM (Karl Heuer) (06/03/88)
In article <8806021259.AA21135@decwrl.dec.com> minow@thundr.dec.com (Martin Minow THUNDR::MINOW ML3-5/U26 223-9922) writes: >[A Swedish user] writes his first program as: > ??= include <stdio.h> > main() ??< printf("H{lsningar fr}n Visby p} \land!??/n"); ??> >When he runs my compiler, How does it know that the charcter whose value >is decimal 92 is a national letter, and not a backslash that crept in? >Do I need command line arguments or a ??=pragma? Are they permitted by >the standard? It's up to the implementation to specify the character set. You could have one translator which believes `\' is a backslash, and a different one which believes it's a national letter. You can select which of these two implementations is to compile the program by using a command-line argument. >Will all ??=include files be required to be distributed in their >trigraphed format? It isn't necessary; you could supply a different set of include files with the two implementations. (E.g. `cc -{' could mean `interpret {|}[\] as national characters and use /usr/include/swedish/*.h, while `cc +{' means `interpret them as punctuation and use /usr/include/ascii/*.h'.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
ok@quintus.UUCP (Richard A. O'Keefe) (06/03/88)
In article <343@pyuxf.UUCP>, boby@pyuxf.UUCP (robert yaeger) writes: > Well just to let you know, trigraphs are indeed needed in the good ol' USA. > Try writing MVS/c programs using a 3270! Fortunately, the only trigraphs > needed are the ??( and ??) ( ie., [ and ] ). The irony of this is that the manufacturer's (IBM's) character set (EBCDIC) *does* include codes for "[" and "]", it's just that a lot of their equipment doesn't quite support their own character set. The pre-ANSI method used in the SAS C compiler ("(|" for "[" and "|)" for "]") strikes me as far more readable, and neither combination is otherwise legal C.
gwyn@brl-smoke.UUCP (06/05/88)
In article <8805261740.AA00659@explorer.dgp.toronto.edu> flaps@dgp.toronto.edu (Alan J Rosenthal) writes: >Or, to put it another way, I fully expect all ansi-conforming compilers >to come in two flavours: a strictly conforming one and a useful one. I've already demonstrated that trigraph mapping is virtually a non-problem, since accidental trigraph sequences in existing code are quite rare. As long as we're guessing about the future, what I expect to see on many systems is a choice (perhaps via "switches", more usefully just as a separate name for the compile command) between (a) backward-compatible C, probably with most of the newer non-conflicting Standard C features and (b) fully-conforming Standard C. A vendor who tries to modify (b) to provide the vendor's notion of what is "useful" will not be selling any compilers to me, since I will need full (b) for my strictly conforming applications. That is what having a standard is all about. The main reason for (a) on UNIX systems would be to support Reiser cpp abuse, which many programmers have been guilty of. Otherwise, Standard C is pretty much upward compatible with old Random C.
karl@haddock.ISC.COM (Karl Heuer) (06/06/88)
In article <8805261740.AA00659@explorer.dgp.toronto.edu> flaps@dgp.toronto.edu (Alan J Rosenthal) writes: >In other words, a useful compiler will not implement trigraphs but will >give a warning message when it encounters them. If it issues warning messages, why should the compiler bother to implement the old meaning? I'd think that the set of users who have programs containing accidental trigraphs *and* can look at a compiler warning without wanting to make it go away (by fixing their programs) would be very small. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
boby@pyuxf.UUCP (robert yaeger) (06/06/88)
In article <4393@haddock.ISC.COM>, karl@haddock.UUCP writes: > >Try writing MVS/c programs using a 3270! Fortunately, the only trigraphs > >needed are the ??( and ??) ( ie., [ and ] ). > And what, pray tell, do you see on your terminal if you run the program > #include <stdio.h> > main() { printf("??(??)\n"); } > The answer is :: (: is the 3270 representation for an unprintable character). > >The practice we've adopted is to code trigraphs only when declaring arrays. > >All references to these arrays in the code use ptr arithmetic. > I once wrote a program using a certain style because it happened to look > better on the one printer that was then available. I soon regretted that > decision. > I don't see the connection here, if you decide to use trigraphs instead of ptr arithmetic then it won't matter what printer you use, the code will always be ugly, and hard to maintain. BTW, there other solutions, 1. you can hard code the EBCDIC codes, ie x'ad' and x'bd' but these show up as unprintables on the 3270 and are also hard to edit after they are embedded in the source. This is what was done before trigraphs. 2. you can use an APL terminal which does support these characters. As another posting has pointed out these chars are in EBCDIC but are not supported on the 3270 terminal. -- Bob Yaeger uucp : ...!inhp4!bellcore!pyuxf!boby phone: 1-201-699-5128