ok@quintus.UUCP (Richard A. O'Keefe) (05/28/88)
There has recently been some discussion of trigraphs in this newsgroup, with distaste and apprehension being the predominant themes. I share the distaste, but I decided to do something about the apprehension. Here is a program which can be used to determine whether ANSI trigraph processing will have an adverse effect on your code. It is a filter which copies its standard input to its standard output, replacing trigraphs by the corresponding ASCII characters (even in comments). Now you can fix your programs _before_ the ANSI compiler arrives. Be warned: it is _your_ responsibility to check this program before you use it. I believe it to be correct, but I'm not getting any money and I'm not taking any responsibility. Ying tong iddle i po! -------------------------------- cut here -------------------------------- /* File : 3g.c Author : Richard A. O'Keefe @ Quintus Computer Systems, Inc. Updated: 27 May 1988 Purpose: Trigraph elimination for C. The draft ANSI standard for C introduces so-called "trigraphs" so that certain characters in ASCII which are not in the ISO 646 base can be represented. The trigraphs are ??= # ??( [ ??/ \ ??) ] ??' ^ ??< { ??! | ??> } ??- ~ Although there are other characters which could benefit from such treatment, C doesn't use them. The ?? combination is left as is if it is not part of one of these sequences. Trigraphs are not a popular feature, and people are worried about whether their programs will work in ANSI C. This program is meant to serve as a tool for finding out. 3g <stdin >stdout replaces all the trigraph sequences in its standard input stream by the appropriate ASCII characters, and otherwise copies its standard input to its standard output. To find out whether a program of yours will be adversely affected by trigraphs, filter it through this program and compare the result with the original. In UNIX: #!/bin/sh #Usage: 3gc foobaz.c 3g <$1 | diff - $1 Note that the ease with which a filter like this can be written makes the claim that such a facility is needed in the _language_ somewhat dubious. */ #include <stdio.h> #define TGCHAR '?' /*ARGSUSED*/ main(argc, argv) int argc; char **argv; { register FILE *card = stdin; register FILE *line = stdout; register int c; register int state; /* There are three states: 0 : not in a trigraph sequence 1 : first character of a possible trigraph sequence read 2 : second character of a possible trigraph sequence read */ for (state = 0; (c = getc(card)) != EOF; ) { if (c == TGCHAR) { if (state == 2) putc(c, line); else state++; } else switch (state) { case 1: state = 0; putc(TGCHAR, line); /* FALL THROUGH */ case 0: putc(c, line); break; case 2: switch (c) { case '=': c = '#'; break; case '(': c = '['; break; case '/': c = '\\'; break; case ')': c = ']'; break; case '\'': c = '^'; break; case '<': c = '{'; break; case '!': c = '|'; break; case '>': c = '}'; break; case '-': c = '~'; break; default: putc(TGCHAR, line); putc(TGCHAR, line); break; } putc(c, line); state = 0; } } switch (state) { case 2: putc(TGCHAR, line); /* FALL THROUGH */ case 1: putc(TGCHAR, line); /* FALL THROUGH */ case 0: break; } exit(0); } -------------------------------- cut here --------------------------------
rcd@ico.ISC.COM (Dick Dunn) (06/02/88)
> Here is a program which can be used to determine whether ANSI trigraph > processing will have an adverse effect on your code... ...and then, instructions on how to use it in UNIX... but I wonder why a UNIX user would write a program to do what a one-line command can do? And, all seriousness aside, shouldn't the program have been written with trigraphs? (It IS kind of fun to see what it looks like.) -- Dick Dunn UUCP: {ncar,cbosgd,nbires}!ico!rcd (303)449-2870 ...If you get confused just listen to the music play...
ok@quintus.UUCP (Richard A. O'Keefe) (06/03/88)
In article <5611@ico.ISC.COM>, rcd@ico.ISC.COM (Dick Dunn) writes: > > Here is a program which can be used to determine whether ANSI trigraph > > processing will have an adverse effect on your code... > ...and then, instructions on how to use it in UNIX... > but I wonder why a UNIX user would write a program to do what a one-line > command can do? Thanks I wasn't expecting, but sneers I can do without. (1) I'd like to know what that one-line command is. tr can't do the job, because it maps single characters to single characters. You can do it with sed, with a script like s/??</{/g ... s/??\//\\/g but that hardly counts as a one-line command. (One of the messages in comp.lang.c proposed sed -e "s;??\\([-=(/)'<!>]\\);?\\\\?\\1;g" *as a method of protecting against trigraphs*, but that is not what 3g does.) (2) Surely it must be obvious that the program was not provided for the sole benefit of UNIX users. Most of the trigraph characters are used heavily by 'sh' and 'csh'. So anyone who is using UNIX is *already* using some solution to the non-ISO-646-character problem. I don't see why people using VMS or MSDOS should be left without a tool for checking whether trigraphs will hurt them just because they aren't running UNIX. > And, all seriousness aside, shouldn't the program have been written with > trigraphs? (It IS kind of fun to see what it looks like.) (3) The program was provided to let people check whether their code would be adversely affected if and when trigraph-processing compilers arrived. Trigraphs not being widely supported yet, using them in the program would have been a good way of making it unusable. That was not my goal.