[comp.lang.c] Trigraphs: a program

ok@quintus.UUCP (Richard A. O'Keefe) (05/28/88)

There has recently been some discussion of trigraphs in this newsgroup,
with distaste and apprehension being the predominant themes.  I share
the distaste, but I decided to do something about the apprehension.
Here is a program which can be used to determine whether ANSI trigraph
processing will have an adverse effect on your code.  It is a filter
which copies its standard input to its standard output, replacing
trigraphs by the corresponding ASCII characters (even in comments).
Now you can fix your programs _before_ the ANSI compiler arrives.

Be warned: it is _your_ responsibility to check this program before you
use it.  I believe it to be correct, but I'm not getting any money and
I'm not taking any responsibility.  Ying tong iddle i po!

-------------------------------- cut here --------------------------------
/*  File   : 3g.c
    Author : Richard A. O'Keefe @ Quintus Computer Systems, Inc.
    Updated: 27 May 1988
    Purpose: Trigraph elimination for C.

    The draft ANSI standard for C introduces so-called "trigraphs" so
    that certain characters in ASCII which are not in the ISO 646 base
    can be represented.  The trigraphs are
	??=	#
	??(	[
	??/	\
	??)	]
	??'	^
	??<	{
	??!	|
	??>	}
	??-	~
    Although there are other characters which could benefit from such
    treatment, C doesn't use them.  The ?? combination is left as is
    if it is not part of one of these sequences.

    Trigraphs are not a popular feature, and people are worried about
    whether their programs will work in ANSI C.  This program is meant
    to serve as a tool for finding out.

    3g <stdin >stdout
	replaces all the trigraph sequences in its standard input stream
	by the appropriate ASCII characters, and otherwise copies its
	standard input to its standard output.

    To find out whether a program of yours will be adversely affected by
    trigraphs, filter it through this program and compare the result with
    the original.  In UNIX:
	#!/bin/sh
	#Usage: 3gc foobaz.c
	3g <$1 | diff - $1

    Note that the ease with which a filter like this can be written makes
    the claim that such a facility is needed in the _language_ somewhat
    dubious.
*/

#include <stdio.h>
#define TGCHAR '?'
				/*ARGSUSED*/
main(argc, argv)
    int argc;
    char **argv;
    {
	register FILE *card = stdin;
	register FILE *line = stdout;
	register int c;
	register int state;

	/*  There are three states:
	    0 : not in a trigraph sequence
	    1 : first character of a possible trigraph sequence read
	    2 : second character of a possible trigraph sequence read
	*/
	for (state = 0; (c = getc(card)) != EOF; ) {
	    if (c == TGCHAR) {
		if (state == 2) putc(c, line);
		else state++;
	    } else
	    switch (state) {
		case 1:
		    state = 0;
		    putc(TGCHAR, line);
		    /* FALL THROUGH */
		case 0:
		    putc(c, line);
		    break;
		case 2:
		    switch (c) {
			case '=':   c = '#';   break;
			case '(':   c = '[';   break;
			case '/':   c = '\\';  break;
			case ')':   c = ']';   break;
			case '\'':  c = '^';   break;
			case '<':   c = '{';   break;
			case '!':   c = '|';   break;
			case '>':   c = '}';   break;
			case '-':   c = '~';   break;
			default:    putc(TGCHAR, line);
				    putc(TGCHAR, line); break;
		    }
		    putc(c, line);
		    state = 0;
	    }
	}
	switch (state) {
	    case 2: putc(TGCHAR, line); /* FALL THROUGH */
	    case 1: putc(TGCHAR, line); /* FALL THROUGH */
	    case 0: break;
	}
	exit(0);
    }

-------------------------------- cut here --------------------------------

rcd@ico.ISC.COM (Dick Dunn) (06/02/88)

> Here is a program which can be used to determine whether ANSI trigraph
> processing will have an adverse effect on your code...
...and then, instructions on how to use it in UNIX...
but I wonder why a UNIX user would write a program to do what a one-line
command can do?

And, all seriousness aside, shouldn't the program have been written with
trigraphs?  (It IS kind of fun to see what it looks like.)
-- 
Dick Dunn      UUCP: {ncar,cbosgd,nbires}!ico!rcd       (303)449-2870
   ...If you get confused just listen to the music play...

ok@quintus.UUCP (Richard A. O'Keefe) (06/03/88)

In article <5611@ico.ISC.COM>, rcd@ico.ISC.COM (Dick Dunn) writes:
> > Here is a program which can be used to determine whether ANSI trigraph
> > processing will have an adverse effect on your code...
> ...and then, instructions on how to use it in UNIX...
> but I wonder why a UNIX user would write a program to do what a one-line
> command can do?

Thanks I wasn't expecting, but sneers I can do without.
(1) I'd like to know what that one-line command is.  tr can't do the
    job, because it maps single characters to single characters.  You
    can do it with sed, with a script like
	s/??</{/g
	...
	s/??\//\\/g
    but that hardly counts as a one-line command.  (One of the messages
    in comp.lang.c proposed
	sed -e "s;??\\([-=(/)'<!>]\\);?\\\\?\\1;g"
    *as a method of protecting against trigraphs*, but that is not what
    3g does.)
	
(2) Surely it must be obvious that the program was not provided for the
    sole benefit of UNIX users.  Most of the trigraph characters are used
    heavily by 'sh' and 'csh'.  So anyone who is using UNIX is *already*
    using some solution to the non-ISO-646-character problem.
    I don't see why people using VMS or MSDOS should be left without a    
    tool for checking whether trigraphs will hurt them just because they
    aren't running UNIX.

> And, all seriousness aside, shouldn't the program have been written with
> trigraphs?  (It IS kind of fun to see what it looks like.)

(3) The program was provided to let people check whether their code would
    be adversely affected if and when trigraph-processing compilers arrived.
    Trigraphs not being widely supported yet, using them in the program
    would have been a good way of making it unusable.  That was not my goal.