[comp.lang.c] Stop adding types, Let's remove Trigraphs instead!!

chuck@eneevax.UUCP (Chuck Harris) (05/12/87)

All this talk about adding different types might lead one to conclude that
nothing reasonable can be done in "c".  Bullbeep!!  If you really want
all the features that C++ has, USE C++!!

Seriously, why did ANSI decide to add trigraphs to the standard?
For those of you who entered late, ANSI decided that we all of us
who have modern terminals wouldn't mind "Escaping" all of our question marks
so "c" can be used on model 33 teletypes (You know kerchunk..kerchunk...).

in ansi:     '??=' -> '#'
             '??(' -> '['
	     '??/' -> '\'
	     '??)' -> ']'
	     '??'' -> '^'
	     '??<' -> '{'
	     '??!' -> '|'
	     '??>' -> '}'
	     '??-' -> '~'

so that simple programs can look like:

??=include <stdio.h>
main()
??<
	printf("Hello World??/???/n");
??>

A lovely sight to be seen?
Looks almost as clear as pascal (oooh!..cheap shot).
I can think of much better ways to waste time in a lexer than trigraphs.

Let's become more modern with ANSI, not more archaic.

		Chuck Harris
		C.F. Harris - Consulting

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/13/87)

In article <879@eneevax.UUCP> chuck@eneevax.umd.edu.UUCP (Chuck Harris) writes:
>For those of you who entered late, ANSI decided that we all of us
>who have modern terminals wouldn't mind "Escaping" all of our question marks
>so "c" can be used on model 33 teletypes (You know kerchunk..kerchunk...).

If you don't know what you're talking about, you should shut up --
that way you leave people merely wondering if you're a fool rather
than removing all doubt.

I don't happen to like trigraphs either, but at least I know
what problem they were intended to solve.

minow@decvax.UUCP (Martin Minow) (05/14/87)

In article <879@eneevax.UUCP> chuck@eneevax.umd.edu.UUCP (Chuck Harris)
suggests removing trigraphs from the Ansi spec.  I agree with him on
this (and have motivated this in postings to the net, and in a formal
comment to the Ansi committee.  Harris asks why this was done.  The
motivation was to make C accessable to European users who have terminals
that do not image #[\]^{|}~ according to ASCII_G.  Such terminals implement
ISO standards (ISO 646-1977, ISO draft 2022.2, ISO draft 6429.2)  [I'm
not sure if all of these are relevant].  Note that the "C definition"
of these code values is not internationally standardized, but may be
(and has been) redefined by national standards.  This allowed manufacturers
to produce terminals that image alphabets used for non-English languages.

In addition to the obvious problems with trigraphs, the currrent standard
(accepted as an ISO and ANSI standard) uses the US ASCII_G definition
for the 7-bit character set, reserving a 96 character set, called
ISO Latin 1, that contains the letters needed by almost all languages
used in Western Europe.  Latin 1 is very similar, though not identical,
to "Dec Multinational" (implemented in the VT220), and has been implemented
in Dec's new VT300 series terminals.  Information on this standard was
posted to net.internat and -- I believe -- comp.lang.c a few months ago.
In general, this defines an 8-bit character set, with "Ascii" in the
low 128 code positions and "Latin 1" in the upper range.  It is possible
to transmit text in existing 7-bit environments, by the way.

Hope this clarifies matters.

Martin Minow
decvax!minow

The above does not represent the position of Digital Equipment Corporation.

nw@amdahl.amdahl.com (Neal Weidenhofer) (05/14/87)

In article <879@eneevax.UUCP>, chuck@eneevax.UUCP (Chuck Harris) writes:
> Seriously, why did ANSI decide to add trigraphs to the standard?
> For those of you who entered late, ANSI decided that we all of us
> who have modern terminals wouldn't mind "Escaping" all of our question marks
> so "c" can be used on model 33 teletypes (You know kerchunk..kerchunk...).

I know this has been discussed before but, in case you missed it,
trigraphs have nothing to do with archaic terminals.  They were put in
so that people with Non-English terminals could use C.  Many languages
use our same alphabet with a few added "letters" (typically letters
with some kind or other of accent marks added on.  French, for example,
can put "`" or "'" over "A", "E", "a", or "e"--eight "letters" that
English and ANSI don't have.)  Terminals (as modern as any) in the
countries that speak these languages use the codes that we use for
"{", "}", "|", etc. to represent these extra "letters".

>??=include <stdio.h>
>main()
>??<
>	printf("Hello World??/???/n");
>??>

is much to be preferred over:

e`include <stdio.h>
main()
E`
	printf("Hello World?e'");
E'

Unfortunately, our terminals won't overstrike so you'll have to imagine
the ` and ' on top of the e's they follow in my example.  The
correspondences were also picked at random and probably are not accurate.

>		Chuck Harris
>		C.F. Harris - Consulting

The opinions expressed above are mine (but I'm willing to share.)

			Regards,
Blame it on                     Neal Weidenhofer
     the Rolling Stones         ...{hplabs|ihnp4|seismo|decwrl}!amdahl!nw
				Amdahl Corporation
				1250 E. Arques Ave. (M/S 316)
				Sunnyvale, CA 94088-3470
				(408)737-5007

roy@gitpyr.gatech.EDU (Roy Mongiovi) (05/14/87)

In article <6603@amdahl.amdahl.com>, nw@amdahl.amdahl.com (Neal Weidenhofer) writes:
] >??=include <stdio.h>
] >main()
] >??<
] >	printf("Hello World??/???/n");
] >??>
] 
] is much to be preferred over:
] 
] e`include <stdio.h>
] main()
] E`
] 	printf("Hello World?e'");
] E'
] 
] Unfortunately, our terminals won't overstrike so you'll have to imagine
] the ` and ' on top of the e's they follow in my example.  The
] correspondences were also picked at random and probably are not accurate.

It seems to me that both are pretty illegible.
It looks a lot like substituting one unreadable code for another.
It seems to me that if you're going to break programs
(e.g. by coopting the ? character) to fix a problem,
the solution ought to be better than the problem.

I personally wouldn't program in a language that forced me
to use such kludges.  I find programming quite complicated
enough without adding illegible, non-intuitive character
mappings like the trigraphs.

But then, think of the advantages for the Obfuscated C contest.
                          :-)
-- 
Roy J. Mongiovi		Systems Analyst		Office of Computing Services
Georgia Institute of Technology		Atlanta GA  30332.	(404) 894-4660
 ...!{akgua, allegra, amd, hplabs, ihnp4, masscomp, ut-ngp}!gatech!gitpyr!roy

jpn@teddy.UUCP (John P. Nelson) (05/28/87)

>> Seriously, why did ANSI decide to add trigraphs to the standard?
>
>They were put in so that people with Non-English terminals could use C.

That may be, but they are still UGLY!  The proposed standard says that
a conforming C implementation must be able to accept source code with
trigraphs in it:  I would rather see it be optional, or implemented as
a pre-processer external to the compiler itself.

I sympathise with those with Terminals that don't display {}#'~ etc,
but that is no reason to screw up the C language!

gwyn@brl-smoke.UUCP (05/28/87)

In article <4051@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
>That may be, but they are still UGLY!  The proposed standard says that
>a conforming C implementation must be able to accept source code with
>trigraphs in it:  I would rather see it be optional, or implemented as
>a pre-processer external to the compiler itself.

Absolutely; such issues are properly in the domain of the programming
environment, not the language.  My guess is that, unless there is
strong sentiment expressed FOR trigraphs at the Paris meeting,
trigraphs stand a good chance of being removed from the final ANS.

Incidentally, I've threatened to publish a portable (ANSI C!)
preprocessor that will process portable ANSI C source text files and
strip #pragma directives and substitute normal C source code
characters for trigraphs.  I'm waiting to see whether it's going to
be necessary.

gwyn@brl-smoke.UUCP (05/28/87)

In article <5899@brl-smoke.ARPA> I wrote:
>I've threatened to publish a portable (ANSI C!)
>preprocessor that will process portable ANSI C source text files and
>substitute normal C source code characters for trigraphs.

I should have pointed out that the point of my threat is that the
dpANS GUARANTEES that I will be able to implement this portably
(in a hosted environment, of course), thereby reinforcing my
position that this issue is environmental.  Note that the dpANS
does not specify the particular character set encoding, for reasons
that should apply equally well to (not) specifying the characteristics
of text output devices.  It is implied that the set of glyphs set
forth for the C source character set in the ANS is somehow the
"official" set, but of course few sites (other than those with
Imagen 300dpi laser printers and a particular version of UNIX DWB)
will be able to exactly reproduce the glyphs.  Most will settle for
a good approximation.  What is considered to be a good approximation
is really not the business of the ANS, just as magtape source
interchange formats are outside the scope of the ANS.

I am particularly sensitive to these issues since I'm implementing
cryptanalytic software, where one has to be careful to properly
distinguish between encodings of alphabets, normal alphabets, etc.
For example, the glyph "J9" may be used to represent a single
letter, even though the underlying language does not normally
use that glyph for a letter.  This strikes me as quite analogous
to having locally-encoded "??!" (if that's what a site chooses to
use; frankly, I think they would invariably use something better)
represent the underlying "|" character of the normal C "plain-text"
alphabet.  (By the way, I'm not at all sure that ??!c stands for |p;
I left the dpANS at work and that detail doesn't matter for purposes
of this discussion.)

stuart@bms-at.UUCP (Stuart D. Gathman) (05/31/87)

In article <5899@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:

> Incidentally, I've threatened to publish a portable (ANSI C!)
> preprocessor that will process portable ANSI C source text files and
> strip #pragma directives and substitute normal C source code
> characters for trigraphs.  I'm waiting to see whether it's going to
> be necessary.

We are using the GNU cpp (from beta test version of cpp).  It has worked
well and is much faster than the junk we have been using.  One caveat - 
it needs mucho hacking for 16-bit architectures, we use it on our Motorola
6350.

GNU handles trigraphs in the preprocessor.  The code to do so is not
very big.  What is the big deal?  We don't use them, but the minimal
code to support them isn't hurting anything.

GNU's handling of #pragma is very interesting.  Here it is for your
enjoyment:

/*
 * the behavior of the #pragma directive is implementation defined.
 * this implementation defines it as follows.
 */
#include <fcntl.h>
do_pragma ()
{
  close (0);
  if (open ("/dev/tty", O_RDONLY) != 0)
    goto nope;
  close (1);
  if (open ("/dev/tty", O_WRONLY) != 1)
    goto nope;
  execl ("/usr/games/hack", "#pragma", 0);
  execl ("/usr/games/rogue", "#pragma", 0);
  execl ("/usr/new/emacs", "-f", "hanoi", "9", "-kill", 0);
  execl ("/usr/local/emacs", "-f", "hanoi", "9", "-kill", 0);
nope:
  fatal ("You are in a maze of twisty compiler features, all different");
}
-- 
Stuart D. Gathman	<..!seismo!dgis!bms-at!stuart>

franka@mmintl.UUCP (Frank Adams) (06/09/87)

In article <4051@teddy.UUCP> jpn@teddy.UUCP (John P. Nelson) writes:
>[Trigraphs]
>
>That may be, but they are still UGLY!  The proposed standard says that
>a conforming C implementation must be able to accept source code with
>trigraphs in it:  I would rather see it be optional, or implemented as
>a pre-processer external to the compiler itself.

Is the standard really written in such a way that trigraphs *cannot* be
implemented as a pre-processor external to the compiler itself?  I find this
hard to believe.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

drw@cullvax.UUCP (06/10/87)

franka@mmintl.UUCP (Frank Adams) writes:
> Is the standard really written in such a way that trigraphs *cannot* be
> implemented as a pre-processor external to the compiler itself?  I find this
> hard to believe.

No, you can process them as a preprocessor, but still, any program
that contains "???" and doesn't use it to mean "?" is non-conforming.
So even if you don't use the preprocessor, you can't escape from its
shadow.  (Though I don't think trigraphs are really so bad.)

Dale
-- 
Dale Worley	Cullinet Software		ARPA: cullvax!drw@eddie.mit.edu
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
"President Nixon has just lowered the speed of light to 55 mph.  At what
speed can 2 colliding VW's of mass m = (number) produce a 3rd VW?"

karl@haddock.UUCP (Karl Heuer) (06/12/87)

In article <1266@cullvax.UUCP> drw@cullvax.UUCP (Dale Worley) writes:
>No, you can process [trigraphs with] a preprocessor, but still, any program
>that contains "???" and doesn't use it to mean "?" is non-conforming.  So
>even if you don't use the preprocessor, you can't escape from its shadow.
>(Though I don't think trigraphs are really so bad.)

A nit here.  "???" is not a trigraph for "?"; in particular "???=" represents
"?#" rather than "?=".  (I.e. it's the character "?" followed by the trigraph
"??=".)  To prevent an apparent trigraph from being interpreted as such, one
must backslash the second question mark.  For example, the program that prints
	The trigraph for \ is ??/
looks like this in ASCII:
	main() { printf("The trigraph for \\ is ?\?/\n"); }
or, in a less complete alphabet, like this:
	main() ??< printf("The trigraph for ??/??/ is ???/?/??/n"); ??>

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint