[comp.std.c] trigraphs

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (06/07/88)

Let's consider the various combinations of compilers and terminals.
Commonly, either of these can be US-ASCII, 7-bit French-ASCII (or
some other national character set), or 8-bit IS0-ASCII.

1) if I am using a US-ASCII terminal, I have the full C source
character set at my fingertips and all three types of compilers
must accept these characters according to the way they appear
on my screen.  Thus, I have no need for trigraphs.

2) similarly, if I am using an ISO-ASCII terminal, the keyboard will
contain the full C source character set, and all three types of
compilers must accept these characters.  Thus, I still have no need
for trigraphs.

3) finally, if I am using a 7-bit French-ASCII terminal, the situation
is a little more complicated.

3a) if the compiler only knows about US-ASCII I have a choice of
entering "\" either as "??/" or as "cedilla-c".

3b) if the compiler uses ISO-ASCII, then again I must enter "\"
either as "??/" or as "cecilla-c".

3c) and finally, if the compiler knows about French-ASCII, then I
would think that I must enter "\" as "??/", since the compiler
will treat "cedilla-c" as a real letter.  But if I try to define
static char language??(??) = "FranCais";
where the "C" is actually the cedilla-c character, then strange
things will happen since the standard says that the character set
must include the "\" character, and so the string will actually
contain "Fran\ais", which is "Fran<beep>is".  Thus again I still
have the choice of entering "\" as either "??/" or as "cedilla-c".

So, putting this all together, regardless of what the compiler's
character set is, it is only the French-ASCII terminal that has
any need of the trigraphs.  Now, on such a terminal I cannot
use the cedilla-c character as anything but a back-slash since
all three types of compilers must interpret this as a back-slash,
and not as a cedilla.

So, the only case that needs trigraphs is the French-ASCII terminal,
and such a terminal will have nine keys that I am better off not
using since they appear to give me something that I don't really
get.

People using French have three choices.  Use the trigraphs and
avoid those 9 keys; use those 9 keys, remembering their special
meanings and forget about trigraphs; or get a different terminal
and forget about trigraphs.

That reduces the cases that need trigraphs to those that have
French-ASCII terminals and that also prefer to avoid using the
national keys.

From what I can gather, there are not many people still buying
French-ASCII terminals and those that have such terminals seem
to prefer using the funny characters to using the trigraphs.
Consider that at the moment trigraphs don't even exist outside
the minds of the X3J11 Committee, and decide how many people
that now use the funny characters and are going to switch to
using trigraphs.

The number of people that would actually use trigraphs must
be amazingly small.  For what it is costing the Committee in
time, the publishers in paper, the net in shipping articles
denouncing trigraphs, and the readers in time to read these
articles, I'm sure it would be cheaper if we all chipped in
and bought new terminals for those few individuals and then
completely dropped the concept of trigraphs from the Standard.

daveb@geac.UUCP (David Collier-Brown) (06/08/88)

In article <19345@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
| Let's consider the various combinations of compilers and terminals.
| Commonly, either of these can be US-ASCII, 7-bit French-ASCII (or
| some other national character set), or 8-bit IS0-ASCII.

[ case analysis elided]

| That reduces the cases that need trigraphs to those that have
| French-ASCII terminals and that also prefer to avoid using the
| national keys.

| Consider that at the moment trigraphs don't even exist outside
| the minds of the X3J11 Committee, and decide how many people
| that now use the funny characters and are going to switch to
| using trigraphs.

  Ok, can someone quote the approximate reasoning behind the
consideration of trigraphs?  
  As Ray has made a good case against the problem's existance, I
therefor wonder 
	1) if some "outside" body has dictated that the standard
committee "solve" it[1]  or 
	2) if the committee merely misestimated the significance of
the problem.

--dave c-b
[1] Suggested without proof earlier in the discussion, source not
    recorded.
-- 
 David Collier-Brown.  {mnetor yunexus utgpu}!geac!daveb
 Geac Computers Ltd.,  | "His Majesty made you a major 
 350 Steelcase Road,   |  because he believed you would 
 Markham, Ontario.     |  know when not to obey his orders"

swarbric@tramp.Colorado.EDU (Frank Swarbrick) (06/09/88)

I'm curious, does the IS0-ASCII standard have foreign characters such as
cedilla-c, the characters with umlauts, accents, etc.?  I know that IBM-PC's
have a way for you to get characters such as these (by using Alt and the
numeric keypad), but Apples, Commodores, many terminals, etc. don't allow
them at all.  I think it would be great if all computers/terminals could
generate these in some way or another, but I guess it's more than a little
too late for that...

s-set, anyone?

Frank Swarbrick (and, yes, the net.cat)           swarbric@tramp.Colorado.EDU
...!{ncar|nbires}!boulder!tramp!swarbric
"...This spells out freedom, it means nothing to me, as long as there's a PMRC"

guido@cwi.nl (Guido van Rossum) (06/09/88)

In article <...> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>From what I can gather, there are not many people still buying
>French-ASCII terminals and those that have such terminals seem
>to prefer using the funny characters to using the trigraphs.
>Consider that at the moment trigraphs don't even exist outside
>the minds of the X3J11 Committee, and decide how many people
>that now use the funny characters and are going to switch to
>using trigraphs.

Although I would love to see that Ray is right, there is one unproven
premise here: "not many people are still buying French-ASCII terminals".
Here at CWI in Holland we usually have to fight to get US style
keyboards on our equipment instead of Dutch national keyboards.  I have
the feeling that this might be the same or worse in other European
countries, perhaps more so than in Canada (an "international" standard
requires agreement from more countries than Canada and the US :-).

I can't believe that in France, for instance, with a large autonomous
computer industry, many US style keyboards are sold.  Especially since
the number of keyboards used for data entry will always outnumber those
used for programming (unless the software crisis really gets a hold of
us :-), I'm not so sure US-ASCII keyboards will win.  Would a company
with lots of data typists and some programmers buy special keyboards for
them?  Those programmers will then have to get used to both keyboard
styles used in their organization (if they are involved in any form of
user support).

A different solution of the problem would be a tendency for keyboards to
comprise both national and US-ASCII characters, in an ISO-ASCII set.  If
this is the development Ray is referring to, I just hope he's right.
Neither the VaxStations 2000 nor the Suns 3 we have here have anything
but US-ASCII (and zillions of unused function keys).
--
Guido van Rossum, Centre for Mathematics and Computer Science (CWI), Amsterdam
guido@piring.cwi.nl or mcvax!piring!guido or guido%piring.cwi.nl@uunet.uu.net

alex@umbc3.UUCP (06/09/88)

In article <19345@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>Let's consider the various combinations of compilers and terminals.
>Commonly, either of these can be US-ASCII, 7-bit French-ASCII (or
>some other national character set), or 8-bit IS0-ASCII.

	I thought that tri-graphs were invented for IBM (EBSDIC) terminals,
and that IBM deserved them.





-- 
					:alex.

nerwin!alex@umbc3.umd.edu
alex@umbc3.umd.edu

daveh@marob.MASA.COM (Dave Hammond) (06/13/88)

In article <19345@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>So, putting this all together, regardless of what the compiler's
>character set is, it is only the French-ASCII terminal that has
>any need of the trigraphs....
>..........I'm sure it would be cheaper if we all chipped in
>and bought new terminals for those few individuals and then
>completely dropped the concept of trigraphs from the Standard.

I have been following the trigraphs discussion in comp.unix.wizards closely
to try and determine what exactly a trigraph is and why it has caused
so much discussion. Not being a wizard, I have no intention of interrupting
an ongoing discussion just to say 'hey folks, what is this thing?'.

If all this broohaha is over a method of representing non-standard characters
on non-standard terminals in minority situations, then I (like rbutterworth)
feel there has been far too much adoo over a trivial problem (not meaning to
trivialize the French terminals, mind you). If there is a farther-reaching
concept which I am not grasping, please e-mail a definition of trigraphs.

Dave Hammond
UUCP:   ...!marob!daveh
--------------------------------

daniels@teklds.TEK.COM (Scott Daniels) (06/17/88)

I was at the meeting in which trigraphs first came in, and the reason
I voted for them is fairly simple.  They were not presented as a means for
people who actually wrote C source to deal with missing characters, but as
a means for mechanical translators to pass un-encodable characters.  The
setup I imagine actually being used is:

	Programmers in ??land whose  national character set uses { for the
all-important qz ligature  (and who write comments using this a lot) happen
to have a graphic on the $ character which looks just fine as an open brace.
The programmers code away in this format locally, and (having hacked their
C compiler) everything works out.  When they decide to port their code to
another country, they can mechanically translate those chars to the proper
trigraph, and thus (1) mail source code, and (2) rely on the destination to
use their best guess for those characters.

	It was considered a great advantage by many that the trigraphs 
chosen were ugly: this meant that nobody would be tempted to write with
them, they were only for mechanical translations (a sort of least-common-
denominator format).

Scott Daniels	(I was only briefly on the committee, another startup died)
		-daniels@teklds.TEK.COM (or @teklds.UUCP)

scs@athena.mit.edu (Steve Summit) (06/17/88)

Here's what I don't understand about trigraphs in character
strings (the only kind I'm worried about): of what possible
utility are they?  As I understand it, trigraphs let you utter
characters, which you need in C, which your local terminal
doesn't understand.  However, the thing you usually do with
strings is print them out (usually on your local terminal) so if
your local terminal can't handle the character, why is it
important to have a special way to encode it within a string?

If I am overlooking some obvious or oft-discussed fact, or if I
am repeating Ray Butterworth's argument, please respond by mail
or not at all; the net has had about enough trigraph articles.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

karl@haddock.ima.isc.com (Karl Heuer) (04/19/89)

In article <10159@socslgw.csl.sony.JUNET> diamond@ (Norman Diamond) writes:
>In article <12629@haddock.ima.isc.com> karl@haddock (Karl Heuer) writes:
>>`printf("??=")' will output `#', not `??='.
>
>Yes indeed, such conversions take place even in strings.  I wonder how
>such programs execute in environments that don't have a '#' character.

The C constant expression ('#'), or its alternate spelling ('??='), even
though it might not correspond to any printable glyph, must have a value
distinct from any other character.  Since the implementation is allowed to
apply a fairly arbitrary mapping when writing to a text stream, it's entirely
possible for this to be written to the device as a digraph `$=', for example,
where `$' is any convenient unused value.  (It needn't even be printable,
though this would be convenient for a terminal device.)

Given such a mapping, and its inverse on input streams, it will appear to any
conforming C program *as if* the execution environment really did have a `#'
character.  In particular, an editor or compiler written in C would
automatically do the right thing.  This is why I consider trigraphs to be
unnecessary: a transparent mapping is already guaranteed to exist.  Alas, the
feature was too deeply entrenched in the Draft by the time I realized this.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

daniels@ogccse.ogc.edu (Scott David Daniels) (04/27/89)

One fact that seems not to have come out yet is that trigraphs as added
by X3J11 were an existing scheme that was proposed for adoption, not a
new design that simply seemed to be a good idea.  I suspect that is part
of the problem with the Danish proposal: the trigraphs we voted on were
something that had been in use for a couple of years, whereas the Danish 
scheme sounded like "this would be an even better idea:...", very little
evidence there would be no nasty surprises later.
-Scott Daniels (short-term X3J11 member may moons ago

karl@haddock.ima.isc.com (Karl Heuer) (04/28/89)

In article <2469@ogccse.ogc.edu> daniels@ogccse.UUCP (Scott David Daniels) writes:
>One fact that seems not to have come out yet is that trigraphs as added
>by X3J11 were an existing scheme that was proposed for adoption, not a
>new design that simply seemed to be a good idea.

They were?  I was under the impression that X3J11 had invented them.  (In
which case they made a good argument for why the X3J11 charter generally
forbade such inventions.)

>... whereas the Danish scheme sounded like "this would be an even better...

What exactly was the Danish proposal, and in what ways is it alleged to be
better than the pANS trigraphs?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

minow@mountn.dec.com (Martin Minow) (04/28/89)

It was suggested that trigraphs were an established practice before ANSI
added them to the Draft Standard language definition.  Could someone
post a reference to a widely-distributed compiler that supported trigraphs
before 1984? As far as I know, neither pcc, Berkeley Unix, Decus C, Vax-C
(VMS), or Think (Lightspeed) C for the Macintosh support/supported trigraphs.

I had argued against them on comp.std.c and during all public comment periods
(though I've never actually received a written reply directly from the
committee). My argument is that ISO 646 is a dead standard, having been
supplemented by ISO 8859 (Latin-x).  In the first public review responses,
a half-dozen writers, including at least one from Sweden and one from Canada,
suggested removing trigraphs. The committee response -- in full -- was
  "The Committee discussed alternatives to trigraphs on a number
  of occasions, but always decided that they fill a need.  C must
  support a wide variety of terminals and keyboards, many of which
  lack the full C character set."
While I understand the issues and sympathize with the problems the USASCII-
specific characters pose for implementors (I am bilingual Swedish-English
and have worked as a programmer in Sweden), they pose unsolvable problems
for implementors and are as necessary today as a modified C for upper-case
only terminals was in 1978 (when the VT05 and ASR33 were still in wide use).

Martin Minow
minow%thundr.dec@decwrl.dec.com
The above does not represent the position of Digital Equipment Corporation.

mcdonald@uxe.cso.uiuc.edu (04/28/89)

>One fact that seems not to have come out yet is that trigraphs as added
>by X3J11 were an existing scheme that was proposed for adoption, not a
>new design that simply seemed to be a good idea.  I suspect that is part

What do you mean by "existing scheme"? What significant compiler
(i.e. one selling more than 10000 copies per year) implemented
them? They seem to be probably the worst misfeature of ANSI C:
one that actually breaks working code. I have some code,
based on K&R C that uses the sequences ??(, ??), and ??! as
delimiters in a text file format - they are used in jillions
of string constants. Wouldn't trigraphs break such schemes?

Doug McDonald

nevin1@ihlpb.ATT.COM (Liber) (04/29/89)

In article <2469@ogccse.ogc.edu> daniels@ogccse.UUCP (Scott David Daniels) writes:
>One fact that seems not to have come out yet is that trigraphs as added
>by X3J11 were an existing scheme that was proposed for adoption, not a
>new design that simply seemed to be a good idea.

Just wondering:  where exactly had they been used for C before??
                                (Is this a trigraph sequence? ^^^ :-))
The Rationale implies that the Committee came up with this solution on
their own.
-- 
 _ __	NEVIN ":-)" LIBER  nevin1@ihlpb.ATT.COM  (312) 979-4751  IH 4F-410
' )  )			 "I will not be pushed, filed, stamped, indexed,
 /  / _ , __o  ____	  briefed, debriefed or numbered!  My life is my own!"
/  (_</_\/ <__/ / <_	As far as I know, these are NOT the opinions of AT&T.

gwyn@smoke.BRL.MIL (Doug Gwyn) (04/29/89)

In article <12840@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
>I was under the impression that X3J11 had invented them.

Yes, they were an X3J11 invention.
Similar (but not identical) schemes have been used for a variety of
purposes for a long time, of course.

>which case they made a good argument for why the X3J11 charter generally
>forbade such inventions.)

I don't think they're too bad for their intended purpose of C source file
transmission to sites having only limited character set support (ISO-646).
They would certainly be awful to have to use while writing programs, but
that's not the intention.  I personally don't think the C Standard needed
to address this particular issue at all, and certainly not when so much
public confusion and criticism resulted.  Since the introduction of
trigraphs, there has been further ISO code set standardization that may
have obviated the need for trigraphs, but in case there are limited code-
set environments still in use somewhere trigraphs may yet be of use.

daveb@gonzo.UUCP (Dave Brower) (04/30/89)

In article <229900002@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>What do you mean by "existing scheme"? What significant compiler
>(i.e. one selling more than 10000 copies per year) 

Just to be pedantic, at the time XJ311 formed, there were (and are
today) quite a number of "significant" compilers that ship in
substantially smaller numbers.  Like in the tens and hundreds.

-dB
-- 
"An expert is someone who's right 75% of the time"
{sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb	daveb@gonzo.uucp

kemnitz@mitisft.Convergent.COM (Gregory Kemnitz) (05/04/89)

I just started reading this newsgroup (and don't have access to a lot of
the standards committee stuff at this time).  What is a trigraph??
How is one used??

						Greg Kemnitz