[net.works] Range of ASCII, alias ISO 646-1973

MACKAY@WASHINGTON@sri-unix (11/19/82)

From: Pierre MacKay <MACKAY at WASHINGTON>
Your 8 bit ASCII message of 10 Nov 1982, found its way
to me by a somewhat roundabout route, since I am not on the
WorkS list, and, given the size of my mail file as it is,
I am hesitant to get there.
     You underestimate the range of even 7-bit ASCII.
In conjunction with the appropriate escape sequences from
ISO 2022-1973, alias (for all practical purposes) ANSI X3.41-1974,
the good old 7-bit table speaks several languages.
For instance:
Greek---ISO 5428-1980 (I haven't actually seen this yet.
Japanese---National standard C6220-1969 (katakana only, of course,
   and this, in the form JISCII is a true 8-bit code, with ASCII
   residing in columns 0..7 and katakana in columns 10..13.
Russian---GOST 13052-67, a dreadful aberration set up for the
   use of SO and SI coding, with the Cyrillic alphabet scrambled to
   match the visually similar Latin letters.  Why even a Commissar
   would want to do that to his own language is beyond me, but
   it is AUTHORITATIVE, under the circumstances.
     The Arabic case is chaos.  There is no reason why a good,
efficient Arabic script coding table cannot be included in a
7-bit range.  I am working with one now, but it is rather my own
invention.  It resembles some of the work done by ISO TC-46
and similar work done at the Library of Congress.  There was a
fine suggestion put forward at Riyadh, Saudi Arabia, about two and a
half years ago, but it came to nothing, and a dreadful Moroccan
notion, cobbled up out of a set of linotype matrices now has
a certain currency, in that it has been registered, whatever that
means, as Number 59, dated June 1, 1982 with ISO.  It includes
4 ISO 2022 escape sequences to identify G0, G1, G2, and G3 graphic
sets, but does not say what is to be done with all these alternatives.
ECMA has plunged into the same waters with an entirely different
proposal, which may even be worse.  They all seem to assume that
all Arabic ligature forms must be shown in the coding table, rather
as if Don Knuth's TeX were to require the elimination of the
open and close brace character positions so that you could code
the double-f ligatures directly.  The implications of microprocessor
technology have not yet got through.
     Urdu, Pashto and Sindhi would probably overload a 7-bit table,
since you are really dealing with two incompatible alphabets mashed
into one in those cases.  Malay and Chinese-Turkish (as seen on the
lower right corner of PRC banknotes) will fit.  Persian, of course will
fit easily, as will Ottoman Turkish, a language for which I have
a bizarre atavistic affection.  Western Europe and Hungary have
national versions of ISO 646 to account for heavily used diacriticals.
I don't know about Czech, which is a bit overloaded.  Modern Turkish
is a nice problem too.  
     I believe the Sanskrit-derived Indian languages would fit, and
the Tamil family would certainly fit in a 7-bit table.
     Chinese, and Japanese Kanji would not.  The Japanese use a manageable
subset of Chinese ideographs, and have already established a multi-bit
code.  One proposal for Chinese uses the 94 cells available in the
Graphic area of ISO 646 in a three level code.  There are 94 books
of 94 pages each of 94 characters each, or 94 to the third power
possible characters.  That should suffice even for Chinese.
					--Pierre MacKay
-------