MACKAY@WASHINGTON@sri-unix (11/19/82)
From: Pierre MacKay <MACKAY at WASHINGTON> Your 8 bit ASCII message of 10 Nov 1982, found its way to me by a somewhat roundabout route, since I am not on the WorkS list, and, given the size of my mail file as it is, I am hesitant to get there. You underestimate the range of even 7-bit ASCII. In conjunction with the appropriate escape sequences from ISO 2022-1973, alias (for all practical purposes) ANSI X3.41-1974, the good old 7-bit table speaks several languages. For instance: Greek---ISO 5428-1980 (I haven't actually seen this yet. Japanese---National standard C6220-1969 (katakana only, of course, and this, in the form JISCII is a true 8-bit code, with ASCII residing in columns 0..7 and katakana in columns 10..13. Russian---GOST 13052-67, a dreadful aberration set up for the use of SO and SI coding, with the Cyrillic alphabet scrambled to match the visually similar Latin letters. Why even a Commissar would want to do that to his own language is beyond me, but it is AUTHORITATIVE, under the circumstances. The Arabic case is chaos. There is no reason why a good, efficient Arabic script coding table cannot be included in a 7-bit range. I am working with one now, but it is rather my own invention. It resembles some of the work done by ISO TC-46 and similar work done at the Library of Congress. There was a fine suggestion put forward at Riyadh, Saudi Arabia, about two and a half years ago, but it came to nothing, and a dreadful Moroccan notion, cobbled up out of a set of linotype matrices now has a certain currency, in that it has been registered, whatever that means, as Number 59, dated June 1, 1982 with ISO. It includes 4 ISO 2022 escape sequences to identify G0, G1, G2, and G3 graphic sets, but does not say what is to be done with all these alternatives. ECMA has plunged into the same waters with an entirely different proposal, which may even be worse. They all seem to assume that all Arabic ligature forms must be shown in the coding table, rather as if Don Knuth's TeX were to require the elimination of the open and close brace character positions so that you could code the double-f ligatures directly. The implications of microprocessor technology have not yet got through. Urdu, Pashto and Sindhi would probably overload a 7-bit table, since you are really dealing with two incompatible alphabets mashed into one in those cases. Malay and Chinese-Turkish (as seen on the lower right corner of PRC banknotes) will fit. Persian, of course will fit easily, as will Ottoman Turkish, a language for which I have a bizarre atavistic affection. Western Europe and Hungary have national versions of ISO 646 to account for heavily used diacriticals. I don't know about Czech, which is a bit overloaded. Modern Turkish is a nice problem too. I believe the Sanskrit-derived Indian languages would fit, and the Tamil family would certainly fit in a 7-bit table. Chinese, and Japanese Kanji would not. The Japanese use a manageable subset of Chinese ideographs, and have already established a multi-bit code. One proposal for Chinese uses the 94 cells available in the Graphic area of ISO 646 in a three level code. There are 94 books of 94 pages each of 94 characters each, or 94 to the third power possible characters. That should suffice even for Chinese. --Pierre MacKay -------