LES@SU-AI@sri-unix (11/10/82)
From: Les Earnest <LES at SU-AI> I come not to praise ASCII but to bury it. It was a nice solution to the limited need for (predominently English) communication among teleprinters with paper tape punching capability. It shows signs of strain when one extends or modifies it to work with other European languages or more advanced terminals. ASCII doesn't cope with Greek, Russian, Hebrew, or Arabic alphabets and is incapable of dealing with ideographic languages such as Chinese and Japanese. Closer to home, it doesn't let you use integral signs or many other symbols that mathematicians have come to know and love, nor can it deal with the special symbols of meteorologists, astronomers, astrologists, electronic engineers, or whatnot. If we extend it to include control codes appropriate to today's terminals, we will have to modify or perhaps repudiate it for the next generation. Is it hopeless, then, to try to standardize symbol codes? Certainly not. What we need is standardization on a much grander scale. For example, a 16 bit code (65K symbols) would provide enough space to allocate codes to all the symbols currently in use on planet Earth with quite a bit of room to spare. Of course, developing a standard of this sort would be a nontrivial exercise, but I believe that this issue must be faced in some form before truly worldwide communications and digital libraries can come into existence. In addition to representing various symbol sets, the standard should also include graphical primitives in the form of control codes with parameters. Obviously, it would not be as efficient to communicate with 16 bit codes as with 7 or 8 bit codes. Fortunately, we can have both generality and efficiency if, instead of standardizing communication codes, we standardize a "code definition language" -- i.e. a way of describing a certain communication code in terms of the 16 bit standard. A simple form of this idea would be to preface a communication with a list of (say) 7 bit codes and their 16 bit supercode equivalents. Once the correspondence had been given, the rest of the communication could be given in the 7 bit code. Using this scheme, the more common variants of ASCII could each be unambiguously defined in terms of the supercode, as could EBSIDIC and other abominations . The existance of such a standard would substantially aid the writing of translation programs among the more commonly used codes. Of course, it should not be necessary to redefine codes in every transmission if the recipient can preserve code definitions. Once a code has been defined, it can be made the default for a given sender, or can be given a short name that is invoked at the beginning of transmission. If we had a standard code description language, it could also aid in achieving more compact representations of text without loss of information. For example, we could code certain letter sequences so as to exploit redundancies in the particular language. As long as the code definition is preserved with the text, the latter can be fully reconstructed. In summary, what we need is not an 8 or 9 bit ASCII but a standard that can hold *all* the symbols that are used to represent the accumulated knowledge of this planet. A 16 bit supercode is about right. We also need a code description language that is both efficient and sufficiently general to represent the more useful communication codes. Even if we can achieve a consensus on the need for such standards, there would remain a great deal of work in assigning codes to the major alphabets and ideographic sets. Fortunately, there would be enough room in the symbol space so that this task could be partitioned and tackled in parallel by the interested parties. Anyone want to start? Les Earnest
bcw (11/11/82)
From: Bruce C. Wright @ Duke University Re: 16-bit codes While I think this effort would be laudable, I think that a 16-bit code would probably be too inefficient for most purposes, even with prefixed headers and so forth. What would probably be more efficient (and would also remove the silly restriction of 65K symbols which may run out some time - Chinese has a *lot* of symbols, and you might want to encode some Western words like articles and even idiomatic phrases as codes so as not to store the individual characters) would be to make the code a variable- length code (sort of like PDP-11 instruction op codes). It wouldn't be necessary to make the code variable down to the bit level; it would probably be sufficient to make it variable to something like the byte level or thereabouts. It might even be possible to make ASCII or even (shudder) EBCDIC be special modes with lead-in codes. The only problem with this is the enormous amount of software which doesn't know about such things... Bruce C. Wright @ Duke University
drd (11/11/82)
c Is there any basis for thinking that 64K characters would be enough?