[net.works] 8 bit ASCII versus 16 bit supercode

LES@SU-AI@sri-unix (11/10/82)

From: Les Earnest <LES at SU-AI>
I come not to praise ASCII but to bury it.  It was a nice solution to the
limited need for (predominently English) communication among teleprinters
with paper tape punching capability.  It shows signs of strain when one
extends or modifies it to work with other European languages or more
advanced terminals.

ASCII doesn't cope with Greek, Russian, Hebrew, or Arabic alphabets and is
incapable of dealing with ideographic languages such as Chinese and
Japanese.  Closer to home, it doesn't let you use integral signs or many
other symbols that mathematicians have come to know and love, nor can it
deal with the special symbols of meteorologists, astronomers,
astrologists, electronic engineers, or whatnot.

If we extend it to include control codes appropriate to today's terminals,
we will have to modify or perhaps repudiate it for the next generation.
Is it hopeless, then, to try to standardize symbol codes?  Certainly not.

What we need is standardization on a much grander scale.  For example, a
16 bit code (65K symbols) would provide enough space to allocate codes to
all the symbols currently in use on planet Earth with quite a bit of room
to spare.  Of course, developing a standard of this sort would be a
nontrivial exercise, but I believe that this issue must be faced in some
form before truly worldwide communications and digital libraries can come
into existence.

In addition to representing various symbol sets, the standard should also
include graphical primitives in the form of control codes with parameters.

Obviously, it would not be as efficient to communicate with 16 bit codes
as with 7 or 8 bit codes.  Fortunately, we can have both generality and
efficiency if, instead of standardizing communication codes, we standardize
a "code definition language" -- i.e. a way of describing a certain
communication code in terms of the 16 bit standard.  A simple form of this
idea would be to preface a communication with a list of (say) 7 bit codes
and their 16 bit supercode equivalents.  Once the correspondence had been
given, the rest of the communication could be given in the 7 bit code.

Using this scheme, the more common variants of ASCII could each be
unambiguously defined in terms of the supercode, as could EBSIDIC
and other abominations .  The existance of such a standard would
substantially aid the writing of translation programs among
the more commonly used codes.

Of course, it should not be necessary to redefine codes in every
transmission if the recipient can preserve code definitions.  Once a code
has been defined, it can be made the default for a given sender, or can be
given a short name that is invoked at the beginning of transmission.

If we had a standard code description language, it could also aid in
achieving more compact representations of text without loss of
information.  For example, we could code certain letter sequences so as to
exploit redundancies in the particular language.  As long as the code
definition is preserved with the text, the latter can be fully
reconstructed.

In summary, what we need is not an 8 or 9 bit ASCII but a standard that
can hold *all* the symbols that are used to represent the accumulated
knowledge of this planet.  A 16 bit supercode is about right.
We also need a code description language that is both efficient and
sufficiently general to represent the more useful communication codes.

Even if we can achieve a consensus on the need for such standards,
there would remain a great deal of work in assigning codes to the
major alphabets and ideographic sets.  Fortunately, there would be
enough room in the symbol space so that this task could be partitioned
and tackled in parallel by the interested parties.

Anyone want to start?

	Les Earnest

bcw (11/11/82)

From:	Bruce C. Wright @ Duke University
Re:	16-bit codes

While I think this effort would be laudable, I think that a 16-bit code
would probably be too inefficient for most purposes, even with prefixed
headers and so forth.  What would probably be more efficient (and would
also remove the silly restriction of 65K symbols which may run out some
time - Chinese has a *lot* of symbols, and you might want to encode some
Western words like articles and even idiomatic phrases as codes so as not
to store the individual characters) would be to make the code a variable-
length code (sort of like PDP-11 instruction op codes).  It wouldn't be
necessary to make the code variable down to the bit level;  it would probably
be sufficient to make it variable to something like the byte level or
thereabouts.  It might even be possible to make ASCII or even (shudder)
EBCDIC be special modes with lead-in codes.

The only problem with this is the enormous amount of software which doesn't
know about such things...

			Bruce C. Wright @ Duke University

drd (11/11/82)

c
Is there any basis for thinking that 64K characters would be enough?