[comp.std.internat] "FOOTWEAR SWOOSH" and other missing characters

djbpitt@unix.cis.pitt.edu (David J Birnbaum) (04/28/91)
In article <ENAG.91Apr25234756@maud.ifi.uio.no> enag@ifi.uio.no 
(Erik Naggum) writes:

(a number of important technical and diplomatic comments that
deserve careful consideration)

That message wisely points out that "my character set standard
has more characters than yours" somewhat misses the point, as long
as there is a procedure to add overlooked or newly-discovered charac-
ters when the time or need arises.  10646 is not a good or bad pro-
posal based on the presence or absence of a single character.

But arguments about missing characters are not restricted to quibbles
about the presence or absence of "FOOTWEAR CAPITAL LETTER SWOOSH WITH
AIR BELOW."  Specifically, certain architectural features, such as
a provision for coding applied diacritics as separate machine charac-
ters, build in support for combinations that might not have been fore-
seen.  In the case of Slavic Cyrillic, there are hundreds of combina-
tions of base + diacritic that are represented in books published by
the academies of sciences and other reputable and established entities
in several Slavic and non-Slavic countries.

As has been mentioned on the ISO10646 ListServ, 10646 must decide
whether to support this inventory; in the current incarnation it has
decided not to.  Unicode, on the other hand, does not have to decide
explicitly; the base characters and diacritic characters (sic) are 
there for those who need them.  Separable diacritics are not merely
a programming issue; they influence the adequacy of coverage of a
character set in a fundamental manner that is quite different from
quibbles about single characters. 
 
I have in front of me at the moment a recent monograph written in
Russian by Professor Lars Steensland and published as volume 19 of
Stockholm Slavic Studies, part of the Acta Universitatis 
Stockholmiensis.  The monograph in question analyzes the linguistic
features of a fifteenth-century accented Slavic Cyrillic manuscript.
The thousands of specific sitations from the manuscript are printed
with diacritics.  In the absence of any international standard for
representing medieval Slavic Cyrillic with diacritics, Professor
Steensland probably developed his own coding system.  Does the ISO
feel that publications by professors of Stockholm University are so
insignificant that universal multilingual character sets should ex-
clude them?

>I'd like to see some discussion on these topics, instead of the
>useless quibbling over which character set does or does not have
>"FOOTWEAR CAPITAL LETTER SWOOSH WITH AIR BELOW" or any other favorite
>"required" character.

I would also be interested in seeing some discussion of the issues
raised in the posting to which I am responding.  But it is misleading
to caricature the question of adequacy of coverage as quibbling over
the presence or absence of individual characters.  The real issue is 
whether a multilingual character set should be able to support texts
such as Professor Steensland's monograph.  And this monograph was
exempli gratia -- chosen to demonstrate that Europeans have a stake in
such matters even if the scholars who would be affected don't know
what the ISO is and don't realize that such issues could be 
standardized.  There are Soviet, German, Dutch, and U.S. publica-
tions of a similar nature ... and surely many others.

There are three basic choices:

1) We tell Stockholm University and other interested parties that
there is no room for them in the computer age.

2) We support the needs of such users by adding hundreds of base+
diacritic combinations.  There will be resistence to such additions
("_I_ don't need these characters and why don't those Slavists go out and
get real jobs?").  If the combinations are added, there will be endless
calls to add the various combinations that were inadvertently
overlooked, resulting in a standard that will require frequent revision
if it is to remain adequate.  Medieval orthography was not codified as
clearly as modern writing and it is not possible to enumerate
exhaustively all possible combinations.

3) We support the needs of such users by supporting separate diacritics.
This aspect of the ECMA proposal is a serious and constructive attempt
to serve the needs of a broad range of users.  I would like to quibble
on another occasion about certain details of that proposal, but alter-
native #2 above seems unreasonably difficult and #1 is beneath the dig-
nity of the ISO.

This issue is not a squabble about a single missing character with
a funny name.  It is about how the issue of separable diacritics
affects not only programming concerns; this issue affects in a sub-
stantial way the adequacy of coverage of the character set.  And
the latter must be paramount; we first decide what must be represented
and only then can we evaluate the ramifications of one or another sys-
tem of representation.  And what must be represented is written culture,
not just vendors' databases of clients. 

--David 
=======================================================================
Professor David J. Birnbaum         djbpitt@vms.cis.pitt.edu [Internet]
The Royal York Apartments, #802     djbpitt@pittvms.bitnet   [Bitnet]
3955 Bigelow Boulevard              voice: 1-412-687-4653
Pittsburgh, PA  15123  USA          fax:   1-412-624-9714