djbpitt@unix.cis.pitt.edu (David J Birnbaum) (04/28/91)
In article <ENAG.91Apr25234756@maud.ifi.uio.no> enag@ifi.uio.no (Erik Naggum) writes: (a number of important technical and diplomatic comments that deserve careful consideration) That message wisely points out that "my character set standard has more characters than yours" somewhat misses the point, as long as there is a procedure to add overlooked or newly-discovered charac- ters when the time or need arises. 10646 is not a good or bad pro- posal based on the presence or absence of a single character. But arguments about missing characters are not restricted to quibbles about the presence or absence of "FOOTWEAR CAPITAL LETTER SWOOSH WITH AIR BELOW." Specifically, certain architectural features, such as a provision for coding applied diacritics as separate machine charac- ters, build in support for combinations that might not have been fore- seen. In the case of Slavic Cyrillic, there are hundreds of combina- tions of base + diacritic that are represented in books published by the academies of sciences and other reputable and established entities in several Slavic and non-Slavic countries. As has been mentioned on the ISO10646 ListServ, 10646 must decide whether to support this inventory; in the current incarnation it has decided not to. Unicode, on the other hand, does not have to decide explicitly; the base characters and diacritic characters (sic) are there for those who need them. Separable diacritics are not merely a programming issue; they influence the adequacy of coverage of a character set in a fundamental manner that is quite different from quibbles about single characters. I have in front of me at the moment a recent monograph written in Russian by Professor Lars Steensland and published as volume 19 of Stockholm Slavic Studies, part of the Acta Universitatis Stockholmiensis. The monograph in question analyzes the linguistic features of a fifteenth-century accented Slavic Cyrillic manuscript. The thousands of specific sitations from the manuscript are printed with diacritics. In the absence of any international standard for representing medieval Slavic Cyrillic with diacritics, Professor Steensland probably developed his own coding system. Does the ISO feel that publications by professors of Stockholm University are so insignificant that universal multilingual character sets should ex- clude them? >I'd like to see some discussion on these topics, instead of the >useless quibbling over which character set does or does not have >"FOOTWEAR CAPITAL LETTER SWOOSH WITH AIR BELOW" or any other favorite >"required" character. I would also be interested in seeing some discussion of the issues raised in the posting to which I am responding. But it is misleading to caricature the question of adequacy of coverage as quibbling over the presence or absence of individual characters. The real issue is whether a multilingual character set should be able to support texts such as Professor Steensland's monograph. And this monograph was exempli gratia -- chosen to demonstrate that Europeans have a stake in such matters even if the scholars who would be affected don't know what the ISO is and don't realize that such issues could be standardized. There are Soviet, German, Dutch, and U.S. publica- tions of a similar nature ... and surely many others. There are three basic choices: 1) We tell Stockholm University and other interested parties that there is no room for them in the computer age. 2) We support the needs of such users by adding hundreds of base+ diacritic combinations. There will be resistence to such additions ("_I_ don't need these characters and why don't those Slavists go out and get real jobs?"). If the combinations are added, there will be endless calls to add the various combinations that were inadvertently overlooked, resulting in a standard that will require frequent revision if it is to remain adequate. Medieval orthography was not codified as clearly as modern writing and it is not possible to enumerate exhaustively all possible combinations. 3) We support the needs of such users by supporting separate diacritics. This aspect of the ECMA proposal is a serious and constructive attempt to serve the needs of a broad range of users. I would like to quibble on another occasion about certain details of that proposal, but alter- native #2 above seems unreasonably difficult and #1 is beneath the dig- nity of the ISO. This issue is not a squabble about a single missing character with a funny name. It is about how the issue of separable diacritics affects not only programming concerns; this issue affects in a sub- stantial way the adequacy of coverage of the character set. And the latter must be paramount; we first decide what must be represented and only then can we evaluate the ramifications of one or another sys- tem of representation. And what must be represented is written culture, not just vendors' databases of clients. --David ======================================================================= Professor David J. Birnbaum djbpitt@vms.cis.pitt.edu [Internet] The Royal York Apartments, #802 djbpitt@pittvms.bitnet [Bitnet] 3955 Bigelow Boulevard voice: 1-412-687-4653 Pittsburgh, PA 15123 USA fax: 1-412-624-9714