[comp.std.misc] Int'n Character Set

dankg@volcano.Berkeley.EDU (Dan KoGai) (06/05/90)

In article <3137@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:

>By the way, "other Indo-European language character sets" would have to
>include many of the Indian languages, and the Devanagari scripts are not
>covered by ISI 646 or ISO 8859.  Indo-European is a _large_ language family.
>As far as I'm concerned, the great thing about ISO 8859/1 is that at long
>last it is _almost_ possible to type English text on a computer.

	And others suggested it will use up to 32bits to include virtually
all languages.  But this is not the only problem.
	Local difference spans from simply number of character set to
justification:  Arabic and Hebrew writes from left to right.  Mongolian writes
up to down.  Japanese and Chinese allows all 3--left to right, right to left
and up to down.  Korean uses Indo-European-like way of writing in a sense
that they have primitives for both consonants and vowels (in Japanese kana
it's implemented as consonant-vowel cluster) but each syllable have to fit in
one "character".
	And while simple character set of English allows single stroke typing
(or WYTIWYG--what you type is what you get), Chinese and Japanese has so
many characters that requires on-line dictionary to type in "small" keyboard.
Justification in alphabetic language is word by word, which each word is 
typically delimited by space.  Chinese and Japanese, on the other hand, needs
no word justification, with each character so "dense" in meaning it's almost
a word in a sense.
	And I wonder how thorough and complete (or loose and imcomplete) ISO's
proposal is.  Is it just a matter of mapping each character?  Even so it's
tough to say whether diacritics are distinct character or two characters, base
char plus diacritical character.  That also applies such languages as Korean
where each "block" contains multiple phenoms to form a syllable.  If those are
replesented by just combination, that saves space (It applies to Chinese where
each character is made up of primitives).
	And how about mixture of languages?  It's common among Japanese to use
alphabets and sometimes even hebrew characters (in math).  And it's vital for
such areas as foreign launguage educations.
	I think Xerox's implementation is very elegant but it still has
problems.  Xerox way is inherited by Mac (Does Xerox include this issue in its
lawsuit against Apple) and I found Mac very powerful in foreign language
processing--I have KanjiTalk, the Japanese OS and there you can not only
type text in Japanese but also Menu definition, punctuation, date and unit
definitions and more.  But it still lacks vertical typing capability (it's up
to application, not OS), which is crutial for Japanese DTP.  And I'd be stuck 
if I wanted to use more than Japanese and English.  More of all it still takes
different OS to handle other languages (ArabicTalk and KanjiTalk won't run
together).
	So I ask my question again.  How much does new ISO's "Whole Earth
Character Set" cover these localities?

----------------
____  __  __    + Dan The "<- That still needs ascii pic in Usenet" Man
    ||__||__|   + E-mail:	dankg@ocf.berkeley.edu
____| ______ 	+ Voice:	+1 415-549-6111
|     |__|__|	+ USnail:	1730 Laloma Berkeley, CA 94709 U.S.A
|___  |__|__|	+	
    |____|____	+ "What's the biggest U.S. export to Japan?" 	
  \_|    |      + "Bullshit.  It makes the best fertilizer for their rice"

src@scuzzy.uucp (Source Admin) (06/08/90)

there is an article about computers using foreign character sets like chinese,
hebrew, arabic etc in BYTE may. quite interesting ! they also talk about a
programm that displays your typing in hebrew, coptic and some other charset
simultaneously. have a look.
-- 
Heiko Blume		blume@scuzzy.UUCP	FAX   (+49 30) 882 50 65
Kottbusser Damm 28	blume@netmbx.UUCP	VOICE (+49 30) 691 88 93
D-1000 Berlin 61				TELEX 184174 intro d