erik@srava.sra.co.jp (Erik M. van der Poel) (04/12/91)
I'm directing followups to comp.std.internat. I apologize to comp.std.c readers for the current noise level, which I seem to have started. Al Harkcom writes: > Though the term EUC is used as the name of an encoding scheme, it is > also the name used for the multibyte encoding of the JIS standard using > SS2 and SS3 single shifts. Yes, people often say "EUC" when they mean "Japanese EUC". That doesn't mean that they are right. Think of it this way: EUC is the generic international `class', while UJIS is a name for the particular Japanese `instance'. Also, you refer to "the JIS standard". This is rather misleading, since several implementations use *two* JIS standards, namely JIS X 0208 (Kanji, etc) and the right-hand part of JIS X 0201 (`half-sized' Katakana, etc). > UJIS is the name used to refer to the 2 byte > encoding of the EUC scheme JIS standard. The 2 byte (4 byte on HP) wide > character encodings for Japanese are usually UJIS... Perhaps we're getting confused because we are looking at different documents. I got my information from a paper by Yasushi Nakahara, "Nihongo Koodo No Genjo To Mondaiten", Jan. 1988. In this paper, he says that UJIS was the name that the Sigma project gave to a Japanese usage of EUC. He refers to codesets 1, 2 and 3 (i.e. not only 0208 Kanji, etc). According to this paper, UJIS is not a 2 byte code. It is an encoding in which characters require 1, 2 or 3 bytes each. I.e. it is an mb code, definitely not a wc code. - -- Erik M. van der Poel erik@sra.co.jp Software Research Associates, Inc., Tokyo, Japan TEL +81-3-3234-2692
harkcom@spinach.pa.yokogawa.co.jp (04/15/91)
In article <1130@sranha.sra.co.jp> erik@srava.sra.co.jp (Erik M. van der Poel) writes: =}Also, you refer to "the JIS standard". This is rather misleading, =}since several implementations use *two* JIS standards, namely JIS X =}0208 (Kanji, etc) and the right-hand part of JIS X 0201 (`half-sized' =}Katakana, etc). Actually 3 popular codesets are JIS standard 0201, 0208, and 0212. JIS X 0212 is a set of additional kanzi. =}Perhaps we're getting confused because we are looking at different =}documents. =} [...] =}He refers to codesets 1, 2 and 3 (i.e. not only 0208 =}Kanji, etc). Yes, I'm looking at the documentation from various software packages which use the UJIS encoding. They refer to four code sets: G0: ASCII G1: KANZI (JIS X 0208) G2: HANKAKU (JIS X 0201) G3: GAIZI All four code sets are 16 bits wide. =}According to this paper, UJIS is not a 2 byte code. It is an encoding =}in which characters require 1, 2 or 3 bytes each. I.e. it is an mb =}code, definitely not a wc code. I hate to disagree, but all of the implementations I have seen which use a mb encoding refer to the Japanese EUC as EUC and the wc encodings refer to it as UJIS (except of course HP which refers to both as UJIS). Al