shirriff@sprite.berkeley.edu (Ken Shirriff) (10/30/90)
Can anyone explain the order of the kanji characters in the X font "k14" (JISX0208.1983 encoding)? It has several thousand kanji characters, but as far as I can tell, they're in a random order. Ken Shirriff shirriff@sprite.Berkeley.EDU "Look! Sunglasses! EXACTLY like the ones worn by the American Don Johnson!"
henry@angel.Eng.Sun.COM (Henry McGilton) (10/31/90)
In article <29326@pasteur.Berkeley.EDU>, shirriff@sprite.berkeley.edu (Ken Shirriff) writes:
* Can anyone explain the order of the kanji characters in
* the X font "k14" (JISX0208.1983 encoding)? It has
* several thousand kanji characters, but as far as I can
* tell, they're in a random order.
To the best of my admittedly minuscule knowledge, the characters
are arranged in phonetic order according to their ON readings.
At least that's how they're arranged in my Ryumin-Light and Gothic
fonts on the laser printer.
........ Henry
rcd@ico.isc.com (Dick Dunn) (10/31/90)
shirriff@sprite.berkeley.edu (Ken Shirriff) writes: > Can anyone explain the order of the kanji characters in the X font "k14" > (JISX0208.1983 encoding)? It has several thousand kanji characters, but > as far as I can tell, they're in a random order. It's a standard encoding of a Japanese character set commonly referred to-- with horrendous over-abbreviation--as "JIS". It's actually JIS C 6226, "Code of the Japanese Graphic Character Set for Information Interchange". (BTW, this is a 2-byte code; don't confuse it with "JISCII" (6220) which is a one-byte code.) The arrangement of characters in 6226 goes something like this: punctuation (including some for vertical writing) special characters arabic numerals, 26-char Roman alphabet Hiragana Katakana Greek Cyrillic line-drawing characters two sets of Kanji In the Hiragana and Katakana, there are separate codes for all possible characters including the "small" forms and forms with the "diacritical marks". (Apologies to Japanese speakers/writers; I am trying to use terms here which will be understood by western readers.) There are two groups of Kanji in the remainder of the character set. The basis for assigning encodings is different for the two, which is possibly why it looked "random" to you. Level 1 Kanji (0x3021-4f53) contains the more common characters; it is arranged by pronunciation. Level 2 Kanji (0x5021-end) contains less common characters, arranged by their primary radicals. Again, for the western view: radicals are essentially the major stroke groups which make up the ideogram. If you look through Level 2, you'll see "lots of characters which have similar pieces" grouped together. If you think about it, the matter of a lexical ordering in a writing system using a large number of distinct symbols, instead of composing from a small alphabet, is an interesting (challenging) exercise. The actual characters in the font look as if they follow the JIS 16x16 standard bitmaps, although my weary eyes aren't up to checking that very thoroughly. I heard a mention that there may be a copyright or license problem with the k14 font, but I don't know what it is; if you are con- sidering using it, I'd look further. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...but Meatball doesn't work that way!
mleisher@nmsu.edu (Mark Leisher) (10/31/90)
If you look at the Japanese fonts using 'xfd', you will notice that not all of the characters are defined. Usually, there will be a section of non-Kanji at the start of the font, and some empty space before the Kanji starts. JIS X0208 fonts are ordered starting at ENCODING 8481[0x2121] and can go to ENCODING 32382[0x7e7e]. The k14.bdf file seems to follow this order, skipping entries here and there. ----------------------------------------------------------------------------- mleisher@nmsu.edu "I laughed. Mark Leisher I cried. Computing Research Lab I fell down. New Mexico State University It changed my life." Las Cruces, NM - Rich [Cowboy Feng's Space Bar and Grille]
harkcom@potato.pa.Yokogawa.Co.jp (Alton Harkcom) (11/07/90)
In article <29326@pasteur.Berkeley.EDU> shirriff@sprite.berkeley.edu (Ken Shirriff) writes: =}Can anyone explain the order of the kanji characters in the X font "k14" =}(JISX0208.1983 encoding)? It has several thousand kanji characters, but =}as far as I can tell, they're in a random order. The JIS coding can be broken into several sections... 0x2120 - 0x227f special characters 0x2330 - 0x237f numerals & roman characters 0x2420 - 0x247f hiragana 0x2520 - 0x257f katakana 0x2620 - 0x265f greek characters 0x2720 - 0x277f russian characters 0x2820 - 0x284f line primitives 0x3020 - 0x4f5f first level kanji (there are many gaps between them) in chinese (not japanese) reading order 0x5020 - 0x742f second level kanji (there are many gaps between them) in radical (radical as in bushu) order If you can get ahold of a book which lists Japanese Industrial Standards, it should be in it... Hope this helps. -- -- $@2#2OEE5!3t<02q<R(J PA$@#15;#22](J TEL 0422-52-5748 FAX 0422-55-1728 E-mail harkcom@pa.yokogawa.co.jp