[comp.fonts] X kanji font k14 question

shirriff@sprite.berkeley.edu (Ken Shirriff) (10/30/90)

Can anyone explain the order of the kanji characters in the X font "k14"
(JISX0208.1983 encoding)?  It has several thousand kanji characters, but
as far as I can tell, they're in a random order.

Ken Shirriff			shirriff@sprite.Berkeley.EDU
"Look!  Sunglasses!  EXACTLY like the ones worn by the American Don Johnson!"

henry@angel.Eng.Sun.COM (Henry McGilton) (10/31/90)

In article <29326@pasteur.Berkeley.EDU>, shirriff@sprite.berkeley.edu (Ken Shirriff) writes:

    *  Can anyone explain the order of the kanji characters in
    *  the X font "k14" (JISX0208.1983 encoding)?  It has
    *  several thousand kanji characters, but as far as I can
    *  tell, they're in a random order.
To the best of my admittedly minuscule knowledge, the characters
are arranged in phonetic order according to their ON readings.
At least that's how they're arranged in my Ryumin-Light and Gothic
fonts on the laser printer.

	........  Henry

rcd@ico.isc.com (Dick Dunn) (10/31/90)

shirriff@sprite.berkeley.edu (Ken Shirriff) writes:

> Can anyone explain the order of the kanji characters in the X font "k14"
> (JISX0208.1983 encoding)?  It has several thousand kanji characters, but
> as far as I can tell, they're in a random order.

It's a standard encoding of a Japanese character set commonly referred to--
with horrendous over-abbreviation--as "JIS".  It's actually JIS C 6226,
"Code of the Japanese Graphic Character Set for Information Interchange".

(BTW, this is a 2-byte code; don't confuse it with "JISCII" (6220) which is
a one-byte code.)

The arrangement of characters in 6226 goes something like this:
	punctuation (including some for vertical writing)
	special characters
	arabic numerals, 26-char Roman alphabet
	Hiragana
	Katakana
	Greek
	Cyrillic
	line-drawing characters
	two sets of Kanji

In the Hiragana and Katakana, there are separate codes for all possible
characters including the "small" forms and forms with the "diacritical
marks".  (Apologies to Japanese speakers/writers; I am trying to use terms
here which will be understood by western readers.)

There are two groups of Kanji in the remainder of the character set.  The
basis for assigning encodings is different for the two, which is possibly
why it looked "random" to you.  Level 1 Kanji (0x3021-4f53) contains the
more common characters; it is arranged by pronunciation.  Level 2 Kanji
(0x5021-end) contains less common characters, arranged by their primary
radicals.

Again, for the western view: radicals are essentially the major stroke
groups which make up the ideogram.  If you look through Level 2, you'll see
"lots of characters which have similar pieces" grouped together.  If you
think about it, the matter of a lexical ordering in a writing system using
a large number of distinct symbols, instead of composing from a small
alphabet, is an interesting (challenging) exercise.

The actual characters in the font look as if they follow the JIS 16x16
standard bitmaps, although my weary eyes aren't up to checking that very
thoroughly.  I heard a mention that there may be a copyright or license
problem with the k14 font, but I don't know what it is; if you are con-
sidering using it, I'd look further.
-- 
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...but Meatball doesn't work that way!

mleisher@nmsu.edu (Mark Leisher) (10/31/90)

If you look at the Japanese fonts using 'xfd', you will notice that
not all of the characters are defined.  Usually, there will be a
section of non-Kanji at the start of the font, and some empty space
before the Kanji starts.

JIS X0208 fonts are ordered starting at ENCODING 8481[0x2121] and can
go to ENCODING 32382[0x7e7e].  The k14.bdf file seems to follow this
order, skipping entries here and there.

-----------------------------------------------------------------------------
mleisher@nmsu.edu                      "I laughed.
Mark Leisher                                I cried.
Computing Research Lab                          I fell down.
New Mexico State University                        It changed my life."
Las Cruces, NM                     - Rich [Cowboy Feng's Space Bar and Grille]

harkcom@potato.pa.Yokogawa.Co.jp (Alton Harkcom) (11/07/90)

In article <29326@pasteur.Berkeley.EDU>
   shirriff@sprite.berkeley.edu (Ken Shirriff) writes:

 =}Can anyone explain the order of the kanji characters in the X font "k14"
 =}(JISX0208.1983 encoding)?  It has several thousand kanji characters, but
 =}as far as I can tell, they're in a random order.

The JIS coding can be broken into several sections...

    0x2120 - 0x227f    special characters
    0x2330 - 0x237f    numerals & roman characters
    0x2420 - 0x247f    hiragana
    0x2520 - 0x257f    katakana
    0x2620 - 0x265f    greek characters
    0x2720 - 0x277f    russian characters
    0x2820 - 0x284f    line primitives
    0x3020 - 0x4f5f    first level kanji (there are many gaps between them)
                           in chinese (not japanese) reading order
    0x5020 - 0x742f    second level kanji (there are many gaps between them)
                           in radical (radical as in bushu) order

If you can get ahold of a book which lists Japanese Industrial Standards,
it should be in it...

Hope this helps. 
--
--
  $@2#2OEE5!3t<02q<R(J PA$@#15;#22](J
  TEL 0422-52-5748  FAX 0422-55-1728
  E-mail harkcom@pa.yokogawa.co.jp