[comp.text.tex] Japanese TeX

jalbert@cs.ubc.ca (Francois Jalbert) (10/12/90)

Hallo TeXperts. I have been working for a while on my own Japanese TeX
system, but before I invest more time, I thought I would mention what I am
doing and how I am doing it to all. Perhaps I am repeating another person's
work. Perhaps some of you have some advice to give me.

The biggest problem seems to be the large number of symbols. I first decided
I would limit myself to a few thousands, but which ones? The answer came
with the simple Japanese vi editor for MS-DOS machines called MOKE. There
is in there a file called JIS24 which contains about 7802 24 by 24 pixel
resolution japanese symbols. JIS stands for Japanese Industrial Standard and
could act for me as some sort of extended ASCII table. I decided to limit
myself to these symbols. I don't know where that file JIS24 comes from. 
The documentation seems to imply it was derived from some X-Window file. 
Any info regarding that and possible copyright violation is welcome.

I quickly wrote a few utilities with my Turbo-Pascal 5.0 which allowed me
to browse through JIS24, dump it all on my printer (+/- 50 pages), and also
manipulate the information for each individual symbol.

I decided to try the following approach. Write a small picture environment
for each Japanese symbol. The picture will be a simple 24 by 24 matrix with
circle*{1} put at the right places. If one assumes 10 such Japanese characters
per inch, that gives us a density of 240DPI. Of course, that is not "true"
240DPI since the characters don't have a continuous boundary, but it might
be enough for my simple needs.

So I wrote a utility which translated automatically JIS24 into a large number
of small .tex files, each containing the right pattern of circle*{1}. I could
then use LaTeX as usual, and here and there use commands like \jap{3056} to
get Japanese character #3056 to appear. That works fine. It looks great with
my screen previewer, and so so with my lousy dot matrix printer. My current
complaints are the size of the .dvi generated (typically 20 times the size of
the main .tex document), the amount of memory used (big emTeX blows up with
half a page of Japanese text), and the large CPU time required.

I thought it ought to be possible to use METAFONT to generate fonts of small
matrices of dots. After all, METAFONT must have primitive operators to draw
outlines of symbols. There is probably some sort of circle*{1} operator in
there. I could automatize the creation of these .mf files in the same way I did
it for my .tex files. That's no problem. Does anybody have examples of such
fonts? I could just change the size to 24 by 24 and the dot patterns.

The problem now is the number of different fonts needed. At 128 Japanese
symbols per font, I need around 40--50 fonts which might be potentially all
needed in a given document. Is that too much for LaTeX? Is it possible to
load a font, grab a character, and then discard the font. That would slow
things down, but would allow me to at least process the document. Each font
could be numbered like JAP23.TFM, and it would be "easy" from something like
\jap{3056} to deduce the font number and the symbol offset.  

Anyway, I sure would appreciate any advice or information anyone could have
for me. I want to avoid \specials since postscript dependant. I also know
quality won't be great, please no flames regarding the spirit of TeX being
violated. If this works fine, I may look at generating better fonts. But right
now, I just want a bare bone system running.

A million thanks in advance. Franky, hacker at large.

mzw_t@hpujsda.HP.COM (Matsuzawa Takashi) (10/16/90)

----
There already exist two Japanese TeX's that are widely used in Japan.
They are `jTeX' ported at NTT (Nihon Telephone & Telegram) lab and
Nihongo-TeX ported by ASCII co., a Japanese private company.  They are
both based on ctex 2.95 (or, pre-3.0) UNIX implementations.  JTeX first
ran on TOPS-20 and ported to VAX/VMS and UNIX.  They are in public and
you can obtain them free from following Internet hosts via anonymous FTP.

	miki.cs.titech.ac.jp		(Tokyo Institute of Technology)
	utsun.is.s.u-tokyo.ac.jp	(Tokyo University)

Their archives are named as `ASCII-jTeX' or `NTT-jTeX' there.

I believe there is no widely used public port of Japanese TeX to PC's yet.
(ASCII co. is already selling the commercial version of Nihongo-TeX on
NEC's PC-9801 computers, the major force in Japanese PC world.)  So, you
are encouraged to work on your Japanese TeX!

---
JIS kanji set (JIS X 0208) for your character set is a good choice and
enough.  It includes Hiraganas, Katakanas, miscellaneous punctuations, and
Kanjis --- you will not meet serious difficulties denoting the usual Japanese
language sentenses.  I can not be sure from where your JIS24 data came, but
you can obtain the public kanji fonts in X11 bdf formats, from the sites
I have noted above.  (You can find k14.tar.Z, etc.)  ---  They may be not
large enogh to meet your needs, but they are in public domain.

Note:  I will use the term `kanji's to denote the non-ASCII characters that
appear in Japanese texts hereafter --- although `kanji's are just the subset
of Japanese characters, as you might know.

My only suggestion to your implementation is to use Shift-JIS kanji code
or EUC (UJIS) kanji code for your input texts. (Or, you can also use the
complicated ISO escape sequences to invoke kanji character sets from within
ASCII texts.)  They are standard encodings (multiple-byte encoding schemes)
to manipulate Japanese texts in computer data.  If your TeX allows these
character codes, you can enjoy printing out whatever Japanese langage text
files you have obtained from somewhere.

----
And, here is a brief description of NTT's jTeX implementation.
(Nihongo-TeX has done major enhancements to TeX font file formats that are
incompatible with ordinal TeX, and I think their approach is too drastic.
--- they are planning to implement Nihongo-TeX with vertical writing mode,
and it itself is a very interesting attempt, though...)

In fact, I am currently using jTeX (jLaTeX) on my Apollo workstations,
and a bit knowlegeble about it.  I hope this will give you some hints on
your Japanese TeX implementation.  The main reason that jTeX does not
`blow up' is that it treats Japanese text as a series of character codes,
not as series of graphic patterns. --- There do exists the limit of loadable
font numbers, though.  (I think it might be good to look into jtex.ch,
the TeX change file which is the core of jTeX implementation. you can also
find the working implementations of jLaTeX, jBibTeX, etc.)

---
jTeX reads the input text (which is generally the mixture of ordinal ASCII
codes and Kanji codes.)  It detects the kanjis in it and encode them into
the special internal codes (a pair of bytes specifies one Japanese character).
jTeX apply Japanese language specific formatting rules on them.  For example,
jTeX has the concept `current kanji-font' in addition to TeX's `current font'
--- you have two `current font's in jTeX.  Kanji characters have special
glues, etc.

----
jTeX' internal expression of a Japanese character is as follows.

  <sub-font#><char#-within-subfont>

As you have wrote, because Japanese language has so many characters, TeX
font files' limit (256 glyphs) is not enough.  jTeX uses multiple TeX font
files for one font.  i.e. you need just one file for the 10pt Computer-Modern
font (cmr10.300pk, it contains necessary 128 glyphs.)  But, if you need the
10pt DNP-Mincho font, then you need following files.  Each of them contains
255 glyphs, approx seven thousand glyphs in total.

	dmjsy10.{tfm|300pk}		(punctuations)
	dmjroma10.{tfm|300pk}		(alpha-numerics)
	dmjhira10.{tfm|300pk}		(hiraganas)
	dmjkata10.{tfm|300pk}		(katakanas)
	dmjgreek10.{tfm|300pk}		(greek characters)
	dmjrussian10.{tfm|300pk}	(cyrillic characters)
	dmjkeisen10.{tfm|300pk}		(line drawing characters)
	dmjka10.{tfm|300pk}		(kanjis - 1st level)
	dmjkb10.{tfm|300pk}		( " )
	dmjkc10.{tfm|300pk}		( " )
	dmjkd10.{tfm|300pk}		( " )
	dmjke10.{tfm|300pk}		( " )
	dmjkf10.{tfm|300pk}		( " )
	dmjkg10.{tfm|300pk}		( " )
	dmjkh10.{tfm|300pk}		( " )
	dmjki10.{tfm|300pk}		( " )
	dmjkj10.{tfm|300pk}		( " )
	dmjkk10.{tfm|300pk}		( " )
	dmjkl10.{tfm|300pk}		( " )
	dmjkm10.{tfm|300pk}		(kanjis - 2nd level)
	dmjkn10.{tfm|300pk}		( " )
	dmjko10.{tfm|300pk}		( " )
	dmjkp10.{tfm|300pk}		( " )
	dmjkq10.{tfm|300pk}		( " )
	dmjkr10.{tfm|300pk}		( " )
	dmjks10.{tfm|300pk}		( " )
	dmjkt10.{tfm|300pk}		( " )
	dmjku10.{tfm|300pk}		( " )
	dmjkv10.{tfm|300pk}		( " )
	dmjkw10.{tfm|300pk}		( " )
	dmjkx10.{tfm|300pk}		( " )
	dmjky10.{tfm|300pk}		( " )
	dmjkz10.{tfm|300pk}		( " )

Imagine when you need several magnificatins to this, and `Mincho' is just
one font design in Japanese fonts.  (A resource hog!)

----
Please note that jTeX did not modify the format of TeX font files.  You
can use the DVI-wares written for TeX, without modification, to process
jTeX output DVI files.  Even Imagen or LaserJet (which will not output
Japanese texts in general) will be able to output beautiful Japanese
texts.

Unfortunately, there is no public jTeX kanji font with better quality.
jTeX distribution contains JIS 24x24 fonts (*.tfm and *.pk) with several
magnifications, but I believe their quality is the same as what you have
already.  --- It *is* a hard task to develop new kanji font from scratch
(you have to design several thousands of glyphs at a time!)

There is a proprietry jTeX kanji font called DNP kanji font, and widely
used by jTeX users.  It is provided by Dai-Nippon-Printing, one of the
largest printing company in Japan.  The data are provided in the form of
*.pk and *.tfm files (no *.mf files).  Because this font is generated
directly from DNP's professional out-line font data, it has the quality
that can be used for professional publications.  It comes with several
magnifications, and includes two fonts, `Mincho' and `Gothic', two major
Japanese fonts. --- Compared to them, JIS 24x24 is just a `courier'.
(But they cost you several ten-thousand yens.)

For further information on jTeX, Nihongo-TeX or DNP fonts, you should
better contact with the authors of Japanese TeX's.  Here are their
network addresses, from the softwares' README's.

	ryo-i@ascii.co.jp	(Nihongo-TeX)
	tony-o@ascii.co.jp	( " )

	isozaji@ntt-20.ntt.jp	(Nihongo-TeX)
	a87480@tansei.cc.u-tokyo.ac.jp	( " )

(Some of above are JUNET addresses, not Internet addresses.  So I am not
sure if your mails arrives of not.  Someone other on net might be
knowlegeable than me..)

---
Because jTeX's kanji fonts occupy several ten-M bytes of disk space, you
will have difficulties installing them on PCs.  One practical approach is
to use Japanese printers' internal kanji fonts.   If you could provide
appropriate *.tfm files and DVI-wares, you do not have to install *.pk
files on you disk.  When you use the kanji-PostScript printers, you can
get the professional quality. Some public Japanese dvi2ps programs use
this method and working fine.  You can obtain them also, from above noted
Internet hosts.  (But, kanji-PostScript printers will cost you several
hundred-thousand yens...)

					Good luck and best regards;

					Takashi Matsuzawa.
					(Yokogawa-Hewlett-packard)
					Email: mzw_t@apollo.hp.com