[comp.text] Chinese TeX

dhosek@jarthur.Claremont.EDU (D.A. Hosek) (11/30/89)

In article <20926@unix.cis.pitt.edu> jbw@unix.cis.pittsburgh.edu (Jingbai  Wang) writes:

>I strongly suspect if MetaFont type of approach can sucessful solve Chinese
>formatting with TeX. According to mainland China GB standard (equivalent to
>ASCII in USA), there are 87x94 Chinese characters. If each set of metafont
>can carry 127 of them, you need more how many of them? and they all have to
>be defined by \font. I am afraid TeX (especially LaTeX) memory will be blown
>up. 

>JTeX used PK and TFM files which are not derived from Metafont, but they seem
>to have ways to reduce number of sets. By the way, JIS has 94x94 characters.

>I have completed a project of Chinese TeX, but it only supports PostScript
>for the timebeing. It has nothing to do with metafont. I have talked to your
>friend Nelson Beebe lately to have it installed in science.utah.edu for
>distribution. Of cource, if metafont is available, I will modify it to use
>metafont files indirectly. I don't want to see TeX/LaTeX memory and printer
>VM blown up.

Actually, Metafont can generate up to 65536 distinct character codes which
is sufficient for all existing character sets (although I've heard a
proposed 24bit Japanese set mentioned as a possibility for the future).

JTeX works by breaking down the JIS set into 256 character subfonts.
I believe that TeX retains the TFM organization of information in its
own font info tables, in which case a 256 character Kanji font would
probably take no more space than the info for a font like cmex10 (only
one height, width, and depth would be necessary). There is another
version of jTeX which uses 65536 character fonts as well.

The printer VM problem really isn't one because any decent DVI driver
only downloads those characters that are actually used, and if VM is
cleared after each page, it would be difficult to run out of VM.

The bigger problem is more the sheer tedium of writing and debugging
all the individual character programs. See my paper presented at the
TUG conference in August (to appear in the proceedings issue of TUGboat)
for details of one approach.

-dh
-- 
"Odi et amo, quare id faciam, fortasse requiris?
   nescio, sed fieri sentio et excrucior"          -Catullus
D.A. Hosek.                        UUCP: uunet!jarthur!dhosek
                               Internet: dhosek@hmcvax.claremont.edu

lee@uhccux.uhcc.hawaii.edu (Greg Lee) (12/01/89)

From article <3313@jarthur.Claremont.EDU>, by dhosek@jarthur.Claremont.EDU (D.A. Hosek):
>In article <20926@unix.cis.pitt.edu> jbw@unix.cis.pittsburgh.edu (Jingbai  Wang) writes:
>...

I've been working at printing Chinese, too.  I'll be eager to use
JB's Chinese TeX (but where does the actual font come from?).
Here is what I have so far, in case it might be of interest to anyone:

1) A set of 34 TeX-compatible subfonts, in 4 sizes, derived from the
24x24 bit Chinese font available by ftp from hanauma.stanford.edu
in pub/zhongwen.  The subfonts are pk and tfm files, meant to
be used with JTeX (or JTeX slightly modified).

2) A program p2ps, derived from the JTeX utility k2ps (which came
in turn from a2ps) for printing unformatted text with a mixture
of ordinary roman and Chinese on a PostScript printer.  It uses
the fonts mentioned in 1).

3) A partially working modification of JTeX to use the Chinese
fonts in place of the JIS Japanese fonts.  (At the moment, not
all the Chinese characters can be printed.)

Now, it may be I'll just give up my little project once I can
try out JB's Chinese TeX.  I don't know -- my real interest in
all this is in working toward some generalized facilities for
composing and using large fonts -- not just Japanese and
Chinese.  But now I have some questions:

Don Hosek mentions a variety of JTeX that uses one big font
instead of a bunch of subfonts (did I get that right?).  That
interests me.  Where can I get it?

What's the right convention for escaping Chinese text?  I'm
just using the JIS conventions now.  What about texts that
have roman + Japanese + Chinese and maybe other character
sets?  Is there any agreed on convention?

What about editing?  Is there any public domain editing
software for Chinese, like maybe a Chinese version of emacs?

Does anyone have good ways of extending character bit maps
to other sizes (e.g. 24x24 to 36x36)?  (My way of doing this
has some problems.)

			Greg, lee@uhccux.uhcc.hawaii.edu

jbw@unix.cis.pitt.edu (Jingbai Wang) (12/01/89)

In article <5578@uhccux.uhcc.hawaii.edu> lee@uhccux.uhcc.hawaii.edu (Greg Lee) writes:
>From article <3313@jarthur.Claremont.EDU>, by dhosek@jarthur.Claremont.EDU (D.A. Hosek):
|>In article <20926@unix.cis.pitt.edu> jbw@unix.cis.pittsburgh.edu (Jingbai  Wang) writes:
|>...
|Here is what I have so far, in case it might be of interest to anyone:
|...
|1) A set of 34 TeX-compatible subfonts, in 4 sizes, derived from the
|24x24 bit Chinese font available by ftp from hanauma.stanford.edu
|in pub/zhongwen.  The subfonts are pk and tfm files, meant to
|be used with JTeX (or JTeX slightly modified).
|
Yeah, that's how jTeX fonts were built.
|2) A program p2ps, derived from the JTeX utility k2ps (which came
|in turn from a2ps) for printing unformatted text with a mixture
|of ordinary roman and Chinese on a PostScript printer.  It uses
|the fonts mentioned in 1).

I am not impressed by k2ps, try out my WStroff which can not only
print unformatted text, but can also format text with Chinese fonts
of different sizes, Adobe fonts in any family. Chinese fonts are from
a whole set instead of subset.

|Now, it may be I'll just give up my little project once I can
|try out JB's Chinese TeX.  I don't know -- my real interest in
|all this is in working toward some generalized facilities for
|composing and using large fonts -- not just Japanese and
|Chinese.  But now I have some questions:

Why? We are using totally different approaches. It is alwasys good have
different ways of solution to a problem as in academic journals.

|Don Hosek mentions a variety of JTeX that uses one big font
|instead of a bunch of subfonts (did I get that right?).  That
|interests me.  Where can I get it?

I don't read TUGboat (because I was really a Scribe hacker and C
programmer, instead of TeX one), but I knew there were articles there
about it. Well, a font of more 256 characters should not surprise anybody
as computer text evolves, since 256 = 2^8 (8-bit representation or one byte
representation), and JIS (Japanese) and GB (Chinese) and Big-5 (Taiwan 
Chinese) are using 2 bytes, it is 2^16 = 65536. However, we only use
#161~#254 in both bytes because there are only 7000 some commonly used
Chinese characters or Japanese Kanji (HanZi, in Chinese PinYin), and we do
want to distinguish Chinese bytes from standard ASCII ones (#33~#126), 
remembering also not to use the control characters (#0~#31 and #127~#159).
#32 and #160 (128+32) are reserved for <space).  Thus,
65536 has only of theorectical beauty. If METAFONT also uses 16-bit encoding
instead of 7-bit or 8-bit, 65536-char font set should not scare anybody.
In my previous posting, I did not mean it was not possible to generate
Chinese fonts with Metafont, of course you can enven with 8-bit scheme. Just
break it up into subsets. The really problem is the efficiency of TeX and
printing. VM is a serious problem uf you have many distinct characters in
a page. After eash page, you can flush it, as Hosek said, but not in the
middle of the page.

Adobe is in an effort to support multiple bytes encoding scheme (two to
three bytes), this will make programmer's life easier. The only thing is that
the cumstomers have to pay bigger bucks for memory expansion  and Adobe
PS licensing (which is included in the printer price).

|
|What's the right convention for escaping Chinese text?  I'm
|just using the JIS conventions now.  What about texts that
|have roman + Japanese + Chinese and maybe other character
|sets?  Is there any agreed on convention?

I developed ChTeX before I saw JTeX, and thus I do not stick to JIS, and I
don't think any Chinese from China, Hong Kong, Taiwan, Singapore and overseas
will. GB is the way to go as far as I can see.

|
|What about editing?  Is there any public domain editing
|software for Chinese, like maybe a Chinese version of emacs?

I have it in ChTeX.tar.Z for mainframe systems. It is called ChText. It
follows the most natural way (and some completely new ideas) fro you to 
type in Chinese just like typing English. Chinese emacs may not be too
hard to design for some particular hardware with graphic capability, and
indeed for DOS PC somebody already adopted emacs/epsilon commands in a 
Chinese editor. The key issue in inputting Chinese, however, is how to
make inputting natural to human mind and international key board.

|			Greg, lee@uhccux.uhcc.hawaii.edu


JB Wang