[comp.archives] [sci.lang.japan] New Version of JDIC released

jwb@monu6.cc.monash.edu.au (Jim Breen) (06/20/91)

Archive-name: text/japanese/jdic/1991-06-18
Archive: monu6.cc.monash.edu.au:/pub/Nihongo/jdic*.zoo [130.194.32.106]
Original-posting-by: jwb@monu6.cc.monash.edu.au (Jim Breen)
Original-subject: New Version of JDIC released
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)


Version 1.3 of JDIC (Simple Japanese/English Dictionary Display
Program) is released, and is now in the pub/Nihongo directory on
monu6.cc.monash.edu.au (130.194.32.106). (This directory also
contains a copy of MOKE1.1)

V1.3 fixes known bugs, interworks properly with MOKE's
environment and includes immediate romaji -> kana translation
when searching by yomikata.

I will NOT be emailing the distribution; there is so much trouble
with mungeing mailers that it does not seem to be worth the
effort.

For those interested, I am including the .doc file below:
----------------------------------------------------------------

J D I C 

Simple English Japanese Dictionary Display 
========================================== 

Version 1.3 (June 1991) 

Introduction 
------------ 

This  program  provides a simple English/Japanese (kana & kanji) display of 
selected entries of a dictionary file.  While it will work (more  or  less) 
with  any text file containing a mix of Japanese and English words,  it has 
been designed specifically to operate on a dictionary in the "EDICT" format 
used by the MOKE (Mark's Own Kanji Editor) Japanese text editor. 

The executable code and documentation of JDIC is  hereby  released  to  the 
"public domain". All usage of this program is at the user's risk, and there 
is no warranty on its performance. 

All  the Japanese displayed is in kana and kanji,  so if you cannot read at 
least hiragana and katakana, this is not the program for you. 

Installation
------------

This program is distributed as a "zoo" archive (jdic13.zoo) containing  the 
following files: 

    jdic.exe    (the executable)
    jdic13.doc  (this documentation file)
    edict       (a sample Japanese/English dictionary file)
    *.bgi       (Borland Graphics drivers for various cards)

The files will need to be unpacked and copied into a directory on your hard 
disk.  If  you  are  storing  them in the same directory as your MOKE files 
(e.g. \kanji) be careful not to overwrite MOKE's "edict" file. In addition, 
the 16-bit JIS font files "k16jis1.fnt" and "k16jis2.fnt" must be  in  this 
directory.  These  latter  files are not included in this distribution.  If 
you use MOKE,  you have them already.  If not you will need to  track  them 
down at one of the FTP sites. 

The  executable  (jdic.exe)  will  need to be stored in a directory on your 
path if you wish to invoke JDIC from any directory.  This simplest approach 
is to add \kanji to your path. 

The  following environment variables may be set (note that they are exactly 
the same environment variables used by MOKE.) 

    bgi (the directory containing the bgi files. E.g. c:\tc or c:\kanji. If 
        this is not present,  the bgi files must be  in  the  directory  in 
        which JDIC is invoked.) 

    mokerc (the directory containing the moke.rc file.  E.g.  c:\kanji.  If 
           this is not present,  the current directory will be searched for 
           a file called moke.rc, and the directory details extracted.) 

    jgraphic  (set this to ATT400 if you have an AT&T high-resolution card. 
              Otherwise it will default to CGA.  NB: MOKE does not use this 
              variable.) 

If you wish to operate JDIC from other directories,  you must have  a  file 
"moke.rc" containing the following line: 

    kanjipath    directory-path    (e.g. C:\KANJI)

to  tell  the  program  the  location  of  the control and font files.  The 
environment variable  "mokerc"  must  be  used  to  specify  the  directory 
containing  "moke.rc".  (If  you  use  MOKE,  you  will have a moke.rc file 
already.) 

Operation
---------

JDIC must operate on a PC or AT with a graphics card.  It has been  written 
using  Turbo C 2.0,  and has been tested on VGA,  CGA,  ATT and HERC cards. 
Auto-detection is used to determine the type of graphics card. 

The invocation of JDIC is: 

    jdic               [uses a dictionary called "edict"]

or

    jdic dicname       [where dicname is the name of your dictionary]

The  default  dictionary  "edict"  is,  of  course,   the  name  of  MOKE's 
English/Japanese  dictionary  file.  It  will  be  located in the directory 
specified in your moke.rc file.  If you use an alternative  dictionary,  it 
can be in any directory. 

JDIC also needs an index file "<dicname>.jdx". If is not present it will be 
created.  JDIC saves the length of the dictionary file and the JDIC version 
in the .jdx file,  and if it detects that  either  have  changed,  it  will 
insist on recreating the index file.  Otherwise the dictionary look-up will 
be useless. 

Operation is very simple.  After loading the  dictionary,  index  and  font 
files,  the  full-screen working window is displayed with the "Enter Search 
String:" prompt. Type a few letters from the *start* of the word(s) you are 
seeking. JDIC does not match on strings in the middle of words. The scan is 
case-insensitive. 

A multi-line display is produced for all the matches  against  the  string. 
The display format is: 

Japanese(in kanji and/or kana) [yomikata in kana] english1, english2, etc.

A line is only displayed once per search, regardless of the number of hits. 

After a search,  a further prompt occurs at the bottom of the screen giving 
you the option of quitting (Q), requesting another search (A) or,  if there 
is still more information to display, requesting the next screen-full (M). 

You will notice an "(A)" in the top lefthand corner of the screen.  This is 
to indicate you are entering search strings in ascii (i.e. in English).  If 
you  press  F3  before  entering  a  string,  you  toggle  between (A)scii, 
(H)iragana and (K)atakana.  (Why F3?,  well that is the key that MOKE  uses 
for this function.) 

To  enter  a  search  string  in  kana,  type  it  in romaji and it will be 
converted to kana as you  type.  The  romaji->kana  translation  is  almost 
identical to that used in MOKE, i.e.  for a small "tsu" you can type either 
a double consonant, e.g. "shippai", or "t-", e.g. shit-pai, and for "n" you 
can type "n'" if necessary (e.g.  as in "kon'yaku").  Most of the time just 
typing  ordinary Hepburn or kunrei romaji works.  Note that the romaji must 
follow the kana style for long vowels. Tokyo must be toukyou, NOT tookyoo. 

The matching of kana strings insensitive to whether they  are  katakana  or 
hiragana.  The ONE difference between them is that typing a "-" in hiragana 
gets a "u", and in katakana gets a "-", just as in MOKE. 

The  display  is  in  "dictionary"  order  for  the  words  matched,   i.e. 
alphabetical for the ascii search,  and EUC order for the kana search.  EUC 
order is very close to the "gojun"  kana  order  in  Japanese  dictionaries 
except that it separates the syllables with nigori and maru. 

There  is  also an "Unlimited Display Mode" which is invoked by pressing F1 
before or during the entering of the search string.  In this mode you  will 
just  keep  scrolling  through the dictionary instead for stopping when you 
run out of matching strings.  Also in this mode entries are displayed every 
time  there  is  a match in the index table (normally an entry is displayed 
once only.) This mode is useful for doing maintenance  on  the  dictionary, 
and for just browsing. 

Dictionary
----------

Clearly  to  be  of  any use,  JDIC must have a reasonably good dictionary. 
Unfortunately there are no good machine readable dictionary  files  in  the 
public  domain yet.  Included with this distribution is the tiny EDICT file 
from MOKE 1.1 (the shareware version).  There is a bigger, but still rather 
limited  EDICT  supplied with MOKE 2.0 release,  however Mark Edwards,  the 
author, has not placed it in the public domain.  JDIC's author is compiling 
a supplement to MOKE (2.0)'s EDICT which will fill in the gaps,  but unless 
you buy MOKE 2.0 (after all,  it's only $US50) you will miss out on a  lot. 
(If  anyone  feels like contributing to a public domain dictionary in EDICT 
format, the author is willing to collate and distribute it.  Just email the 
pieces.) 

The  dictionary  file  must  use  the "EUC" coding for Japanese characters. 
MOKE's EDICT does this, so that was the coding adopted in JDIC. Files using 
JIS codings can be converted to EUC  using  MOKE  itself,  or  Ken  Lunde's 
"JIS.C" program. 

The format each entry of EDICT is:

Japanese [yomikata] /english1/english2/..../

If the word is in kana alone, the yomikata is omitted. 

Technical
---------

JDIC  holds  the  complete  dictionary  in  RAM,  along with the first 3490 
bitmaps of the JIS character set and  the  index  table.  The  index  table 
contains  an  entry  for each word in the dictionary,  sorted in alpha/kana 
order.  This enables a fast search to be done, and for the display to be in 
alphabetical order by keyword.  Common words like:  "of", "to", "the", etc. 
and grammatical terms like: "adj", "vi", "vt", etc. are not indexed. 

If a kanji is required that is not in the ~3000 most  common  ones,  it  is 
read from disk into a circular cache buffer. This happens rarely. 

JDIC  can  cope  with  dictionaries up to about 180 kbytes (MOKE's EDICT is 
about 60k).  If a larger dictionary ever comes available,  another  version 
could  operate with the dictionary on disk.  The parsing and sort to set up 
the index table would be slower,  but the searching  will  still  be  quite 
fast. 

Changes in Version 1.1
----------------------

o ATT graphics card handling.

o  fixes to the parsing of kanji/kana strings.  The result is that the .jdx 
  file is about 20% larger than in V1.0. 

Changes in Version 1.2
----------------------

o fixes to the kana->romaji code to handle "nyu" properly.

o facility to use dictionaries other than "edict".

o Unlimited Display Mode.

Changes in Version 1.3
----------------------

o immediate romaji->kana conversion (suggested by David Cowhig).

o examination of the "bgi" and  "mokerc"  environment  variables,  and  the 
"moke.rc" control file. 

It Doesn't Work!
----------------

Oh  dear.  If you do not get the introductory message,  you probably have a 
corrupted .exe. Try and get a clean copy.  Also your environment might have 
trouble with the output of a Turbo C 2.0 compilation/link. 

If you actually get started,  but cannot find any thing,  even when you put 
"a" as a search key,  delete your .jdx file and start again.  If  it  still 
doesn't work, mail the author a sample of your dictionary. 

Acknowledgements
----------------

A message from the author: 

I  wrote  this  program  to  gain experience in handling and displaying the 
Japanese character set,  and to exploit the dictionary that  came  with  my 
copy of MOKE.  I also wanted to brush up my C skills.  I make no claims for 
it,  but I am pleased how it turned out.  I  will  consider  releasing  the 
source (if anyone is actually interested in it) at a later date. 

I welcome suggestions, comments and constructive criticism. 

I  wrote  about  two-thirds of this program.  Great lumps of it were lifted 
with minor modifications from "KD" (Kanji Driver),  which  was  written  by 
Izumi  Ohzawa  at Berkeley,  in particular the JIS handling module (kjis.c) 
which was a port of "jis.pas" by Seiichi Nomura and Seke Wei. 

Ken Lunde's "japan.inf" and his elegant "jis.c" explained the  workings  of 
EUC and old/new JIS codes. 

Mark  Edwards'  MOKE  remains  the  tour  de  force  in this field,  and an 
inspiration for us all.  I regard JDIC as a humble and minor  accessory  to 
MOKE.  (I  use  tables lifted from two of the ".hlp" files in MOKE to drive 
the romaji->kana code.) 

Jim Breen
Department of Robotics & Digital Technology
Monash University
Melbourne, Australia
(jwb@monu6.cc.monash.edu.au)

May-June 1991
-- 
Jim Breen                                   AARNet:jwb@monu6.cc.monash.edu.au  
Department of Robotics & Digital Technology. 
Monash University. PO Box 197 Caulfield East VIC 3145 Australia
(ph) +61 3 573 2552 (fax) +61 3 573 2745           JIS:$B%8%`!!%V%j!<%s(J

-- comp.archives file verification
monu6.cc.monash.edu.au
-rw-r--r--  1 886      729         88014 Jun 13 17:35 /pub/Nihongo/jdic13.zoo
found jdic ok
monu6.cc.monash.edu.au:/pub/Nihongo/jdic*.zoo