[comp.archives] [sci.lang] pckimmo etc. by ftp

lee@uhccux.uhcc.Hawaii.Edu (Greg Lee) (10/22/90)

Archive-name: uhccux-linguist/21-Oct-90
Original-posting-by: lee@uhccux.uhcc.Hawaii.Edu (Greg Lee)
Original-subject: pckimmo etc. by ftp
Archive-site: uhccux.uhcc.hawaii.edu [128.171.7.2]
Archive-directory: linguist
Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti)


Several files that might interest people interested in language are
available by anonymous ftp from uhccux.uhcc.hawaii.edu.  Look in
the directory `linguist'.  For one thing, there is the C source code
for pckimmo, information about which is appended below after the
list of files available.

------------------------------------
README file from linguist directory:
------------------------------------
This directory contains programs or data for natural language work.
See the packages themselves for docs.  Use binary mode for ftp'ing.
Files whose names end in .Z must be uncompressed.

Questions to Greg Lee (lee@uhccux.uhcc.hawaii.edu) or David Stampe
(stampe@uhccux.uhcc.hawaii), or to the original authors.

Here is some information about the files here:

tree1.1.tar.Z
	program, utilities, documentation for displaying trees
	on screen or (using TeX) to print; this is version 1.1
	of July, 1990 -- gl
phon.tar.Z
	a little phonemic translator program, slightly adapted
	from a version posted in sci.lang in August, 1989 -- gl
pck10B.tar.Z
	C source code for pckimmo, the SIL implementation of the Kimmo
	morphological analyzer; this is version 1.0B of Sept. 7, 1990
rc0.1.tar.Z
	a theoretically biased rule compiler for pckimmo; this is
	version 0.1 of August, 1990 -- gl
bible.Z
	text of King James Bible.
eth.Z
	database for Barbara Grimes' Ethnologue, 11th edition.
wgt.lst.Z
	World Genetic Tree [of languages], by J. Grimes, B. Bright,
	and B. Comrie.  Supplements and corrects genetic information
	in Ethnologue.

--------------------------------
README file from pckimmo distribution, version 1.0B:
--------------------------------
                                                Dallas, Texas
                                                September 7, 1990

PC-KIMMO: a two-level processor for morphological analysis

PC-KIMMO is an implementation for microcomputers of a program dubbed
KIMMO after its inventor Kimmo Koskenniemi.  It is of interest to
computational linguists, descriptive linguists, and those developing
natural language processing systems.  The program is designed to
generate (produce) and/or recognize (parse) words using a two-level
model of word structure in which a word is represented as a
correspondence between its lexical level form and its surface level
form.  PC-KIMMO is language-independant.  For each language
description the user prepares two input files: (1) a set of rules
that govern phonological/orthographic alternations and (2) a lexicon
that lists all words (morphemes) in their lexical form and specifies
constraints on their order.  The rules and lexicon are implemented
computationally using finite state machines.

The purpose of developing PC-KIMMO is to provide a version of the
two-level processor that runs on an IBM PC compatible computer.
The PC-KIMMO program is actually a shell program that serves as an
interactive user interface to the primitive PC-KIMMO functions.  It
provides an environment for developing, testing, and debugging
two-level descriptions.  The primitive PC-KIMMO functions are also
available as a C-language source code library that can be included
in a program written by the user.  This means that the user can
develop and debug a two-level description using the PC-KIMMO shell
and then link PC-KIMMO's functions into his own program.  For
example, a syntactic parsing program could use PC-KIMMO as a
morphological preprocessor.

PC-KIMMO will run on the following systems:
    MS-DOS or PC-DOS (any IBM PC compatible)
    UNIX System V (SCO UNIX V/386 and A/UX) and 4.2 BSD UNIX (SunOS)
    Apple Macintosh

The PC-KIMMO software is packaged with the book that describes how
to use it:

    Antworth, Evan L. 1990. PC-KIMMO: a two-level processor
      for morphological analysis. Occasional Publications in
      Academic Computing No. 16. Dallas, TX: Summer Institute of
      Linguistics. ISBN 0-88312-639-7, 273pp., $23.00.

The book is a full-length tutorial on writing two-level linguistic
descriptions with PC-KIMMO.  It also fully documents the PC-KIMMO
user interface and the source code function library.  The book with
release diskette(s) is available from:

    International Academic Bookstore
    7500 W. Camp Wisdom Road
    Dallas TX, 75236
    phone 214/709-2404

There are two versions of the PC-KIMMO release diskette(s), one
for IBM PC compatibles and one for the Macintosh.  Each contains
the executable PC-KIMMO program, examples of language
descriptions, and the source code library for the primitive
PC-KIMMO functions.  The PC-KIMMO executable program and the
source code library are copyrighted but are made freely available
to the general public under the condition that they not be resold
or used for commercial purposes.

PC-KIMMO is a research project in progress, not a finished
commercial product.  In this spirit, we invite your response to
the software and the book. Please direct your comments to:

    Academic Computing Department
    PC-KIMMO project
    7500 W. Camp Wisdom Road
    Dallas, TX 75236

    phone: 214/709-2418

    Internet: evan@txsil.lonestar.org (Evan Antworth)
    (via Compuserve: >Internet evan@txsil.lonestar.org)

===============================================================================

This is the source code for Version 1.0B (6 September 1990) of the PC-KIMMO
program described above.  (Changes from Version 1.0 relate only to
portability problems and other bugs that have been discovered and fixed.)
You really should have a copy of the (printed and bound) book to know what
this program does and how to use this particular implementation.  (In
addition to an MS-DOS or Macintosh executable, the normal distribution has
a number of sample files illustrating the program.)

You should have these files:

Makefile        generic UNIX makefile
Makefile.MSC    Microsoft C makefile (needs better make than Microsoft's)
Makefile.SCO    SCO UNIX V/386 makefile (3 Unix compilers and a cross-compiler
                for MS-DOS)
README          this file
g.c             toy generator program (trivial user interface)
generate.c      source code for word generation
lexicon.c       source code for dealing with the lexicon
pckfuncs.c      source code for functions used throughout the program
pckimmo.c       source code for main() and global variables
pckimmo.h       header file with structure definitions, etc.
r.c             toy recognizer program (trivial user interface)
recogniz.c      source code for word recognition
rules.c         source code for dealing with the rules
usercmd.c       source code for user interface
userfunc.c      source code for functions used by user interface

For the paranoid, here's a listing of the checksum and size information for
each of these files, as output by the UNIX sum (System V "sum -r") and wc
programs:

file                    sum [-r]       wc
----                    -----------    ------------------
Makefile                17720     3      68    165   1491
Makefile.MSC            57298     2      30     80   1012
Makefile.SCO            02163     7     105    354   3147
README                  *****   ***     139    890   6681
g.c                     43535     3      56    164   1106
generate.c              49461    31     522   2188  15658
lexicon.c               37281    50     878   3468  25583
pckfuncs.c              59123    61    1124   4712  31123
pckimmo.c               03870    17     302   1299   8540
pckimmo.h               12730    19     275   1274   9339
r.c                     64881     3      55    165   1140
recogniz.c              23611    46     741   3135  23252
rules.c                 19803    90    1577   6284  45716
usercmd.c               09851   132    2375   9133  67269
userfunc.c              38313   100    1992   7108  50874

Have fun.  Comments regarding program bugs or desired features should be
sent to Evan Antworth at the addresses given above.  (Compliments are also
welcome... :-)  If you have problems compiling the code for your system,
it may be worth communicating with me (Steve) rather than Evan since he's
a linguist, and I'm a computerist.
--
Stephen McConnel
Summer Institute of Linguistics  PHONE: 214-709-2418
7500 W. Camp Wisdom Road          UUCP: ...!{convex|utafll}!txsil!steve
Dallas, TX 75236              Internet: steve@txsil.lonestar.org