lee@uhccux.uhcc.Hawaii.Edu (Greg Lee) (10/22/90)
Archive-name: uhccux-linguist/21-Oct-90 Original-posting-by: lee@uhccux.uhcc.Hawaii.Edu (Greg Lee) Original-subject: pckimmo etc. by ftp Archive-site: uhccux.uhcc.hawaii.edu [128.171.7.2] Archive-directory: linguist Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti) Several files that might interest people interested in language are available by anonymous ftp from uhccux.uhcc.hawaii.edu. Look in the directory `linguist'. For one thing, there is the C source code for pckimmo, information about which is appended below after the list of files available. ------------------------------------ README file from linguist directory: ------------------------------------ This directory contains programs or data for natural language work. See the packages themselves for docs. Use binary mode for ftp'ing. Files whose names end in .Z must be uncompressed. Questions to Greg Lee (lee@uhccux.uhcc.hawaii.edu) or David Stampe (stampe@uhccux.uhcc.hawaii), or to the original authors. Here is some information about the files here: tree1.1.tar.Z program, utilities, documentation for displaying trees on screen or (using TeX) to print; this is version 1.1 of July, 1990 -- gl phon.tar.Z a little phonemic translator program, slightly adapted from a version posted in sci.lang in August, 1989 -- gl pck10B.tar.Z C source code for pckimmo, the SIL implementation of the Kimmo morphological analyzer; this is version 1.0B of Sept. 7, 1990 rc0.1.tar.Z a theoretically biased rule compiler for pckimmo; this is version 0.1 of August, 1990 -- gl bible.Z text of King James Bible. eth.Z database for Barbara Grimes' Ethnologue, 11th edition. wgt.lst.Z World Genetic Tree [of languages], by J. Grimes, B. Bright, and B. Comrie. Supplements and corrects genetic information in Ethnologue. -------------------------------- README file from pckimmo distribution, version 1.0B: -------------------------------- Dallas, Texas September 7, 1990 PC-KIMMO: a two-level processor for morphological analysis PC-KIMMO is an implementation for microcomputers of a program dubbed KIMMO after its inventor Kimmo Koskenniemi. It is of interest to computational linguists, descriptive linguists, and those developing natural language processing systems. The program is designed to generate (produce) and/or recognize (parse) words using a two-level model of word structure in which a word is represented as a correspondence between its lexical level form and its surface level form. PC-KIMMO is language-independant. For each language description the user prepares two input files: (1) a set of rules that govern phonological/orthographic alternations and (2) a lexicon that lists all words (morphemes) in their lexical form and specifies constraints on their order. The rules and lexicon are implemented computationally using finite state machines. The purpose of developing PC-KIMMO is to provide a version of the two-level processor that runs on an IBM PC compatible computer. The PC-KIMMO program is actually a shell program that serves as an interactive user interface to the primitive PC-KIMMO functions. It provides an environment for developing, testing, and debugging two-level descriptions. The primitive PC-KIMMO functions are also available as a C-language source code library that can be included in a program written by the user. This means that the user can develop and debug a two-level description using the PC-KIMMO shell and then link PC-KIMMO's functions into his own program. For example, a syntactic parsing program could use PC-KIMMO as a morphological preprocessor. PC-KIMMO will run on the following systems: MS-DOS or PC-DOS (any IBM PC compatible) UNIX System V (SCO UNIX V/386 and A/UX) and 4.2 BSD UNIX (SunOS) Apple Macintosh The PC-KIMMO software is packaged with the book that describes how to use it: Antworth, Evan L. 1990. PC-KIMMO: a two-level processor for morphological analysis. Occasional Publications in Academic Computing No. 16. Dallas, TX: Summer Institute of Linguistics. ISBN 0-88312-639-7, 273pp., $23.00. The book is a full-length tutorial on writing two-level linguistic descriptions with PC-KIMMO. It also fully documents the PC-KIMMO user interface and the source code function library. The book with release diskette(s) is available from: International Academic Bookstore 7500 W. Camp Wisdom Road Dallas TX, 75236 phone 214/709-2404 There are two versions of the PC-KIMMO release diskette(s), one for IBM PC compatibles and one for the Macintosh. Each contains the executable PC-KIMMO program, examples of language descriptions, and the source code library for the primitive PC-KIMMO functions. The PC-KIMMO executable program and the source code library are copyrighted but are made freely available to the general public under the condition that they not be resold or used for commercial purposes. PC-KIMMO is a research project in progress, not a finished commercial product. In this spirit, we invite your response to the software and the book. Please direct your comments to: Academic Computing Department PC-KIMMO project 7500 W. Camp Wisdom Road Dallas, TX 75236 phone: 214/709-2418 Internet: evan@txsil.lonestar.org (Evan Antworth) (via Compuserve: >Internet evan@txsil.lonestar.org) =============================================================================== This is the source code for Version 1.0B (6 September 1990) of the PC-KIMMO program described above. (Changes from Version 1.0 relate only to portability problems and other bugs that have been discovered and fixed.) You really should have a copy of the (printed and bound) book to know what this program does and how to use this particular implementation. (In addition to an MS-DOS or Macintosh executable, the normal distribution has a number of sample files illustrating the program.) You should have these files: Makefile generic UNIX makefile Makefile.MSC Microsoft C makefile (needs better make than Microsoft's) Makefile.SCO SCO UNIX V/386 makefile (3 Unix compilers and a cross-compiler for MS-DOS) README this file g.c toy generator program (trivial user interface) generate.c source code for word generation lexicon.c source code for dealing with the lexicon pckfuncs.c source code for functions used throughout the program pckimmo.c source code for main() and global variables pckimmo.h header file with structure definitions, etc. r.c toy recognizer program (trivial user interface) recogniz.c source code for word recognition rules.c source code for dealing with the rules usercmd.c source code for user interface userfunc.c source code for functions used by user interface For the paranoid, here's a listing of the checksum and size information for each of these files, as output by the UNIX sum (System V "sum -r") and wc programs: file sum [-r] wc ---- ----------- ------------------ Makefile 17720 3 68 165 1491 Makefile.MSC 57298 2 30 80 1012 Makefile.SCO 02163 7 105 354 3147 README ***** *** 139 890 6681 g.c 43535 3 56 164 1106 generate.c 49461 31 522 2188 15658 lexicon.c 37281 50 878 3468 25583 pckfuncs.c 59123 61 1124 4712 31123 pckimmo.c 03870 17 302 1299 8540 pckimmo.h 12730 19 275 1274 9339 r.c 64881 3 55 165 1140 recogniz.c 23611 46 741 3135 23252 rules.c 19803 90 1577 6284 45716 usercmd.c 09851 132 2375 9133 67269 userfunc.c 38313 100 1992 7108 50874 Have fun. Comments regarding program bugs or desired features should be sent to Evan Antworth at the addresses given above. (Compliments are also welcome... :-) If you have problems compiling the code for your system, it may be worth communicating with me (Steve) rather than Evan since he's a linguist, and I'm a computerist. -- Stephen McConnel Summer Institute of Linguistics PHONE: 214-709-2418 7500 W. Camp Wisdom Road UUCP: ...!{convex|utafll}!txsil!steve Dallas, TX 75236 Internet: steve@txsil.lonestar.org