alex@dutirt2.tudelft.nl (Alexander Vonk) (07/01/90)
L.S., Some three weeks ago, I posted a request for information concerning error handling in natural language processing, specifically error handling with respect to parsing with a (context-free) grammar. I received six reactions and I can say now that it certainly helped me to get somewhat more acquainted with the subject. Thanks to all who were so kind to respond. The rest of this article consists of the somewhat abridged version of the response collection split into text and address information ('----' separates entries, '****' separates text and adresses). Feel free to comment or react to this article. Some time next week I hope to have collected all articles referred to in the following (except the book on probabilistic grammars; sorry Chuck, but maybe someone else can use it), so anyone in the neighboorhood can get information from me about location of the material. Thanks everyone, Alexander Vonk (alex@dutirt2.tudelft.nl) C.S. student Technical University of Delft, Netherlands. *************************************************************************** HARPER@ccvax.ucd.ie writes: There is so much written on this it is difficult to find a convenient starting point. Misspellings are usually corrected at a pre-processing stage (is it in the lexicon? or close to an entry in the lexicon?). A trigram lookup approach is one way of correcting this problem, however, if each word is matched to its nearest equivalent then a lot of extraeneous legitimate parse trees are possible. The real problem is pragmatic "overshoot" where the interlocutor presumes more about the extent of the KB than is there. Correcting presupposition failure is still devoid of any uncontentious methodology. Get from the CEC in Brussels copies of deliverables 8 and 11 (the final report) on ESPRIT 1 project 527 CFID: Communication failure in dialogue, techniques for detection and repair. This project lasted for almost five years (I worked on it for four) and is definitely worth looking at. Also request reports on the LOKI project (ESPRIT 1 also). -------------------------------------------------------------------------- Dan Fass <fass@cs.sfu.ca> writes: There's quite a literature on error handling in Natural Language Processing systems. The error handling problem is often called ``handling ill-formed input.'' A good place to start is the Special Issue on Ill-Formed Input in the American Journal of Computational Linguistics (now called Computational Linguistics), volume 9, issues 3-4. Probably the best known people in the field are (1) Carbonell and Hayes and their colleagues Mouradian and Fainman, and (2) Weischedel and his colleagues Sondheimer and Ramshaw. Some references of theirs include: Carbonell, Jaime G. (1979). Towards a Self-Extending Parser. Proceedings of the 17th Annual Meeting of the Association for Computational Linguistics, San Diego, CA, pp. 3-7. Carbonell, Jaime G., and Philip J. Hayes (1983). Recovery Strategies for Parsing Extragrammatical Language. American Journal of Computational Linguistics, 9, (3-4), pp. 123-146. Carbonell, Jaime G., and Philip J. Hayes (1987). Robust Parsing Using Multiple Construction-Specific Strategies. In Leonard Bolc (Ed.) Natural Language Parsing Systems. Heidelberg, West Germany: Springer-Verlag, pp. 1-32. Hayes, Philip J., and G.V. Mouradian (1981). Flexible Parsing. American Journal of Computational Linguistics, 7, (4), pp. 232-241. Lehman, Jill Fain, and Jaime G. Carbonell (1989). Learning the User's Language: A Step Towards Automated Creation of User Models. In Alfred Kobsa and Wolfgang Wahlster (Eds.) \fIUser Models in Dialog Systems.\fR Berlin, West Germany: Springer-Verlag, pp. 163-194. Weischedel, Ralph M., and J. Black (1980). Responding Intelligently to Unparsable Inputs. American Journal of Computational Linguistics, 6, (2), pp. 97-109. Weischedel, Ralph M., and Lance A. Ramshaw (1987). Reflections on the Knowledge Needed to Process Ill-Formed Language. In Sergei Nirenburg (Ed.) Machine Translation: Theoretical and Methodological Issues. Cambridge, England: Cambridge University Press, pp. 155-167. Weischedel, Ralph M., and Norman J. Sondheimer (1983). Meta-Rules as a Basis for Processing Ill-Formed Input. American Journal of Computational Linguistics, Special Issue on Ill-Formed Input, 9, (3-4), pp. 161-177. Other recent papers that might be of interest: Dale, Robert (1989). Computer-Based Editorial Aids. In Jeremy Peckham (Ed.) Recent Developments and Applications of Natural Language Processing. London, England: Kogan Page Limited, pp. 8-22. Fass, Dan C., Nicholas J. Cercone, Gary Hall, Chris Groeneboer, Paul McFetridge and Fred Popowich (to appear). A Classification of User-System Interactions in Natural Language, with Special Reference to ``Ill-Formed Input.'' In 5th Rocky Mountain Conference on Artificial Intelligence (RMCAI-90), Las Cruces, NM, June 28-30 1990. Fass, Dan C., and Gary Hall (to appear). A Belief-Based View of Ill-Formed Input. In Computational Intelligence `90, Milan, Italy, September 24-28 1990. The above three papers all contain discussions of the spelling error/unknown word problem. The Hayes/Mouradian and Carbonell/Hayes papers contain extensive discussion of treating partial sentences. -------------------------------------------------------------------------- lavelli@irst.it (Alberto Lavelli) writes: There are many papers about Error Handling in NLP. I'll try to choose the references more relevant according to what you have written in your letter. First of all, there is the special issue on Ill-Formed Input of the American Journal of Computational Linguistics (volume 9, Numbers 3-4, 1983). This issue can give you a general (but not particularly up-to-date) overview of the field. Papers more connected with the examples in your letter (i.e., missing words, spurious words, etc.) and based on chart parsing are: Mellish - Some Chart-Based Techniques for Parsing Ill-Formed Input - Proceedings of ACL 89. Lang - Parsing Incomplete Sentences - Proceedings of Coling 88. I have tried to be concise and to give you only a very limited number of relevant references; if you need more information, let me know. If you have problems in getting any of these papers, let me know and I'll send them to you. -------------------------------------------------------------------------- Ei-ichi Osawa <osawa%csl.sony.jp@RELAY.CS.NET> writes: As I am not an expert in that field, I cannot give you a well-informed pointer. But your article quickly remind me of the following paper by Chris S. Mellish. Chris S. Mellish, "Some Chart-Based Techniques For Parsing Ill-Formed Input", In Proceedings of 27th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 1989 I think error handling techniques he proposed in the paper is much related to your idea. Therefore the article must be worth reading for you. Here follows the abstract of his paper. ABSTRACT: We argue for the usefulness of an active chart as the basis of a system that searches for the globally most plausible explanation of failure to syntactically parse a given input. We suggest semantics-free, grammar-independent techniques for parsing inputs displaying simple kind of ill-formedness and discuss the search issues involved. Also you will be able to obtain related articles by writing to: Chris S. Mellish Department of Artificial Intelligence University of Edinburgh 80 South Bridge Edinburgh EH1 1HN Scotland -------------------------------------------------------------------------- houpt@svax.cs.cornell.edu (Charles E. Houpt) One book you might be interested in is: "The Computational Analysis of English: a corpus-based approach" Edited by Roger Garside, Geoffrey Leech, and Geoffrey Sampson 1987 Longman, London, ISBN 0-582-29149-6 It has a chapter on dealing with ill-formed text. Rather than using the traditional Chomskyan/theoretical approach to NLP, this book takes an empirical approach. Large corpora of text are used to build up statistical models of grammar. A Probabilistic grammar is used instead of formal grammars (context free, ATN etc). Probabilistic grammars are robust and thus can handle ill-formed text. This book won't help you directly with error correction in formal grammars, but it provides a fascinating alternative. -------------------------------------------------------------------------- Eric Bilange <bilange@capsogeti.fr> writes: I've read your question in the news about error handling. I assume your question was focused around error handling in parsing. Moreover, the type of error is: what to do when the sentence you are parsing is not in your grammar definition (this makes sensible the use of the word "error"). In such a case, I think you should look at the state of the art in speech understanding technologies. I don't know whether you are familiar with such activity. From the acoustic signal you build a lattice of objects (syllables, phonemes or words) where one axis is the time and the other the vocabulary where the better candidates have the highest score. This lattice is the input of the parser. Various technics are available (charts, ucg, ...) but the most up to date is based on island parsing: that is you look for islands (syntactically and semanticaly grounded) and you try to find links between islands. The translation between written an oral language is simple: in written language you have the links but your grammar rejects them so the first principle is to conjecture grounded links. You may find references of such work in ICASSP proceedings, EUROSPEECH and in COLING, ACL (even if they are not oral oriented). The other entry point could be the technics used in orthographic correctors (but this is not my cup of tea). Finaly, I suggest you to read a paper of Jacques Vergnes to appear in COLING this year (August). You will be surpised to see that he has no grammar and his parser is efficient ! *************************************************************************** Condensed address information about received articles: HARPER@ccvax.ucd.ie Jerry Harper: Computer Science Department,University College Dublin,Dublin 4,IRELAND harper@ccvax.ucd.ie -------------------------------------------------------------------------- Dan Fass <fass@cs.sfu.ca> -------------------------------------------------------------------------- lavelli@irst.it (Alberto Lavelli) IRST, Istituto per la Ricerca Scientifica e Tecnologica I-38050 Povo TN, ITALY from ARPA: lavelli%irst@uunet.uu.net phone: +39-461-810105 -------------------------------------------------------------------------- Ei-ichi Osawa <osawa%csl.sony.jp@RELAY.CS.NET> Sony Computer Science Laboratory Inc., Tokyo, Japan -------------------------------------------------------------------------- houpt@svax.cs.cornell.edu (Charles E. Houpt) -Chuck Houpt houpt@cs.cornell.edu -------------------------------------------------------------------------- Eric Bilange <bilange@capsogeti.fr> Research Engineer CAP GEMINI INNOVATION Paris Research Centre Tel: +33 (1) 40 54 66 25 +33 (1) 40 54 66 66 ext. 66 25 118, rue de Tocqueville Fax: +33 (1) 42 67 41 39 75017 Paris France bilange@csinn.UUCP bilange@crp.capsogeti.fr **************************************************************************** +++ Alexander Vonk - Technical Univ. Delft, Netherlands +++ +++Phone: (NL) 015 - 78 64 12 (world) 31 15 78 64 12 +++ +++Mail: alex@dutirt2.tudelft.nl or alex@dutirt2.UUCP +++