[comp.ai] [Summary] Info on Error Handling in Natural Language Processing

alex@dutirt2.tudelft.nl (Alexander Vonk) (07/01/90)

L.S.,

Some three weeks ago, I posted a request for information concerning
error handling in natural language processing, specifically error
handling with respect to parsing with a (context-free) grammar.

I received six reactions and I can say now that it certainly helped me
to get somewhat more acquainted with the subject.  Thanks to all who
were so kind to respond. The rest of this article consists of the
somewhat abridged version of the response collection split into text and
address information ('----' separates entries, '****' separates text and
adresses). 

Feel free to comment or react to this article. Some time next week I hope
to have collected all articles referred to in the following (except the
book on probabilistic grammars; sorry Chuck, but maybe someone else can
use it), so anyone in the neighboorhood can get information from me about
location of the material.

Thanks everyone,

Alexander Vonk (alex@dutirt2.tudelft.nl)
C.S. student
Technical University of Delft, Netherlands.

***************************************************************************

HARPER@ccvax.ucd.ie writes:

There is so much written on this it is difficult to find a convenient
starting point.  Misspellings are usually corrected at a
pre-processing stage (is it in the lexicon? or close to an entry in
the lexicon?).  A trigram lookup approach is one way of correcting
this problem, however, if each word is matched to its nearest
equivalent then a lot of extraeneous legitimate parse trees are
possible.  The real problem is pragmatic "overshoot" where the
interlocutor presumes more about the extent of the KB than is there.
Correcting presupposition failure is still devoid of any uncontentious
methodology.

Get from the CEC in Brussels copies of deliverables 8 and 11 (the
final report) on ESPRIT 1 project 527 CFID: Communication failure in
dialogue, techniques for detection and repair.  This project lasted
for almost five years (I worked on it for four) and is definitely
worth looking at.  Also request reports on the LOKI project (ESPRIT 1
also).
 
--------------------------------------------------------------------------
 
Dan Fass <fass@cs.sfu.ca> writes:

There's quite a literature on error handling in Natural Language
Processing systems. The error handling problem is often called
``handling ill-formed input.'' A good place to start is the Special
Issue on Ill-Formed Input in the American Journal of Computational
Linguistics (now called Computational Linguistics), volume 9, issues
3-4.

Probably the best known people in the field are 
(1) Carbonell and Hayes and their colleagues Mouradian and Fainman, and
(2) Weischedel and his colleagues Sondheimer and Ramshaw.

		Some references of theirs include:

Carbonell, Jaime G. (1979).
Towards a Self-Extending Parser.
Proceedings of the 17th Annual Meeting of the Association for Computational
Linguistics, San Diego, CA, pp. 3-7.

Carbonell, Jaime G., and Philip J. Hayes (1983). 
Recovery Strategies for Parsing Extragrammatical Language.
American Journal of Computational Linguistics, 9, (3-4), pp. 123-146.

Carbonell, Jaime G., and Philip J. Hayes (1987). 
Robust Parsing Using Multiple Construction-Specific Strategies.
In Leonard Bolc (Ed.) Natural Language Parsing Systems.
Heidelberg, West Germany: Springer-Verlag, pp. 1-32.

Hayes, Philip J., and G.V. Mouradian (1981).
Flexible Parsing.
American Journal of Computational Linguistics, 7, (4), pp. 232-241.

Lehman, Jill Fain, and Jaime G. Carbonell (1989).
Learning the User's Language: A Step Towards Automated Creation of User Models.
In Alfred Kobsa and Wolfgang Wahlster (Eds.) \fIUser Models in Dialog 
Systems.\fR Berlin, West Germany: Springer-Verlag, pp. 163-194.

Weischedel, Ralph M., and J. Black (1980).
Responding Intelligently to Unparsable Inputs.
American Journal of Computational Linguistics, 6, (2), pp. 97-109.

Weischedel, Ralph M., and Lance A. Ramshaw (1987).
Reflections on the Knowledge Needed to Process Ill-Formed Language.
In Sergei Nirenburg (Ed.) Machine Translation: Theoretical and Methodological 
Issues. Cambridge, England: Cambridge University Press, pp. 155-167.

Weischedel, Ralph M., and Norman J. Sondheimer (1983).
Meta-Rules as a Basis for Processing Ill-Formed Input. 
American Journal of Computational Linguistics, Special Issue on Ill-Formed 
Input, 9, (3-4), pp. 161-177.

		Other recent papers that might be of interest:

Dale, Robert (1989).
Computer-Based Editorial Aids.
In Jeremy Peckham (Ed.) Recent Developments and Applications of Natural
Language Processing. London, England: Kogan Page Limited, pp. 8-22.

Fass, Dan C., Nicholas J. Cercone, Gary Hall, Chris Groeneboer, Paul McFetridge
and Fred Popowich (to appear).
A Classification of User-System Interactions in Natural Language, with Special 
Reference to ``Ill-Formed Input.''
In 5th Rocky Mountain Conference on Artificial Intelligence (RMCAI-90), Las 
Cruces, NM, June 28-30 1990.

Fass, Dan C., and Gary Hall (to appear).
A Belief-Based View of Ill-Formed Input.
In Computational Intelligence `90, Milan, Italy, September 24-28 1990.

The above three papers all contain discussions of the spelling error/unknown
word problem.
The Hayes/Mouradian and Carbonell/Hayes papers contain extensive discussion 
of treating partial sentences.

--------------------------------------------------------------------------

lavelli@irst.it (Alberto Lavelli) writes:

There are many papers about Error Handling in NLP.
I'll try to choose the references more relevant according
to what you have written in your letter.

First of all, there is the special issue on Ill-Formed Input of the
American Journal of Computational Linguistics (volume 9, Numbers 3-4,
1983).  This issue can give you a general (but not particularly
up-to-date) overview of the field.

Papers more connected with the examples in your letter (i.e., missing
words, spurious words, etc.)  and based on chart parsing are:

Mellish - Some Chart-Based Techniques for Parsing Ill-Formed Input - 
        Proceedings of ACL 89.
Lang - Parsing Incomplete Sentences - 
     Proceedings of Coling 88.

I have tried to be concise and to give you only a very limited number
of relevant references; if you need more information, let me know.  If
you have problems in getting any of these papers, let me know and I'll
send them to you.

--------------------------------------------------------------------------

Ei-ichi Osawa <osawa%csl.sony.jp@RELAY.CS.NET> writes:

As I am not an expert in that field, I cannot give you a well-informed
pointer.  But your article quickly remind me of the following paper
by Chris S. Mellish. 

Chris S. Mellish, "Some Chart-Based Techniques For Parsing Ill-Formed
	Input",  In Proceedings of 27th Annual Meeting of the
	Association for Computational Linguistics, Vancouver, Canada,
	1989

I think error handling techniques he proposed in the paper is much
related to your idea.  Therefore the article must be worth reading for
you.

Here follows the abstract of his paper. 

	ABSTRACT: We argue for the usefulness of an active chart as
	the basis of a system that searches for the globally most
	plausible explanation of failure to syntactically parse a
	given input.  We suggest semantics-free, grammar-independent
	techniques for parsing inputs displaying simple kind of
	ill-formedness and discuss the search issues involved.

Also you will be able to obtain related articles by writing to:

	Chris S. Mellish
	Department of Artificial Intelligence
	University of Edinburgh
	80 South Bridge
	Edinburgh EH1 1HN
	Scotland

--------------------------------------------------------------------------

houpt@svax.cs.cornell.edu (Charles E. Houpt)

One book you might be interested in is:

"The Computational Analysis of English: a corpus-based approach"
Edited by Roger Garside, Geoffrey Leech, and Geoffrey Sampson
1987 Longman, London, ISBN 0-582-29149-6

It has a chapter on dealing with ill-formed text.

Rather than using the traditional Chomskyan/theoretical approach to
NLP, this book takes an empirical approach. Large corpora of text are
used to build up statistical models of grammar. A Probabilistic
grammar is used instead of formal grammars (context free, ATN etc).
Probabilistic grammars are robust and thus can handle ill-formed text.

This book won't help you directly with error correction in formal grammars,
but it provides a fascinating alternative.

--------------------------------------------------------------------------

Eric Bilange <bilange@capsogeti.fr> writes:

I've read your question in the news about error handling. I assume your
question was focused around error handling in parsing.  Moreover, the
type of error is: what to do when the sentence you are parsing is not
in your grammar definition (this makes sensible the use of the word
"error"). In such a case, I think you should look at the state of the
art in speech understanding technologies. I don't know whether you are
familiar with such activity. From the acoustic signal you build a
lattice of objects (syllables, phonemes or words) where one axis is
the time and the other the vocabulary where the better candidates have
the highest score. This lattice is the input of the parser. Various
technics are available (charts, ucg, ...) but the most up to date is
based on island parsing: that is you look for islands (syntactically
and semanticaly grounded) and you try to find links between islands.
The translation between written an oral language is simple: in written
language you have the links but your grammar rejects them so the first
principle is to conjecture grounded links. You may find references of
such work in ICASSP proceedings, EUROSPEECH and in COLING, ACL (even
if they are not oral oriented).

The other entry point could be the technics used in orthographic
correctors (but this is not my cup of tea).

Finaly, I suggest you to read a paper of Jacques Vergnes to appear in
COLING this year (August). You will be surpised to see that he has no
grammar and his parser is efficient !

***************************************************************************
Condensed address information about received articles:

HARPER@ccvax.ucd.ie
Jerry Harper:
Computer Science Department,University College Dublin,Dublin 4,IRELAND
harper@ccvax.ucd.ie
--------------------------------------------------------------------------
Dan Fass <fass@cs.sfu.ca>
--------------------------------------------------------------------------
lavelli@irst.it (Alberto Lavelli)
IRST, Istituto per la Ricerca Scientifica e Tecnologica
I-38050 Povo TN, ITALY
from ARPA: lavelli%irst@uunet.uu.net
phone:  +39-461-810105
--------------------------------------------------------------------------
Ei-ichi Osawa <osawa%csl.sony.jp@RELAY.CS.NET>
Sony Computer Science Laboratory Inc., Tokyo, Japan
--------------------------------------------------------------------------
houpt@svax.cs.cornell.edu (Charles E. Houpt)
-Chuck Houpt
 houpt@cs.cornell.edu
--------------------------------------------------------------------------
Eric Bilange <bilange@capsogeti.fr>
		Research Engineer
CAP GEMINI INNOVATION
Paris Research Centre                    Tel: +33 (1) 40 54 66 25
					      +33 (1) 40 54 66 66 ext. 66 25
118, rue de Tocqueville                  Fax: +33 (1) 42 67 41 39
75017 Paris France                       bilange@csinn.UUCP
					 bilange@crp.capsogeti.fr
****************************************************************************

+++	Alexander Vonk - Technical Univ. Delft, Netherlands	+++
+++Phone:	(NL) 015 - 78 64 12	(world) 31 15 78 64 12	+++
+++Mail:	alex@dutirt2.tudelft.nl	or alex@dutirt2.UUCP	+++