[net.ai] Natural Language Understanding

shebs@utah-cs.UUCP (Stanley Shebs) (09/20/83)

Lest usenet readers think things had gotten silent all at once,
here's an article by Fernando Pereira that (apparently and
inexplicably) was *not* sent to usenet, and my reply
(fortunately, I now have read-only access to Arpanet, so I was
able to find out about this)
_____________________

Date: Wed 31 Aug 83 18:42:08-PDT
From: PEREIRA@SRI-AI.ARPA
Subject: Solutions of the natural language analysis problem

Given the downhill trend of some contributions on natural language
analysis in this group, this is my last comment on the topic, and is
essentially an answer to Stan the leprechaun hacker (STLH for short).

I didn't "admit" that grammars only reflect some aspects of language.
(Using loaded verbs such as "admit" is not conducive to the best
quality of discussion.)  I just STATED THE OBVIOUS. The equations of
motion only reflect SOME aspects of the material world, and yet no
engineer goes without them. I presented this point at greater length
in my earlier note, but the substantive presentation of method seems
to have gone unanswered. Incidentally, I worked for several years in a
civil engineering laboratory where ACTUAL dams and bridges were
designed, and I never saw there the preference for alchemy over
chemistry that STLH suggests is the necessary result of practical
concerns. Elegance and reproduciblity do not seem to be enemies of
generality in other scientific or engineering disciplines.  Claiming
for AI an immunity from normal scientific standards (however flawed
on the deffensive because of media hype, but will surely come back to
the fray, with that weapon plus a long list of unfulfilled promises
and irreproducible "results."

Lack of rigor follows from lack of method. STLH tries to bludgeon us
with "generating *all* the possible meanings" of a sentence.  Does he
mean ALL of the INFINITY of meanings a sentence has in general? Even
leaving aside model-theoretic considerations, we are all familiar with

        he wanted me to believe P so he said P
        he wanted me to believe not P so he said P because he thought
           that I would think that he said P just for me to believe P
           and not believe it
        and so on ...

in spy stories.

The observation that "we need something that models human cognition
closely enough..." begs the question of what human cognition looks
like. (Silly me, it looks like STLH's program, of course.)  STLH also
forgets that is often better for a conversation partner (whether man
or machine) to say "I don't understand" than to go on saying "yes,
yes, yes ..." and get it all wrong, as people (and machines) that are
trying to disguise their ignorance do.

It is indeed not surprising that "[his] problems are really concerned
with the acquisition of linguistic knowledge." Once every grammatical
framework is thrown out, it is extremely difficult to see how new
linguistic knowledge can be assimilated, whether automatically or even
by programming it in. As to the notion that "everyone is an expert on
the native language", it is similar to the claim that everyone with
working ears is an expert in acoustics.

As to "pernicious behavior", it would be better if STLH would first
put his own house in order: he seems to believe that to work at SRI
one needs to swear eternal hate to the "Schank camp" (whatever that
is); and useful criticism of other people's papers requires at least a
mention of the title and of the objections. A bit of that old battered
scientific protocol would help...

Fernando Pereira
___________________

The level of discussion *has* degenerated somewhat, so let me try
to bring it back up again.  I was originally hoping to stimulate
some debate about certain assumptions involved in NLP, but
instead I seem to see a lot of dogma, which is *very* dismaying.
Young idealistic me thought that AI would be the field where the
most original thought was taking place, but instead everyone
seems to be divided into warring factions, each of whom refuses
to accept the validity of anybody else's approach.  Hardly seems
scientific to me, and certainly other sciences don't evidence
this problem (perhaps there's some fundamental truth here - that
the nature of epistemology and other AI activities are such that
it's very difficult to prevent one's thought from being trapped
into certain patterns - I know I've been caught a couple times,
and it was hard to break out of the habit - more on that later)

As a colleague of mine put it, we seem to be suffering from a
"difference in context".  So let me describe the assumptions
underpinning my theory (yes I do have one):

1. Language is a very fuzzy thing.  More precisely, the set of
sound strings meaningful to a human is almost (if not exactly)
the set of all possible sound strings.  Now, before you flame,
consider:  Humans can get at least *some* understanding out of a
nonsense sequence, especially if they have any expectations about
what they're hearing (this has been demonstrated experimentally)
although it will likely be wrong.  Also, they can understand
mispronounced or misspelled words, sentences with missing words,
sentences with repeated words, sentences with scrambled word
order, sentences with mixed languages (I used to have fun by
speaking English using German syntax, and you can sometimes see
signs using English syntax with "German" words), and so forth.
Language is also used creatively (especially netters!).  Words
are continually invented, metaphors are created and mixed in
novel ways. I claim that there is no rule of grammar that cannot
be violated.  Note that I have said *nothing* about changes of
meaning, nor have I claimed that one could get much of anything
out of a random sequence of words strung together.  I have only
claimed that the set of linguistically valid utterances is
actually a large fuzzy set (in the technical sense of "fuzzy").
If you accept this, the implications for grammar are far-reaching
- in fact, it may be that classical grammar is a curious but
basically irrelevant description of language (however, I'm not
completely convinced of that).

2. Meaning and interpretation are distinct.  Perhaps I should
follow convention and say "s-meaning" and "s-interpretation", to
avoid terminology trouble.  I think it's noncontroversial that
the "true meaning" of an utterance can be defined as the totality
of response to that utterance.  In that case, s-meaning is the
individual-independent portion of meaning (I know, that's pretty
vague.  But would saying that 51% of all humans must agree on a
meaning make it any more precise?  Or that there must be a
predicate to represent that meaning? Who decides which predicate
is appropriate?).  Then s-interpretation is the component that
depends primarily on the individual and his knowledge, etc.

Let's consider an example - "John kicked the bucket."  For most
people, this has two s-meanings - the usual one derived directly
from the words and an idiomatic way of saying "John died".  Of
course, someone may not know the idiom, so they can assign only
one s-meaning.  But as Mr. Pereira correctly points out, there
are an infinitude of s-interpretations, which will completely
vary from individual to individual.  Most can be derived from the
s-meaning, for instance the convoluted inferences about belief
and intention that Mr. Pereira gave.  On the other hand, I don't
normally make those s-interpretations, and a "naive" person might
*never* do so.  Other parts of the s-interpretation could be (if
the second s-meaning above was intended) that the speaker tends
to be rather blunt; certainly a part of the response to the
utterance, but is less clearly part of a "meaning".  Even s-
meanings are pretty volatile though - to use another spy story
example, the sentence might actually be a code phrase with a
completely arbitrary meaning!

3. Cognitive science is relevant to NLP.  Let me be the first to
say that all of its results are at best suspect.  However,  the
apparent inclination of many AI people to regard the study of
human cognition as "unscientific" is inexplicable.  I won't claim
that my program defines human cognition, since that degree of
hubris requires at least a PhD  :-) .  But cognitive science does
have useful results, like the aforementioned result about making
sense out of nonsense.  Also, lot of common-sense results can be
more accurately described by doing experiments.  "Don't think of
a zebra for the next ten minutes" - my informal experimentation
indicates that *nobody* is capable - that seems to say a lot
about how humans operate.  Perhaps cognitive science gets a bad
review because much of it is Gedanken experiments;  I don't need
tests on a thousand subjects to know that most kinds of
ungrammaticality (such as number agreement) are noticeable, but
rarely affect my understanding of a sentence.  That's why I say
that humans are experts at their own languages - we all (at least
intuitively) understand the different parts of speech and how
sentences are put together, even though we have difficulty
expressing that knowledge (sounds like the knowledge engineer's
problems in dealing with experts!).  BTW, we *have* had a non-
expert (a CS undergrad) add knowledge to our NLP system, and the
folks at Berkeley have reported similar results [Wilensky81].

4.  Theories should reflect reality.  This is especially
important because the reverse is quite pernicious - one ignores
or discounts information not conforming to one's theories.  The
equations of motion are fine for slow-speed behavior, but fail as
one approaches c (the language or the velocity? :-) ).  Does this
mean that Lorenz contractions are experimental anomalies?  The
grammar theory of language is fine for very restricted subsets of
language, but is less satisfactory for explaining the phenomena
mentioned in 1., nor does it suggest how organisms *learn*
language.  Mr. Pereira's suggestion that I do not have any kind
of theoretical basis makes me wonder if he knows what Phrase
Analysis *is*, let alone its justification.  Wilensky and Arens
of UCB have IJCAI-81 papers (and tech reports) that justify the
method much better than I possibly could.  My own improvement was
to make it follow multiple lines of parsing (have to be contrite
on this; I read Winograd's new book recently and what I have is
really a sort of active chart parser;  also noticed that he gives
nary a mention to Phrase Analysis, which is inexcusable - that's
the sort of thing I mean by "warring factions").

4a.  Reflecting reality means "all of it" or (less preferable)
"as much as possible".  Most of the "soft sciences" get their bad
reputation by disregarding this principle, and AI seems to have a
problem with that also.  What good is a language theory that
cannot account for language learning, creative use of language,
and the incredible robustness of language understanding?  The
definition of language by grammar cannot properly explain these -
the first because of results (again mentioned by Winograd) that
children receive almost no negative examples, and that a grammar
cannot be learned from positive examples alone, the third because
the grammar must be extended and extended until it recognizes all
strings as valid.  So perhaps the classical notion of grammar is
like classical mechanics - useful for simple things, but not so
good for photon drives or complete NLP systems.  The basic
notions in NLP have been thoroughly investigated;

IT'S TIME TO DEVELOP THEORIES THAT CAN EXPLAIN *ALL* ASPECTS OF
LANGUAGE BEHAVIOR!


5. The existence of "infinite garden-pathing".  To steal an
example from [Wilensky80],

        John gave Mary a piece of his.........................mind.

Only the last word disambiguates the sentence.  So now, what did
*you* fill in, before you read that last word?  There's even more
interesting situations.  Part of my secret research agenda (don't
tell Boeing!) has been the understanding of jokes, particularly
word plays.  Many jokes are multi-sentence versions of garden-
pathing, where only the punch line disambiguates.  A surprising
number of crummy sitcoms can get a whole half-hour because an
ambiguous sentence is interpreted differently by two people (a
random thought - where *did* this notion of sentence as
fundamental structure come from?  Why don't speeches and
discourses have a "grammar" precisely defining *their*
structure?).  In general, language is LR(lazy eight).

Miscellaneous comments:

This has gotten pretty long (a lot of accusations to respond
to!), so I'll save the discussion of AI dogma, fads, etc for
another article.

When I said that "problems are really concerned with the
acquisition of linguistic knowledge", that was actually an
awkward way to say that, having solved the parsing problem,  my
research interests switched to the implementation of full-scale
error correction and language learning (notice that Mr. Pereira
did not say "this is ambiguous - what did you mean?", he just
assumed one of the meanings and went on from there.  Typical
human language behavior, and inadequately explained by most
existing theories...).  In fact, I have a detailed plan for
implementation, but grad school has interrupted that and it may
be a while before it gets done.  So far as I can tell, the
implementation of learning will not be unusually difficult.  It
will involve inductive learning, manipulation of analogical
representations to acquire meanings ("an mtrans is like a ptrans,
but with abstract objects"....), and other good things.  The
nonrestrictive nature of Phrase Analysis seems to be particularly
well-suited to language knowledge acquisition.

Thanks to Winograd (really quite a good book, but biased) I now
know what DCG's are (the paper I referred to before was
[Pereira80]).  One of the first paragraphs in that paper was
revealing.  It said that language was *defined* by a grammar,
then proceeded from there.  (Different assumptions....) Since
DCG's were compared only to ATN's, it was of course easy to show
that they were better (almost any formalism is better than one
from ten years before, so that wasn't quite fair).  However, I
fail to see any important distinction between a DCG and a
production rule system with backtracking.  In that case, a DCG is
really a special case of a Phrase Analysis parser  (I did at one
time tinker with the notion of compiling phrase rules into OPS5
rules, but OPS5 couldn't manage it very well - no capacity for
the parallelism that my parser needed).  I am of course
interested in being contradicted on any of this.

Mr. Pereira says he doesn't know what the "Schank camp" is.  If
that's so then he's the only one in NLP who doesn't.  I have
heard some highly uncomplimentary comments about Schank and his
students.  But then that's the price for going against
conventional wisdom...

Sorry for the length, but it *was* time for some light rather
than heat!  I have refrained from saying much of anything about
my theories of language understanding, but will post details if
accusations warrant :-)

                                Theoretically yours*,
                                Stan (the leprechaun hacker) Shebs
                                utah-cs!shebs

* love those double meanings!

[Pereira80]  Pereira, F.C.N., and Warren, D.H.D. "Definite Clause
    Grammars for Language Analysis - A Survey of the Formalism and
    a Comparison with Augmented Transition Networks", Artificial
    Intelligence 13 (1980), pp 231-278.

[Wilensky80] Wilensky, R. and Arens, Y.  PHRAN: A Knowledge-based
    Approach to Natural Language Analysis (Memorandum No.
    UCB/ERL M80/34).  University of California, Berkeley, 1980.

[Wilensky81] Wilensky, R. and Morgan, M.  One Analyzer for Three
    Languages (Memorandum No. UCB/ERL M81/67). University of California,
    Berkeley, 1981.

[Winograd83] Winograd, T.  Language as a Cognitive Process, vol. 1: Syntax.
    Addison-Wesley, 1983.