[net.ai] Hearsay II question in AIList Digest V2 #110

sidney%MIT-OZ@MIT-MC.ARPA (08/25/84)

From:  Sidney Markowitz <sidney%MIT-OZ@MIT-MC.ARPA>


    Date: 22 Aug 1984 22:05:18-PDT
    From: doshi%umn-cs.csnet@csnet-relay.arpa
    Subject: Question about HEARSAY-II.

    I have a question about the HEARSAY-II system [Erman et.al.1980].

    What exactly is the HEARSAY system required/supposed to do ?
    i.e. what is the meaning of the phrase :
          "Speech Understanding system"

I am not familiar with the HEARSAY-II system, however I am answering
your question based on the following lines from the quotes you
provided, and some comments of yours that indicate that you are not
familiar with certain points of view common among natural language
researchers. The quotes:

(1)       page 213 : "The HEARSAY-II reconstructs an intention ....      "
(2)                   on the strong syntactic/semantic/task constraints
(3)                       - with a slightly artificial syntax and highly
                            constrained task
(4)                       - tolerating less than 10 % semantic error

  Researchers pretty much agree that in order to understand natural
language, we need an understanding of the meaning and context of the
communication. It is not enough to simply look up words in a
dictionary, and/or apply rules of grammar to sentences. A classic
example is the pair of sentences:   "Time flies like an arrow." and
"Fruit flies like a banana." The problem with speech is even worse --
It turns out that even to separate the syllables in continuous speech
you need to have some understanding of what the speaker is talking
about! You can discover this for yourself by trying to hear the sounds
of the words when someone is speaking a foreign language. You can't
even repeat them correctly as nonsense syllables.
  What this implies is an approach to speech recognition that goes
beyond pattern recognition to include understanding of utterances.
This in turn implies that the system has some understanding of the
"world view" of the speaker, i.e., common sense knowledge and the
probable intentions of the speaker. AI researchers have attempted to
make the problem tractable by restricting the "domain" of a system. A
famous example is the "blocks world" used by Terry Winograd in his
doctoral thesis on a natural langugage understanding system, SHRDLU.
All SHRDLU knew about was its little world of various shapes and
colors of blocks, its robot arm and the possible actions and
interactions of those elements. Given those limitations, and the
additional assumption that anything said to it was either a question
about the state of its world or else a command, Winograd was able to
devise a system in which syntax, semantics and task performance all
interacted. For example, an ambiguity in syntax could be resolved if
only one grammatical interpretation made semantic sense.
 You can see how this approach is implied by the four quotes above.
With this as background, lets proceed to your questions...


    Let me explain my confusion with examples. Does the system do one of the
    following :
          - 1) Accepts speech as input; Then, tries to output what (ever) was
              spoken or might have been spoken ?
          - 2) Or, accept speech as input and UNDERSTAND it ?
    Now, the 1) above is, I think speech RECOGNITION. DARPA did not want just
    that.

    Then, what is(are)  the meaning(s) of UNDERSTAND ?
          - if I say "Alligators can fly", should the system repeat this and
            also tell me that that is "not true"; is this called UNDERSTANDING?
          - if I say "I go house", should the system repeat this and also add
            that there is a "grammetical error"; is this called UNDERSTANDING?
          - Or, if HAYES-ROTH claims  "I am ERMAN", the system should say
            "No, You are not ERMAN" - I dont think that HEARSAY was supposedd
            to do this (it does not have Vision etc). But you will agree that
            that is also UNDERSTANDING. Note that the above claim by
            HAYES-ROTH would be true if :
                  - he had changed his last name
                  - he was merely QUOTING what ERMAN might have said somewhere
                  - etc

            In light of the above examples, what does it mean by
            saying that HEARSAY-II understands speech ?


  The references to "tasks" in the quotes you provided are a clue that
the authors are thinking of "understanding" in terms of the ability to
perform a task that is requested by the speaker. The examples in your
questions are statements that would need to be reframed as tasks. It
is possible that the system could be set up so that a statement like
"Alligators can fly" is an implied command to add that fact to the
knowledge base, perhaps first checking for contradictions. But you
probably ought to think of an example of a restricted task domain
first, and then think about what "understanding" would mean in that
context. For example, given a blocks world domain the system might
respond to a statement such as "Place a blue cube on the red pyramid"
by saying "I can't place anything on top of a pyramid". There's much
that can be done with modelling the speaker's intentions and
assumptions which would affect the sophistication of the resulting
system, but that's the general idea.

-- Sidney Markowitz <sidney%mit-oz@mit-mc.ARPA>

mmt@dciem.UUCP (Martin Taylor) (08/31/84)

================
It turns out that even to separate the syllables in continuous speech
you need to have some understanding of what the speaker is talking
about! You can discover this for yourself by trying to hear the sounds
of the words when someone is speaking a foreign language. You can't
even repeat them correctly as nonsense syllables.
================
I used to believe this myth myself, but my various visits to Europe for
short (1-3 week periods, mostly) trips have convinced me otherwise. There
is no point trying to repeat syllables as nonsense, partly because the
sounds are not in your phonetic vocabulary.  More to the point, syllable
separation definitely preceded understanding.  I HAD to learn to separate
syllables of German long before I could understand anything (I still
understand only a tiny fraction, but now I can parse most sentences
into kernel and bound morphemes because I now know most of the common
bound ones).  My understanding of written German is a little better,
and when I do understand a German sentence, it is because I can transcribe
it into a visual representation with some blanks.

(Incidentally, I also do some research in speech recognition, so I am
well aware of the syllable segmentation problem.  There do exist
segmentation algorithms that correctly segment over 95% of the syllables
in connected speech without any attempt to identify phonemes, let
alone words or the "meaning" of speech.  Mermelstein, now in Montreal,
and Mangold in Ulm, Germany, are names that come to mind.)
-- 

Martin Taylor
{allegra,linus,ihnp4,floyd,ubc-vision}!utzoo!dciem!mmt
{uw-beaver,qucis,watmath}!utcsrgv!dciem!mmt