[net.ai] Understanding speech versus hearing words

AFB%MIT-OZ@MIT-MC.ARPA (08/29/84)

From:  Aaron F. Bobick <AFB%MIT-OZ@MIT-MC.ARPA>

About speech recognition:

        From: Sidney Markowitz <sidney%MIT-OZ@MIT-MC.ARPA>

        It turns out that even to separate the syllables in continuous speech
        you need to have some understanding of what the speaker is talking
        about! You can discover this for yourself by trying to hear the sounds
        of the words when someone is speaking a foreign language. You can't
        even repeat them correctly as nonsense syllables.
          What this implies is an approach to speech recognition that goes
        beyond pattern recognition to include understanding of utterances.
        This in turn implies that the system has some understanding of the
        "world view" of the speaker, i.e., common sense knowledge and the
        probable intentions of the speaker.....

Many psycho-lingists would dispute this.  The problem with the foreign
language example is that you don't recognize WORDS, not that you don't
understand the utterance (for now let us define understanding as
building some sort of SEMANTIC model, the details don't matter).
Consider the classic: "Green ideas sleep furiously."  I doubt one can
"understand" this in any plausible way yet it's encoding is easy.
Even if one removes grammar and is listening to a randomized listing
of Websters dictionary, one can easily parse the string into syllables
and words.

In fact, *except under noise conditions much worse than normal
conversation*, there is psycho-linguistic evidence that context does
not greatly affect word recognition by humans in terms of the parsing
of the input signal.  ....

(I am over simplifying a little; there is also evidence that
context can help you make judgements about incoming words and
syllables.  However, this may be a post-access phenomena, sort of a
surprise effect when an anomalous word or syllable is encountered; the
jury is still out.   Regardless, it is certainly reasonable to consider
a context independent word recognition system. )

.....  Therefore, it is clearly possible to consider speech
*recognition* as separate from understanding.  Hearsay (I or II) does
not; some psychologists (and, by the way, many AI speech hackers) do.

Stuck in the middle again ......  aaron bobick (afb%mit-oz@mit-mc)

polard@fortune.UUCP (Henry Polard) (09/05/84)

<fowniymz for dh6 layn iyt6r>

	Which hip was burned?
	Which ship was burned?
	Which chip was burned?
and 	Which Chip was spurned?

all sound the same when spoken at the speed of conversational speech.  This 
is evidence that in order to recognize words in continuous speech
you (and presumably a speech-recognition apparatus) need to understand 
what the speaker is talking about.
	There seem to be two reasons why understanding is necessary 
for word recognition in continuous speech:
	1. The existence of homonyms.  This is why "It's a good read." 
sounds the same as: "It's a good reed," and why the two sentences 
could not be distinguished without a knowledge of the context.
	2. Sandhi, or sound changes at word boundaries. The sounds at the
 end of a word tend to blend into the sounds at the beginning of the next
 word in conversation,  making words sound as if they ran into each other 
and making words sound differently than they would when said in isolation.
	The resulting ambiguities are ususally resolved by context.
	Speech rarely occurs without some sort of context, and even then
the first thing that usually happens is to establish a context for what
is to follow.
	To paraphrase Edsgar Dijkstra: "Asking whether computers will
understand speech is like asking whether submarines swim."

-- 
Henry Polard (You bring the flames - I'll bring the marshmallows.)
{ihnp4,cbosgd,amd}!fortune!polard