[comp.ai] Practical effects of AI

rolandi@gollum.Columbia.NCR.COM (rolandi) (10/31/87)

In article <6667@ut-ngp.UUCP> you write:
>I have a question for people:
>   What practical effects do you think AI will have in the next ten
>years?
>........[etc...]

I 'd say that AI will have at least two real and immediate effects.

	1) given AI programming tools and techniques, many processes
	   previously assumed to be too complicated for automation
	   will be automated.  the automation of these tasks will
    	   take less time given the productivity gains that AI tools
	   can provide. expert systems will be common place within
	   the DP/MIS world.

	2) AI will make computers easier to use and therefore extend
	   their usefulness to non-computer people.

Regarding #2 above...

It would seem to me that the single greatest practical advancement for
AI will be in speaker independent, continuous speech recognition. This
is NOT to imply total computer "comprehension" in the sense of being
able to carry on an unrestricted conversation.  I am NOT referring to
abilities to process natural language.  That, is a long way off, and
will most likely come about as a function of a redefinition of the NLP
problem in terms of a machine learning issue.  What "simple" speaker
independent, continuous speech recognition will provide is the ultimate
alternative to keyboard entry.  This would thereby provide all of  
the functionality of current technology to anyone who could pronounce
the commands.  This issue will have a major impact on the industry and
on society.  By making "every body" a user, more machines will be sold,
and because "every body" will have different needs, tha range of  
automation will be widely extended.


-w.rolandi
ncrcae!gollum!rolandi

disclaimer: i speak for no one but myself and usually no one else is
	    listening.

kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) (11/01/87)

In article <12@gollum.Columbia.NCR.COM>, rolandi@gollum.Columbia.NCR.COM (rolandi) writes:
> 
> In article <6667@ut-ngp.UUCP> you write:
> >I have a question for people:
> >   What practical effects do you think AI will have in the next ten
> >years?
> >........[etc...]

> It would seem to me that the single greatest practical advancement for
> AI will be in speaker independent, continuous speech recognition. This
> is NOT to imply total computer "comprehension" in the sense of being
> able to carry on an unrestricted conversation.  I am NOT referring to
> abilities to process natural language.  That, is a long way off, and
> will most likely come about as a function of a redefinition of the NLP
> problem in terms of a machine learning issue.  What "simple" speaker
> independent, continuous speech recognition will provide is the ultimate
> alternative to keyboard entry.  This would thereby provide all of  
> the functionality of current technology to anyone who could pronounce
> the commands.  This issue will have a major impact on the industry and
> on society.  By making "every body" a user, more machines will be sold,
> and because "every body" will have different needs, tha range of  
> automation will be widely extended.
> 

Those of us who work on speech will be very encourage by this enthusiasm.
However,

(1) Speaker-independent continuous speech is much farther from reality
    than some companies would have you think.  Currently, the best
    speech recognizer is IBM's Tangora, which makes about 6% errors
    on a 20,000 word vocabulary.  But the Tangora is for speaker-
    dependent, isolate-words, grammar-guided recognition in a benign
    environment.  Each of these four constraints cuts the error rate 
    by 3 or more times if used independently.  I don't know how well
    they will do if you remove all four constraints, but I would guess
    about 70% error rate.  So while speech recognition has made a lot 
    of advancements, it is still far from usable in the application you 
    mentioned.
(2) Spoken English is a harder problem than NLP of written English.
    If you make the recognizer too constrained (small vocabulary, fixed
    syntax, etc.), it will be harder to use than a keyboard.  If you don't, 
    you have to understand spoken English, which is really hard.
(3) If this product were to materialize, it is far from clear that it
    would be an advancement for AI.  At present, the most promising 
    techniques are based on stochastic modeling, pattern recognition, 
    information theory, signal processing, auditory modeling, etc..
    So far, very few traditional AI techniques are used in, or work well 
    for speech recognition.  
> 
> -w.rolandi
> ncrcae!gollum!rolandi

Kai-Fu Lee
Computer Science Department
Carnegie-Mellon University

gt@hpfcmp.HP.COM (George Tatge) (11/03/87)

>
>Those of us who work on speech will be very encourage by this enthusiasm.
>However,
>
>(1) Speaker-independent continuous speech is much farther from reality
>    than some companies would have you think.  Currently, the best
>    speech recognizer is IBM's Tangora, which makes about 6% errors
>    on a 20,000 word vocabulary.  But the Tangora is for speaker-
>    dependent, isolate-words, grammar-guided recognition in a benign
>    environment.  Each of these four constraints cuts the error rate 
>    by 3 or more times if used independently.  I don't know how well
>    they will do if you remove all four constraints, but I would guess
>    about 70% error rate.  So while speech recognition has made a lot 
>    of advancements, it is still far from usable in the application you 
>    mentioned.
>
>Kai-Fu Lee
>Computer Science Department
>Carnegie-Mellon University
>----------

Just curious what the definition of "best" is.  For example, I have seen
6% error rates and better on grammar specific, speaker dependent, continuous
speech recognition.  I would guess that for some applications this is
better than the "best" described above.

George (floundering in superlative ambiguity) Tatge

kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) (11/08/87)

In article <930001@hpfcmp.HP.COM>, gt@hpfcmp.HP.COM (George Tatge) writes:
> >
> >(1) Speaker-independent continuous speech is much farther from reality
> >    than some companies would have you think.  Currently, the best
> >    speech recognizer is IBM's Tangora, which makes about 6% errors
> >    on a 20,000 word vocabulary.  But the Tangora is for speaker-
> >    dependent, isolate-words, grammar-guided recognition in a benign
> >    environment. . . .
> >
> >Kai-Fu Lee
> 
> Just curious what the definition of "best" is.  For example, I have seen
> 6% error rates and better on grammar specific, speaker dependent, continuous
> speech recognition.  I would guess that for some applications this is
> better than the "best" described above.
> 

"Best" is not measured in terms of error rate alone.  More effort and
new technologies have gone into the IBM's system than any other system, 
and I believe that it will do better than any other system on a comparable
task.  I guess this definition is subjective, but I think if you asked other 
speech researchers, you will find that most people believe the same.

I know many commercial (and research) systems have lower error rates
than 6%.  But you have to remember that the IBM system works on a 20,000
word vocabulary, and their grammar is a very loose one, accepting
arbitrary sentences in office correspondences.  Their grammar has a
perplexity (number of choices at each decision point, roughly speaking)
of several hundred.  Nobody else has such a large vocabulary or such
a difficult grammar.  

IBM has experimented with tasks like the one you mentioned.  In 1978,
they tried a 1000-word task with a very tight grammar (perplexity = 5 ?),
the same task CMU used on Hearsay and Harpy.  They achieved 0.1% error
rate.

> George (floundering in superlative ambiguity) Tatge

Kai-Fu Lee

jpdres10@usl-pc.UUCP (Green Eric Lee) (11/09/87)

In message <267@PT.CS.CMU.EDU>, kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) says:
>In article <12@gollum.Columbia.NCR.COM>, rolandi@gollum.Columbia.NCR.COM (rolandi) writes:
>> It would seem to me that the single greatest practical advancement for
>> AI will be in speaker independent, continuous speech recognition. This
>(3) If this product were to materialize, it is far from clear that it
>    would be an advancement for AI.  At present, the most promising 
>    techniques are based on stochastic modeling, pattern recognition, 
>    information theory, signal processing, auditory modeling, etc..
>    So far, very few traditional AI techniques are used in, or work well 
>    for speech recognition.  

Very few traditional AI techniques have resulted in much at all :-)
(sorry, I couldn't help it).

But seriously, considering that sciences such as physics and
mathematics have been ongoing for centuries, can we REALLY say that AI
has "traditional techniques"? Certainly there is a large library of
techniques available to AI researchers today, but 30 years is hardly
a long enough time to call something "traditional". Remembering how
going beyond the "traditional" resulted in many breakthroughs in
mathematics and physics, saying that "it is far from clear that it
would be an advancement for AI" presupposes that one defines AI as
"that science which uses certain traditional methods", which, I
submit, is false.

--
Eric Green  elg@usl.CSNET       from BEYOND nowhere:
{ihnp4,cbosgd}!killer!elg,      P.O. Box 92191, Lafayette, LA 70509
{ut-sally,killer}!usl!elg     "there's someone in my head, but it's not me..."

lee@uhccux.UUCP (Greg Lee) (11/14/87)

In article <244@usl-pc.UUCP> jpdres10@usl-pc.UUCP (Green Eric Lee) writes:
>In message <267@PT.CS.CMU.EDU>, kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) says:
>>In article <12@gollum.Columbia.NCR.COM>, rolandi@gollum.Columbia.NCR.COM (rolandi) writes:
>>> It would seem to me that the single greatest practical advancement for
>>> ...
>>    So far, very few traditional AI techniques are used in, or work well 
>>    for speech recognition.  
>
>Very few traditional AI techniques have resulted in much at all :-)

	I suppose that applying AI to speech recognition would involve
making use of what we know about the perceptual and cognitive nature
of language sound-structures -- i.e. the results of phonology.  I don't
know that this has ever been tried.  If it has, could someone supply
references?  I'd be very interested to know what has been done in this
direction.
		Greg Lee, lee@uhccux.uhcc.hawaii.edu

kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) (11/16/87)

In article <244@usl-pc.UUCP>, jpdres10@usl-pc.UUCP (Green Eric Lee) writes:
> But seriously, considering that sciences such as physics and
> mathematics have been ongoing for centuries, can we REALLY say that AI
> has "traditional techniques"? . . .  "it is far from clear that it
> would be an advancement for AI" presupposes that one defines AI as
> "that science which uses certain traditional methods", which, I
> submit, is false.
> 

By "traditional techniques", I was referring to the older popular
techniques in AI, such as expert systems, predicate calculus, semantic
networks, etc.  Also, I was trying to exclude neural networks,
which may be promising for speech recognition.  I have heard of
"traditionalist vs. connectionist AI", and that is why I used the
term "traditional techniques".

Kai-Fu Lee
Computer Science Dept.
Carnegie-Mellon University

P.S. - I did not say that AI is a science.

goldfain@osiris.cso.uiuc.edu.UUCP (11/18/87)

I would like to echo the sentiment in Eric Green's comment.

Let  us NOT try to define  AI in terms  of techniques.  It  is defined  by its
domain of inquiry, and that clearly includes speech recognition.  I do not for
a  moment believe   that   continuous speaker-independent speech  recognition,
if/when it is achieved,  will be considered primarily  a  work of physics.  No
matter how it is achieved, that is just not a viable statement.

                                                            - Mark Goldfain

mmt@dciem.UUCP (Martin Taylor) (11/19/87)

--        I suppose that applying AI to speech recognition would involve
--making use of what we know about the perceptual and cognitive nature
--of language sound-structures -- i.e. the results of phonology.  I don't
--know that this has ever been tried.  If it has, could someone supply
--references?  I'd be very interested to know what has been done in this
--direction.
--                Greg Lee, lee@uhccux.uhcc.hawaii.edu

I have been unable in a quick search to come up with exact references,
but Alinat, working at Thomson-DASM in Cros-de-Cagnes, France, has had
quite successful results using the phonological structure of French
as his basic database.  As I remember it, he does a quick-and-dirty
analysis of the phonetic structure of the incoming speech signal
(classes in order of preference), and then uses fairly complex
phonotactic rules along with the (fairly strict) syntax of the
permitted sentence structure to produce rather good talker-independent
results for native French talkers.  Non-native but reasonably
fluent talkers are not well recognized, probably because they
don't conform to the French phonotactic rules.  Alinat was working
essentially alone for a long time, but I understand he is now cooperating
with CRIN at the University of Nancy.


All the above is from memory, so details may be wrong.  If I find
the references, I'll post them.  If you are really interested, you
should be able to get hold of Alinat from the information above
(except I'm not sure whether it may be Cagnes-sur-Mer rather than
Cros de Cagnes; they are contiguous).
-- 

Martin Taylor
{allegra,linus,ihnp4,floyd,ubc-vision}!utzoo!dciem!mmt
{uw-beaver,qucis,watmath}!utcsri!dciem!mmt
mmt@zorac.arpa
Magic is just advanced technology ... so is intelligence.  Before computers,
the ability to do arithmetic was proof of intelligence.  What proves
intelligence now?