rolandi@gollum.Columbia.NCR.COM (rolandi) (10/31/87)
In article <6667@ut-ngp.UUCP> you write: >I have a question for people: > What practical effects do you think AI will have in the next ten >years? >........[etc...] I 'd say that AI will have at least two real and immediate effects. 1) given AI programming tools and techniques, many processes previously assumed to be too complicated for automation will be automated. the automation of these tasks will take less time given the productivity gains that AI tools can provide. expert systems will be common place within the DP/MIS world. 2) AI will make computers easier to use and therefore extend their usefulness to non-computer people. Regarding #2 above... It would seem to me that the single greatest practical advancement for AI will be in speaker independent, continuous speech recognition. This is NOT to imply total computer "comprehension" in the sense of being able to carry on an unrestricted conversation. I am NOT referring to abilities to process natural language. That, is a long way off, and will most likely come about as a function of a redefinition of the NLP problem in terms of a machine learning issue. What "simple" speaker independent, continuous speech recognition will provide is the ultimate alternative to keyboard entry. This would thereby provide all of the functionality of current technology to anyone who could pronounce the commands. This issue will have a major impact on the industry and on society. By making "every body" a user, more machines will be sold, and because "every body" will have different needs, tha range of automation will be widely extended. -w.rolandi ncrcae!gollum!rolandi disclaimer: i speak for no one but myself and usually no one else is listening.
kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) (11/01/87)
In article <12@gollum.Columbia.NCR.COM>, rolandi@gollum.Columbia.NCR.COM (rolandi) writes: > > In article <6667@ut-ngp.UUCP> you write: > >I have a question for people: > > What practical effects do you think AI will have in the next ten > >years? > >........[etc...] > It would seem to me that the single greatest practical advancement for > AI will be in speaker independent, continuous speech recognition. This > is NOT to imply total computer "comprehension" in the sense of being > able to carry on an unrestricted conversation. I am NOT referring to > abilities to process natural language. That, is a long way off, and > will most likely come about as a function of a redefinition of the NLP > problem in terms of a machine learning issue. What "simple" speaker > independent, continuous speech recognition will provide is the ultimate > alternative to keyboard entry. This would thereby provide all of > the functionality of current technology to anyone who could pronounce > the commands. This issue will have a major impact on the industry and > on society. By making "every body" a user, more machines will be sold, > and because "every body" will have different needs, tha range of > automation will be widely extended. > Those of us who work on speech will be very encourage by this enthusiasm. However, (1) Speaker-independent continuous speech is much farther from reality than some companies would have you think. Currently, the best speech recognizer is IBM's Tangora, which makes about 6% errors on a 20,000 word vocabulary. But the Tangora is for speaker- dependent, isolate-words, grammar-guided recognition in a benign environment. Each of these four constraints cuts the error rate by 3 or more times if used independently. I don't know how well they will do if you remove all four constraints, but I would guess about 70% error rate. So while speech recognition has made a lot of advancements, it is still far from usable in the application you mentioned. (2) Spoken English is a harder problem than NLP of written English. If you make the recognizer too constrained (small vocabulary, fixed syntax, etc.), it will be harder to use than a keyboard. If you don't, you have to understand spoken English, which is really hard. (3) If this product were to materialize, it is far from clear that it would be an advancement for AI. At present, the most promising techniques are based on stochastic modeling, pattern recognition, information theory, signal processing, auditory modeling, etc.. So far, very few traditional AI techniques are used in, or work well for speech recognition. > > -w.rolandi > ncrcae!gollum!rolandi Kai-Fu Lee Computer Science Department Carnegie-Mellon University
gt@hpfcmp.HP.COM (George Tatge) (11/03/87)
> >Those of us who work on speech will be very encourage by this enthusiasm. >However, > >(1) Speaker-independent continuous speech is much farther from reality > than some companies would have you think. Currently, the best > speech recognizer is IBM's Tangora, which makes about 6% errors > on a 20,000 word vocabulary. But the Tangora is for speaker- > dependent, isolate-words, grammar-guided recognition in a benign > environment. Each of these four constraints cuts the error rate > by 3 or more times if used independently. I don't know how well > they will do if you remove all four constraints, but I would guess > about 70% error rate. So while speech recognition has made a lot > of advancements, it is still far from usable in the application you > mentioned. > >Kai-Fu Lee >Computer Science Department >Carnegie-Mellon University >---------- Just curious what the definition of "best" is. For example, I have seen 6% error rates and better on grammar specific, speaker dependent, continuous speech recognition. I would guess that for some applications this is better than the "best" described above. George (floundering in superlative ambiguity) Tatge
kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) (11/08/87)
In article <930001@hpfcmp.HP.COM>, gt@hpfcmp.HP.COM (George Tatge) writes: > > > >(1) Speaker-independent continuous speech is much farther from reality > > than some companies would have you think. Currently, the best > > speech recognizer is IBM's Tangora, which makes about 6% errors > > on a 20,000 word vocabulary. But the Tangora is for speaker- > > dependent, isolate-words, grammar-guided recognition in a benign > > environment. . . . > > > >Kai-Fu Lee > > Just curious what the definition of "best" is. For example, I have seen > 6% error rates and better on grammar specific, speaker dependent, continuous > speech recognition. I would guess that for some applications this is > better than the "best" described above. > "Best" is not measured in terms of error rate alone. More effort and new technologies have gone into the IBM's system than any other system, and I believe that it will do better than any other system on a comparable task. I guess this definition is subjective, but I think if you asked other speech researchers, you will find that most people believe the same. I know many commercial (and research) systems have lower error rates than 6%. But you have to remember that the IBM system works on a 20,000 word vocabulary, and their grammar is a very loose one, accepting arbitrary sentences in office correspondences. Their grammar has a perplexity (number of choices at each decision point, roughly speaking) of several hundred. Nobody else has such a large vocabulary or such a difficult grammar. IBM has experimented with tasks like the one you mentioned. In 1978, they tried a 1000-word task with a very tight grammar (perplexity = 5 ?), the same task CMU used on Hearsay and Harpy. They achieved 0.1% error rate. > George (floundering in superlative ambiguity) Tatge Kai-Fu Lee
jpdres10@usl-pc.UUCP (Green Eric Lee) (11/09/87)
In message <267@PT.CS.CMU.EDU>, kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) says: >In article <12@gollum.Columbia.NCR.COM>, rolandi@gollum.Columbia.NCR.COM (rolandi) writes: >> It would seem to me that the single greatest practical advancement for >> AI will be in speaker independent, continuous speech recognition. This >(3) If this product were to materialize, it is far from clear that it > would be an advancement for AI. At present, the most promising > techniques are based on stochastic modeling, pattern recognition, > information theory, signal processing, auditory modeling, etc.. > So far, very few traditional AI techniques are used in, or work well > for speech recognition. Very few traditional AI techniques have resulted in much at all :-) (sorry, I couldn't help it). But seriously, considering that sciences such as physics and mathematics have been ongoing for centuries, can we REALLY say that AI has "traditional techniques"? Certainly there is a large library of techniques available to AI researchers today, but 30 years is hardly a long enough time to call something "traditional". Remembering how going beyond the "traditional" resulted in many breakthroughs in mathematics and physics, saying that "it is far from clear that it would be an advancement for AI" presupposes that one defines AI as "that science which uses certain traditional methods", which, I submit, is false. -- Eric Green elg@usl.CSNET from BEYOND nowhere: {ihnp4,cbosgd}!killer!elg, P.O. Box 92191, Lafayette, LA 70509 {ut-sally,killer}!usl!elg "there's someone in my head, but it's not me..."
lee@uhccux.UUCP (Greg Lee) (11/14/87)
In article <244@usl-pc.UUCP> jpdres10@usl-pc.UUCP (Green Eric Lee) writes: >In message <267@PT.CS.CMU.EDU>, kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) says: >>In article <12@gollum.Columbia.NCR.COM>, rolandi@gollum.Columbia.NCR.COM (rolandi) writes: >>> It would seem to me that the single greatest practical advancement for >>> ... >> So far, very few traditional AI techniques are used in, or work well >> for speech recognition. > >Very few traditional AI techniques have resulted in much at all :-) I suppose that applying AI to speech recognition would involve making use of what we know about the perceptual and cognitive nature of language sound-structures -- i.e. the results of phonology. I don't know that this has ever been tried. If it has, could someone supply references? I'd be very interested to know what has been done in this direction. Greg Lee, lee@uhccux.uhcc.hawaii.edu
kfl@SPEECH2.CS.CMU.EDU (Kai-Fu Lee) (11/16/87)
In article <244@usl-pc.UUCP>, jpdres10@usl-pc.UUCP (Green Eric Lee) writes: > But seriously, considering that sciences such as physics and > mathematics have been ongoing for centuries, can we REALLY say that AI > has "traditional techniques"? . . . "it is far from clear that it > would be an advancement for AI" presupposes that one defines AI as > "that science which uses certain traditional methods", which, I > submit, is false. > By "traditional techniques", I was referring to the older popular techniques in AI, such as expert systems, predicate calculus, semantic networks, etc. Also, I was trying to exclude neural networks, which may be promising for speech recognition. I have heard of "traditionalist vs. connectionist AI", and that is why I used the term "traditional techniques". Kai-Fu Lee Computer Science Dept. Carnegie-Mellon University P.S. - I did not say that AI is a science.
goldfain@osiris.cso.uiuc.edu.UUCP (11/18/87)
I would like to echo the sentiment in Eric Green's comment. Let us NOT try to define AI in terms of techniques. It is defined by its domain of inquiry, and that clearly includes speech recognition. I do not for a moment believe that continuous speaker-independent speech recognition, if/when it is achieved, will be considered primarily a work of physics. No matter how it is achieved, that is just not a viable statement. - Mark Goldfain
mmt@dciem.UUCP (Martin Taylor) (11/19/87)
-- I suppose that applying AI to speech recognition would involve --making use of what we know about the perceptual and cognitive nature --of language sound-structures -- i.e. the results of phonology. I don't --know that this has ever been tried. If it has, could someone supply --references? I'd be very interested to know what has been done in this --direction. -- Greg Lee, lee@uhccux.uhcc.hawaii.edu I have been unable in a quick search to come up with exact references, but Alinat, working at Thomson-DASM in Cros-de-Cagnes, France, has had quite successful results using the phonological structure of French as his basic database. As I remember it, he does a quick-and-dirty analysis of the phonetic structure of the incoming speech signal (classes in order of preference), and then uses fairly complex phonotactic rules along with the (fairly strict) syntax of the permitted sentence structure to produce rather good talker-independent results for native French talkers. Non-native but reasonably fluent talkers are not well recognized, probably because they don't conform to the French phonotactic rules. Alinat was working essentially alone for a long time, but I understand he is now cooperating with CRIN at the University of Nancy. All the above is from memory, so details may be wrong. If I find the references, I'll post them. If you are really interested, you should be able to get hold of Alinat from the information above (except I'm not sure whether it may be Cagnes-sur-Mer rather than Cros de Cagnes; they are contiguous). -- Martin Taylor {allegra,linus,ihnp4,floyd,ubc-vision}!utzoo!dciem!mmt {uw-beaver,qucis,watmath}!utcsri!dciem!mmt mmt@zorac.arpa Magic is just advanced technology ... so is intelligence. Before computers, the ability to do arithmetic was proof of intelligence. What proves intelligence now?