csirpd@quagga.ru.ac.za (Paul Ducklin) (12/18/90)
I'm trying to get a handle on the state-of-the-art in speech recognition systems. Could anyone in netland let me have some idea of (a) where we are now and (b) where we'll be in 2-5 years time re... * for a specific voice, and a non-mega$ desktop machine, what's a good recognition vocabulary? 5000 words? 10000 words? I hear that there's a product out called "Dragon Dictate" for which a 30000 word vocabulary has been mentioned (h/w reqd. = 386-type). Is this genuwyne? What happens if you get a cold? Will you need to train all 30000 words again? What sort of recognition speed is likely on a non-Cray? * for "generic voice" (eg: all American-speaking females), what is a good vocabulary? What sort of reliability is attainable? * what's good vis-a-vis "natural" or continuous speech? How capable are recognition systems at handling speech without staccato-type interword pauses? Paul Ducklin ------------ CSIR, Pretoria, RSA ------------
schultz@halley.tmc.edu (John C. Schultz) (12/20/90)
In article <1990Dec17.202616.3021@quagga.ru.ac.za> csirpd@quagga.ru.ac.za (Paul Ducklin) writes: > > * for a specific voice, and a non-mega$ desktop machine, what's > a good recognition vocabulary? 5000 words? 10000 words? I > > * for "generic voice" (eg: all American-speaking females), what is > a good vocabulary? What sort of reliability is attainable? > > * what's good vis-a-vis "natural" or continuous speech? How > capable are recognition systems at handling speech without > staccato-type interword pauses? I would like to add to this list of questions * How reliable is voice recognition in noisy environments with respect to vocabulary size? For example if the system only needs to recognize 50 or so words, is that more robust than a 5000 word system? How much more reliable? Don't know answers would be preferable to wrong answers. -- John C. Schultz EMAIL: schultz@halley.est.3m.com 3M Company, Building 518-01-1 WRK: +1 (612) 733-4047 1865 Woodlane Drive, Dock 4, Woodbury, MN 55125
ray@ariel.ucs.unimelb.edu.au (Douglas Ray) (12/24/90)
From article <1990Dec19.215611.10659@mmm.serc.3m.com>, by schultz@halley.tmc.edu (John C. Schultz): > In article <1990Dec17.202616.3021@quagga.ru.ac.za> csirpd@quagga.ru.ac.za (Paul Ducklin) writes: >> >> * for a specific voice, and a non-mega$ desktop machine, what's >> a good recognition vocabulary? 5000 words? 10000 words? I >> >> * for "generic voice" (eg: all American-speaking females), what is >> a good vocabulary? What sort of reliability is attainable? >> >> * what's good vis-a-vis "natural" or continuous speech? How >> capable are recognition systems at handling speech without >> staccato-type interword pauses? > > * How reliable is voice recognition in noisy environments with > respect to vocabulary size? For example if the system only > needs to recognize 50 or so words, is that more robust than > a 5000 word system? How much more reliable? Don't know answers > would be preferable to wrong answers. initial response: I'm not qualified in this field, but if I haven't misinterpreted the figures, here's summaries from papers presented at the 3rd international conference on Speech Science and Technology, Melbourne, Australia, November 1990. General attitude at conference was to quote "small" vocabs as 20 - 200 words, and large as 500 - 1000 words. [only first authors quoted] C. Rowles (Telecom Australia) state of art for speaker independent, continuous speech, modest vocab. (200-500w ?): 95% word recognition. This 95% figure comes up a lot: W.A. Smith (Waikato, N.Z.) presented a feature selection algorithm for speaker independant, isolated word recognition, vocab. 20w: 95% word recognition Tracy Clark (Canturbury, N.Z.) compares various methods in isolation, comments on accent dependance; speaker dependant, isolated word, 10w vocab.: best up to 96% word recognition but for larger vocabs you can't expect this: Tony Robinson (Cambridge, U.K.) Preliminary work on word recognition without grammatic constraints: speaker independant, continuous speech, using the DARPA 1000 word Resource Management Task: 52.1% word recognition rate (43.3% accuracy), but quotes the Sphinx system at 81.9%. There was also some work on language recognition, eg: Walter Weigel (Munich, Germany) speaker independant, continuous speech, 132w vocab., 40 rule context-free grammar subset of German: 74% sentence recognition [The conference proceedings contain around 80 papers in over 500 pp.; inquiries to the Secretary, Australian Speech Science and Technology Association, GPO Box 143, Canberra ACT 2601, Australia]
Eric.Thayer@cs.cmu.edu (Eric H. Thayer) (01/03/91)
In article <377@ariel.ucs.unimelb.edu.au> ray@ariel.ucs.unimelb.edu.au (Douglas Ray) writes: > but for larger vocabs you can't expect this: > > Tony Robinson (Cambridge, U.K.) > Preliminary work on word recognition without grammatic constraints: > speaker independant, continuous speech, using the DARPA 1000 word > Resource Management Task: 52.1% word recognition rate (43.3% accuracy), > but quotes the Sphinx system at 81.9%. I am not sure what conditions he is quoting, but the 'best' results are above 95% correct for the SPHINX system on the resource management task (sprk ind., continuous). It is generally not very useful to quote raw performance numbers because there are many factors which improve/degrade recognition accuracy. ---------------------------------- Replies can have NeXT attachments in them Phone: (412)268-7679