karl@haddock.ISC.COM (Karl Heuer) (11/05/87)
This isn't a C issue. I am redirecting to comp.misc. In article <10138@brl-adm.ARPA> fbaube@note.nsf.GOV (Fred Baube) writes: >I have the new US Snooze & World Distort, and on p.66 is an IBM ad about >voice recognition. The screen show "Testing one, two, three", complete with >commas! Does anyone know how they do that? The most likely scenario is that they have somebody type the string, complete with commas, at a keyboard. Or (slightly less deceptive) they have carefully arranged their demo so that that particular situation appears as shown, without handling the (impossible) general case. Evidence to the contrary, anyone? Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
stuart@bms-at.UUCP (Stuart D. Gathman) (11/18/87)
Your second guess is almost true. IBM's voice recognition technique is statistical. The machine has a large database of phoneme/timing distributions. Thus, the text is reproduced with punctuation! Unlike AI approaches, the text is not actually "understood" by the machine. It is simply output to match statistically similar utterrances used to build their database. The results are not 100% correct, of course. But you can bet that it gets the example in the ad right every time! I do not work for IBM. I personally think that a neural network approach is the way to go. (Similar in concept, but vastly different in implementation from the statistical route.) -- Stuart D. Gathman <stuart@bms-at.uucp> <..!{vrdxhq|dgis}!bms-at!stuart>