lenny@icus.islp.ny.us (Lenny Tropiano) (11/08/88)
I've posed this before, but now I have proof that it's possible. I've spoken with various people (some who were on the original Voice Power development team) who couldn't give me "specifics" but said it was possible. Voice Recognition, how? That's the question... Since my involvement with the Voice Power product on the UNIX pc, I've learned a lot. Learning bits and pieces about CODEC's, PCM (pulse code modulation), DSP's (digital signal processors), sub bands, mu-law, a-law, etc... It's still very technical, and way over my head, but I'm learning... [side note: if there is anyone out there who can give me help in the above topics please feel free to contact me]. I was fortunate to get a copy of the _AT&T Technical Journal_, Sept/Oct 1986, volume 65, issue 5, entitled "Speech Processsing Theory", from someone on the Voice Power team. This issue of the magazine was dedicated to the technical aspects of speech processing and added some light to certain topics. [in most places it's very technical]. In the article, "Speech Processing for AT&T Workstations", by John G. Ackenhusen, Syed S. Ali, James G. Josenhans, John W. Moffett, Reuel R. Robertson and Jaime R. Tormos, they discussed the Voice Power product for the UNIX PC. -- the follow is paraphrasing passages out of the 8 page article -- ..." This paper describes the Voice Power speech processor, a speech processing option for the AT&T UNIX PC. It is a peripheral card (with software) that slides into an expansion slot on the worksation and adds the capability for speech store and playpack, speech recognition, and text-to-speech synthesis to the UNIX PC. The initial offereing of the application software is also described. This software uses a subset of the hardwares capabilities: speech storage and playpack, and text-to-speech synthesis. ... Future Speech Processing Capabilities The Voice Power speech processor can support speech recognition and text-to-speech synthesis. These capabilities are under development. Speech Recognition. The Voice Power speech processor's speech recognition capability permits the automatic identification of an unknown word of phrase from a vocabulary of 50 words or phrases. First the user or an automatic application compiler must slect the words; then, the user trains the recognizer... " Text-to-speech works, and is implemented with the "vtts(1) command plus various library calls" All throughout the manuals they mention: ... CAVEATS Text-to-speech components of Voice Power are provided for evaluation only and are not supported. ... It does have some problems, but does work neithertheless. Speech recognition is mentioned in various header files and commands, but there is *no* software that utilizes it and no documentation on how to interface to it. How do you do it?! I spoke with a person who's name is plastered all over everything having to do with Voice Power at AT&T, and he said that the recognition parameters (and format) are proprietary and he couldn't say any more ... He did say that the recognition parameters are a representation of filters of the vocal track. Gee, that helps me a lot! :-} Anyone able to help? The manual page for vrecord(1) also wets your taste buds, but that's about it. vrecord(1) VOICE POWER vrecord(1) NAME vrecord - record voice data file SYNOPSIS vrecord [-c card] [-v] [-x] [-q] [-o] [-l n] [-f n] [-16] [-24] [-64] [-s] [-t n] [-e n] [-g] [-i] [-D] [file] ... -f n Set format control word (v2_ctrl.format) to n. 0 16k sub-band with silence compression 4 16k sub-band, no silence compression, default. 2 24k sub-band, with silence compression. 6 24k sub-band, no silence compression. 36 Recognition ^^^^^^^^^^^ 8 64k mu Law -Lenny -- Lenny Tropiano ICUS Software Systems [w] +1 (516) 582-5525 lenny@icus.islp.ny.us Telex; 154232428 ICUS [h] +1 (516) 968-8576 {talcott,decuac,boulder,hombre,pacbell,sbcs}!icus!lenny attmail!icus!lenny ICUS Software Systems -- PO Box 1; Islip Terrace, NY 11752
clb@loci.UUCP (Charles Brunow) (11/09/88)
In article <540@icus.islp.ny.us>, lenny@icus.islp.ny.us (Lenny Tropiano) writes: > I've posed this before, but now I have proof that it's possible. I've > spoken with various people (some who were on the original Voice Power > development team) who couldn't give me "specifics" but said it was > possible. Voice Recognition, how? That's the question... Since my > involvement with the Voice Power product on the UNIX pc, I've learned > a lot. Learning bits and pieces about CODEC's, PCM (pulse code modulation), > DSP's (digital signal processors), sub bands, mu-law, a-law, etc... It's > still very technical, and way over my head, but I'm learning... [side note: > if there is anyone out there who can give me help in the above topics > please feel free to contact me]. > If you don't already know this stuff pat then you're years away from speech recognition (SR). The coding method and companding are basic stuff which you can find in telco references. There's a bit in "Transmission Systems for Communications", by "Members of the Technical Staff - Bell Telephone Laboratories", and you could profit from "Digital Signal Processing" by Alan V. Oppenheim and Ronald W. Schafer (Prentice-Hall, 1975). There are bound to be other references which are basically equivalent. Another sources might be the app notes put out by TI a few years back when they were trying to convince the world that they had the best speech stuff. Some of it is very specific, like how the vocal tract simulations work (schematics). My archives are too confused to find copies so maybe someone else can lay their hands on a copy for you. Ultimately the process probably consists of determining the coefficients for the filter nodes and looking for the best match with the set of known words and updating the coefficients either completely or with a damping factor for learning. The problem is that knowing that doesn't get you much closer to actually doing it. There is loads of raw data (assume a 8KHz sample rate) which has to be reduced to a form which can be efficiently processed while keeping enough data to distinguish similar words from different people. Many people have spent lots of time on it without significant break-thoughs. -- CLBrunow - KA5SOF clb@loci.uucp, loci@csccat.uucp, loci@killer.dallas.tx.us Loci Products, POB 833846-131, Richardson, Texas 75083