minow (04/14/83)
For the last year, I have been working on DECtalk, a high quality text-to-speech synthesizer. An article on this work will appear in next week's Electronics (should be out the week of April 18th) and I thought Human Nets, Telecom and USEnet people might enjoy a brief summary and an early chance to hear the demo. DECtalk uses a 68000 with 256K-bytes ROM to convert unrestricted English text to synthesizer parameters. These are transmitted to a TMS32010 digital signal processor to generate the analog waveform. The 68000 uses a large lexicon and a set of about 400 letter-to-sound conversion rules. The lexicon (occupying about 1/2 of the ROM space) guarantees correct pronounciation for a large subset of English and cuts down processing time for common words. DECtalk also contains heuristics to process abbreviations, numbers, and acronyms. To communicate with the outside world, DECtalk contains two asychronous terminal lines (one with modem control) and a built-in telephone line interface with DTMF (Touch-Tone) decoder. The processing requirements are interesting: English text, entered at about 30 bytes per second, is converted by the 68000 to synthesizer parameter blocks (18 16-bit words). The TMS DSP reads a new parameter block every 6.5 msec. and generates 10,000 12-bit samples per second. All processing is digital -- the only analog component on the board is the DAC anti-aliasing filter. DECtalk software was written in C -- the board has a home-brew Unix-flavored real-time operating system. This allowed us to debug text-to-speech modules on a timesharing system. (The non-real-time components run on VMS, Unix, RSTS, RT11, and TOPS-20.) If you would rather listen to DECtalk than read about it, feel free to call (617) 493-7625 (preferably from a Touch-Tone phone). Your comments and suggestions are most welcome. Regards. Martin Minow decvax!minow (USENET) decvax!minow @ Berkeley (ARPA)