[net.general] DECtalk -- a text-to-speech system

minow (04/14/83)

For the last year, I have been working on DECtalk, a high quality
text-to-speech synthesizer.  An article on this work will appear
in next week's Electronics (should be out the week of April 18th)
and I thought Human Nets, Telecom and USEnet people might enjoy
a brief summary and an early chance to hear the demo.

DECtalk uses a 68000 with 256K-bytes ROM to convert unrestricted
English text to synthesizer parameters.  These are transmitted
to a TMS32010 digital signal processor to generate the analog
waveform.  The 68000 uses a large lexicon and a set of about 400
letter-to-sound conversion rules.  The lexicon (occupying about 1/2
of the ROM space) guarantees correct pronounciation for a large
subset of English and cuts down processing time for common words.
DECtalk also contains heuristics to process abbreviations, numbers,
and acronyms.

To communicate with the outside world, DECtalk contains two asychronous
terminal lines (one with modem control) and a built-in telephone line
interface with DTMF (Touch-Tone) decoder.

The processing requirements are interesting:  English text, entered
at about 30 bytes per second, is converted by the 68000 to synthesizer
parameter blocks (18 16-bit words).  The TMS DSP reads a new parameter
block every 6.5 msec. and generates 10,000 12-bit samples per second.
All processing is digital -- the only analog component on the board
is the DAC anti-aliasing filter.

DECtalk software was written in C -- the board has a home-brew
Unix-flavored real-time operating system.  This allowed us to debug
text-to-speech modules on a timesharing system.  (The non-real-time
components run on VMS, Unix, RSTS, RT11, and TOPS-20.)

If you would rather listen to DECtalk than read about it, feel free to
call (617) 493-7625 (preferably from a Touch-Tone phone).

Your comments and suggestions are most welcome.

Regards.


Martin Minow
decvax!minow		(USENET)
decvax!minow @ Berkeley	(ARPA)