[comp.sys.amiga] more on phonemes and talking machines...

czhj@vax1.UUCP (02/14/87)

The problem with using a simple phoneme splicing algorithm is that words are
not spoken that way by people in the real world.  The phone company uses 
spliced words for the directory assistance - there you can say 3 - 1 - 4 - ...
and get reasonable sounding voices.  However, if you really want to accurately
simulate a human voice, you have to some funky merging of phonemes.  We start
preparing to utter a phoneme far before it is actually spoken.  Your tongue
glides from one phoneme to the next while uttering the phonemes themselves.
Try saying "hello".  Now say "howdy"  Notice that your tongue position on the
'h' in "hello" differs from that of "howdy".  You anticipate the following
vowel.  Then, the e -> l transition is made starting at the beginning of the
utterance of 'e'.  Any voice production system that is going to sound realistic
is going to have to base phoneme stress patterns on the rest of the word, not
just on a simple phonemic level.

In addition to these basic pronounciation mechanisms, you have to take context,
intonation, etc.  into account when saying things.  Hence vocalization becomes
a much more difficult problem to do well (though not impossible).

---Ted Inoue