sean@ukma.UUCP (02/10/87)
You know, with a 512k amiga and a low sampling rate, you could probably fit all the digitized phonemes you ever wanted in memory. They'd probably sound a hell of a lot better, too. And you could make them sound like Leonard Nimoy, or perhaps, Mae West :-). BTW - I finally got my Amiga!!!!! Sean -- =========================================================================== Sean Casey UUCP: cbosgd!ukma!sean CSNET: sean@ms.uky.csnet ARPA: ukma!sean@anl-mcs.arpa BITNET: sean@UKMA.BITNET
flaps@utcsri.UUCP (Alan J Rosenthal) (02/11/87)
In a recent article sean@ukmj.ms.uky.csnet (Sean Casey) writes: >You know, with a 512k amiga and a low sampling rate, you could probably >fit all the digitized phonemes you ever wanted in memory. They'd >probably sound a hell of a lot better, too. They'd sound terrible. The narrator does a lot of intonation, etc. This is currently based on a model of human speech and could not be adapted directly to work with digitized sounds. Assuming you didn't have this in mind, just cutting up phonemes and pasting them together would sound like those talking clocks that say "The time. is? five! thirty.." except much worse because the oddness would be on a phoneme, and not word, level. -- Alan J Rosenthal UUCP: {backbone}!seismo!mnetor!utgpu!flaps, ubc-vision!utai!utgpu!flaps, or utzoo!utgpu!flaps (among other possibilities) ARPA: flaps@csri.toronto.edu CSNET: flaps@toronto BITNET: flaps at utorgpu
lachac@topaz.UUCP (02/12/87)
In article <4111@utcsri.UUCP> flaps@utcsri.UUCP (Alan J Rosenthal) writes: > >In a recent article sean@ukmj.ms.uky.csnet (Sean Casey) writes: >> {about digitized phonemes} > >They'd sound terrible. The narrator does a lot of intonation, etc. This >is currently based on a model of human speech and could not be adapted >directly to work with digitized sounds. Assuming you didn't have this in >mind, just cutting up phonemes and pasting them together would sound like those >talking clocks that say "The time. is? five! thirty.." except much worse >because the oddness would be on a phoneme, and not word, level. In that case then, how are the phone companies recorded messages done? ("The number you have reached..")??? Aren't those computer generated? Also at work we have a Periphonics with a FANTASTIC female voice. I understand that the Amiga would probably be incapable of doing this, (maybe with 8 megs....?) I was wondering if the voice on the Amy could be improved... -- ---------------------------------------------------------------------------- "Isn't fun the best thing to have?" lachac@topaz.rutgers.edu
cjp@vax135.UUCP (02/12/87)
While this subject is current, I'd like to post a few ramblings. First let me say that I value the accomplishments of the current Narrator, and to some extent appreciate the difficulty of the enhancements I propose here. I would, however, like to see something better. In playing with Narrator through "say" in phoneme mode (i.e. not invoking Translator), I have found it difficult to achieve a satisfactorily natural-sounding voice. Impossible, really. One problem is that there is not enough control available. The phoneme syntax gives you only one parameter, called "stress", that you can adjust (ignoring the parameters like "pitch" or "male" which affect the whole utterance). I feel there is a need for control of, say, three factors: volume, duration, and pitch. These things need to be controllable at the resolution *at least* of phonemes. I argue that even finer control is necessary for fully expressive, natural sounding voice. It is not just Thai that needs the ability to change pitch during a vowel (or other voiced) sound. You need to be able to slide the pitch and volume of the phoneme between two limits from its start to its end. Perhaps you even need to say what the "shape" of that slide is, chosen from a few such as exponential, negative exponential, linear. Try listening critically to the pitch and speed of someone talking -- using a tape recording (or digitized voice) helps -- and notice how much of a person's attitude and intent are communicated through intonation. I'm sure you've already noticed how little of it comes through in Narrator's voice. Let me call this hypothetical voice generator the Expressor. Now clearly, this type of voice generator is not meant to be driven by an automatic text translator. There is generally not enough information in text for even humans to derive accurate intentions and attitudes, let alone the problem of generating parameters which re-evoke those attitudes. But I think there would be many good and impressive uses for "canned" strings of phonemes, generated manually. I estimate that even a fully parameterized, inflected, modulated, and warbled word, expressed as a string of phonemes in Expressor syntax, would require a tiny fraction of the storage of a digitized sound sample saying more or less the same thing. One could store maybe hours of expressive, intelligible talk on a single disk instead of the (I forget the exact time) less than a minute of sampled sound. If done properly, if the parameters are given enough range and resolution (*much* more than 1 to 9), one could even take a good shot at synthesized singing. Well, enough of me talking through my hat. I certainly don't know how hard it would be to implement. It would be neat though. Comments, especially informed comments, are requested. Charles Poirier USENET vax135!cjp
flaps@utcsri.UUCP (Alan J Rosenthal) (02/13/87)
>>In a recent article sean@ukmj.ms.uky.csnet (Sean Casey) writes: >>> {about digitized phonemes} I, flaps@utcsri.UUCP (Alan J Rosenthal), wrote: >>They'd sound terrible. lachac@topaz.rutgers.edu (Gerard Lachac) writes: >In that case then, how are the phone companies recorded messages done? >("The number you have reached..")??? Aren't those computer generated? > >Also at work we have a Periphonics with a FANTASTIC female voice. Well, I have a different phone company than you do, but I don't think this is really on the same topic. There is no obstacle to computer generated speech based on sampled phonemes, but it would have to be done with features like the attack and decay in the demo called "NewMusic" (the one with the window entitled "Audio Demo"), and not just by pasting recordings next to each other. In any case, if something that always begins with "The number you have reached" is based on sampled sound, I would assume that they recorded someone saying "The number you have reached" rather than asking them to say each word separately. Does it say a telephone number? Listen to the relative inflexion between digits. It sounds terrible, as I originally said, if it's like the ones here (or like any others I've heard where beauty of speech sound was not a design objective). And just to reiterate, my original article was a response to an article saying "why not just record each phoneme separately and splice them together? They'd probably sound better than the current narrator" or somesuch. -- Alan J Rosenthal UUCP: {backbone}!seismo!mnetor!utgpu!flaps, ubc-vision!utai!utgpu!flaps, or utzoo!utgpu!flaps (among other possibilities) ARPA: flaps@csri.toronto.edu CSNET: flaps@toronto BITNET: flaps at utorgpu
gwe@cbosgd.UUCP (02/13/87)
In article <9146@topaz.RUTGERS.EDU> lachac@topaz.rutgers.edu (Gerard Lachac) writes: >In article <4111@utcsri.UUCP> flaps@utcsri.UUCP (Alan J Rosenthal) writes: >> >>In a recent article sean@ukmj.ms.uky.csnet (Sean Casey) writes: >>> {about digitized phonemes} >>talking clocks that say "The time. is? five! thirty.." except much worse >>because the oddness would be on a phoneme, and not word, level. > > >In that case then, how are the phone companies recorded messages done? >("The number you have reached..")??? Aren't those computer generated? > While I don't work in that area, I do know that the recording you speak of is just that. It contains one section "The number you have reached...", which was digitized as spoken by somebody, then assembles the digits (each one individually digitized), then plays the trailer, "has been disconnected". The message isn't constructed from phonemes, or even at the word level. It's basically an answering machine with 4k RAM. :-) ------------------------------clip and save---------------------------------- Bill Thacker cbatt!cbosgd!cbdkc1!serial!wbt DISCLAIMER: Farg 'em if they can't take a joke ! If you love something, set it free. If it doesn't come back to you, track it down and kill it. -----------------------------valuable coupon---------------------------------
jimh@hpsadla.UUCP (02/15/87)
Just a quick note on the Narrator... Has anyone out there ever played with SAM (the Software Automatic Mouth), which was a program for the Apple ][, C-64, and Atari 6502 machines? Am I the only one who recognizes the voice in `SAY'? Speaking of improved Amiga speech, has anyone else heard or played with the (gasp!) Macintosh's `Smoothtalker'? Do it. You will be pleasantly surprised. Now if a < 8MHz 68K can do that without DMA, why can't that code find its way to the Amiga? Jim Horn {The World}!hplabs!hpfcla!hpsrla!hpsadla!jimh