v092pxca@ubvmsb.cc.buffalo.edu (Paul D Fly) (04/16/91)
Does anyone know if anyone is working on building a voice synthesizer? This seems like one of the last frontiers completely open. Lots of people, both pros and hobbists sit at home and compose fully orchestrated stuff on their music systems, but one thing is missing: vocals. I figure a voice synth would have to be a little computerized system/sequencer, independent of MIDI except for syncing. One could specify vowels (sweepable from frontal to rear, and open to close (forgive me, I don't know enough about linguistics to know the proper terms...), consonants, resonance, timbre, dynamics, etc. To produce a completely realistic voice synth would be a monster task, but isn't it within the realm of present technology to make the first steps? The best I've seen so far are cheesy "voice-waveform" producers that claim to make vowel sounds, but really sound terrible. Sampling is okay, but limited, and , of course, requires having someone actual do what you want for real in the first place. With the coming of such technology, we'd be one more step closer to fully computerized music production.
bjornl@sics.se (Bj|rn Lisper) (04/17/91)
In article <71181@eerie.acsu.Buffalo.EDU> v092pxca@ubvmsb.cc.buffalo.edu (Paul D Fly) writes: >Does anyone know if anyone is working on building a voice synthesizer? This >seems like one of the last frontiers completely open. ... A "singing voice" synthesizer has been built as a research project at the Dept. of Music Acoustics (Institutionen for musikakustik) at the Royal Institute of Technology in Stockholm. Some years ago I heard a striking demonstration of it: starting from a sine wave, characeristics of a human voice were successively added until it suddenly sounded like a heroic opera tenor! Rumour has it that they once sent a tape of synthesized song to the entrance test of a school of music, and it passed... This synthesizer was not capable of forming words, though, just to sing with a certain wovel (which could be changed by changing the voice characteristics). The department in question has, however, over the years conducted some quite interesting research on speech synthesis too. I have no references at hand, but I expect their work to be well documented at conferences and such. Bjorn Lisper
ron@vicorp.com (Ron Peterson) (04/19/91)
In article <71181@eerie.acsu.Buffalo.EDU> v092pxca@ubvmsb.cc.buffalo.edu writes: >Does anyone know if anyone is working on building a voice synthesizer? This In the cyberpunk novel "Little Hereos" a device called a VoxBox is used to create the lead and backing vocals for synthetically created music. It is described as requiring a real human voice as input which is then modified to give it zing. Perhaps there is work of this nature going on somewhere. I know that Laurie Anderson has a device that she uses to make her voice sound male. Anyone know the details of how it works? I've played with sending a voice through fuzzboxes, flangers, delays, equalizers and even synthesizers (anyone have a PAIA Gnome they want to sell me? Mine got stolen) like the ARP Avatar and while fun, it all sounds robotic. A memory of an ad in Electronic Musician comes to my mind--the one that showed a box with hundreds of LED's on the front and boasted real-time harmonic modification of guitar sounds. Costs more than 10K but might do some interesting things to a voice. Anyone ever try one? I'd be interested in knowing how hard it is to program and what it sounds like. PAIA sells an affordable vocoder. Perhaps this could be modified to work with the digital recording systems that are appearing (or the samplers that already exist) to function as an input device for altering vocals. Like this: Record a lyric, isolate the first word/sound, smooth the pitch fluctuations be specifying a pitch glide rate limit, using your voice as input record an amplitude and pitch history, then cut-paste-shift-alter-scale-stretch-impose rules on the history, then apply it to the recorded lyric, then fiddle with it to get it good. Sounds tedious but not much more so than twiddling sampled sounds. Seems like real-time modification of voice via DSP is do-able. Just guessing, it seems like it would require recognising/detecting events in song/speech (start, stop, click, resonance shifts, ?) in order to know when to switch in the proper algorithm for modifying that part of the sound (in addition to things like pitch shifting and equalisation.) Hook it up to a Max Headroom and then any dweeb (like me 8v) who can program can build themselves a new image and voice! (Pretty hard to take it into the bedroom though...robotics and telepresence?) ron@vicorp.com or uunet!vicorp!ron
melby@daffy.yk.Fujitsu.CO.JP (John B. Melby) (04/21/91)
Speaking of real-time voice processing, voice octaver chips seem to be quite popular over here in Japan. (They are most often used to make a voice unrecognizable, but I think they produce a somewhat more natural sound than the Chipmunks.... :-) ) ----- John B. Melby Fujitsu Limited, Machida, Japan melby%yk.fujitsu.co.jp@uunet
ogata@leviathan.cs.umd.edu (Jefferson Ogata) (04/24/91)
In article <1991Apr18.230956.20033@vicorp.com> ron@vicorp.com (Ron Peterson) writes: |> In article <71181@eerie.acsu.Buffalo.EDU> v092pxca@ubvmsb.cc.buffalo.edu writes: |> >Does anyone know if anyone is working on building a voice synthesizer? This |> |> In the cyberpunk novel "Little Hereos" a device called a VoxBox is used |> to create the lead and backing vocals for synthetically created music. |> It is described as requiring a real human voice as input which is then |> modified to give it zing. Perhaps there is work of this nature going |> on somewhere. I know that Laurie Anderson has a device that she uses |> to make her voice sound male. Anyone know the details of how it works? I believe that this is just a pitch transposer coupled with a slightly modified vocal inflection. I know that I've gotten similar effects messing with pitch transposers, although my voice *already* sounds male, so... -- Jefferson Ogata ogata@cs.umd.edu University of Maryland Department of Computer Science "Sure. Understanding today's complex world of the future *is* a little like having bees live in your head."
ron@vicorp.com (Ron Peterson) (04/26/91)
In article <33454@mimsy.umd.edu> ogata@leviathan.cs.umd.edu (Jefferson Ogata) writes: >In article <1991Apr18.230956.20033@vicorp.com> ron@vicorp.com (Ron Peterson) writes: >|> In article <71181@eerie.acsu.Buffalo.EDU> v092pxca@ubvmsb.cc.buffalo.edu writes: >|> >Does anyone know if anyone is working on building a voice synthesizer? This >|> In the cyberpunk novel "Little Hereos" a device called a VoxBox is used >|> to create the lead and backing vocals for synthetically created music. >|> It is described as requiring a real human voice as input which is then >|> modified to give it zing. Perhaps there is work of this nature going >|> on somewhere. I know that Laurie Anderson has a device that she uses >|> to make her voice sound male. Anyone know the details of how it works? > >I believe that this is just a pitch transposer coupled with a slightly >modified vocal inflection. I know that I've gotten similar effects >messing with pitch transposers, although my voice *already* sounds >male, so... How do you transpose a voice in pitch without losing its natural sound? I've heard of pitch transposers that convert the input signal to a square wave and then multiply or divide it to get a fundamental pitch that is an octave higher or lower, but this destroys all of the information contained in the shape of the waves. Is there another way to do it? And how do you get sub-octave transposition? ron@vicorp.com or uunet!vicorp!ron
ogata@leviathan.cs.umd.edu (Jefferson Ogata) (04/27/91)
I wrote: |> I believe that this is just a pitch transposer coupled with a slightly |> modified vocal inflection. I know that I've gotten similar effects |> messing with pitch transposers, although my voice *already* sounds |> male, so... In article <1991Apr25.210916.348@vicorp.com> ron@sunspark.UUCP (Ron Peterson) writes: |> How do you transpose a voice in pitch without losing its natural sound? |> I've heard of pitch transposers that convert the input signal to a |> square wave and then multiply or divide it to get a fundamental pitch |> that is an octave higher or lower, but this destroys all of the |> information contained in the shape of the waves. Is there another |> way to do it? And how do you get sub-octave transposition? |> ron@vicorp.com or uunet!vicorp!ron Here is a primitive description. Actual algorithms are more refined, especially in what data they decide to throw away. Measure the frequency of the input (using zero-crossings, for example). Digitize the input. Then: Down an octave: Save every other input wave. Throw the other one away. For each output wave period (twice the input period), output your sampled input so it takes twice as long. For example, for each sample of the input, output that sample twice. Or for better results, inter- polate each sample point with the following one to get your extra point. Up an octave: Throw away every other sample point. For each input wave period (half the output period), output the complete sample stream twice. For other intervals of transposition, you have to throw away/duplicate different amounts of information. Now regular pitch transposers don't really make a "natural" sounding voice, because the algorithm isn't so great (especially the frequency tracking, because of noise from sibilants) and also because they transpose sibilant noise. Sibilant noise should be at the same frequency no matter what the pitch of the voice is. An S should sound the same whether I am singing high or low. Other aspirant noise has the same problem, but it really comes out in S, SH, TH, F, etc. I correct for this by adjusting my pronunciation of the sibilants. If I am transposing up, I sing an S as a SH, so it comes out sounding like S. Transposing down I do the opposite. It is very difficult to get a really natural sounding voice, but changing your sibilants makes a big difference. For a good example, listen to the Chipmunks on Saturday morning cartoons. These are voices transposed straight up with no sibilant adjustment. The further the transposition, the worse the sibilants are distorted. This is why pitch-riders don't screw up the voice; they are typically transposing less than a semitone, which is pretty much okay. The pitch transposition machine I usually use (Digitech IPS-33) doesn't guess frequency extremely fast so it can avoid tracking all over the place during a sibilant. This is a big tradeoff: if the machine tracks pitch too quickly, it's wrong most of the time in a word like "fist", where the noise has indeterminate frequency. But if it doesn't track fast enough there will be audible delay during melodic lines. The Digitech is set to a fairly reasonable tracking rate. I think there is a PLL tied to the input with a limit on slew rate, and the processor measures the frequency of the PLL rather than trying to decompose the input. I'm not sure about this, though; it just seems like the right way to do it. Hope this helps. -- Jefferson Ogata ogata@cs.umd.edu University of Maryland Department of Computer Science "Sure. Understanding today's complex world of the future *is* a little like having bees live in your head."
ctdonath@rodan.acs.syr.edu (Carl T. Donath) (04/30/91)
Has anyone tried synthesizing phonemes on a music synth, esp. a TG77? I'd like to create a synthetic chorus - clarity is not a major concern, as long as phonemes can be built into roughly understandable words. - Carl -- \-\-\ ctdonath@rodan.acs.syr.edu Carl T Donath /-/-/ ---------------------------------------------------------------- /-/-/ In most rationalized situations, logic need not apply. \-\-\
scott@bbxsda.UUCP (Scott Amspoker) (04/30/91)
In article <1991Apr29.202550.12985@rodan.acs.syr.edu> ctdonath@rodan.acs.syr.edu (Carl T. Donath) writes: >Has anyone tried synthesizing phonemes on a music synth, esp. a TG77? >I'd like to create a synthetic chorus - clarity is not a major concern, >as long as phonemes can be built into roughly understandable words. There are patches for the DX7 (and probably the TG77) that attempt to sound like human voices. Of course, they are rather crude but they do have a strange "vocal" quality to them. I examined some of these patches once to see what they were doing. They composed of the usual FM operators that created a "generic", pitched tone plus at least one operator that was fixed at a carefully chosen frequency which seemed to do the trick. -- Scott Amspoker | Touch the peripheral convex of every Basis International, Albuquerque, NM | kind, then various kinds of blaming (505) 345-5232 | sound can be sent forth. unmvax.cs.unm.edu!bbx!bbxsda!scott | - Instructions for a little box that | blurts out obscenities.
rivero@dev8a.mdcbbs.com (05/01/91)
In article <1991Apr29.202550.12985@rodan.acs.syr.edu>, ctdonath@rodan.acs.syr.edu (Carl T. Donath) writes: > Has anyone tried synthesizing phonemes on a music synth, esp. a TG77? > I'd like to create a synthetic chorus - clarity is not a major concern, > as long as phonemes can be built into roughly understandable words. > > - Carl Phoneme generation requires a lot of slightly different sounds, which would quickly use up all your banks. More than that, it requires the smooth transition from phoneme to phoneme, something still lacking in all but the most expensive and experimental hardware. For the poor mans electronic chorus, create a good "ahhhh" sound, then build the following. Mount a speaker onto a board centered in an airtight box. The speaker mounting board should have a hole drilled in it about 1/2 in diameter, directly above the apeaker cone. A 1/2 inch diameter 6 foot plastic hose is attached to this hole, and leaves the box through aother 1/2 inch hole. PLay your "ahhh" sound through the speaker inside the box. Hold the end of the plastic hose in your teeth (to one side of your mouth) and "mouth" the phoneme sounds, re-recording the new sounds with an open air microphone. For about the $40.00 in parts that it takes, you will get a good "vocal" track. Hope this helps. Michael
mike@ymt.com (Michael Czeiszperger) (05/02/91)
I remember hearing phonemes created with FM. It was the usual vowel simulations, recognizable, but not particularly musical. Unfortunately, I can't remember where I heard it. -- Michael Czeiszperger | "I'm trying to teach a caveman to play scrabble mike@ymt.com | but the only word he is knows is 'uugh', and he Greenbrae, CA | doesn't know how to spell it!"