[comp.music] voice synthesizer

v092pxca@ubvmsb.cc.buffalo.edu (Paul D Fly) (04/16/91)

Does anyone know if anyone is working on building a voice synthesizer?  This
seems like one of the last frontiers completely open.  Lots of people, both
pros and hobbists sit at home and compose fully orchestrated stuff on their
music systems, but one thing is missing: vocals.
	I figure a voice synth would have to be a little computerized
system/sequencer, independent of MIDI except for syncing.  One could specify
vowels (sweepable from frontal to rear, and open to close (forgive me, I don't
know enough about linguistics to know the proper terms...), consonants,
resonance, timbre, dynamics, etc.
	To produce a completely realistic voice synth would be a monster task,
but isn't it within the realm of present technology to make the first steps? 
The best I've seen so far are cheesy "voice-waveform" producers that claim to
make vowel sounds, but really sound terrible.  Sampling is okay, but limited,
and , of course, requires having someone actual do what you want for real in
the first place.
	With the coming of such technology, we'd be one more step closer to
fully computerized music production.

bjornl@sics.se (Bj|rn Lisper) (04/17/91)

In article <71181@eerie.acsu.Buffalo.EDU> v092pxca@ubvmsb.cc.buffalo.edu
(Paul D Fly) writes:
>Does anyone know if anyone is working on building a voice synthesizer?  This
>seems like one of the last frontiers completely open. ...

A "singing voice" synthesizer has been built as a research project at the
Dept. of Music Acoustics (Institutionen for musikakustik) at the Royal
Institute of Technology in Stockholm. Some years ago I heard a striking
demonstration of it: starting from a sine wave, characeristics of a human
voice were successively added until it suddenly sounded like a heroic opera
tenor! Rumour has it that they once sent a tape of synthesized song to the
entrance test of a school of music, and it passed...

This synthesizer was not capable of forming words, though, just to sing with
a certain wovel (which could be changed by changing the voice
characteristics). The department in question has, however, over the years
conducted some quite interesting research on speech synthesis too. I have no
references at hand, but I expect their work to be well documented at
conferences and such.

Bjorn Lisper

ron@vicorp.com (Ron Peterson) (04/19/91)

In article <71181@eerie.acsu.Buffalo.EDU> v092pxca@ubvmsb.cc.buffalo.edu writes:
>Does anyone know if anyone is working on building a voice synthesizer?  This

In the cyberpunk novel "Little Hereos" a device called a VoxBox is used
to create the lead and backing vocals for synthetically created music.
It is described as requiring a real human voice as input which is then
modified to give it zing.  Perhaps there is work of this nature going
on somewhere.  I know that Laurie Anderson has a device that she uses
to make her voice sound male.  Anyone know the details of how it works?

I've played with sending a voice through fuzzboxes, flangers, delays,
equalizers and even synthesizers (anyone have a PAIA Gnome they want to
sell me? Mine got stolen) like the ARP Avatar and while fun, it all
sounds robotic.  A memory of an ad in Electronic Musician comes to my
mind--the one that showed a box with hundreds of LED's on the front
and boasted real-time harmonic modification of guitar sounds.  Costs
more than 10K but might do some interesting things to a voice.  Anyone
ever try one?  I'd be interested in knowing how hard it is to program
and what it sounds like.
PAIA sells an affordable vocoder.  Perhaps this could be modified to
work with the digital recording systems that are appearing (or the
samplers that already exist) to function as an input device for 
altering vocals.  Like this: Record a lyric, isolate the first
word/sound, smooth the pitch fluctuations be specifying a pitch
glide rate limit, using your voice as input record an amplitude and
pitch history, then cut-paste-shift-alter-scale-stretch-impose rules
on the history, then apply it to the recorded lyric, then fiddle with it
to get it good.  Sounds tedious but not much more so than twiddling
sampled sounds.

Seems like real-time modification of voice via DSP is do-able.  Just
guessing, it seems like it would require recognising/detecting events
in song/speech (start, stop, click, resonance shifts, ?) in order to
know when to switch in the proper algorithm for modifying that part of
the sound (in addition to things like pitch shifting and equalisation.)

Hook it up to a Max Headroom and then any dweeb (like me 8v) who can
program can build themselves a new image and voice!  (Pretty hard to
take it into the bedroom though...robotics and telepresence?)
ron@vicorp.com or uunet!vicorp!ron

melby@daffy.yk.Fujitsu.CO.JP (John B. Melby) (04/21/91)

Speaking of real-time voice processing, voice octaver chips seem to be quite
popular over here in Japan.  (They are most often used to make a voice
unrecognizable, but I think they produce a somewhat more natural sound than
the Chipmunks.... :-) )

-----
John B. Melby
Fujitsu Limited, Machida, Japan
melby%yk.fujitsu.co.jp@uunet

ogata@leviathan.cs.umd.edu (Jefferson Ogata) (04/24/91)

In article <1991Apr18.230956.20033@vicorp.com> ron@vicorp.com (Ron Peterson) writes:
|> In article <71181@eerie.acsu.Buffalo.EDU> v092pxca@ubvmsb.cc.buffalo.edu writes:
|> >Does anyone know if anyone is working on building a voice synthesizer?  This
|> 
|> In the cyberpunk novel "Little Hereos" a device called a VoxBox is used
|> to create the lead and backing vocals for synthetically created music.
|> It is described as requiring a real human voice as input which is then
|> modified to give it zing.  Perhaps there is work of this nature going
|> on somewhere.  I know that Laurie Anderson has a device that she uses
|> to make her voice sound male.  Anyone know the details of how it works?

I believe that this is just a pitch transposer coupled with a slightly
modified vocal inflection. I know that I've gotten similar effects
messing with pitch transposers, although my voice *already* sounds
male, so...

--
Jefferson Ogata                 ogata@cs.umd.edu
University of Maryland          Department of Computer Science
   "Sure. Understanding today's complex world of the future *is*
          a little like having bees live in your head."

ron@vicorp.com (Ron Peterson) (04/26/91)

In article <33454@mimsy.umd.edu> ogata@leviathan.cs.umd.edu (Jefferson Ogata) writes:
>In article <1991Apr18.230956.20033@vicorp.com> ron@vicorp.com (Ron Peterson) writes:
>|> In article <71181@eerie.acsu.Buffalo.EDU> v092pxca@ubvmsb.cc.buffalo.edu writes:
>|> >Does anyone know if anyone is working on building a voice synthesizer?  This
>|> In the cyberpunk novel "Little Hereos" a device called a VoxBox is used
>|> to create the lead and backing vocals for synthetically created music.
>|> It is described as requiring a real human voice as input which is then
>|> modified to give it zing.  Perhaps there is work of this nature going
>|> on somewhere.  I know that Laurie Anderson has a device that she uses
>|> to make her voice sound male.  Anyone know the details of how it works?
>
>I believe that this is just a pitch transposer coupled with a slightly
>modified vocal inflection. I know that I've gotten similar effects
>messing with pitch transposers, although my voice *already* sounds
>male, so...
How do you transpose a voice in pitch without losing its natural sound?
I've heard of pitch transposers that convert the input signal to a
square wave and then multiply or divide it to get a fundamental pitch
that is an octave higher or lower, but this destroys all of the 
information contained in the shape of the waves.  Is there another
way to do it?  And how do you get sub-octave transposition? 
ron@vicorp.com or uunet!vicorp!ron

ogata@leviathan.cs.umd.edu (Jefferson Ogata) (04/27/91)

I wrote:
|> I believe that this is just a pitch transposer coupled with a slightly
|> modified vocal inflection. I know that I've gotten similar effects
|> messing with pitch transposers, although my voice *already* sounds
|> male, so...

In article <1991Apr25.210916.348@vicorp.com> ron@sunspark.UUCP (Ron Peterson) writes:
|> How do you transpose a voice in pitch without losing its natural sound?
|> I've heard of pitch transposers that convert the input signal to a
|> square wave and then multiply or divide it to get a fundamental pitch
|> that is an octave higher or lower, but this destroys all of the 
|> information contained in the shape of the waves.  Is there another
|> way to do it?  And how do you get sub-octave transposition? 
|> ron@vicorp.com or uunet!vicorp!ron

Here is a primitive description. Actual algorithms are more refined,
especially in what data they decide to throw away.

Measure the frequency of the input (using zero-crossings, for example).
Digitize the input. Then:

Down an octave:
   Save every other input wave. Throw the other one away.
   For each output wave period (twice the input period), output your
sampled input so it takes twice as long. For example, for each sample
of the input, output that sample twice. Or for better results, inter-
polate each sample point with the following one to get your extra
point.

Up an octave:
   Throw away every other sample point.
   For each input wave period (half the output period), output the
complete sample stream twice.

For other intervals of transposition, you have to throw away/duplicate
different amounts of information.

Now regular pitch transposers don't really make a "natural" sounding
voice, because the algorithm isn't so great (especially the frequency
tracking, because of noise from sibilants) and also because they
transpose sibilant noise. Sibilant noise should be at the same
frequency no matter what the pitch of the voice is. An S should
sound the same whether I am singing high or low. Other aspirant
noise has the same problem, but it really comes out in S, SH, TH,
F, etc. I correct for this by adjusting my pronunciation of the
sibilants. If I am transposing up, I sing an S as a SH, so it
comes out sounding like S. Transposing down I do the opposite. It
is very difficult to get a really natural sounding voice, but
changing your sibilants makes a big difference. For a good example,
listen to the Chipmunks on Saturday morning cartoons. These are
voices transposed straight up with no sibilant adjustment.

The further the transposition, the worse the sibilants are
distorted. This is why pitch-riders don't screw up the voice;
they are typically transposing less than a semitone, which is
pretty much okay.

The pitch transposition machine I usually use (Digitech IPS-33)
doesn't guess frequency extremely fast so it can avoid tracking
all over the place during a sibilant. This is a big tradeoff: if
the machine tracks pitch too quickly, it's wrong most of the time
in a word like "fist", where the noise has indeterminate frequency.
But if it doesn't track fast enough there will be audible delay
during melodic lines. The Digitech is set to a fairly reasonable
tracking rate. I think there is a PLL tied to the input with a
limit on slew rate, and the processor measures the frequency of the
PLL rather than trying to decompose the input.  I'm not sure about
this, though; it just seems like the right way to do it.

Hope this helps.

--
Jefferson Ogata                 ogata@cs.umd.edu
University of Maryland          Department of Computer Science
   "Sure. Understanding today's complex world of the future *is*
          a little like having bees live in your head."

ctdonath@rodan.acs.syr.edu (Carl T. Donath) (04/30/91)

Has anyone tried synthesizing phonemes on a music synth, esp. a TG77?
I'd like to create a synthetic chorus - clarity is not a major concern, 
as long as phonemes can be built into roughly understandable words.

- Carl

-- 
\-\-\ ctdonath@rodan.acs.syr.edu   Carl T Donath             /-/-/
 ----------------------------------------------------------------
/-/-/ In most rationalized situations, logic need not apply. \-\-\

scott@bbxsda.UUCP (Scott Amspoker) (04/30/91)

In article <1991Apr29.202550.12985@rodan.acs.syr.edu> ctdonath@rodan.acs.syr.edu (Carl T. Donath) writes:
>Has anyone tried synthesizing phonemes on a music synth, esp. a TG77?
>I'd like to create a synthetic chorus - clarity is not a major concern, 
>as long as phonemes can be built into roughly understandable words.

There are patches for the DX7 (and probably the TG77) that attempt to
sound like human voices.  Of course, they are rather crude but they
do have a strange "vocal" quality to them.  I examined some of these
patches once to see what they were doing.  They composed of the usual
FM operators that created a "generic", pitched tone plus at least
one operator that was fixed at a carefully chosen frequency which seemed
to do the trick.

-- 
Scott Amspoker                       | Touch the peripheral convex of every
Basis International, Albuquerque, NM | kind, then various kinds of blaming
(505) 345-5232                       | sound can be sent forth.
unmvax.cs.unm.edu!bbx!bbxsda!scott   |    - Instructions for a little box that
                                     |      blurts out obscenities.

rivero@dev8a.mdcbbs.com (05/01/91)

In article <1991Apr29.202550.12985@rodan.acs.syr.edu>, ctdonath@rodan.acs.syr.edu (Carl T. Donath) writes:
> Has anyone tried synthesizing phonemes on a music synth, esp. a TG77?
> I'd like to create a synthetic chorus - clarity is not a major concern, 
> as long as phonemes can be built into roughly understandable words.
> 
> - Carl

Phoneme generation requires a lot of slightly different sounds, which would
quickly use up all your banks. More than that, it requires the smooth
transition from phoneme to phoneme, something still lacking in all but the
most expensive and experimental hardware.

For the poor mans electronic chorus, create a good "ahhhh" sound, then build
the following.

 Mount a speaker onto a board centered in an airtight box. The speaker
mounting board should have a hole drilled in it about 1/2 in diameter,
directly above the apeaker cone. A 1/2 inch diameter 6 foot plastic hose is 
attached to this hole, and leaves the box through aother 1/2 inch hole.
PLay your "ahhh" sound through the speaker inside the box. Hold the end
of the plastic hose in your teeth (to one side of your mouth) and
"mouth" the phoneme sounds, re-recording the new sounds with an open
air microphone. For about the $40.00 in parts that it takes, you will get
a good "vocal" track.

Hope this helps.

Michael

mike@ymt.com (Michael Czeiszperger) (05/02/91)

I remember hearing phonemes created with FM.  It was the usual vowel
simulations, recognizable, but not particularly musical.
Unfortunately, I can't remember where I heard it.
-- 
Michael Czeiszperger  | "I'm trying to teach a caveman to play scrabble
mike@ymt.com          | but the only word he is knows is 'uugh', and he
Greenbrae, CA         | doesn't know how to spell it!"