[rec.music.gaffa] Voice processing technology

boris@PRODIGAL.PSYCH.ROCHESTER.EDU (Boris "transceiver" Goldowsky) (11/05/89)

Ever since I heard Laurie Anderson I've been wondering... how exactly
do they do this filtering to change the sound of someone's voice, or
to make it sound like 3 people singing at once...?

Enlighten me, O net!    (or recommend a good book)

Bng

jsd@GAFFA.MIT.EDU (Jon Drukman) (11/05/89)

In article <3801@ur-cc.UUCP> boris@prodigal.psych.rochester.edu writes:
>Ever since I heard Laurie Anderson I've been wondering... how exactly
>do they do this filtering to change the sound of someone's voice, or
>to make it sound like 3 people singing at once...?

Well, there's two kinds of boxes.  The 'vocoder' takes a microphone
input and modulates the signal with an external source, usually a
synthesizer circuit of some kind playing one note.  The vocoder works
great for speech, not so much for singing, although there's a
brilliant passage in "Boom! There She Was" by Scritti Politti which
features Roger Troutman singing some scat vocals through a vocoder
which is being modulated by a minimoog... 

The harmonizer is a device which takes your voice and electronically
alters the frequencies in it (how, I'm not exactly sure) to produce a
harmony line with it. 

With the advent of digital sampling technology, all this stuff is now
a piece of cake, and you can buy cheap boxes to do it.  I have a
cartridge for my computer which when coupled with appropriate software
can transform the pitch of any incoming signal.  If you put a digeridu
into it playing only one note (since that's all they can play) and
then played a melody on a MIDI keyboard, it would 'play' the melody
with a digeridu sound.  I used this effect for the live version of
"Running Up That Hill" which was done at the Katemas party in July.  I
used it as basically a digital version of the Chipmunks vocal effect.
It has the advantage that you can sing in your normal voice at normal
speed and you still come out sounding like a weasel on helium, whereas
the Chipmunks stuff was all done by speeding up the tape and involved
speaking... really... slowly... so that the pacing came out right when
the tape was played back fast. 

All clear?

+---------------------- Is there any ESCAPE from NOISE? ----------------------+
|  |   |\       | jsd@gaffa.mit.edu | "I like George Bush, but this `kinder,  |
| \|on |/rukman | jsd@umass.bitnet  | gentler' crap is killing us." - D.Trump |
+-----------------------------------------------------------------------------+

donley@BLAKE.ACS.WASHINGTON.EDU (E. Donley Olson) (11/05/89)

In article <3801@ur-cc.UUCP> boris@prodigal.psych.rochester.edu writes:
>
>Ever since I heard Laurie Anderson I've been wondering... how exactly
>do they do this filtering to change the sound of someone's voice, or
>to make it sound like 3 people singing at once...?
>

John Drukman has answered some of this...  regarding vocoders and such.
Harmonizers are strange little boxes.  I beleive that the harmonizers may
work by digitally sampling (or otherwise) the incomming sound, ie Laurie's
voice, and then playing it back "sped up" almost instantaneously... The
problem with this method is that you have to "drop" bits of the sound
when you do this because when you play it back "sped up" it takes less time
for the sound to occur and when you play it back "slowed down" it takes
too MUCH time to play it back.  The solution found in tape decks that use
this technique is to sample very quickly and to "cut out" the pieces that
are extra, or stick in an extra sample every period if the voice is sped up.
This is the least likely method because the  "cuts" leave annoying glitches
in the sound.

The other possible way this is accomplished is by doing some sort of frequency
counting on the incomming sound and then dividing the frequency by a certain
amount (or multiplying to make her sound like Dolly Parton).  I once built
a divider of this sort, but it was rather crude...  But it might be what
they do...  People have been able to digitize the human voice into square
waves, so why not...
I would NOT expect that harmonizers do complete spectral analysis
on samples in real time -- even the Fairlight doesn't do that!

Any other possibilities?
  - Eo

bloch%mandrill@UCSD.EDU (Steve Bloch) (11/05/89)

boris@prodigal.psych.rochester.edu writes:
>Ever since I heard Laurie Anderson I've been wondering... how exactly
>do they do this filtering to change the sound of someone's voice, or
>to make it sound like 3 people singing at once...?

Jon Drukman writes:
>Well, there's two kinds of boxes.  The 'vocoder' ...
>
>The harmonizer is a device which takes your voice and electronically
>alters the frequencies in it (how, I'm not exactly sure) to produce a
>harmony line with it. 

Donley describes a sample-and-chop approach and a frequency-dividing
approach.
>I would NOT expect that harmonizers do complete spectral analysis
>on samples in real time -- even the Fairlight doesn't do that!
Of course, the special-purpose FFT chips are getting faster every day.
Let's see... to do it in real time, assuming mono input and say a
24KHz sampling rate, you need to do a 1024-point FFT in 40 msec.
That's within the capabilities of current hardware, I think.  'Course,
you'd only get a precision of 24Hz, which wouldn't be good enough to
produce a clean harmony in the 500-2000Hz range (1-3 quarter-tones).
If you can do a 2048-point FFT in real-time (here you have 80 msec to
do it), the precision becomes 12Hz.  Check the newsgroup comp.dsp for
more accurate answers.

>Any other possibilities?

Back to Jon:
>With the advent of digital sampling technology, all this stuff is now
>a piece of cake, and you can buy cheap boxes to do it.  I have a
>cartridge for my computer which when coupled with appropriate software
>can transform the pitch of any incoming signal.  If you put a digeridu
>into it playing only one note (since that's all they can play) and
>then played a melody on a MIDI keyboard, it would 'play' the melody
>with a digeridu sound.
If I understand this right, it's just straight play-back-faster-or-
slower, which changes the ADSR parameters if you take it very far
(like, more than half an octave or so).

But if all you want to do is echo a voice at a particular pitch (or
several pitches), and you don't want to change what pitch it is too
often, you can do it with an IIR filter, using very short digital
feedback to build resonances at whatever pitches strike your fancy.
I've always assumed that was how Laurie did it, as it's computa-
tionally very easy (to resonate two pitches, you need a four-pole
filter, which only requires three adds and four multiplies per
sample, and an 8086 can do that.)
The only problem is that DESIGNING the IIR filter, figuring out the
coefficients to suit the pitches you want to resonate, takes some
work, and a grasp of complex analysis doesn't hurt.  You don't want
to do it in real-time.

By the way, you notice that whenever Laurie has her voice echoed on a
particular fixed harmony it "rings" for a while?  That's a direct
effect of the IIR ("infinite impulse response" means that technically
it rings forever, but it may drop below audibility in less than a
second).  How long it rings depends on how precise you want your
pitches to be; if you want absolutely perfect tuning, it WILL ring
forever, without attenuating.

Boris writes again:
>Or suggest a good book to read? [or words to that effect]
How about _Digital_Audio_Signal_Processing_(an_Anthology)_, edited by
John Strawn, Wm. Kaufmann 1985?  Or you could type "g comp.dsp".

"Writers are a funny breed -- I should know." -- Jane Siberry

bloch%cs@ucsd.edu