[comp.dsp] Pitch shift / offset and FFT

d88-jwa@nada.kth.se (Jon W{tte) (09/27/89)

In article <9520001@hpsad.HP.COM> toma@hpsad.HP.COM (Tom Anderson) writes:
[ Hilbert transform ]

>The above technique has the advantage that it doesn't rely on an FFT, so
>that windowing issues are avoided.  The hardware is also easier than an

But the disadvantage that it's not doing what we want to do and that
it completely, utterly inusable on any kind of signal containing
overtones.

>An interesting question is:  how many FFT points are required?  I think
>that you need to know the maximum frequency to be represented, the
>minimum frequency to be shifted, and the smallest amount of shift.  To
>shift from C0 at 16.35Hz to C#0 at 17.32Hz requires a shift of about
>1Hz, so the FFT bins should be spaced by about 1Hz.  An FFT with this
>spacing covering 0Hz-20kHz would need about 20,000 points.  To keep the
>fidelity high, one transform every few milliseconds or so would be
>required.  Such a brute force technique gets expensive quickly.

You're off by a mile, in my humble opinion... Most of those points
would be null.

I have heard it said (and have no reason to doubt) that any musical
signal (that is, inclusive of chords, multiple instruments et al)
can be faithfully reproduced using 1024 point FFT at every sample.
That is, at any one time, you don't have more than 1024 (audible)
simple components in the sound (remember: a note isn't "one" frequency,
it's got one BASE NOTE and lots of harmonics) but these 1024
components may change between each and every sample...

The "sliding window" FFT described earlier sounded great. Too bad it
appears to be copyrighted and proprietary, so we can't share it :-(

Also, how about an algorythm that reduced frequency mixing interference
rendundance from the FFT ? Or is that already taken care of ?
I.e. if you have one 1000 Hz sine wave and one 1100 Hz sine wave, you'd
also get one 100 Hz harmonic (and probably a 2100 Hz harmonic as well ?)

These wouldn't have to be represented in the FFT (had they ?) since
they'll be re-created in the output.

Of course, this is provided the components are steady for at least a
few cycles, and that's explicitly NOT provided from the above mentioned
"sliding window" method or new-FFT-in-each-point method.

h+@nada.kth.se
-- 
Just because you're paranoid doesn't mean they AREN'T after you.

wass@Apple.COM (Steve Wasserman) (09/27/89)

In article <1787@draken.nada.kth.se> h+@nada.kth.se (Jon W{tte) writes:
... much stuff deleted ...
>
>Also, how about an algorythm that reduced frequency mixing interference
>rendundance from the FFT ? Or is that already taken care of ?
>I.e. if you have one 1000 Hz sine wave and one 1100 Hz sine wave, you'd
>also get one 100 Hz harmonic (and probably a 2100 Hz harmonic as well ?)

If you have a 1000 Hz sine wave and an 1100 Hz sine wave, and you ADD
them, you will not get a 100 Hz harmonic, but rather the two
frequencies will BEAT together at 100 Hz.  The perceived effect (in
sound, at least) is a 1050 Hz signal that rapidly gets changes volume.
(Actually, I'm not totally sure what you'd hear with 1000 & 1100.
But, if the frequencies were closer, say 1000 and 1005, you would
certainly be able to hear the two beating.  Accurate tuning of musical
instruments is possible by listening for this beating against a
properly-tuned standard.)  Anyway, if you looked at 1000 Hz PLUS 1100
Hz on an oscilloscope, you would see 1050 Hz modulated by a 100 Hz
envelope.  By the nature of the Fourier Transform, nothing in this
signal would correlate with 100 Hz.  

Essentially what I'm saying is that linear superposition applies and
that when you linearly add to frequencies, that is exactly what
happens in the frequency domain - they are added to the original
spectrum and no harmonics are created.

It is a different situation, of course, if you MULTIPLY the two
signals.

1000 plus 1100 looks different than 1000 plus 1100 plus 100 plus 2100,
so it is not really redundant information.

-- 
swass@apple.com

d88-jwa@nada.kth.se (Jon W{tte) (09/27/89)

In article <4384@internal.Apple.COM> wass@Apple.COM (Steve Wasserman) writes:

>If you have a 1000 Hz sine wave and an 1100 Hz sine wave, and you ADD
>them, you will not get a 100 Hz harmonic, but rather the two
>frequencies will BEAT together at 100 Hz.  The perceived effect (in
>sound, at least) is a 1050 Hz signal that rapidly gets changes volume.

Okay, I think I messed up. You're right and I confused things a bit...
Isn't it ring modulation that gives you the sums/differences between
the frequencies ?

>certainly be able to hear the two beating.  Accurate tuning of musical
>instruments is possible by listening for this beating against a
>properly-tuned standard.)  Anyway, if you looked at 1000 Hz PLUS 1100

Yeah, I do that all the time... And you hear the beating, although
it gets weaker with increased frequency.

Thanx. I love this group :')

h+@nada.kth.se
-- 
History does not repeat itself, historians merely repeat each other.

mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) (09/28/89)

In a recent article by Steve Wasserman...

>If you have a 1000 Hz sine wave and an 1100 Hz sine wave, and you ADD
>them, you will not get a 100 Hz harmonic, but rather the two
>frequencies will BEAT together at 100 Hz.  The perceived effect (in
>sound, at least) is a 1050 Hz signal that rapidly gets changes volume.
>(Actually, I'm not totally sure what you'd hear with 1000 & 1100.
>But, if the frequencies were closer, say 1000 and 1005, you would
>certainly be able to hear the two beating.  Accurate tuning of musical
>instruments is possible by listening for this beating against a
>properly-tuned standard.)  Anyway, if you looked at 1000 Hz PLUS 1100
>Hz on an oscilloscope, you would see 1050 Hz modulated by a 100 Hz
>envelope.  By the nature of the Fourier Transform, nothing in this
>signal would correlate with 100 Hz.  

When you add two sinusoids together, you get exactly that: the sum of
two sinusoids.  Any apparent beating between two (or more) added 
sinusoids is purely a perceptual effect.  In the case of hearing a
beat between two summed sinusoids, the ear is acting as a mixer which
detects the sum/difference signals as well as detecting the base
signals.  Looking at the sum of two sinusoids on a `scope may lead one
to believe that somehow the sum of the two signals has yielded two
(or more) new, different signals.  This interpretation may seem valid,
but nothing magical is happening: you're only seeing the result of two
sinusoids being summed together, not a mixing of the two.

>-- 
>swass@apple.com

Mike Horne
mhorne@ka7axd.wv.tek.com

wass@Apple.COM (Steve Wasserman) (09/28/89)

In article <4725@orca.WV.TEK.COM> mhorne@ka7axd.wv.tek.com writes:
>> them, you will not get a 100 Hz harmonic, but rather the two
>> frequencies will BEAT together at 100 Hz.  The perceived effect (in
>> sound, at least) is a 1050 Hz signal that rapidly gets changes volume.
... stuff deleted ...
>
>two sinusoids added together.  Nothing magical happens; Any perceived beating
>between the two (or more) sinusoids, such as what your ear hears, is caused
>by the mixing (multiplying) action of the ear itself (with a little help from
>the brain:).  Understanding this gives you a good feel for how subjective 
>music really is.

In the original posting, I forgot to say that I was assuming that the
two sinusoids in my example had the same magnitude.  I disagree with
you slightly - beating is *not* caused by your ears and brain, it
really happens.

Think of it this way: Imagine two vectors of unit length at the
origin.  At time t=0, they both point in the theta = 0 direction.  One
rotates around the origin 1000 times a second and the other
1100 times per second.  If you add these vectors and take the real
part of the result, you get the waveform created by mixing 1000 Hz and
1100 Hz.

Now change your perspective a little bit.  Imagine this whole
arrangement (or yourself, if you prefer) spinning at 1050 Hz.
Relative to the new frame of reference, one vector is spinning at +50
Hz and the other is spinning at -50 Hz.  Adding these two gives a
vector that always points in the theta = 0 direction (i.e. it is
always midway between the two) and whose magnitude varies between -2
and 2 at a rate of 50 Hz.  Going back to the original frame of
reference, we have a vector of length 2 * cos(2 * pi * 50 * t) and
direction theta = 0 that we have just started spinning at 1050 Hz.  In
other words, you get a 1050 Hz sinewave that gets louder and softer
with a 100 Hz envelope. (Some people probably call this a 50 Hz
envelope.)

Algebraically, this is:

exp(j*f1* t) + exp(j*f2*t) = 	exp[j*t*(f1-f2)/2] * exp[j*t*(f1+f2)/2] +
				exp[j*t*(f2-f1)/2] * exp[j*t*(f1+f2)/2]

where f1 and f2 are the two angular frequencies and exp() is
exponentiation (e to the x).  Taking the real part of both sides:

cos(f1*t) + cos(f2*t) = cos[t*(f1-f2)/2]*cos[t*(f1+f2)/2] -
			sin[t*(f1-f2)/2]*sin[t*(f1+f2)/2] +
			cos[t*(f2-f1)/2]*cos[t*(f1+f2)/2] -
			sin[t*(f2-f1)/2]*sin[t*(f1+f2)/2]

We can simplify this because we know thet f2-f1 = -(f1-f2),
sin(-a)=-sin(a), and cos(-a) = cos(a).  The answer is:

2*cos[t*(f1-f2)/2]*cos[t*(f1+f2)/2]

Which really does get louder and softer.  In out case, we have:

v = 2*cos[2*pi*50*t]*cos[2*pi*1050*t]

>As far as viewing on a `scope, much of the same applies.  You can call the
>resulting waveform anything you want, but it still is a sum of sinusoids.

Try plotting out cos(2*pi*1000*t) + cos(2*pi*1100*t) on whatever is
most convenient.  It *does* look like a 1050 Hz wave with a 100 Hz
envelope.  Try plotting out some cases where the magnitudes are not
quite equal - like: 1.5*cos(2*pi*1000*t) + cos(2*pi*1100*t). You still
get beating; however, the magnitude never goes all the way down to
zero.

As you can see, the phenomenon of beating is more a consequence of
living in the time domain than any processing done by our ears or
brains. Nothing magic about it at all.

>mhorne@ka7axd.wv.tek.com

-- 
swass@apple.com

toma@hpsad.HP.COM (Tom Anderson) (09/29/89)

> Taking the real part of both sides:

I don't know much about ears, but I believe that they hear:

sqrt(Real_part^2+Imaginary_part^2) 

and so it is incorrect to just look at the real part.

Also, if you look at two tones on a spectrum analyzer, you just see two
tones and no beat notes in a linear system.  (I just tried this to make
sure :-)).  It takes a nonlinearity to generate the "mixing products" in
the frequency domain.

Tom Anderson       "It's only hardware"
Hewlett-Packard    Signal Analysis Division
Opinions expressed are my own and not HP's.

wass@Apple.COM (Steve Wasserman) (09/29/89)

In article <4730@orca.WV.TEK.COM> mhorne@orca.WV.TEK.COM writes:
>
>In a recent article by Steve Wasserman...
>
>>If you have a 1000 Hz sine wave and an 1100 Hz sine wave, and you ADD
>>them 
... stuff deleted ...
>When you add two sinusoids together, you get exactly that: the sum of
>two sinusoids.  

I didn't ever say you'd get anything else.  However, I did use some
very simple trig identities (disguised as lots of complex algebra in
my last posting) to show that in the special case of adding two
sinusoids, you get a phenomenon called "beating", and that this
phenomenon is the consequence of basic physical principles and not any
processing done by human ears or minds.  The result from my last
posting was:

cos(f1*t) + cos(f2*t) = 2*cos[(f1-f2)*t/2] * cos[(f1+f2)*t/2]

Now look at the right side of this equation closely.  The first term
is a cosine at a frequency equal to half of the difference between f1
and f2.  This is *multiplied* (i.e. it acts as an envelope) by the
second cosine which has a frequency equal to the average of f1 and f2.
When two sinwaves of equal amplitude are combined, this is what you
get.  When you hear "beating", f1 and f2 are close to each other in
frequency, hence (f1-f2)/2 is small.  So what *you* hear, and what a
microphone also hears, and what really happens is a sinewave that gets
cyclically louder and softer.

So what if they don't have the same amplitude?  In general when you
add two sinusoids of arbitrary magnitude, you get:

A*cos(f1*t) + B*cos(f2*t) = [A+B]*cos[(f1-f2)*t/2]*cos[(f1+f2)*t/2] +
                            [B-A]*sin[(f1-f2)*t/2]*sin[(f1+f2)*t/2]

As you can see, if A and B are even *close*, the first term on the
right side will dominate and again you hear beating.

>Any apparent beating between two (or more) added 
>sinusoids is purely a perceptual effect.

I disagree.  Try plotting cos(2*pi*1000.0*t)+cos(2*pi*1000.1*t).  Even
better, get two waveform generators and set them at the above
frequencies at approximately equal magnitudes.  Put one generator on
channel 1 of your 'scope and the other on channel 2 and hit the "add"
button.  Turn the scale waaaaay down so that you can see stuff at .1
Hz and trigger on the maximum (or minimum) amplitude of the whole
waveform.  (Since you're posting from Tektronix, you ought to be able
to find a few waveform generators and 'scopes lying around :-)

>  In the case of hearing a
>beat between two summed sinusoids, the ear is acting as a mixer which
>detects the sum/difference signals as well as detecting the base
>signals.  Looking at the sum of two sinusoids on a `scope may lead one
>to believe that somehow the sum of the two signals has yielded two
>(or more) new, different signals.  This interpretation may seem valid,
>but nothing magical is happening: you're only seeing the result of two
>sinusoids being summed together, not a mixing of the two.

The two *do* mix: they are added.  Linear superposition applies!
Of course they sum together.

>Mike Horne
>mhorne@ka7axd.wv.tek.com

-- 
swass@apple.com

ted@nmsu.edu (Ted Dunning) (09/29/89)

In article <4725@orca.WV.TEK.COM> mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) writes:


   As far as viewing on a `scope, much of the same applies.  You can
   call the resulting waveform anything you want, but it still is a
   sum of sinusoids.

absolutely right.  of course, it is _also_ the product of two other
sinusoids, and this second explanation may be the way that you hear
it.

--
ted@nmsu.edu
			remember, when extensions and subsets are outlawed,
			only outlaws will have extensions or subsets

mhorne@ka7axd.WV.TEK.COM (Michael T. Horne) (09/29/89)

In a recent article by Steve Wasserman:
>>
>>...Nothing magical happens; Any perceived beating
>>between the two (or more) sinusoids, such as what your ear hears, is caused
>>by the mixing (multiplying) action of the ear itself (with a little help from
>>the brain :)...
>
>...I disagree with
>you slightly - beating is *not* caused by your ears and brain, it
>really happens.
>
>Algebraically, this is:
>
>exp(j*f1* t) + exp(j*f2*t) = 	exp[j*t*(f1-f2)/2] * exp[j*t*(f1+f2)/2] +
>				exp[j*t*(f2-f1)/2] * exp[j*t*(f1+f2)/2]
>
>where f1 and f2 are the two angular frequencies and exp() is
>exponentiation (e to the x).  Taking the real part of both sides:
>
>cos(f1*t) + cos(f2*t) = cos[t*(f1-f2)/2]*cos[t*(f1+f2)/2] -
>			sin[t*(f1-f2)/2]*sin[t*(f1+f2)/2] +
>			cos[t*(f2-f1)/2]*cos[t*(f1+f2)/2] -
>			sin[t*(f2-f1)/2]*sin[t*(f1+f2)/2]
>
>We can simplify this because we know thet f2-f1 = -(f1-f2),
>sin(-a)=-sin(a), and cos(-a) = cos(a).  The answer is:
>
>2*cos[t*(f1-f2)/2]*cos[t*(f1+f2)/2]
>				
>Which really does get louder and softer.  In our case, we have:
>
>v = 2*cos[2*pi*50*t]*cos[2*pi*1050*t]

What you have shown, Steve, is a rather thorough example of a trig identity.
Take for example any given sinusoid represented as a complex exponential.
Maintaining the same notation that you have used, we can represent it as a
product of two exponentials:

	exp(j*f1*t) = exp(j*t*(f1+f2)/2) * exp(j*t*(f1-f2)/2)		(1)

where f1, f2 are any arbitrary angular frequencies.  It can be readily shown
that you can represent a single sinusoid as the sum-of-products of
sum/difference sinusoids.  Taking the real part of both sides of (1) above:

	cos(f1*t) =	cos(t*(f1+f2)/2) * cos(t*(f1-f2)/2) -		(2)
			sin(t*(f1+f2)/2) * sin(t*(f1-f2)/2)

I think that most of us are aware of the common trig identity:

	cos(u+v) =	cos(u)*cos(v) - sin(u)*sin(v)			(3)

letting u = t*(f1+f2)/2, and v = t*(f1-f2)/2, and applying it to (3) we obtain:

	cos(t*(f1+f2+f1-f2)/2) = cos(t*(f1+f2)/2) * cos(t*(f1-f2)/2) -	(4)
				 sin(t*(f1+f2)/2) * sin(t*(f1-f2)/2)

which is of the same form as (2).  For example, letting f1 = 5, f2 = 1,
you get:

	cos(5t) = cos(3t + 2t) = cos(3t)*cos(2t) - sin(3t)*sin(2t)	(5)

In this example, we have shown that we can create a sinusoid with angular
frequency 5 by simply taking the difference (or sum, however you wish to
look at it) between the product of two cosines (one of frequency 3 and one
of frequency 2), and the product of two sines (of the same frequencies).
This identity applied in this manner is also expandable to additional "sums"
(i.e. cos(u+v+w)), even though it ultimately reduces to a single frequency.

One can easily argue that when you sum two sinusoids, you get `beating'
effects.  Steve's example above shows just such an apparent beating phenomenon;
However, it actually is just a unique arrangement of terms that represent the
sinusoids in forms similar to (2) above, grouped together to form what appears
to be an actual mix (multiplication) of frequencies other than the original
sums.  As I've shown above, even though you can manipulate an equation to
show this apparent multiplication, it still reduces to a single, simple
sinusoid.  This same rule holds for the sum of two sinusoids, that is:

	sin(u) + sin(v) = sin(u) + sin(v)

regardless of how u and v are represented.

What all of this shows (through both Steve's comments and mine) is that
it can be viewed either way; however, one must always remember that in
reduced form, it still is only a sum.  Summing two sinusoids does not generate
additional sinusoids; However, a non-linear operation such as mixing (multi-
plying) *does* generate additional sinusoids.  The two operations are very
different.

All in all, the math is beautiful, isn't it? :)

>swass@apple.com

Mike

(Ever listen to two Piccolo Petes (fireworks) going off within a few seconds of
 each other? :)

wass@Apple.COM (Steve Wasserman) (09/30/89)

In article <9520003@hpsad.HP.COM> toma@hpsad.HP.COM (Tom Anderson) writes:
>> Taking the real part of both sides:
>
>I don't know much about ears, but I believe that they hear:
>
>sqrt(Real_part^2+Imaginary_part^2) 

Your ears hear the imaginary part??? No, the way I set up the problem,
I used the real part of a sum of exponentials to represent the signal
- *not* the magnitude as you suggested.  Look at the equations in the
original posting carefully.  This is a fairly standard way to set up a
problem of this sort (especially if you're not sure that you correctly
remember some trig identities :-)

>and so it is incorrect to just look at the real part.

I don't think so -- look at the original equations.

>Also, if you look at two tones on a spectrum analyzer, you just see two
>tones and no beat notes in a linear system.  (I just tried this to make
>sure :-)).

Quite true.  That is because what I call beating (a sinusoid at one
frequency that gets louder and softer in a sinusoidally-varying
envelope of another frequency.  Or in terms of what you hear, slow
volume changes when two notes that are close together are played) is
best seen in the time domain.  May I suggest another experiment for
you?  Hack up a multiplier somehow (with a 741 or something), and feed
it 50 Hz and 1050 Hz.  Look similar?  That's because adding 1000 and
1010 is the same as multiplying 50 and 1050.  Try looking at both
signals in the time domain with your 'scope set up so you can see the
50 Hz envelope.

It still seems that some people doubt that a sum of sines can be
represented also as a product of sines.  Here is *another* explanation
of the above, this time drawing on frequency domain concepts,
especially the fact the convolution and multiplication are dual
properties, i.e. if you do one in one domain, the other happens in the
other domain.  Specifically, multiplying in time equals convolving in
frequency.  Here is what I hope to show by the following diagrams:
that the sum of an 1100 Hz sinewave and a 1000 Hz wave is EXACTLY the
same as the MULTIPLICATION of a 50 Hz wave and a 1050 Hz wave (equal
magnitudes assumed - see previous posting for a discussion of
magnitudes which are close but not quite equal).  The sum part is
easy.  I submit that it is:

   ^ ^                   +                   ^ ^
   | |                   +                   | |
   | |                   +                   | |
-----+-------------------+-------------------+-----
  -1000 Hz                                 1000 Hz

where the horizontal axis represents frequency, the vertical
represents magnitude and each character width horizontally represents
50 Hz.  Now let's do the problem the other way.  1050 Hz looks like this:

    ^                    +                    ^
    |                    +                    |
    |                    +                    |
-----+-------------------+-------------------+-----
  -1000 Hz                                 1000 Hz

50 Hz looks like this:

                        ^+^
                        |+|
                        |+|
-----+-------------------+-------------------+-----
  -1000 Hz                                 1000 Hz

Now let's *multiply* 1050 Hz times 50 Hz.  This is accomplished by
convolving the previous two diagrams.  The result is:

   ^ ^                   +                   ^ ^
   | |                   +                   | |
   | |                   +                   | |
-----+-------------------+-------------------+-----

which is exactly the same as the result for adding 1000 Hz and 1100 Hz.

-- 
swass@apple.com