[comp.dsp] Autocorrelation Pitch Tracker

gbell@sdcc13.ucsd.edu (Greg Bell) (03/27/91)

Does anybody out there have experience implementing the
autocorrelation method of pitch detection?  I have the algorithm in
front of me, and some code that works for a large sample, but am
having problems getting the thing to reliably work on shorter
samples of data (30ms, where the signal only repeats only twice).

Thanks!

-- 
-----------------------------------------------------------------------------
  Who:  Greg Bell                            Address:  gbell@ucsd.edu
 What:  EE hobbyist and major                  Where:  UC San Diego
-----------------------------------------------------------------------------

pankaj@bass.bu.edu (Pankaj Tyagi) (04/05/91)

In article <17823@sdcc6.ucsd.edu> gbell@sdcc13.ucsd.edu (Greg Bell) writes:
>Does anybody out there have experience implementing the
>autocorrelation method of pitch detection?  I have the algorithm in
>front of me, and some code that works for a large sample, but am
>having problems getting the thing to reliably work on shorter
>samples of data (30ms, where the signal only repeats only twice).
>
>Thanks!
>
>-- 
>-----------------------------------------------------------------------------
>  Who:  Greg Bell                            Address:  gbell@ucsd.edu
> What:  EE hobbyist and major                  Where:  UC San Diego
>-----------------------------------------------------------------------------

	The autocorrelation algorithm is not suited for pitch-detection/
spectral estimation for short range data. Try covariance method if you are
sure that the data is bound to give a stable system(stability in covariance
method is not guaranteed). The reason why autocorrelation method fails  for
short range data is due to the fact that it has edge effects of the implict 
window it uses in the algoritm formulation.

	Well if you are dealing with narrowband signals, and have short data
range then the best bet is probably the Burg's method. It is very much
like the Levinson's recursion for autocorrelation, only faster and it
does not suffer from the 'edge-effect' that autocorrelation does. 
	You can find useful information on the Burg's method in
"Advanced Digital Signal Processing" Edited by Oppenheim and Lim , under
chapter 2.

Pankaj Tyagi

---------------------------------------------------------------------
p.s. I had a C code for Burg's algorithm and I can't trace it now,
	I'll email it to you if I can find it.  pt
---------------------------------------------------------------------

dandb+@cs.cmu.edu (Dean Rubine) (04/06/91)

>In article <17823@sdcc6.ucsd.edu> gbell@sdcc13.ucsd.edu (Greg Bell) writes:
>>Does anybody out there have experience implementing the
>>autocorrelation method of pitch detection? 

    I've played around with these, but not for really short signals.

In article <78395@bu.edu.bu.edu> pankaj@bass.bu.edu (Pankaj Tyagi) writes:
>	The autocorrelation algorithm is not suited for pitch-detection/
>spectral estimation for short range data. Try covariance method if you are
>sure that the data is bound to give a stable system(stability in covariance
>method is not guaranteed).

    I may be wrong, but I think Greg Bell is not referring to the
autocorrelation method for doing LPC (as Pankaj Tyagi seems to imply),
but something much simpler.

    In the autocorrelation method of pitch dectection, one actually computes
the autocorrelation of a signal, and then identifies those lags for which the
value of the autocorrelation is maximal.  A periodic signal will have peaks in
its autocorrelation at multiples of the period.  This method returns an
integer number of samples for the period; various methods can be tried to
interpolate for a more accurate period.

    There's also a comb-filter-based method that's kind of a poor man's
autocorrelation.  Here, for various delays, the signal is subtracted from
a delayed copy of itself and the sum of absolute values of the result
computed.  This sum will be show minima for delays equal to the multiples
of the period.  It is thus very similar to the autocorrelation method,
but requires no multiplies, so is often preferred.

    Summarizing, the comb filter method computes the m which minimizes

     	sum_over_n |x[n] - x[n-m]|

     The autocorrelation methods looks for the m which maximizes

     	sum_over_n x[n] * x[n-m]

    I'll talk about the bounds on the summation shortly.

    As for the effectiveness of these methods on really short pieces of
data there's probably not much hope.  To attempt to make it work, I 
offer the following suggestions.  My comments apply to both the auto-
correlation and comb-filter methods.

	1 It seems to me that windowing will make it harder to determine
	  periodicity so should be avoided.
	 
	2 For a valid determination of the extrema the same number of terms
	  needs to be present in the sum for each m.

	3 No term in the sum should ever use an x outside the range of
	  samples you have.  (In other words, don't pad your input with
	  zeros or anything else).

        4 The number of terms in the sum has to be at least the number of
	  samples in one period of the signal.  Since this is unknown, a
	  maximum period (minimum pitch) must be assumed. 

    Assuming the length of the signal is N, the above considerations imply
that the maximum period that may be detected is M=N/2.  Thus I suggest 
you look for 1 <= m <= M that minimizes

     	sum(from n=M to 2M-1) |x[n] - x[n-m]|

    This method could be refined further: once you find an m that gives a
global minimum you could check m/2, m/3, m/4, etc. to see if any are local
minima, to avoid settling on a period that's a multiple of the real period. 
Anyway, I don't have much faith in these methods.  Pankaj Tyagi's post could
be interpreted as a suggestion that maximum entropy or linear prediction
methods be tried for pitch detection on short samples.  I don't know if they
will work well or not.  I do know they're much more complicated than the
autocorrelation methods discussed so far, and might be out of the question if
you're trying to do some real time thing.   

    Well, I've babbled on much too long, especially given that I don't
really know what the particular problem is and I'm not that sure the
term "autocorrelation method of pitch detection" means to you what it means
to me.

    Hope this helps anyway.  Thanks for given me the opportunity to
procrastinate.

	Dean


--
ARPA:       Dean.Rubine@CS.CMU.EDU	
PHONE:	    412-268-2613		[ Free if you call from work ]
US MAIL:    Computer Science Dept / Carnegie Mellon U / Pittsburgh PA 15213
DISCLAIMER: My employer wishes I would stop posting and do some work.

malcolm@Apple.COM (Malcolm Slaney) (04/07/91)

>>>Does anybody out there have experience implementing the
>>>autocorrelation method of pitch detection? 
>
>    I've played around with these, but not for really short signals.
>

Everybody here has really been talking about periodicity detectors and NOT
pitch detection.

It is important to realize that pitch is a perceptual quality.  An official
definition of pitch (ANSI?) defines pitch as that quantity that humans would
perceive and try to sing to.  It is not defined in terms of the periodicities
of the signal....although if the signal is periodic then the pitch is
usually the same as the fundamental.  Once you add a bit of noise then your
signal is not periodic.

That said, you need more than two cycles of a signal to perceive its pitch!
I just tried it.  I synthesized (with MatLab) a signal that looks like
	1 second of silence 
	a small number of cycles of 220Hz sine wave
	1 second of silence
If there is only one or two cycles then all one hears is a click.  There is
nothing in this sound that would let me hear a pitch.  Not until I get to 
four or five cycles does it start to sound musical so I can assign a pitch
to it.  Try it, if you don't believe me.

It is also important to realize that pitch is not a unique quantity.  There
are many examples where it is possible to perceive more than one pitch.  
Shepard tones (on the ASA Auditory Demonstrations CD) and creaky voice are
two common examples.  The engineering world likes to reduce pitch to a 
single number but that just isn't realistic.  If your system outputs a 
single number then it is probably a periodicity detector and not a pitch
detector.

If you want to know more about pitch there is a book that reviews most of
the pre-1983 literature
	Hess, Wolfgang. PITCH DETERMINATION OF SPEECH SIGNALS (Berlin ;
	       Springer-Verlag, 1983)

Within the psychoacoustics world the Journal of the Acoustical Society
of America has a few articles a year talking about pitch.  There seems
to be two camps, the place based people led by Julius Goldstein and the
autocorrelation people.  I think both of these approaches are flawed and
a hybrid approach was described in my 1990 ICASSP paper and an upcoming
JASA article by Ray Meddis.

						Malcolm Slaney
						Apple Perception Group

dandb+@cs.cmu.edu (Dean Rubine) (04/07/91)

In article <51258@apple.Apple.COM> malcolm@Apple.COM (Malcolm Slaney) writes:
>Everybody here has really been talking about periodicity detectors and NOT
>pitch detection.

    Come, now.  I think most of us realize that pitch is a perceptual
quantity, or would if we thought or read about it a bit.  That doesn't
lessen the desire to solve a practical problem: determine the instantaneous
fundamental frequency at successive points in a quasi-periodic signal.  Of
course even this problem statement is too vague, but it's probably good
enough, because it's usually considered as a means to some end.  For example,
the original poster might be building a real-time transcription system for
vocalists.  Or a system that listens to a trumpet and synthesizes an
accompaniment.  Or a better pitch-to-MIDI converter. Note I've used the
term "pitch" here, even though I am not overly concerned about perception.
I'm just going along with the rest of the world, as was the original poster.
Face it, these things are called "pitch detectors" even though the use of the
term "pitch" may be technically incorrect.  While pitch detectors may be
judged on how often they report the same pitch as a skilled human would,
in most applications the hard cases, where perception is ambiguous, can be
ignored. 

   As for his short signal not having a perceptable pitch, that may be true
but that still doesn't help him to solve his problem.  He knows what he means
by pitch, as do the rest of us, even Maclcom Slaney I suspect. 

--
ARPA:       Dean.Rubine@CS.CMU.EDU	
PHONE:	    412-268-2613		[ Free if you call from work ]
US MAIL:    Computer Science Dept / Carnegie Mellon U / Pittsburgh PA 15213
DISCLAIMER: My employer wishes I would stop posting and do some work.

gbell@sdcc13.ucsd.edu (Greg Bell) (04/07/91)

In article <1991Apr6.062906.11886@cs.cmu.edu> dandb+@cs.cmu.edu (Dean Rubine) writes:
>
>    I may be wrong, but I think Greg Bell is not referring to the
>autocorrelation method for doing LPC (as Pankaj Tyagi seems to imply),
>but something much simpler.

Exactly... I wasn't sure what PT was talking about!  This is
actually part of an LPC analysis package, but you need to determine
whether a signal is periodic or not, and if so, its period in order
to feed this info to one of TI's LPC synthesizers.
>interpolate for a more accurate period.

>    There's also a comb-filter-based method that's kind of a poor man's
>autocorrelation.  Here, for various delays, the signal is subtracted from

Also called, according to my book, AMDF for Average Magnitude
Difference Function.

>	1 It seems to me that windowing will make it harder to determine
>	  periodicity so should be avoided.

It does make it harder to determine the periodicity, but apparently
it is necessary.  There are methods that do not require windowing,
but the autocorr. method is not one of them.

>	 
>	2 For a valid determination of the extrema the same number of terms
>	  needs to be present in the sum for each m.
>

Hmm... good point.  My book says each sum goes from n=1 to N where N
is the length of the sample.  This didn't make sense since one of
the terms is s[n+k] so you go out of the range of samples.  So, I
made my sum to from n=1 to N-k.  This doesn't follow what you are
saying, but it did work well for a larger segment.  I'll try your
guidelines for my troublesome short segments.

I'll have to read you guidelines for the number of sum terms later
when I can look 'em over a little more carefully.

Thanks for the input!

-- 
-----------------------------------------------------------------------------
  Who:  Greg Bell                            Address:  gbell@ucsd.edu
 What:  EE hobbyist and major                  Where:  UC San Diego
-----------------------------------------------------------------------------

gbell@sdcc13.ucsd.edu (Greg Bell) (04/07/91)

In article <51258@apple.Apple.COM> malcolm@Apple.COM (Malcolm Slaney) writes:
>
>Everybody here has really been talking about periodicity detectors and NOT
>pitch detection.
>

I agree except that the text I'm using uses the two terms
interchangably.  Maybe for a speech signal, the frequency and pitch
are the same.  I'm flying by the seat of my pants on that one, but
that would explain it.

The segment to be presented to the ear is, of course, a lot longer
than the 30mS of signal I'm processing at a time.  But, its
neccessary to process each chunk so that you know when the original
signal's pitch changes, or when it becomes pitchless (ie. an
unvoiced sound such as "s").

By the way, since I keep mentioning the book I'm using, I might as
well give it credit:  its Practical Approaches to Speech Coding by
Panos E. Papamichalis.  

I'll check out your recommendation for the pitch detection book.

-- 
-----------------------------------------------------------------------------
  Who:  Greg Bell                            Address:  gbell@ucsd.edu
 What:  EE hobbyist and major                  Where:  UC San Diego
-----------------------------------------------------------------------------

tomh.bbs@shark.cs.fau.edu (Tom Holroyd) (04/10/91)

> I agree except that the text I'm using uses the two terms
> interchangably.
Doesn't mean *you* have to.
> Maybe for a speech signal, the frequency and pitch
> are the same.
No, they aren't.  As Malcolm Slaney has pointed out, pitch is not
even an invariant- many experiments have been conducted which show
that identical stimuli can be perceived as being different*.
Sort of like when you walk from a very loud environment to a quiet one,
voices seem louder than they did in the loud environment.  This is only
an analogy, but the principle here is that what you perceive depends on
the background, your recent past, your auditory organs, etc.

* For example, work done in our lab by Janice Giangrande using pairs
of Shepard tones- the perceived pitch difference between the pair of
tones changes depending on whether the tones are part of an ascending
sequence or a descending one.

Tom Holroyd
Florida Atlantic University
Center for Complex Systems
tomh@bambi.ccs.fau.edu

doug@eris.berkeley.edu (Doug Merritt) (04/13/91)

In article <51258@apple.Apple.COM> malcolm@Apple.COM (Malcolm Slaney) writes:
>That said, you need more than two cycles of a signal to perceive its pitch!
>I just tried it.  I synthesized (with MatLab) a signal that looks like
>	1 second of silence 
>	a small number of cycles of 220Hz sine wave
>	1 second of silence
>If there is only one or two cycles then all one hears is a click.  There is
>nothing in this sound that would let me hear a pitch.  Not until I get to 
>four or five cycles does it start to sound musical so I can assign a pitch
>to it.  Try it, if you don't believe me.

This is an interesting subject. I have to wonder whether your experiment
might be faulty, though. What if you're simply uncovering funny behavior
in your sound system rather than in your ear, for instance? I think for
something like this you'd want to verify that the sound you intended is
really being created; try a mike & a digitizer to bring it back into your
system.

Besides all that, consider the Fourier domain of this. My intuition
is flakey here, but it seems to me that since the period of the boxcar
window is almost equal to the period of the sine, the effect in the Fourier
domain should be almost identical to a single pulse of a square wave in
the first place, no? Hence a click.
	Doug
-- 
--
Doug Merritt		doug@eris.berkeley.edu (ucbvax!eris!doug)
		or	uunet.uu.net!crossck!dougm

malcolm@Apple.COM (Malcolm Slaney) (04/15/91)

doug@eris.berkeley.edu (Doug Merritt) writes:
>This is an interesting subject. I have to wonder whether your experiment
>might be faulty, though. What if you're simply uncovering funny behavior
>in your sound system rather than in your ear, for instance? 

Well, nobody has ever accused the native Macintosh sound system of being
high fidelity but the effect is true.  I quote from the booklet that 
accompanies the Auditory Demonstrations CD (as done by the ASA):

	How long must a tone be heard in order to have an identifiable pitch? 
	Early experiments by Savart (1830) indicated that a sense of pitch 
	develops after only two cycles. Very brief tones are described as 
	"clicks," but as the tones lengthen, the clicks take on a sense of 
	pitch which increases upon further lengthening. 

	It has been suggested that the dependence of pitch salience on 
	duration follows a sort of "acoustic uncertainty principles," 

				    delta F   Delta t = K, 

	where Delta f is the uncertainty in frequency and Delta t is the 
	duration of a tone burst.  K, which can be as short as 0.1 (Majernik u
	and Kaluzny, 1979), appears to depend upon intensity and amplitude 
	envelope (Ronken, 1971). The actual pitch appears to have little or 
	no dependence upon duration (Doughty and Garner, 1948; Rossing
	and Houtsma, 1986). 

	In this demonstration, we present tones of 300, 1000, and 3000 Hz in 
	bursts of 1, 2, 4, 8, 16, 32, 64, and 128 periods. How many periods 
	are necessary to establish a sense of pitch? 

	Commentary

	"In this demonstration, three tones of increasing durations are 
	presented.  Notice the change from a click to a tone. Sequences are 
	presented twice." 

>Besides all that, consider the Fourier domain of this. 
Bad move.  First the signals I am talking about were synthesized in the
time domain...Mr. Fourier wasn't involved.

Second, there is a famous hearing researcher (von Bekesy?) who once said
something like
	Dead cats and Fourier transforms have harmed hearing science more
	than anything else.
Dead cats are a no-no because it is now known that most of what makes the
ear work are the non-linearities and active gain control mechanisms in the
ear.  Taking measurements from an cochlea that is not living gives you a lot
of meaningless data.

Fourier transforms are not good because the ear is non-linear.  Fourier 
theory is great for linear systems but the ear is far from linear.  Sure,
we all talk about frequency and such but one must remember that it doesn't
always make sense in the ear.  A lot of people argue about how the ear works
but nobody thinks it computes a Fourier transform.

								Malcolm

doug@eris.berkeley.edu (Doug Merritt) (04/16/91)

In article <51534@apple.Apple.COM> malcolm@Apple.COM (Malcolm Slaney) writes:
>	Early experiments by Savart (1830) indicated that a sense of pitch 
>	develops after only two cycles. Very brief tones are described as 
>	"clicks," but as the tones lengthen, the clicks take on a sense of 

This was interesting, thanks.

>Bad move.  First the signals I am talking about were synthesized in the
>time domain...Mr. Fourier wasn't involved.
>
>Fourier transforms are not good because the ear is non-linear.  Fourier 
>theory is great for linear systems but the ear is far from linear.

I understand, I've read your "Lyon's Cochlea Model" paper (quite
interesting, BTW). But that's not the point. You're talking about the
ear and perception, I meant to talk about the sound itself.

What I meant (and didn't say very well) is that you can always look at
the Fourier domain for information about the sound itself; it makes no
difference whether it was synthesized in that domain or not (as I'm
sure you know). And what I predict you'll find is that the result is
very close to what you would see for a single cycle of a square wave,
which means that it would take only a small perturbation to transform
the one into the other.

And in fact, the nonlinear characteristics of the ear may well perform
exactly such a perturbation. So far from being irrelevent, looking at
this in the Fourier domain may explain *why* the nonlinearity of the
ear produces perception of a click rather than a pitch.

Fair enough?
	Doug
-- 
--
Doug Merritt		doug@eris.berkeley.edu (ucbvax!eris!doug)
		or	uunet.uu.net!crossck!dougm