gbell@sdcc13.ucsd.edu (Greg Bell) (03/27/91)
Does anybody out there have experience implementing the autocorrelation method of pitch detection? I have the algorithm in front of me, and some code that works for a large sample, but am having problems getting the thing to reliably work on shorter samples of data (30ms, where the signal only repeats only twice). Thanks! -- ----------------------------------------------------------------------------- Who: Greg Bell Address: gbell@ucsd.edu What: EE hobbyist and major Where: UC San Diego -----------------------------------------------------------------------------
pankaj@bass.bu.edu (Pankaj Tyagi) (04/05/91)
In article <17823@sdcc6.ucsd.edu> gbell@sdcc13.ucsd.edu (Greg Bell) writes: >Does anybody out there have experience implementing the >autocorrelation method of pitch detection? I have the algorithm in >front of me, and some code that works for a large sample, but am >having problems getting the thing to reliably work on shorter >samples of data (30ms, where the signal only repeats only twice). > >Thanks! > >-- >----------------------------------------------------------------------------- > Who: Greg Bell Address: gbell@ucsd.edu > What: EE hobbyist and major Where: UC San Diego >----------------------------------------------------------------------------- The autocorrelation algorithm is not suited for pitch-detection/ spectral estimation for short range data. Try covariance method if you are sure that the data is bound to give a stable system(stability in covariance method is not guaranteed). The reason why autocorrelation method fails for short range data is due to the fact that it has edge effects of the implict window it uses in the algoritm formulation. Well if you are dealing with narrowband signals, and have short data range then the best bet is probably the Burg's method. It is very much like the Levinson's recursion for autocorrelation, only faster and it does not suffer from the 'edge-effect' that autocorrelation does. You can find useful information on the Burg's method in "Advanced Digital Signal Processing" Edited by Oppenheim and Lim , under chapter 2. Pankaj Tyagi --------------------------------------------------------------------- p.s. I had a C code for Burg's algorithm and I can't trace it now, I'll email it to you if I can find it. pt ---------------------------------------------------------------------
dandb+@cs.cmu.edu (Dean Rubine) (04/06/91)
>In article <17823@sdcc6.ucsd.edu> gbell@sdcc13.ucsd.edu (Greg Bell) writes: >>Does anybody out there have experience implementing the >>autocorrelation method of pitch detection? I've played around with these, but not for really short signals. In article <78395@bu.edu.bu.edu> pankaj@bass.bu.edu (Pankaj Tyagi) writes: > The autocorrelation algorithm is not suited for pitch-detection/ >spectral estimation for short range data. Try covariance method if you are >sure that the data is bound to give a stable system(stability in covariance >method is not guaranteed). I may be wrong, but I think Greg Bell is not referring to the autocorrelation method for doing LPC (as Pankaj Tyagi seems to imply), but something much simpler. In the autocorrelation method of pitch dectection, one actually computes the autocorrelation of a signal, and then identifies those lags for which the value of the autocorrelation is maximal. A periodic signal will have peaks in its autocorrelation at multiples of the period. This method returns an integer number of samples for the period; various methods can be tried to interpolate for a more accurate period. There's also a comb-filter-based method that's kind of a poor man's autocorrelation. Here, for various delays, the signal is subtracted from a delayed copy of itself and the sum of absolute values of the result computed. This sum will be show minima for delays equal to the multiples of the period. It is thus very similar to the autocorrelation method, but requires no multiplies, so is often preferred. Summarizing, the comb filter method computes the m which minimizes sum_over_n |x[n] - x[n-m]| The autocorrelation methods looks for the m which maximizes sum_over_n x[n] * x[n-m] I'll talk about the bounds on the summation shortly. As for the effectiveness of these methods on really short pieces of data there's probably not much hope. To attempt to make it work, I offer the following suggestions. My comments apply to both the auto- correlation and comb-filter methods. 1 It seems to me that windowing will make it harder to determine periodicity so should be avoided. 2 For a valid determination of the extrema the same number of terms needs to be present in the sum for each m. 3 No term in the sum should ever use an x outside the range of samples you have. (In other words, don't pad your input with zeros or anything else). 4 The number of terms in the sum has to be at least the number of samples in one period of the signal. Since this is unknown, a maximum period (minimum pitch) must be assumed. Assuming the length of the signal is N, the above considerations imply that the maximum period that may be detected is M=N/2. Thus I suggest you look for 1 <= m <= M that minimizes sum(from n=M to 2M-1) |x[n] - x[n-m]| This method could be refined further: once you find an m that gives a global minimum you could check m/2, m/3, m/4, etc. to see if any are local minima, to avoid settling on a period that's a multiple of the real period. Anyway, I don't have much faith in these methods. Pankaj Tyagi's post could be interpreted as a suggestion that maximum entropy or linear prediction methods be tried for pitch detection on short samples. I don't know if they will work well or not. I do know they're much more complicated than the autocorrelation methods discussed so far, and might be out of the question if you're trying to do some real time thing. Well, I've babbled on much too long, especially given that I don't really know what the particular problem is and I'm not that sure the term "autocorrelation method of pitch detection" means to you what it means to me. Hope this helps anyway. Thanks for given me the opportunity to procrastinate. Dean -- ARPA: Dean.Rubine@CS.CMU.EDU PHONE: 412-268-2613 [ Free if you call from work ] US MAIL: Computer Science Dept / Carnegie Mellon U / Pittsburgh PA 15213 DISCLAIMER: My employer wishes I would stop posting and do some work.
malcolm@Apple.COM (Malcolm Slaney) (04/07/91)
>>>Does anybody out there have experience implementing the >>>autocorrelation method of pitch detection? > > I've played around with these, but not for really short signals. > Everybody here has really been talking about periodicity detectors and NOT pitch detection. It is important to realize that pitch is a perceptual quality. An official definition of pitch (ANSI?) defines pitch as that quantity that humans would perceive and try to sing to. It is not defined in terms of the periodicities of the signal....although if the signal is periodic then the pitch is usually the same as the fundamental. Once you add a bit of noise then your signal is not periodic. That said, you need more than two cycles of a signal to perceive its pitch! I just tried it. I synthesized (with MatLab) a signal that looks like 1 second of silence a small number of cycles of 220Hz sine wave 1 second of silence If there is only one or two cycles then all one hears is a click. There is nothing in this sound that would let me hear a pitch. Not until I get to four or five cycles does it start to sound musical so I can assign a pitch to it. Try it, if you don't believe me. It is also important to realize that pitch is not a unique quantity. There are many examples where it is possible to perceive more than one pitch. Shepard tones (on the ASA Auditory Demonstrations CD) and creaky voice are two common examples. The engineering world likes to reduce pitch to a single number but that just isn't realistic. If your system outputs a single number then it is probably a periodicity detector and not a pitch detector. If you want to know more about pitch there is a book that reviews most of the pre-1983 literature Hess, Wolfgang. PITCH DETERMINATION OF SPEECH SIGNALS (Berlin ; Springer-Verlag, 1983) Within the psychoacoustics world the Journal of the Acoustical Society of America has a few articles a year talking about pitch. There seems to be two camps, the place based people led by Julius Goldstein and the autocorrelation people. I think both of these approaches are flawed and a hybrid approach was described in my 1990 ICASSP paper and an upcoming JASA article by Ray Meddis. Malcolm Slaney Apple Perception Group
dandb+@cs.cmu.edu (Dean Rubine) (04/07/91)
In article <51258@apple.Apple.COM> malcolm@Apple.COM (Malcolm Slaney) writes: >Everybody here has really been talking about periodicity detectors and NOT >pitch detection. Come, now. I think most of us realize that pitch is a perceptual quantity, or would if we thought or read about it a bit. That doesn't lessen the desire to solve a practical problem: determine the instantaneous fundamental frequency at successive points in a quasi-periodic signal. Of course even this problem statement is too vague, but it's probably good enough, because it's usually considered as a means to some end. For example, the original poster might be building a real-time transcription system for vocalists. Or a system that listens to a trumpet and synthesizes an accompaniment. Or a better pitch-to-MIDI converter. Note I've used the term "pitch" here, even though I am not overly concerned about perception. I'm just going along with the rest of the world, as was the original poster. Face it, these things are called "pitch detectors" even though the use of the term "pitch" may be technically incorrect. While pitch detectors may be judged on how often they report the same pitch as a skilled human would, in most applications the hard cases, where perception is ambiguous, can be ignored. As for his short signal not having a perceptable pitch, that may be true but that still doesn't help him to solve his problem. He knows what he means by pitch, as do the rest of us, even Maclcom Slaney I suspect. -- ARPA: Dean.Rubine@CS.CMU.EDU PHONE: 412-268-2613 [ Free if you call from work ] US MAIL: Computer Science Dept / Carnegie Mellon U / Pittsburgh PA 15213 DISCLAIMER: My employer wishes I would stop posting and do some work.
gbell@sdcc13.ucsd.edu (Greg Bell) (04/07/91)
In article <1991Apr6.062906.11886@cs.cmu.edu> dandb+@cs.cmu.edu (Dean Rubine) writes: > > I may be wrong, but I think Greg Bell is not referring to the >autocorrelation method for doing LPC (as Pankaj Tyagi seems to imply), >but something much simpler. Exactly... I wasn't sure what PT was talking about! This is actually part of an LPC analysis package, but you need to determine whether a signal is periodic or not, and if so, its period in order to feed this info to one of TI's LPC synthesizers. >interpolate for a more accurate period. > There's also a comb-filter-based method that's kind of a poor man's >autocorrelation. Here, for various delays, the signal is subtracted from Also called, according to my book, AMDF for Average Magnitude Difference Function. > 1 It seems to me that windowing will make it harder to determine > periodicity so should be avoided. It does make it harder to determine the periodicity, but apparently it is necessary. There are methods that do not require windowing, but the autocorr. method is not one of them. > > 2 For a valid determination of the extrema the same number of terms > needs to be present in the sum for each m. > Hmm... good point. My book says each sum goes from n=1 to N where N is the length of the sample. This didn't make sense since one of the terms is s[n+k] so you go out of the range of samples. So, I made my sum to from n=1 to N-k. This doesn't follow what you are saying, but it did work well for a larger segment. I'll try your guidelines for my troublesome short segments. I'll have to read you guidelines for the number of sum terms later when I can look 'em over a little more carefully. Thanks for the input! -- ----------------------------------------------------------------------------- Who: Greg Bell Address: gbell@ucsd.edu What: EE hobbyist and major Where: UC San Diego -----------------------------------------------------------------------------
gbell@sdcc13.ucsd.edu (Greg Bell) (04/07/91)
In article <51258@apple.Apple.COM> malcolm@Apple.COM (Malcolm Slaney) writes: > >Everybody here has really been talking about periodicity detectors and NOT >pitch detection. > I agree except that the text I'm using uses the two terms interchangably. Maybe for a speech signal, the frequency and pitch are the same. I'm flying by the seat of my pants on that one, but that would explain it. The segment to be presented to the ear is, of course, a lot longer than the 30mS of signal I'm processing at a time. But, its neccessary to process each chunk so that you know when the original signal's pitch changes, or when it becomes pitchless (ie. an unvoiced sound such as "s"). By the way, since I keep mentioning the book I'm using, I might as well give it credit: its Practical Approaches to Speech Coding by Panos E. Papamichalis. I'll check out your recommendation for the pitch detection book. -- ----------------------------------------------------------------------------- Who: Greg Bell Address: gbell@ucsd.edu What: EE hobbyist and major Where: UC San Diego -----------------------------------------------------------------------------
tomh.bbs@shark.cs.fau.edu (Tom Holroyd) (04/10/91)
> I agree except that the text I'm using uses the two terms > interchangably. Doesn't mean *you* have to. > Maybe for a speech signal, the frequency and pitch > are the same. No, they aren't. As Malcolm Slaney has pointed out, pitch is not even an invariant- many experiments have been conducted which show that identical stimuli can be perceived as being different*. Sort of like when you walk from a very loud environment to a quiet one, voices seem louder than they did in the loud environment. This is only an analogy, but the principle here is that what you perceive depends on the background, your recent past, your auditory organs, etc. * For example, work done in our lab by Janice Giangrande using pairs of Shepard tones- the perceived pitch difference between the pair of tones changes depending on whether the tones are part of an ascending sequence or a descending one. Tom Holroyd Florida Atlantic University Center for Complex Systems tomh@bambi.ccs.fau.edu
doug@eris.berkeley.edu (Doug Merritt) (04/13/91)
In article <51258@apple.Apple.COM> malcolm@Apple.COM (Malcolm Slaney) writes: >That said, you need more than two cycles of a signal to perceive its pitch! >I just tried it. I synthesized (with MatLab) a signal that looks like > 1 second of silence > a small number of cycles of 220Hz sine wave > 1 second of silence >If there is only one or two cycles then all one hears is a click. There is >nothing in this sound that would let me hear a pitch. Not until I get to >four or five cycles does it start to sound musical so I can assign a pitch >to it. Try it, if you don't believe me. This is an interesting subject. I have to wonder whether your experiment might be faulty, though. What if you're simply uncovering funny behavior in your sound system rather than in your ear, for instance? I think for something like this you'd want to verify that the sound you intended is really being created; try a mike & a digitizer to bring it back into your system. Besides all that, consider the Fourier domain of this. My intuition is flakey here, but it seems to me that since the period of the boxcar window is almost equal to the period of the sine, the effect in the Fourier domain should be almost identical to a single pulse of a square wave in the first place, no? Hence a click. Doug -- -- Doug Merritt doug@eris.berkeley.edu (ucbvax!eris!doug) or uunet.uu.net!crossck!dougm
malcolm@Apple.COM (Malcolm Slaney) (04/15/91)
doug@eris.berkeley.edu (Doug Merritt) writes: >This is an interesting subject. I have to wonder whether your experiment >might be faulty, though. What if you're simply uncovering funny behavior >in your sound system rather than in your ear, for instance? Well, nobody has ever accused the native Macintosh sound system of being high fidelity but the effect is true. I quote from the booklet that accompanies the Auditory Demonstrations CD (as done by the ASA): How long must a tone be heard in order to have an identifiable pitch? Early experiments by Savart (1830) indicated that a sense of pitch develops after only two cycles. Very brief tones are described as "clicks," but as the tones lengthen, the clicks take on a sense of pitch which increases upon further lengthening. It has been suggested that the dependence of pitch salience on duration follows a sort of "acoustic uncertainty principles," delta F Delta t = K, where Delta f is the uncertainty in frequency and Delta t is the duration of a tone burst. K, which can be as short as 0.1 (Majernik u and Kaluzny, 1979), appears to depend upon intensity and amplitude envelope (Ronken, 1971). The actual pitch appears to have little or no dependence upon duration (Doughty and Garner, 1948; Rossing and Houtsma, 1986). In this demonstration, we present tones of 300, 1000, and 3000 Hz in bursts of 1, 2, 4, 8, 16, 32, 64, and 128 periods. How many periods are necessary to establish a sense of pitch? Commentary "In this demonstration, three tones of increasing durations are presented. Notice the change from a click to a tone. Sequences are presented twice." >Besides all that, consider the Fourier domain of this. Bad move. First the signals I am talking about were synthesized in the time domain...Mr. Fourier wasn't involved. Second, there is a famous hearing researcher (von Bekesy?) who once said something like Dead cats and Fourier transforms have harmed hearing science more than anything else. Dead cats are a no-no because it is now known that most of what makes the ear work are the non-linearities and active gain control mechanisms in the ear. Taking measurements from an cochlea that is not living gives you a lot of meaningless data. Fourier transforms are not good because the ear is non-linear. Fourier theory is great for linear systems but the ear is far from linear. Sure, we all talk about frequency and such but one must remember that it doesn't always make sense in the ear. A lot of people argue about how the ear works but nobody thinks it computes a Fourier transform. Malcolm
doug@eris.berkeley.edu (Doug Merritt) (04/16/91)
In article <51534@apple.Apple.COM> malcolm@Apple.COM (Malcolm Slaney) writes: > Early experiments by Savart (1830) indicated that a sense of pitch > develops after only two cycles. Very brief tones are described as > "clicks," but as the tones lengthen, the clicks take on a sense of This was interesting, thanks. >Bad move. First the signals I am talking about were synthesized in the >time domain...Mr. Fourier wasn't involved. > >Fourier transforms are not good because the ear is non-linear. Fourier >theory is great for linear systems but the ear is far from linear. I understand, I've read your "Lyon's Cochlea Model" paper (quite interesting, BTW). But that's not the point. You're talking about the ear and perception, I meant to talk about the sound itself. What I meant (and didn't say very well) is that you can always look at the Fourier domain for information about the sound itself; it makes no difference whether it was synthesized in that domain or not (as I'm sure you know). And what I predict you'll find is that the result is very close to what you would see for a single cycle of a square wave, which means that it would take only a small perturbation to transform the one into the other. And in fact, the nonlinear characteristics of the ear may well perform exactly such a perturbation. So far from being irrelevent, looking at this in the Fourier domain may explain *why* the nonlinearity of the ear produces perception of a click rather than a pitch. Fair enough? Doug -- -- Doug Merritt doug@eris.berkeley.edu (ucbvax!eris!doug) or uunet.uu.net!crossck!dougm