[comp.dsp] Wavelet Transform for pitch-shifting, etc.

rjk@sequent.UUCP (Robert Kelley) (09/27/89)

I came across an article:

Kronland-Martinet, Richard. 1988. "The Wavelet Transform for Analysis, Synthesis,
and Processing of Speech and Music Sounds" Computer Music Journal 12(4):11-20.

It describes a method of signal analysis and synthesis which seems ideally suited
to the pitch-shifting or CD speed-altering task mentioned here recently.  I'm
fascinated with this article and would like to learn more about this technique.
It reminds me of a windowed DFT in a way, except that the analyzing function is
windowed, not the input data.  I would like to hear about anyone experimenting
with this technique, or even thinking about it.

By the way, the EDN article confused me because what was being described was not
an FFT but a DFT.

malcolm@Apple.COM (Malcolm Slaney) (10/01/89)

In article <22313@sequent.UUCP> rjk@sequent.UUCP (Robert Kelley) writes:
>I came across an article:
>Kronland-Martinet, Richard. 1988. "The Wavelet Transform for Analysis, 
> Synthesis, and Processing of Speech and Music Sounds" Computer Music 
> Journal 12(4):11-20.
>It describes a method of signal analysis and synthesis which seems ideally 
>suited to the pitch-shifting or CD speed-altering task mentioned here 
>recently.

I think the Wavelet Transform is a partially developed rehash of two ideas
that have been around for a long time.  If you are interested in recognition
and detection than you should look at the Scale Space theory first proposed
by Andy Whitkin somewhere around 1984 in AI literature.  The other idea
is often called the Wigner distribution and was proposed in the 50's.  The
latest reference to it I have seen is an MIT thesis by Riley called something
like "Time Frequency Representation of Speech."  The thesis was published as
a book last year.

Drop me a note if you need better references.

						Malcolm Slaney
						Speech and Hearing Project
						malcolm@apple.com

dean@image.soe.clarkson.edu (Dean Swan) (10/03/89)

From article <35148@apple.Apple.COM>, by malcolm@Apple.COM (Malcolm Slaney):
> In article <22313@sequent.UUCP> rjk@sequent.UUCP (Robert Kelley) writes:
>>I came across an article:
>>Kronland-Martinet, Richard. 1988. "The Wavelet Transform for Analysis, 
>> Synthesis, and Processing of Speech and Music Sounds" Computer Music 
>> Journal 12(4):11-20.
>>It describes a method of signal analysis and synthesis which seems ideally 
>>suited to the pitch-shifting or CD speed-altering task mentioned here 
>>recently.
> 
> I think the Wavelet Transform is a partially developed rehash of two ideas
> that have been around for a long time.  If you are interested in recognition
> and detection than you should look at the Scale Space theory first proposed
> by Andy Whitkin somewhere around 1984 in AI literature.  The other idea
> is often called the Wigner distribution and was proposed in the 50's.  The
> latest reference to it I have seen is an MIT thesis by Riley called something
> like "Time Frequency Representation of Speech."  The thesis was published as
> a book last year.
> 
> Drop me a note if you need better references.
> 
> 						Malcolm Slaney
> 						Speech and Hearing Project
> 						malcolm@apple.com