lchodgson@watale.waterloo.edu (Lauren Hodgson) (03/14/90)
I'm looking into vocal formant movements over time for _consonants_ in speech, (vowels are easy!). Of course, one runs into the old tradeoff between time resolution and frequency resolution when one wants both good time and freq resolution (as required in analysis of speech consonants). I've a few thoughts: 1) Requirements are (approximately): 2ms in time with 20Hz resolution from 0 to 6kHz 2) FFT's are of little use to get this time and freq resolution, unless one zero pads the input data a lot and then it's really interpolation of the output spectrum (rather than actually increasing the freq resolution). 3) An AR model is probably superior to an FFT because peaks are of more interest than valleys in a speech spectrum. 4) Various time/freq analysis tools exist, (FFT is freq only), such as the Wigner distribution and the Pseudo-Wigner distribution. Does anyone have or know of any practical experience with these time/freq analysis methods such as the Wigner distribution? What are they good at, what are their drawbacks? Much thanks. I'll post if sufficient interest...