mlaprade@x102a.harris-atd.com (laprade maria 42641) (04/05/91)
I have a clean speaker database where I use the following to extract features: use energy and zero crossing to find voiced speech, pre-emphasize and high pass filter, autocorrelate. I was satisfied with those results and decided to add noise. I added white noise and processed it exactly the same way. For most of the speakers I have lost about half the number of features extracted ( I think I'm losing them because the pitch is calculated as a too high frequency. I do know that the zero crossing value has now doubled, and the energy threshold has increased about 100x for a SNR of 10dB.) Which I thought was reasonable. However I am left with less than 10 samples for 1 particular speaker. My question is am I incorrect to add in white noise, or do I need to modify my processing equations. I'm neither a speech nor a signal processing type engineer, I just need a database to feed to my neural network so please be explicit with your help. Thanks. -- Maria Laprade ARPA: mlaprade@x102a.harris-atd.com Harris Corporation - GASD UUCP: ...uunet!x102a!mlaprade Palm Bay, Florida voice: (407)727-4920
malcolm@Apple.COM (Malcolm Slaney) (04/06/91)
In article laprade@x102a.ess.harris.com (laprade maria 42641) writes: >My question is am I incorrect to add in white noise, or do I need to >modify my processing equations. I'm neither a speech nor a signal >processing type engineer Ooooh, I love it....the blind (somebody who doesn't understand the problem) leading the blind (a neural net). Nothing personal......but wouldn't it make more sense to do NN research with a problem that you understand? If not, check out Ron Cole's paper in the Albuquerque ICASSP (1990). He compares several different representations as input to a NN. Your data isn't necessarily bad, but your representations are't robust in the face of noise. Also check out Melvyn Hunt's paper in the NY ICASSP. Malcolm