mcguire@fornax.UUCP (Michael McGuire) (05/08/91)
A couple of weeks back I posted a question regarding the effects of scaling inputs into a back-prop net, and received several replies confirming that it should have no affect on classification performance. This leads to a question on normalizing network inputs. Scaling usually refers to biasing the entire input pattern set by some fixed amount. What is the effect of normalizing each input pattern individually based on some criterion in attempts to remove pattern variation caused by such things as a signals dynamic range? Does anyone have any experience with different normalization techniques? Thanks in advance. Mike McGuire Engineering Science Simon Fraser University Canada e-mail:mcguire@cs.sfu.ca
ajr@eng.cam.ac.uk (Tony Robinson) (05/09/91)
In article <2654@fornax.UUCP> mcguire@fornax.UUCP (Michael McGuire) writes: ... >This leads to a question on normalizing network inputs. Scaling usually >refers to biasing the entire input pattern set by some fixed amount. >What is the effect of normalizing each input pattern individually based >on some criterion in attempts to remove pattern variation caused by >such things as a signals dynamic range? Does anyone have any experience >with different normalization techniques? Well my current favourite is to compute the probability density function of each input node and then warp this to be a Gaussian. If I do this then the number of errors my phoneme classifier makes reduces by a worthwhile amount. Perhaps there is some justification for this in that if all inputs are independent then the distribution of points in the input space would be nicely spherical and a back-prop type hyperplane node can lop any section off as easily as any other. If anybody could come up with a neat explanation, or find this of benefit in a real problem, then obviously I'd be interested. Tony [Robinson] Cambridge University Engineering Department, Trumpington Street, Cambridge, UK Email: ajr@cam.eng.ac.uk, Phone: +44-223-332754, Fax: +44-223-332662