sontag@fermat.rutgers.edu (Eduardo Sontag) (03/28/89)
There have been a number of postings on this issue lately. Some time ago a couple of abstracts were posted here. These papers show: (1) Even for the case of NO hidden layers and binary inputs, spurious local minima can happen. The smallest example I know of has 4 inputs (5 weights to be learnt, when including the threshold) and 125 training instances. (2) Still in the NO hidden layer case, even perceptron-separable data, when fed to BP, can fail to be classified (Brady et al). BUT: if one uses "threshold penalty" cost, i.e. one does not penalize an output higher (if "1" is desired) or lower (if "0" is desired) than some cutoff point (say, .75 and .25 respectively) then YES, BP classifies correctly in that case. (More precisely, the gradient differential equation converges, in finite time, from each initial condition.) These results, and comparisons to related work, are included in: Sontag, E.D. and H.J. Sussmann, ``Backpropagation can give rise to spurious local minima even for networks without hidden layers,'' to appear in _Complex Systems_, February 1989 issue, I understand. Sontag, E.D. and H.J. Sussmann, "Backpropagation Separates when Perceptrons Do", Rutgers Center for Systems and Control Technical Report 88-12, November 1988. Submitted to ICNN89. LaTeX copies of these are available by email. -eduardo sontag -- Eduardo D. Sontag Rutgers Center for Systems and Control (SYCON) Rutgers University New Brunswick, NJ 08903, USA sontag@fermat.rutgers.edu ...!rutgers!fermat.rutgers.edu!sontag sontag@pisces.bitnet (Sorry if a duplicate posting -- had an error message earlier) -- Eduardo D. Sontag, Professor Department of Mathematics Rutgers Center for Systems and Control (SYCON) Rutgers University New Brunswick, NJ 08903, USA (Phone: (201)932-3072; dept.: (201)932-2390) sontag@fermat.rutgers.edu ...!rutgers!fermat.rutgers.edu!sontag sontag@pisces.bitnet