[comp.ai.neural-nets] BP, local minima, perceptrons

sontag@fermat.rutgers.edu (Eduardo Sontag) (03/28/89)

There have been a number of postings on this issue lately.  Some time ago a
couple of abstracts were posted here.  These papers show:

(1) Even for the case of NO hidden layers and binary inputs, spurious local
minima can happen.  The smallest example I know of has 4 inputs (5 weights to
be learnt, when including the threshold) and 125 training instances.

(2) Still in the NO hidden layer case, even perceptron-separable data, when
fed to BP, can fail to be classified (Brady et al).  BUT: if one uses
"threshold penalty" cost, i.e. one does not penalize an output higher (if "1"
is desired) or lower (if "0" is desired) than some cutoff point (say, .75 and
.25 respectively) then YES, BP classifies correctly in that case.  (More
precisely, the gradient differential equation converges, in finite time, from
each initial condition.)

These results, and comparisons to related work, are included in:

Sontag, E.D. and H.J. Sussmann, ``Backpropagation can give rise to spurious
local minima even for networks without hidden layers,'' to appear in
_Complex Systems_, February 1989 issue, I understand.

Sontag, E.D. and H.J. Sussmann, "Backpropagation Separates when Perceptrons
Do",  Rutgers Center for Systems and Control Technical Report 88-12, November
1988.  Submitted to ICNN89.

LaTeX copies of these are available by email.

-eduardo sontag
--
Eduardo D. Sontag
Rutgers Center for Systems and Control (SYCON)
Rutgers University
New Brunswick, NJ 08903, USA

sontag@fermat.rutgers.edu
...!rutgers!fermat.rutgers.edu!sontag
sontag@pisces.bitnet

(Sorry if a duplicate posting -- had an error message earlier)
-- 
Eduardo D. Sontag, Professor
Department of Mathematics
Rutgers Center for Systems and Control (SYCON)
Rutgers University
New Brunswick, NJ 08903, USA

(Phone: (201)932-3072; dept.: (201)932-2390)
sontag@fermat.rutgers.edu
...!rutgers!fermat.rutgers.edu!sontag
sontag@pisces.bitnet