[comp.ai.neural-nets] Backpropagation doesn't need layered networks

markh@csd4.csd.uwm.edu (Mark William Hopkins) (11/16/90)
In article <GREENBA.90Nov13115837@gambia.crd.ge.com> greenba@gambia.crd.ge.com (ben a green) writes:
>Backpropagation is nothing more than the application of the chain rule
>of differentiation to the task of calculating the gradient with
>respect to weights and biases of a cost function for a layered,
>feedforward net.

It can also be applied to neural nets that have feedback loops, not in the
relatively roundabout way described in the McClelland & Rumelhart PDP book,
but in a very direct way that involves iterating both the forward activation
and error propagation without breaking up the feedback loops into an infinite
series of identical layers.

Hopfield nets can be embedded in the larger picture as a special case...

The resulting error propagation is guaranteed to settle down in a finite number
of steps below any prespecified tolerance IF the forward activation does.  
Experience has proven that they both settle down in the same number of steps,
and usually a small number.  About the only time I found any instability is
where I put it there on purpose (running a simulated flip-flop in an undefined
mode).

Which brings up another point: these kind of nets with feedback can exist in
multiple states that not only depend on the current input but also the prior
history of inputs.  You MAY be able to use backpropagation here to get the
net to *learn* to become bi-stable, like a flip-flop.  Then backpropagation
provides a natural way for training finite state machines.  The only problem
I've observed with using backpropagation in this way so far is that it causes
the net to converge, but to converge by first passing through an instable
region (which takes forever do to).  So it never quite crosses the threshold,
as it were...

>It's amazing that it took so many years after Minsky and Paepert's
>denunciation of the perceptron for people to think of BP as a solution
>to the training problem... (Maybe the discovery really was the use of
>differentiable node functions instead of flipflops, not
>backpropagation.)
>

Note the double irony here...