[comp.ai.neural-nets] Help for RTRL?

coms2146@waikato.ac.nz (Alistair Veitch, University of Waikato, New Zealand) (08/16/90)

Has anybody out there worked with Williams and Zipsers "Real-time recurrent
learning algorithm"? [Connection Science, Vol 1, No 1].

We are currently trying to implement this algorithm, but have run into some
problems. We've got it to run succesfully on the various XOR problems
described, the "ab" problem (recognise the first "b" after an "a") and the
oscillation problems. What we can't seem to achieve is success for the Turing
machine problem. As this is perhaps the major result of the paper, it seems
important to duplicate it to reassure ourselves that everything is correct. Has
anyone else had success/failure with this problem? If success, would it be
possible to post your source? (We think we've got it right, but...)

--
Alistair Veitch                      Phone: +64 71 562889 ext. 8768
Internet: coms2146@waikato.ac.nz	    +64 71 562388 (home)
SNAIL: Computer Science Dept, University of Waikato, Hamilton, New Zealand

ins_atge@jhunix.HCF.JHU.EDU (Thomas G Edwards) (08/17/90)

In article <1243.26cac1c4@waikato.ac.nz> coms2146@waikato.ac.nz (Alistair Veitch, University of Waikato, New Zealand) writes:
>Has anybody out there worked with Williams and Zipsers "Real-time recurrent
>learning algorithm"? [Connection Science, Vol 1, No 1].

I haven't actually implemented this algorithm, but I have heard
that it is important to use the "Teacher Forcing" method
they discuss to learn difficult problems.

You might also want to look at J. Schmidhuber, "Making the World
Differentiable:  On using supervised learning fully-recurrent
networks for dynamic reinforcement learning and planning in non-stationary
environments", FKI Report 125-90, Technische Univeritat Munchen,
1990.  A pole-balancer is trained by reinforcement learning (i.e.
apply pain when the pole is dropped).

And to explain why gradient-descent methods will probably not give
you reasonable temporal learning see J. Schmidhuber, "Towards
compositional learning with dynamic neural networks",
FKI Report 129-90, TUM, April 1990.

He explains that gradient-descent-only methods must take into
account training learned during all past time steps when dealing with
a new problem.  For "toy" temporal learning problems, this is not
a big impediment.  For "serious" temporal learning problems,
dynamic neural systems must develop methods of breaking goals down
into subgoals, most of which have already been learned, some of which
need to be developed by gradient-descent.  In this way, only small
problems are trained by gradient-descent, and they are used by
the system combinatorially to allow the network-of-networks to
solve real problems by "divide-and-conquer" methods.
The research is very fresh into this area, and I think in about a year
there will be a move away from naive implementations of gradient-descent
learning in both stationary and temporal learning and a move
towards connectionist compositional learning (Cascade-Correlation
is a simple example of this).

-Thomas Edwards