thrun@gmdzi.uucp (Sebastian Thrun) (04/13/91)
Archive-name: ai/neural-nets/thrun-plan-explor/1991-03-19
Archive: cheops.cis.ohio-state.edu:/pub/neuroprose/thrun.plan-explor* [128.146.8.62]
Original-posting-by: Sebastian Thrun <thrun@gmdzi.uucp>
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)
Well, there is a new TR available on the neuroprose archieve which is
more or less an extended version of the NIPS paper I announced some weeks
ago:
ON PLANNING AND EXPLORATION IN NON-DISCRETE WORLDS
Sebastian Thrun Knut Moeller
German National Research Center Bonn University
for Computer Science
St. Augustin, FRG Bonn, FRG
The application of reinforcement learning to control problems has
received considerable attention in the last few years
[Anderson86,Barto89,Sutton84]. In general there are two principles to
solve reinforcement learning problems: direct and indirect techniques,
both having their advantages and disadvantages.
We present a system that combines both methods. By interaction with an
unknown environment a world model is progressively constructed using the
backpropagation algorithm. For optimizing actions with respect to future
reinforcement planning is applied in two steps: An experience network
proposes a plan, which is subsequently optimized by gradient descent with
a chain of model networks. While operating in a goal-oriented manner due
to the planning process the experience network is trained. Its
accumulating experience is fed back into the planning process in form of
initial plans, such that planning can be gradually reduced. In order to
ensure complete system identification, a competence network is trained to
predict the accuracy of the model. This network enables purposeful
exploration of the world.
The appropriateness of this approach to reinforcement learning is
demonstrated by three different control experiments, namely a target
tracking, a robotics and a pole balancing task.
Keywords: backpropagation, connectionist networks, control, exploration,
planning, pole balancing, reinforcement learning, robotics, neural
networks, and, and, and...
=-------------------------------------------------------------------------
The TR can be retrieved by ftp:
unix> ftp cheops.cis.ohio-state.edu
Name: anonymous
Guest Login ok, send ident as password
Password: neuron
ftp> binary
ftp> cd pub
ftp> cd neuroprose
ftp> get thrun.plan-explor.ps.Z
ftp> bye
unix> uncompress thrun.plan-explor.ps
unix> lpr thrun.plan-explor.ps
= -------------------------------------------------------------------------
If you have trouble in ftping the files, do not hesitate to contact me.
--- Sebastian Thrun
(st@gmdzi.uucp, st@gmdzi.gmd.de)
-- comp.archives file verification
cheops.cis.ohio-state.edu
-rw-r--r-- 1 3169 274 307596 Mar 18 10:28 /pub/neuroprose/thrun.plan-explor.ps.Z
found thrun-plan-explor ok
cheops.cis.ohio-state.edu:/pub/neuroprose/thrun.plan-explor*