[comp.archives] TR - On planning and exploartion in non-discrete worlds

thrun@gmdzi.uucp (Sebastian Thrun) (04/13/91)

Archive-name: ai/neural-nets/thrun-plan-explor/1991-03-19
Archive: cheops.cis.ohio-state.edu:/pub/neuroprose/thrun.plan-explor* [128.146.8.62]
Original-posting-by:    Sebastian Thrun <thrun@gmdzi.uucp>
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)

Well, there is a new TR available on the neuroprose archieve which is
more or less an extended version of the NIPS paper I announced some weeks
ago:



            ON PLANNING AND EXPLORATION IN NON-DISCRETE WORLDS

                    Sebastian Thrun                  Knut Moeller
            German National Research Center         Bonn University
                for Computer Science
                  St. Augustin, FRG                  Bonn, FRG



The application of reinforcement learning to control problems has
received considerable attention in the last few years
[Anderson86,Barto89,Sutton84].  In general there are two principles to
solve reinforcement learning problems: direct and indirect techniques,
both having their advantages and disadvantages.

We present a system that combines both methods.  By interaction with an
unknown environment a world model is progressively constructed using the
backpropagation algorithm. For optimizing actions with respect to future
reinforcement planning is applied in two steps: An experience network
proposes a plan, which is subsequently optimized by gradient descent with
a chain of model networks.  While operating in a goal-oriented manner due
to the planning process the experience network is trained. Its
accumulating experience is fed back into the planning process in form of
initial plans, such that planning can be gradually reduced.  In order to
ensure complete system identification, a competence network is trained to
predict the accuracy of the model. This network enables purposeful
exploration of the world.

The appropriateness of this approach to reinforcement learning is
demonstrated by three different control experiments, namely a target
tracking, a robotics and a pole balancing task.

Keywords: backpropagation, connectionist networks, control, exploration,
planning, pole balancing, reinforcement learning, robotics, neural
networks, and, and, and...

=-------------------------------------------------------------------------

The TR can be retrieved by ftp:

             unix>         ftp cheops.cis.ohio-state.edu

             Name:         anonymous
             Guest Login ok, send ident as password
             Password:     neuron
             ftp>          binary
             ftp>          cd pub
             ftp>          cd neuroprose
             ftp>          get thrun.plan-explor.ps.Z
             ftp>          bye
             
             unix>         uncompress thrun.plan-explor.ps
             unix>         lpr thrun.plan-explor.ps

= -------------------------------------------------------------------------


If you have trouble in ftping the files, do not hesitate to contact me.


                                            --- Sebastian Thrun
                                          (st@gmdzi.uucp, st@gmdzi.gmd.de)



-- comp.archives file verification
cheops.cis.ohio-state.edu
-rw-r--r--  1 3169     274        307596 Mar 18 10:28 /pub/neuroprose/thrun.plan-explor.ps.Z
found thrun-plan-explor ok
cheops.cis.ohio-state.edu:/pub/neuroprose/thrun.plan-explor*