thrun@gmdzi.uucp (Sebastian Thrun) (04/13/91)
Archive-name: ai/neural-nets/thrun-plan-explor/1991-03-19 Archive: cheops.cis.ohio-state.edu:/pub/neuroprose/thrun.plan-explor* [128.146.8.62] Original-posting-by: Sebastian Thrun <thrun@gmdzi.uucp> Reposted-by: emv@msen.com (Edward Vielmetti, MSEN) Well, there is a new TR available on the neuroprose archieve which is more or less an extended version of the NIPS paper I announced some weeks ago: ON PLANNING AND EXPLORATION IN NON-DISCRETE WORLDS Sebastian Thrun Knut Moeller German National Research Center Bonn University for Computer Science St. Augustin, FRG Bonn, FRG The application of reinforcement learning to control problems has received considerable attention in the last few years [Anderson86,Barto89,Sutton84]. In general there are two principles to solve reinforcement learning problems: direct and indirect techniques, both having their advantages and disadvantages. We present a system that combines both methods. By interaction with an unknown environment a world model is progressively constructed using the backpropagation algorithm. For optimizing actions with respect to future reinforcement planning is applied in two steps: An experience network proposes a plan, which is subsequently optimized by gradient descent with a chain of model networks. While operating in a goal-oriented manner due to the planning process the experience network is trained. Its accumulating experience is fed back into the planning process in form of initial plans, such that planning can be gradually reduced. In order to ensure complete system identification, a competence network is trained to predict the accuracy of the model. This network enables purposeful exploration of the world. The appropriateness of this approach to reinforcement learning is demonstrated by three different control experiments, namely a target tracking, a robotics and a pole balancing task. Keywords: backpropagation, connectionist networks, control, exploration, planning, pole balancing, reinforcement learning, robotics, neural networks, and, and, and... =------------------------------------------------------------------------- The TR can be retrieved by ftp: unix> ftp cheops.cis.ohio-state.edu Name: anonymous Guest Login ok, send ident as password Password: neuron ftp> binary ftp> cd pub ftp> cd neuroprose ftp> get thrun.plan-explor.ps.Z ftp> bye unix> uncompress thrun.plan-explor.ps unix> lpr thrun.plan-explor.ps = ------------------------------------------------------------------------- If you have trouble in ftping the files, do not hesitate to contact me. --- Sebastian Thrun (st@gmdzi.uucp, st@gmdzi.gmd.de) -- comp.archives file verification cheops.cis.ohio-state.edu -rw-r--r-- 1 3169 274 307596 Mar 18 10:28 /pub/neuroprose/thrun.plan-explor.ps.Z found thrun-plan-explor ok cheops.cis.ohio-state.edu:/pub/neuroprose/thrun.plan-explor*