[comp.ai.neural-nets] Responses to my query about reinforcement learning tasks

finton@ai.cs.wisc.edu (David J. Finton) (09/12/90)

Here's a summary of the responses I received to my query about tasks
to demonstrate reinforcement learning (RL):

Chuck Anderson (canderson@gte.com) has compared RL with back-prop in 
his thesis:

   Anderson, C.W., 1986.  "Learning and problem solving with multilayered
      connectionist systems," Doctoral Dissertation, Department of Computer 
      and Information Science, University of Massachusetts, Amherst,
      Massachusetts.

GTE Laboratories has a group of people working primarily on extensions
to RL;  here's a survey paper:

   Franklin, Judy A., Sutton, Richard S., Anderson, Charles W., 
      Selfridge, Oliver G., and Schwartz, Daniel B. "Connectionist
      learning control at GTE Laboratories," in Proceedings of the
      SPIE 1989 Symposium on Advances in Intelligent Robotics Systems, 
      November 1989, Philadelphia, Pennsylvania.

Leslie Kaelbling (leslie@teleos.com leslie%teleos.com@ai.sri.com) 
has just finished a dissertation in the area of RL: 
   Kaelbling, Leslie Pack, 1990.  "Learning in embedded systems,"
      Doctoral Dissertation, Department of Computer Science, 
      Stanford University.  (Tech report No. TR-90-04)

   She is in the process of cleaning up code for an environment which 
   makes it easy to test different RL algorithms in different 
   environments. Written in common lisp.  Will have a technical 
   note published in Machine Learning.

Michael L. Littman (mlittman@breeze.bellcore.com), along with Dave Ackley, 
invented some RL problems for their algorithm which was published in the 
1990 NIPS proceedings.  They don't do comparisons with back-prop, although 
they use back-prop as part of their algorithm.  Littman is dubious about 
the existence of "standard" datasets, since he notes that RL is not a 
"standard" paradigm, as back-prop is.

Rich Sutton is organizing a special issue on RL in the Machine Learning
journal, according to Littman.

Tony Robinson (ajr@engineering.cambridge.ac.uk) of Cambridge University 
suggests robot path planning or game playing as potential RL tasks.  He 
mentions an obstacle avoidance problem in the work of Andy Barto of about 
a year ago in Andy Barto's lengthy review of reinforcement learning.


--David Finton
  finton@cs.wisc.edu