finton@ai.cs.wisc.edu (David J. Finton) (09/12/90)
Here's a summary of the responses I received to my query about tasks to demonstrate reinforcement learning (RL): Chuck Anderson (canderson@gte.com) has compared RL with back-prop in his thesis: Anderson, C.W., 1986. "Learning and problem solving with multilayered connectionist systems," Doctoral Dissertation, Department of Computer and Information Science, University of Massachusetts, Amherst, Massachusetts. GTE Laboratories has a group of people working primarily on extensions to RL; here's a survey paper: Franklin, Judy A., Sutton, Richard S., Anderson, Charles W., Selfridge, Oliver G., and Schwartz, Daniel B. "Connectionist learning control at GTE Laboratories," in Proceedings of the SPIE 1989 Symposium on Advances in Intelligent Robotics Systems, November 1989, Philadelphia, Pennsylvania. Leslie Kaelbling (leslie@teleos.com leslie%teleos.com@ai.sri.com) has just finished a dissertation in the area of RL: Kaelbling, Leslie Pack, 1990. "Learning in embedded systems," Doctoral Dissertation, Department of Computer Science, Stanford University. (Tech report No. TR-90-04) She is in the process of cleaning up code for an environment which makes it easy to test different RL algorithms in different environments. Written in common lisp. Will have a technical note published in Machine Learning. Michael L. Littman (mlittman@breeze.bellcore.com), along with Dave Ackley, invented some RL problems for their algorithm which was published in the 1990 NIPS proceedings. They don't do comparisons with back-prop, although they use back-prop as part of their algorithm. Littman is dubious about the existence of "standard" datasets, since he notes that RL is not a "standard" paradigm, as back-prop is. Rich Sutton is organizing a special issue on RL in the Machine Learning journal, according to Littman. Tony Robinson (ajr@engineering.cambridge.ac.uk) of Cambridge University suggests robot path planning or game playing as potential RL tasks. He mentions an obstacle avoidance problem in the work of Andy Barto of about a year ago in Andy Barto's lengthy review of reinforcement learning. --David Finton finton@cs.wisc.edu