finton@ai.cs.wisc.edu (David J. Finton) (09/12/90)
Here's a summary of the responses I received to my query about tasks
to demonstrate reinforcement learning (RL):
Chuck Anderson (canderson@gte.com) has compared RL with back-prop in
his thesis:
Anderson, C.W., 1986. "Learning and problem solving with multilayered
connectionist systems," Doctoral Dissertation, Department of Computer
and Information Science, University of Massachusetts, Amherst,
Massachusetts.
GTE Laboratories has a group of people working primarily on extensions
to RL; here's a survey paper:
Franklin, Judy A., Sutton, Richard S., Anderson, Charles W.,
Selfridge, Oliver G., and Schwartz, Daniel B. "Connectionist
learning control at GTE Laboratories," in Proceedings of the
SPIE 1989 Symposium on Advances in Intelligent Robotics Systems,
November 1989, Philadelphia, Pennsylvania.
Leslie Kaelbling (leslie@teleos.com leslie%teleos.com@ai.sri.com)
has just finished a dissertation in the area of RL:
Kaelbling, Leslie Pack, 1990. "Learning in embedded systems,"
Doctoral Dissertation, Department of Computer Science,
Stanford University. (Tech report No. TR-90-04)
She is in the process of cleaning up code for an environment which
makes it easy to test different RL algorithms in different
environments. Written in common lisp. Will have a technical
note published in Machine Learning.
Michael L. Littman (mlittman@breeze.bellcore.com), along with Dave Ackley,
invented some RL problems for their algorithm which was published in the
1990 NIPS proceedings. They don't do comparisons with back-prop, although
they use back-prop as part of their algorithm. Littman is dubious about
the existence of "standard" datasets, since he notes that RL is not a
"standard" paradigm, as back-prop is.
Rich Sutton is organizing a special issue on RL in the Machine Learning
journal, according to Littman.
Tony Robinson (ajr@engineering.cambridge.ac.uk) of Cambridge University
suggests robot path planning or game playing as potential RL tasks. He
mentions an obstacle avoidance problem in the work of Andy Barto of about
a year ago in Andy Barto's lengthy review of reinforcement learning.
--David Finton
finton@cs.wisc.edu