[comp.ai] TD Model of Conditioning -- Paper Announcement

rich@GTE.COM (Rich Sutton) (05/20/89)

Andy Barto and I have just completed a major new paper relating
temporal-difference learning, as used, for example, in our
pole-balancing learning controller, to classical conditioning in
animals.  The paper will appear in the forthcoming book ``Learning and
Computational Neuroscience,'' edited by J.W. Moore and M. Gabriel, MIT
Press.  A preprint can be obtained by emailing to
rich%gte.com@relay.cs.net with your physical-mail address.  The paper
has no abstract, but begins as follows:


	   TIME-DERIVATIVE MODELS OF PAVLOVIAN REINFORCEMENT

			   Richard S. Sutton
		     GTE Laboratories Incorporated

			    Andrew G. Barto
		      University of Massachusetts

This chapter presents a model of classical conditioning called the
temporal-difference (TD) model.  The TD model was originally developed
as a neuron-like unit for use in adaptive networks (Sutton & Barto,
1987; Sutton, 1984; Barto, Sutton & Anderson, 1983).  In this paper,
however, we analyze it from the point of view of animal learning theory.
Our intended audience is both animal learning researchers interested in
computational theories of behavior and machine learning researchers
interested in how their learning algorithms relate to, and may be
constrained by, animal learning studies.

We focus on what we see as the primary theoretical contribution to
animal learning theory of the TD and related models: the hypothesis that
reinforcement in classical conditioning is the time derivative of a
composite association combining innate (US) and acquired (CS)
associations.  We call models based on some variant of this hypothesis
``time-derivative models'', examples of which are the models by Klopf
(1988), Sutton & Barto (1981a), Moore et al (1986), Hawkins & Kandel
(1984), Gelperin, Hopfield & Tank (1985), Tesauro (1987), and Kosko
(1986); we examine several of these models in relation to the TD model.
We also briefly explore relationships with animal learning theories of
reinforcement, including Mowrer's drive-induction theory (Mowrer, 1960)
and the Rescorla-Wagner model (Rescorla & Wagner, 1972).

We motivate and explain time-derivative models from the point of view
of animal learning theory, and show that the TD model solves
significant problems with earlier time-derivative models.  We also
demonstrate the TD model's accord with empirical data in a range of
conditioning paradigms including conditioned inhibition, primacy
effects (Egger & Miller, 1962), facilitation of remote associations,
and second-order conditioning.