rich@GTE.COM (Rich Sutton) (05/20/89)
Andy Barto and I have just completed a major new paper relating temporal-difference learning, as used, for example, in our pole-balancing learning controller, to classical conditioning in animals. The paper will appear in the forthcoming book ``Learning and Computational Neuroscience,'' edited by J.W. Moore and M. Gabriel, MIT Press. A preprint can be obtained by emailing to rich%gte.com@relay.cs.net with your physical-mail address. The paper has no abstract, but begins as follows: TIME-DERIVATIVE MODELS OF PAVLOVIAN REINFORCEMENT Richard S. Sutton GTE Laboratories Incorporated Andrew G. Barto University of Massachusetts This chapter presents a model of classical conditioning called the temporal-difference (TD) model. The TD model was originally developed as a neuron-like unit for use in adaptive networks (Sutton & Barto, 1987; Sutton, 1984; Barto, Sutton & Anderson, 1983). In this paper, however, we analyze it from the point of view of animal learning theory. Our intended audience is both animal learning researchers interested in computational theories of behavior and machine learning researchers interested in how their learning algorithms relate to, and may be constrained by, animal learning studies. We focus on what we see as the primary theoretical contribution to animal learning theory of the TD and related models: the hypothesis that reinforcement in classical conditioning is the time derivative of a composite association combining innate (US) and acquired (CS) associations. We call models based on some variant of this hypothesis ``time-derivative models'', examples of which are the models by Klopf (1988), Sutton & Barto (1981a), Moore et al (1986), Hawkins & Kandel (1984), Gelperin, Hopfield & Tank (1985), Tesauro (1987), and Kosko (1986); we examine several of these models in relation to the TD model. We also briefly explore relationships with animal learning theories of reinforcement, including Mowrer's drive-induction theory (Mowrer, 1960) and the Rescorla-Wagner model (Rescorla & Wagner, 1972). We motivate and explain time-derivative models from the point of view of animal learning theory, and show that the TD model solves significant problems with earlier time-derivative models. We also demonstrate the TD model's accord with empirical data in a range of conditioning paradigms including conditioned inhibition, primacy effects (Egger & Miller, 1962), facilitation of remote associations, and second-order conditioning.