[mod.ai] TR Abstract -- Learning to Predict

rich@gte-labs.CSNET (Rich Sutton) (01/17/87)
------------------------------------------------------------------------


                        LEARNING TO PREDICT
              BY THE METHODS OF TEMPORAL DIFFERENCES
 
                         Richard S. Sutton
                             GTE Labs
                         Waltham, MA 02254
                        Rich@GTE-Labs.CSNet

This technical report introduces and provides the first formal results
in the theory of TEMPORAL-DIFFERENCE METHODS, a class of statistical
learning procedures specialized for prediction---that is, for using past
experience with an incompletely known system to predict its future
behavior.  Whereas in conventional prediction-learning methods the error
term is the difference between predicted and actual outcomes, in
temporal-difference methods it is the difference between temporally
successive predictions.  Although temporal-difference methods have been
used in Samuel's checker-player, Holland's Bucket Brigade, and the
author's Adaptive Heuristic Critic, they have remained poorly
understood.  Here we prove the convergence and optimality of
temporal-difference methods for special cases, and relate them to
supervised-learning procedures.  For most real-world prediction
problems, temporal-difference methods require less memory and peak
computation than conventional methods AND produce more accurate
predictions.  It is argued that most problems to which supervised
learning is currently applied are really prediction problems of the sort
to which temporal-difference methods can be applied to advantage.

--------------------------------------------------------------------------


p.s. Those who have previously requested a paper on "bootstrap learning"
are already on my mailing list and should receive the paper sometime next week.