rich@gte-labs.CSNET (Rich Sutton) (01/17/87)
------------------------------------------------------------------------ LEARNING TO PREDICT BY THE METHODS OF TEMPORAL DIFFERENCES Richard S. Sutton GTE Labs Waltham, MA 02254 Rich@GTE-Labs.CSNet This technical report introduces and provides the first formal results in the theory of TEMPORAL-DIFFERENCE METHODS, a class of statistical learning procedures specialized for prediction---that is, for using past experience with an incompletely known system to predict its future behavior. Whereas in conventional prediction-learning methods the error term is the difference between predicted and actual outcomes, in temporal-difference methods it is the difference between temporally successive predictions. Although temporal-difference methods have been used in Samuel's checker-player, Holland's Bucket Brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove the convergence and optimality of temporal-difference methods for special cases, and relate them to supervised-learning procedures. For most real-world prediction problems, temporal-difference methods require less memory and peak computation than conventional methods AND produce more accurate predictions. It is argued that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage. -------------------------------------------------------------------------- p.s. Those who have previously requested a paper on "bootstrap learning" are already on my mailing list and should receive the paper sometime next week.