TD(λ) converges with probability 1

Dayan, Peter; Sejnowski, Terrence J.

doi:10.1007/BF00993978

TD(λ) converges with probability 1

Published: March 1994

Volume 14, pages 295–301, (1994)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

TD(λ) converges with probability 1

Download PDF

Peter Dayan¹ &
Terrence J. Sejnowski¹

1265 Accesses
65 Citations
Explore all metrics

Abstract

The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future.

Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as large samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result that the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.

References

Benveniste, A., Métivier, M., & Priouret, P. (1990).Adaptive algorithms and stochastic approximation. Berlin: Springer-Verlag.
Google Scholar
Dayan, P. (1992). The convergence of TD(λ) for general λ.Machine Learning, 8, 341–362.
Google Scholar
Geman, S., Bienenstock, E., & Doursat, R. (1991). Neural networks and the bias/variance dilemma.Neural Computation, 4, 1–58.
Google Scholar
Kuan, C.M., & White, H. (1990).Recursive m-estimation, non-linear regression and neural network learning with dependent observations (discussion paper). Department of Economics, University of California at San Diego.
Kuan, C.M., & White, H. (1991).Strong convergence of recursive m-estimators for models with dynamic latent variables (discussion paper 91-05). Department of Economics, University of California at San Diego.
Kushner, H.J. (1984).Approximation and weak convergence methods for random processes, with applications to stochastic systems theory. Cambridge, MA: MIT Press.
Google Scholar
Kushner, H.J., & Clark, D. (1978).Stochastic approximation methods for constrained and unconstrained systems. Berlin: Springer-Verlag.
Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method.Annals of Mathematical Statistics, 22, 400–407.
Google Scholar
Ross, S. (1983).Introduction to stochastic dynamic programming. New York: Academic Press.
Google Scholar
Samuel, A.L. (1959). Some studies in machine learning using the game of checkers.IBM Journal of Research and Development, 3, 311–229.
Google Scholar
Sutton, R.S. (1984).Temporal credit assignment in reinforcement learning. Ph.D. thesis, Department of Computer Science, University of Massachusetts, Amherst, MA.
Google Scholar
Sutton, R.S. (1988). Learning to predict by the methods of temporal difference.Machine Learning, 3, 9–44.
Google Scholar
Sutton, R.S., & Barto, A.G. (1987). A temporal-difference model of classical conditioning. GTE Laboratories Report TR87-509-2. Waltham, MA.
Tesauro, G. (1992). Practical issues in temporal difference learning.Machine Learning, 8, 257–278.
Google Scholar
Watkins, C.J.C.H. (1989).Learning from delayed rewards. Ph.D. thesis, King's College, University of Cambridge, England.
Google Scholar
Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning.Machine Learning, 8, 279–292.
Google Scholar

Download references

Author information

Authors and Affiliations

CNL, The Salk Institute, P.O. Box 85800, 92186-5800, San Diego, CA
Peter Dayan & Terrence J. Sejnowski

Authors

Peter Dayan
View author publications
You can also search for this author in PubMed Google Scholar
Terrence J. Sejnowski
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dayan, P., Sejnowski, T.J. TD(λ) converges with probability 1. Mach Learn 14, 295–301 (1994). https://doi.org/10.1007/BF00993978

Download citation

Received: 08 October 1992
Accepted: 08 January 1993
Issue Date: March 1994
DOI: https://doi.org/10.1007/BF00993978

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

TD(λ) converges with probability 1

Abstract

Article PDF

Similar content being viewed by others

TSIR

Rational Expectations

On the rates of convergence for moments convergence in regression models

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TD(λ) converges with probability 1

Abstract

Article PDF

Similar content being viewed by others

TSIR

Rational Expectations

On the rates of convergence for moments convergence in regression models

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation