Analytical Mean Squared Error Curves for Temporal Difference Learning

Singh, Satinder; Dayan, Peter

doi:10.1023/A:1007495401240

Analytical Mean Squared Error Curves for Temporal Difference Learning

Published: July 1998

Volume 32, pages 5–40, (1998)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Analytical Mean Squared Error Curves for Temporal Difference Learning

Download PDF

Satinder Singh¹ &
Peter Dayan²

794 Accesses
22 Citations
Explore all metrics

Abstract

We provide analytical expressions governing changes to the bias and variance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updates over trials in absorbing Markov reward processes. We have used these expressions to develop software that serves as an analysis tool: given a complete description of a Markov reward process, it rapidly yields an exact mean-square-error curve, the curve one would get from averaging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to illustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various temporal difference algorithms are quite sensitive to the choice of step-size and eligibility-trace parameters, there are values of these parameters that make them similarly competent, and generally good.

References

Barnard, E. (1993). Temporal-difference methods and Markov models. IEEE Transactions on Systems, Man, and Cybernetics, 23(2), 357-365.
Google Scholar
Barto, A. G. & Duff, M. (1994). Monte Carlo matrix inversion and reinforcement learning. In Advances in Neural Information Processing Systems 6, pages 687-694, San Mateo, CA. Morgan Kaufmann.
Google Scholar
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 835-846.
Google Scholar
Bucklew, J. A. (1990). Large Deviation Techniques in Decision, Simulation and Estimation. New York: Wiley-Interscience.
Google Scholar
Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8(3/4), 341-362.
Google Scholar
Dayan, P. & Sejnowski, T. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
Google Scholar
Haussler, D., Kearns, M., Seung, H. S., & Tishby, N. (1994). Rigorous learning curve bounds from statistical mechanics. In Proceedings of the 7th Annual ACM Workshop on Computational Learning Theory, pages 76-87, San Mateo, CA. Morgan Kauffman.
Google Scholar
Jaakkola, T., Jordan, M. I., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
Google Scholar
Saul, L. K.& Singh, S. (1996). Learning curves bounds for Markov decision processes with undiscounted rewards. In Proceedings of COLT.
Singh, S. & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, Vol. 22, 123-158.
Google Scholar
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
Google Scholar
Tsitsiklis, J. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), 185-202.
Google Scholar
Wasow, W. R. (1952). A note on the inversion of matrices by random walks. Math. Tables Other Aids Comput., 6, 78-81.
Google Scholar
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D Thesis, Cambridge Univ., Cambridge, England.
Google Scholar
Widrow, B. & Stearns, S. D. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Colorado, Boulder, CO, 80309-0430. E-mail
Satinder Singh
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139. E-mail
Peter Dayan

Authors

Satinder Singh
View author publications
You can also search for this author in PubMed Google Scholar
Peter Dayan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, S., Dayan, P. Analytical Mean Squared Error Curves for Temporal Difference Learning. Machine Learning 32, 5–40 (1998). https://doi.org/10.1023/A:1007495401240

Download citation

Issue Date: July 1998
DOI: https://doi.org/10.1023/A:1007495401240

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Analytical Mean Squared Error Curves for Temporal Difference Learning

Abstract

Article PDF

Similar content being viewed by others

Confidence distributions and hypothesis testing

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

How Many Participants? How Many Trials? Maximizing the Power of Reaction Time Studies

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Abstract

Article PDF

Similar content being viewed by others

Confidence distributions and hypothesis testing

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

How Many Participants? How Many Trials? Maximizing the Power of Reaction Time Studies

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation