Skip to main content
Log in

Theoretical Foundation for CMA-ES from Information Geometry Perspective

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

This paper explores the theoretical basis of the covariance matrix adaptation evolution strategy (CMA-ES) from the information geometry viewpoint.

To establish a theoretical foundation for the CMA-ES, we focus on a geometric structure of a Riemannian manifold of probability distributions equipped with the Fisher metric. We define a function on the manifold which is the expectation of fitness over the sampling distribution, and regard the goal of update of the parameters of sampling distribution in the CMA-ES as maximization of the expected fitness. We investigate the steepest ascent learning for the expected fitness maximization, where the steepest ascent direction is given by the natural gradient, which is the product of the inverse of the Fisher information matrix and the conventional gradient of the function.

Our first result is that we can obtain under some types of parameterization of multivariate normal distribution the natural gradient of the expected fitness without the need for inversion of the Fisher information matrix. We find that the update of the distribution parameters in the CMA-ES is the same as natural gradient learning for expected fitness maximization. Our second result is that we derive the range of learning rates such that a step in the direction of the exact natural gradient improves the parameters in the expected fitness. We see from the close relation between the CMA-ES and natural gradient learning that the default setting of learning rates in the CMA-ES seems suitable in terms of monotone improvement in expected fitness. Then, we discuss the relation to the expectation-maximization framework and provide an information geometric interpretation of the CMA-ES.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Akimoto, Y., Nagata, Y., Ono, I., Kobayashi S.: Theoretical analysis of evolutionary computation on continuously differentiable functions. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2010, pp. 1401–1408 (2010)

    Google Scholar 

  2. Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Bidirectional relation between CMA evolution strategies and natural evolution strategies. In: Parallel Problem Solving from Nature—PPSN XI, pp. 154–163. Springer, Berlin (2010)

    Chapter  Google Scholar 

  3. Akimoto, Y., Sakuma, J., Ono, I., Kobayashi, S.: Functionally specialized CMA-ES: a modification of CMA-ES based on the specialization of the functions of covariance matrix adaptation and step size adaptation. In: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation—GECCO ’08, pp. 479–486 (2008)

    Chapter  Google Scholar 

  4. Amari, S.: Information geometry of the EM and em algorithms for neural networks. Neural Netw. 8(9), 1379–1408 (1995)

    Article  Google Scholar 

  5. Amari S.i.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)

    Article  MathSciNet  Google Scholar 

  6. Amari S.i., Douglas S.: Why natural gradient? In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 1213–1216 (1998)

    Google Scholar 

  7. Amari S.i., Nagaoka H.: Methods of Information Geometry. American Mathematical Society, Providence (2007)

    Google Scholar 

  8. Billingsley, P.: Probability and Measure, 3rd edn. Wiley-Interscience, New York (1995)

    MATH  Google Scholar 

  9. Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Comput. 9(2), 271–278 (1997)

    Article  MATH  Google Scholar 

  10. Dempster, A., Laird, N.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  11. Evans, M., Swartz, T.: Approximating Integrals via Monte Carlo and Deterministic Methods. Oxford University Press, New York (2000)

    MATH  Google Scholar 

  12. Glasmachers, T., Schaul, T., Yi, S., Wierstra, D., Schmidhuber, J.: Exponential natural evolution strategies. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 393–400 (2010)

    Google Scholar 

  13. Hansen, N.: The CMA Evolution Strategy: A Comparing Review, pp. 75–102. Springer, Berlin (2006)

    Google Scholar 

  14. Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)

    Article  Google Scholar 

  15. Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)

    Article  Google Scholar 

  16. Harville, D.A.: Matrix Algebra from a Statistician’s Perspective. Springer, Berlin (2008)

    MATH  Google Scholar 

  17. Jastrebski, G., Arnold, D.V.: Improving evolution strategies through active covariance matrix adaptation. In: 2006 IEEE International Conference on Evolutionary Computation, pp. 9719–9726 (2006)

    Google Scholar 

  18. Kay, S.M.: Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall, New York (1993)

    Google Scholar 

  19. Kober, J., Peters, J.: Policy search for motor primitives in robotics. Adv. Neural Inf. Process. Syst. 22, 1–8 (2009)

    Google Scholar 

  20. Kullback, S.: Information Theory and Statistics. Wiley, New York (1959)

    MATH  Google Scholar 

  21. Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic, Norwell (2002)

    MATH  Google Scholar 

  22. Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, vol. 89, pp. 355–368 (1998)

    Google Scholar 

  23. Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7–9), 1180–1190 (2008)

    Article  Google Scholar 

  24. Ros, R., Hansen, N.: A simple modification in CMA-ES achieving linear time and space complexity. In: Parallel Problem Solving from Nature—PPSN X, pp. 296–305 (2008)

    Chapter  Google Scholar 

  25. Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Efficient natural evolution strategies. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation—GECCO ’09, pp. 539–545 (2009)

    Chapter  Google Scholar 

  26. Sun, Y., Wierstra, D., Schaul, T., Schmidhuber, J.: Stochastic search using the natural gradient. In: Proceedings of the 26th International Conference on Machine Learning, pp. 1161–1168 (2009)

    Google Scholar 

  27. Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Fitness expectation maximization. In: Parallel Problem Solving from Nature—PPSN X, pp. 337–346. Springer, Berlin (2008)

    Chapter  Google Scholar 

  28. Wierstra, D., Schaul, T., Peters, J., Schmidhuber, J.: Natural evolution strategies. In: IEEE Congress on Evolutionary Computation, pp. 3381–3387 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youhei Akimoto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akimoto, Y., Nagata, Y., Ono, I. et al. Theoretical Foundation for CMA-ES from Information Geometry Perspective. Algorithmica 64, 698–716 (2012). https://doi.org/10.1007/s00453-011-9564-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-011-9564-8

Keywords

Navigation