ABSTRACT
Evolutionary algorithms greatly benefit from an optimal application of the different genetic operators during the optimization process: thus, it is not surprising that several research lines in literature deal with the self-adapting of activation probabilities for operators. The current state of the art revolves around the use of the Multi-Armed Bandit (MAB) and Dynamic Multi-Armed bandit (D-MAB) paradigms, that modify the selection mechanism based on the rewards of the different operators. Such methodologies, however, update the probabilities after each operator's application, creating possible issues with positive feedbacks and impairing parallel evaluations, one of the strongest advantages of evolutionary computation in an industrial perspective. Moreover, D-MAB techniques often rely upon measurements of population diversity, that might not be applicable to all real-world scenarios. In this paper, we propose a generalization of the D-MAB approach, paired with a simple mechanism for operator management, that aims at removing several limitations of other D-MAB strategies, allowing for parallel evaluations and self-adaptive parameter tuning. Experimental results show that the approach is particularly effective with frameworks containing many different operators, even when some of them are ill-suited for the problem at hand, or are sporadically failing, as it commonly happens in the real world.
- P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2--3):235--256, 2002. Google ScholarDigital Library
- N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. Google ScholarCross Ref
- F. Corno, E. Sánchez, and G. Squillero. Evolving assembly programs: how games help microprocessor validation. Evolutionary Computation, IEEE Transactions on, 9(6):695--706, 2005. Google ScholarDigital Library
- L. DaCosta, A. Fialho, M. Schoenauer, and M. Sebag. Adaptive operator selection with dynamic multi-armed bandits. In Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 913--920. ACM, 2008. Google ScholarDigital Library
- K. Deb and D. E. Goldberg. An investigation of niche and species formation in genetic function optimization. In Proceedings of the 3rd International Conference on Genetic Algorithms, pages 42--50. Morgan Kaufmann Publishers Inc., 1989. Google ScholarDigital Library
- K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evolutionary Computation, IEEE Transactions on, 6(2):182--197, 2002. Google ScholarDigital Library
- Á. Fialho, L. Da Costa, M. Schoenauer, and M. Sebag. Analyzing bandit-based adaptive operator selection mechanisms. Annals of Mathematics and Artificial Intelligence, 60(1--2):25--64, 2010. Google ScholarDigital Library
- S. Gandini, W. Ruzzarin, E. Sanchez, G. Squillero, and A. Tonda. A framework for automated detection of power-related software errors in industrial verification processes. Journal of Electronic Testing, 26(6):689--697, 2010. Google ScholarDigital Library
- N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation, 9(2):159--195, 2001. Google ScholarDigital Library
- D. V. Hinkley. Inference about the change-point from cumulative sum tests. Biometrika, 58(3):509--523, 1971.Google ScholarCross Ref
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4--22, 1985. Google ScholarDigital Library
- J. Maturana, Á. Fialho, F. Saubion, M. Schoenauer, and M. Sebag. Extreme compass and dynamic multi-armed bandits for adaptive operator selection. In Evolutionary Computation, 2009. CEC'09. IEEE Congress on, pages 365--372. IEEE, 2009. Google ScholarDigital Library
- J. Maturana, F. Lardeux, and F. Saubion. Autonomous operator management for evolutionary algorithms. Journal of Heuristics, 16(6):881--909, 2010. Google ScholarDigital Library
- J. Maturana and F. Saubion. A compass to guide genetic algorithms. In Parallel Problem Solving from Nature--PPSN X, pages 256--265. Springer, 2008. Google ScholarDigital Library
- R. Poli. A simple but theoretically-motivated method to control bloat in genetic programming. In Genetic Programming, pages 204--217. Springer, 2003. Google ScholarCross Ref
- E. Sanchez, M. Schillaci, and G. Squillero. Evolutionary Optimization: the μGP toolkit. Springer, 2011. Google ScholarDigital Library
- E. Sanchez, G. Squillero, and A. Tonda. Industrial applications of evolutionary algorithms, volume 34. Springer, 2012. Google ScholarDigital Library
- E. Sanchez, G. Squillero, and A. Tonda. Industrial applications of evolutionary algorithms, volume 34. Springer, 2012. Google ScholarDigital Library
- M. Schmidt and H. Lipson. Distilling free-form natural laws from experimental data. science, 324(5923):81--85, 2009.Google Scholar
- A. Tonda, E. Lutton, and G. Squillero. Lamps: A test problem for cooperative coevolution. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2011), pages 101--120. Springer, 2011.Google ScholarCross Ref
- J. M. Whitacre, T. Q. Pham, and R. A. Sarker. Use of statistical outlier detection method in adaptive evolutionary algorithms. In Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1345--1352. ACM, 2006. Google ScholarDigital Library
Index Terms
- Operator Selection using Improved Dynamic Multi-Armed Bandit
Recommendations
Adaptive operator selection with dynamic multi-armed bandits
GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computationAn important step toward self-tuning Evolutionary Algorithms is to design efficient Adaptive Operator Selection procedures. Such a procedure is made of two main components: a credit assignment mechanism, that computes a reward for each operator at hand ...
Multi-armed bandit problem with known trend
We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...
Multi-armed Bandit with Additional Observations
SIGMETRICS '18We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...
Comments