Abstract
Reinforcement Learning and local search have been combined in a variety of ways, in order to learn how to solve combinatorial problems more efficiently. Most approaches optimise the total reward, where the reward at each action is the change in objective function. We argue that it is more appropriate to optimise the average reward. We use R-learning to dynamically tune noise in standard SAT local search algorithms on single instances. Experiments show that noise can be successfully automated in this way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boese, K.D.: Cost Versus Distance in the Travelling Salesman Problem. Technical report CSD-950018, UCLA Computer Science Department
Boyan, J.A., Moore, A.W.: Learning Evaluation Functions for Global Optimization and Boolean Satisfiability. In: 15th National Conference on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conference, pp. 3–10. AAAI Press / MIT Press (1998)
Crites, R., Barto, A.: Improving Elevator Performance Using Reinforcement Learning. In: Conference on Advance in Neural Information Processing Systems, pp. 1017–1023. MIT Press, Cambridge (1999)
Gagliolo, M., Schmidhuber, J.: Gambling in a Computationally Expensive Casino: Algorithm Selection as a Bandit Problem. In: Online Trading of Exploration and Exploitation, NIPS 2006 Workshop, Whistler, BC, Canada (2006)
Gambardella, L.M., Dorigo, M.: Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem. In: 12th International Conference on Machine Learning, pp. 252–260. Morgan Kaufmann, San Francisco (1995)
Gent, I.P., Walsh, T.: An Empirical Analysis of Search in GSAT. Journal of Artificial Intelligence Research 1, 47–59 (1993)
Gent, I.P., Walsh, T.: Unsatisfied Variables in Local Search. In: Hallam, J. (ed.) Hybrid Problems, Hybrid Solutions, pp. 73–85. IOS Press, Amsterdam (1995)
Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations and Applications. Morgan Kaufmann, San Francisco (2004)
Lagoudakis, M.G., Littman, M.L.: Algorithm Selection Using Reinforcement Learning. In: 17th International Conference on Machine Learning, pp. 511–518. Morgan Kaufmann, San Francisco (2000)
Mahadevan, S.: Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results. Machine Learning 22, 159–196 (1996)
McAllester, D.A., Selman, B., Kautz, H.A.: Evidence for Invariants in Local Search. In: 14th National Conference on Artificial Intelligence and Ninth Innovative Applications of Artificial Intelligence Conference, pp. 321–326. AAAI Press / MIT Press (1997)
Miagkikh, V., Punch, W.: Global Search in Combinatorial Optimization using Reinforcement Learning Algorithms. In: Congress on Evolutionary Computation, vol. 1, pp. 189–196. IEEE, Los Alamitos (1999)
Moll, R., Barto, A., Perkins, T., Sutton, R.: Learning Instance-Independent Value Functions to Enhance Local Search. In: Advances in Neural Information Processing Systems 11, pp. 1017–1023. MIT Press, Cambridge (1999)
Morris, P.: The Breakout Method for Escaping from Local Minima. In: 11th National Conference on Artificial Intelligence, pp. 40–45. AAAI Press / MIT Press (1993)
Nareyek, A.: Choosing Search Heuristics by Non-Stationary Reinforcement Learning. Metaheuristics: Computer Decision-Making, pp. 523–544. Kluwer, Dordrecht (2004)
Prestwich, S.D.: Random Walk With Continuously Smoothed Variable Weights. In: Bacchus, F., Walsh, T. (eds.) SAT 2005. LNCS, vol. 3569, pp. 203–215. Springer, Heidelberg (2005)
Rummery, G.A., Niranjan, M.: On-line Q-learning Using Connectionist Systems. Technical report CUED/F-INFENG/TR 166, Engineering Dept., Cambridge University, UK (1994)
Schwartz, A.: A Reinforcement Learning Method for Maximizing Undiscounted Rewards. In: 10th International Conference on Machine Learning, pp. 298–305. Morgan Kaufmann, San Francisco (1993)
Selman, B., Kautz, H.A., Cohen, B.: Noise Strategies for Improving Local Search. In: 12th National Conference on Artificial Intelligence, pp. 337–343. AAAI Press, Menlo Park (1994)
Singh, S., Jaakkola, T., Jordan, M., Cohen, W.W., Hirsh, H. (eds.): Learning Without State-Estimation in Partially Observable Markovian Decision Processes. Eleventh International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1994)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Tesauro, G.: Temporal Difference Learning and TD-Gammon. Communications of the ACM 38(3), 58–67 (1995)
Tompkins, D.A.D., Hoos, H.H.: Scaling and Probabilistic Smoothing: Dynamic Local Search for Unweighted MAX-SAT. In: Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS, vol. 2671, pp. 145–159. Springer, Heidelberg (2003)
Varrentrapp, K.E.: A Practical Framework for Adaptive Metaheuristics. PhD thesis, Fachgebiet Intellektik, Fachbereich Informatik, Technische Universität Darmstadt, Darmstadt, Germany (2005)
Watkins, C.J.C.H.: Learning From Delayed Rewards. PhD thesis. Cambridge University (1989)
Zhang, W., Dietterrich, T.D.: A Reinforcement Learning Approach to Job-Shop Scheduling. In: 14th International Joint Conference on Artificial Intelligence, pp. 1114–1120. Morgan Kaufmann, San Francisco (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prestwich, S. (2008). Tuning Local Search by Average-Reward Reinforcement Learning. In: Maniezzo, V., Battiti, R., Watson, JP. (eds) Learning and Intelligent Optimization. LION 2007. Lecture Notes in Computer Science, vol 5313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92695-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-92695-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92694-8
Online ISBN: 978-3-540-92695-5
eBook Packages: Computer ScienceComputer Science (R0)