Tuning Local Search by Average-Reward Reinforcement Learning

Prestwich, Steven

doi:10.1007/978-3-540-92695-5_15

Steven Prestwich⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5313))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

866 Accesses
1 Citations

Abstract

Reinforcement Learning and local search have been combined in a variety of ways, in order to learn how to solve combinatorial problems more efficiently. Most approaches optimise the total reward, where the reward at each action is the change in objective function. We argue that it is more appropriate to optimise the average reward. We use R-learning to dynamically tune noise in standard SAT local search algorithms on single instances. Experiments show that noise can be successfully automated in this way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boese, K.D.: Cost Versus Distance in the Travelling Salesman Problem. Technical report CSD-950018, UCLA Computer Science Department
Google Scholar
Boyan, J.A., Moore, A.W.: Learning Evaluation Functions for Global Optimization and Boolean Satisfiability. In: 15th National Conference on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conference, pp. 3–10. AAAI Press / MIT Press (1998)
Google Scholar
Crites, R., Barto, A.: Improving Elevator Performance Using Reinforcement Learning. In: Conference on Advance in Neural Information Processing Systems, pp. 1017–1023. MIT Press, Cambridge (1999)
Google Scholar
Gagliolo, M., Schmidhuber, J.: Gambling in a Computationally Expensive Casino: Algorithm Selection as a Bandit Problem. In: Online Trading of Exploration and Exploitation, NIPS 2006 Workshop, Whistler, BC, Canada (2006)
Google Scholar
Gambardella, L.M., Dorigo, M.: Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem. In: 12th International Conference on Machine Learning, pp. 252–260. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Gent, I.P., Walsh, T.: An Empirical Analysis of Search in GSAT. Journal of Artificial Intelligence Research 1, 47–59 (1993)
MATH Google Scholar
Gent, I.P., Walsh, T.: Unsatisfied Variables in Local Search. In: Hallam, J. (ed.) Hybrid Problems, Hybrid Solutions, pp. 73–85. IOS Press, Amsterdam (1995)
Google Scholar
Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations and Applications. Morgan Kaufmann, San Francisco (2004)
MATH Google Scholar
Lagoudakis, M.G., Littman, M.L.: Algorithm Selection Using Reinforcement Learning. In: 17th International Conference on Machine Learning, pp. 511–518. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Mahadevan, S.: Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results. Machine Learning 22, 159–196 (1996)
MATH Google Scholar
McAllester, D.A., Selman, B., Kautz, H.A.: Evidence for Invariants in Local Search. In: 14th National Conference on Artificial Intelligence and Ninth Innovative Applications of Artificial Intelligence Conference, pp. 321–326. AAAI Press / MIT Press (1997)
Google Scholar
Miagkikh, V., Punch, W.: Global Search in Combinatorial Optimization using Reinforcement Learning Algorithms. In: Congress on Evolutionary Computation, vol. 1, pp. 189–196. IEEE, Los Alamitos (1999)
Google Scholar
Moll, R., Barto, A., Perkins, T., Sutton, R.: Learning Instance-Independent Value Functions to Enhance Local Search. In: Advances in Neural Information Processing Systems 11, pp. 1017–1023. MIT Press, Cambridge (1999)
Google Scholar
Morris, P.: The Breakout Method for Escaping from Local Minima. In: 11th National Conference on Artificial Intelligence, pp. 40–45. AAAI Press / MIT Press (1993)
Google Scholar
Nareyek, A.: Choosing Search Heuristics by Non-Stationary Reinforcement Learning. Metaheuristics: Computer Decision-Making, pp. 523–544. Kluwer, Dordrecht (2004)
Google Scholar
Prestwich, S.D.: Random Walk With Continuously Smoothed Variable Weights. In: Bacchus, F., Walsh, T. (eds.) SAT 2005. LNCS, vol. 3569, pp. 203–215. Springer, Heidelberg (2005)
Chapter Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning Using Connectionist Systems. Technical report CUED/F-INFENG/TR 166, Engineering Dept., Cambridge University, UK (1994)
Google Scholar
Schwartz, A.: A Reinforcement Learning Method for Maximizing Undiscounted Rewards. In: 10th International Conference on Machine Learning, pp. 298–305. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Selman, B., Kautz, H.A., Cohen, B.: Noise Strategies for Improving Local Search. In: 12th National Conference on Artificial Intelligence, pp. 337–343. AAAI Press, Menlo Park (1994)
Google Scholar
Singh, S., Jaakkola, T., Jordan, M., Cohen, W.W., Hirsh, H. (eds.): Learning Without State-Estimation in Partially Observable Markovian Decision Processes. Eleventh International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tesauro, G.: Temporal Difference Learning and TD-Gammon. Communications of the ACM 38(3), 58–67 (1995)
Article Google Scholar
Tompkins, D.A.D., Hoos, H.H.: Scaling and Probabilistic Smoothing: Dynamic Local Search for Unweighted MAX-SAT. In: Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS, vol. 2671, pp. 145–159. Springer, Heidelberg (2003)
Chapter Google Scholar
Varrentrapp, K.E.: A Practical Framework for Adaptive Metaheuristics. PhD thesis, Fachgebiet Intellektik, Fachbereich Informatik, Technische Universität Darmstadt, Darmstadt, Germany (2005)
Google Scholar
Watkins, C.J.C.H.: Learning From Delayed Rewards. PhD thesis. Cambridge University (1989)
Google Scholar
Zhang, W., Dietterrich, T.D.: A Reinforcement Learning Approach to Job-Shop Scheduling. In: 14th International Joint Conference on Artificial Intelligence, pp. 1114–1120. Morgan Kaufmann, San Francisco (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Cork Constraint Computation Centre Department of Computer Science, University College, Cork, Ireland
Steven Prestwich

Authors

Steven Prestwich
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. Computer Science, University of Bologna, Bologna, Italy
Vittorio Maniezzo
Università degli Studi di Trento, Trento, Italy
Roberto Battiti
Discrete Math and Complex Systems Department, Sandia National Laboratories, Albuquerque, New Mexico, USA
Jean-Paul Watson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prestwich, S. (2008). Tuning Local Search by Average-Reward Reinforcement Learning. In: Maniezzo, V., Battiti, R., Watson, JP. (eds) Learning and Intelligent Optimization. LION 2007. Lecture Notes in Computer Science, vol 5313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92695-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-92695-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92694-8
Online ISBN: 978-3-540-92695-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics