Abstract
The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al. (2016b), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot’s best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot’s best action for arbitrary belief and irrationality assumptions.
The first four authors Contributed roughly equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allais, M.: Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école Américaine. Econometrica 21(4), 503–546 (1953). doi:10.2307/1907921
Armstrong, S.: Motivated value selection for artificial agents. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 12–20 (2015)
Armstrong, S.: Utility indifference. Technical report. Oxford University, pp. 1–5 (2010)
Armstrong, S., Leike, J.: Towards interactive inverse reinforcement learning. In: NIPS Workshop (2016)
Dewey, D.: Learning what to value. In: Artificial General Intelligence, vol. 6830, pp. 309–314 (2011). ISBN 978-3-642-22886-5. doi:10.1007/978-3-642-22887-2. arXiv: 1402.5379
Everitt, T., Filan, D., Daswani, M., Hutter, M.: Self-modification of policy and utility function in rational agents. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 1–11. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_1
Hadfield-Menell, D., et al.: Cooperative inverse reinforcementlearning (2016a). arXiv: 1606.03137
Hadfield-Menell, D., et al.: The off-switch game 2008, pp. 1–11 (2016b). arXiv: 1611.08219
Martin, J., Everitt, T., Hutter, M.: Death and suicide in universal artificial intelligence. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 23–32. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_3. arXiv: 1606.00652
Omohundro, S.M.: The basic AI drives. In: Wang, P., Goertzel, B., Franklin, S. (eds.) Artificial General Intelligence, vol. 171, pp. 483–493. IOS Press (2008)
Orseau, L., Armstrong, S.: Safely interruptible agents. In: 32nd Conference on Uncertainty in Artificial Intelligence (2016)
Rasmusen, E.: Games and Information, 2nd edn. Blackwell, Oxford (1994)
Soares, N., Fallenstein, B.: A technical research agenda. Technical report. Machine Intelligence Research Institute (MIRI), pp. 1–14
Soares, N., et al.: Corrigibility. In: AAAI Workshop on AI and Ethics, pp. 74–82 (2015)
Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton Classic Editions. Princeton University Press, Princeton (1947). ISBN 0691003629. doi:10.1177/1468795X06065810. Lambert, S., Deuber, O. (eds.)
Wiener, N.: Some moral and technical consequences of automation. Science 131(3410), 1355–1358 (1960). ISSN 0036–8075. doi:10.1126/science.132.3429.741
Acknowledgements
This work grew out of a MIRIx workshop, with Owen Cameron, John Aslanides, Huon Puertas also attending. Thanks to Amy Zhang for proof reading multiple drafts. This work was in part supported by ARC grant DP150104590.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wängberg, T., Böörs, M., Catt, E., Everitt, T., Hutter, M. (2017). A Game-Theoretic Analysis of the Off-Switch Game. In: Everitt, T., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2017. Lecture Notes in Computer Science(), vol 10414. Springer, Cham. https://doi.org/10.1007/978-3-319-63703-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-63703-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63702-0
Online ISBN: 978-3-319-63703-7
eBook Packages: Computer ScienceComputer Science (R0)