A Game-Theoretic Analysis of the Off-Switch Game

Wängberg, Tobias; Böörs, Mikael; Catt, Elliot; Everitt, Tom; Hutter, Marcus

doi:10.1007/978-3-319-63703-7_16

Tobias Wängberg¹⁷,
Mikael Böörs¹⁷,
Elliot Catt¹⁶,
Tom Everitt¹⁶ &
…
Marcus Hutter¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10414))

Included in the following conference series:

International Conference on Artificial General Intelligence

1781 Accesses
3 Citations

Abstract

The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al. (2016b), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot’s best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot’s best action for arbitrary belief and irrationality assumptions.

The first four authors Contributed roughly equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allais, M.: Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école Américaine. Econometrica 21(4), 503–546 (1953). doi:10.2307/1907921
Armstrong, S.: Motivated value selection for artificial agents. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 12–20 (2015)
Google Scholar
Armstrong, S.: Utility indifference. Technical report. Oxford University, pp. 1–5 (2010)
Google Scholar
Armstrong, S., Leike, J.: Towards interactive inverse reinforcement learning. In: NIPS Workshop (2016)
Google Scholar
Dewey, D.: Learning what to value. In: Artificial General Intelligence, vol. 6830, pp. 309–314 (2011). ISBN 978-3-642-22886-5. doi:10.1007/978-3-642-22887-2. arXiv: 1402.5379
Everitt, T., Filan, D., Daswani, M., Hutter, M.: Self-modification of policy and utility function in rational agents. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 1–11. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_1
Google Scholar
Hadfield-Menell, D., et al.: Cooperative inverse reinforcementlearning (2016a). arXiv: 1606.03137
Hadfield-Menell, D., et al.: The off-switch game 2008, pp. 1–11 (2016b). arXiv: 1611.08219
Martin, J., Everitt, T., Hutter, M.: Death and suicide in universal artificial intelligence. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 23–32. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_3. arXiv: 1606.00652
Google Scholar
Omohundro, S.M.: The basic AI drives. In: Wang, P., Goertzel, B., Franklin, S. (eds.) Artificial General Intelligence, vol. 171, pp. 483–493. IOS Press (2008)
Google Scholar
Orseau, L., Armstrong, S.: Safely interruptible agents. In: 32nd Conference on Uncertainty in Artificial Intelligence (2016)
Google Scholar
Rasmusen, E.: Games and Information, 2nd edn. Blackwell, Oxford (1994)
Google Scholar
Soares, N., Fallenstein, B.: A technical research agenda. Technical report. Machine Intelligence Research Institute (MIRI), pp. 1–14
Google Scholar
Soares, N., et al.: Corrigibility. In: AAAI Workshop on AI and Ethics, pp. 74–82 (2015)
Google Scholar
Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton Classic Editions. Princeton University Press, Princeton (1947). ISBN 0691003629. doi:10.1177/1468795X06065810. Lambert, S., Deuber, O. (eds.)
Wiener, N.: Some moral and technical consequences of automation. Science 131(3410), 1355–1358 (1960). ISSN 0036–8075. doi:10.1126/science.132.3429.741

Download references

Acknowledgements

This work grew out of a MIRIx workshop, with Owen Cameron, John Aslanides, Huon Puertas also attending. Thanks to Amy Zhang for proof reading multiple drafts. This work was in part supported by ARC grant DP150104590.

Author information

Authors and Affiliations

Australian National University, Acton, 2601, Australia
Elliot Catt, Tom Everitt & Marcus Hutter
Linköping University, 581 83, Linköping, Sweden
Tobias Wängberg & Mikael Böörs

Authors

Tobias Wängberg
View author publications
You can also search for this author in PubMed Google Scholar
Mikael Böörs
View author publications
You can also search for this author in PubMed Google Scholar
Elliot Catt
View author publications
You can also search for this author in PubMed Google Scholar
Tom Everitt
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Hutter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Everitt .

Editor information

Editors and Affiliations

Australian National University , Canberra, Aust Capital Terr, Australia
Tom Everitt
OpenCog Foundation , Hong Kong, China
Ben Goertzel
St. Petersburg State University , St. Petersburg, Russia
Alexey Potapov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wängberg, T., Böörs, M., Catt, E., Everitt, T., Hutter, M. (2017). A Game-Theoretic Analysis of the Off-Switch Game. In: Everitt, T., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2017. Lecture Notes in Computer Science(), vol 10414. Springer, Cham. https://doi.org/10.1007/978-3-319-63703-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-63703-7_16
Published: 15 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63702-0
Online ISBN: 978-3-319-63703-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics