Skip to main content

A Game-Theoretic Analysis of the Off-Switch Game

  • Conference paper
  • First Online:
Artificial General Intelligence (AGI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10414))

Included in the following conference series:

Abstract

The off-switch game is a game theoretic model of a highly intelligent robot interacting with a human. In the original paper by Hadfield-Menell et al.  (2016b), the analysis is not fully game-theoretic as the human is modelled as an irrational player, and the robot’s best action is only calculated under unrealistic normality and soft-max assumptions. In this paper, we make the analysis fully game theoretic, by modelling the human as a rational player with a random utility function. As a consequence, we are able to easily calculate the robot’s best action for arbitrary belief and irrationality assumptions.

The first four authors Contributed roughly equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Allais, M.: Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école Américaine. Econometrica 21(4), 503–546 (1953). doi:10.2307/1907921

  • Armstrong, S.: Motivated value selection for artificial agents. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 12–20 (2015)

    Google Scholar 

  • Armstrong, S.: Utility indifference. Technical report. Oxford University, pp. 1–5 (2010)

    Google Scholar 

  • Armstrong, S., Leike, J.: Towards interactive inverse reinforcement learning. In: NIPS Workshop (2016)

    Google Scholar 

  • Dewey, D.: Learning what to value. In: Artificial General Intelligence, vol. 6830, pp. 309–314 (2011). ISBN 978-3-642-22886-5. doi:10.1007/978-3-642-22887-2. arXiv: 1402.5379

  • Everitt, T., Filan, D., Daswani, M., Hutter, M.: Self-modification of policy and utility function in rational agents. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 1–11. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_1

    Google Scholar 

  • Hadfield-Menell, D., et al.: Cooperative inverse reinforcementlearning (2016a). arXiv: 1606.03137

  • Hadfield-Menell, D., et al.: The off-switch game 2008, pp. 1–11 (2016b). arXiv: 1611.08219

  • Martin, J., Everitt, T., Hutter, M.: Death and suicide in universal artificial intelligence. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 23–32. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_3. arXiv: 1606.00652

    Google Scholar 

  • Omohundro, S.M.: The basic AI drives. In: Wang, P., Goertzel, B., Franklin, S. (eds.) Artificial General Intelligence, vol. 171, pp. 483–493. IOS Press (2008)

    Google Scholar 

  • Orseau, L., Armstrong, S.: Safely interruptible agents. In: 32nd Conference on Uncertainty in Artificial Intelligence (2016)

    Google Scholar 

  • Rasmusen, E.: Games and Information, 2nd edn. Blackwell, Oxford (1994)

    Google Scholar 

  • Soares, N., Fallenstein, B.: A technical research agenda. Technical report. Machine Intelligence Research Institute (MIRI), pp. 1–14

    Google Scholar 

  • Soares, N., et al.: Corrigibility. In: AAAI Workshop on AI and Ethics, pp. 74–82 (2015)

    Google Scholar 

  • Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior. Princeton Classic Editions. Princeton University Press, Princeton (1947). ISBN 0691003629. doi:10.1177/1468795X06065810. Lambert, S., Deuber, O. (eds.)

  • Wiener, N.: Some moral and technical consequences of automation. Science 131(3410), 1355–1358 (1960). ISSN 0036–8075. doi:10.1126/science.132.3429.741

Download references

Acknowledgements

This work grew out of a MIRIx workshop, with Owen Cameron, John Aslanides, Huon Puertas also attending. Thanks to Amy Zhang for proof reading multiple drafts. This work was in part supported by ARC grant DP150104590.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom Everitt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wängberg, T., Böörs, M., Catt, E., Everitt, T., Hutter, M. (2017). A Game-Theoretic Analysis of the Off-Switch Game. In: Everitt, T., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2017. Lecture Notes in Computer Science(), vol 10414. Springer, Cham. https://doi.org/10.1007/978-3-319-63703-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63703-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63702-0

  • Online ISBN: 978-3-319-63703-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics