skip to main content
research-article
Public Access

SkillFence: A Systems Approach to Practically Mitigating Voice-Based Confusion Attacks

Authors Info & Claims
Published:29 March 2022Publication History
Skip Abstract Section

Abstract

Voice assistants are deployed widely and provide useful functionality. However, recent work has shown that commercial systems like Amazon Alexa and Google Home are vulnerable to voice-based confusion attacks that exploit design issues. We propose a systems-oriented defense against this class of attacks and demonstrate its functionality for Amazon Alexa. We ensure that only the skills a user intends execute in response to voice commands. Our key insight is that we can interpret a user's intentions by analyzing their activity on counterpart systems of the web and smartphones. For example, the Lyft ride-sharing Alexa skill has an Android app and a website. Our work shows how information from counterpart apps can help reduce dis-ambiguities in the skill invocation process. We build SkilIFence, a browser extension that existing voice assistant users can install to ensure that only legitimate skills run in response to their commands. Using real user data from MTurk (N = 116) and experimental trials involving synthetic and organic speech, we show that SkillFence provides a balance between usability and security by securing 90.83% of skills that a user will need with a False acceptance rate of 19.83%.

References

  1. N. Carlini, P. Mishra, T. Vaidya, Y. Zhang, M. Sherr, C. Shields, D. Wagner, and W. Zhou. 2016. Hidden Voice Commands. In 25th USENIX Security Symposium (USENIX Security 16).Google ScholarGoogle Scholar
  2. Yuxuan Chen, Xuejing Yuan, Jiangshan Zhang, Yue Zhao, Kai Zhang, Shengzhi Chen, and XiaoFeng Wang. 2020. Devil's Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices. In 29th USENIX Security Symposium (USENIX Security 20).Google ScholarGoogle Scholar
  3. E. Fernandes, J. Paupore, A. Rahmati, D. Simionato, M. Conti, and A. Prakash. 2016. FlowFence: Practical Data Protection for Emerging IoT Application Frameworks. In Proceedings of the 25th USENIX Security Symposium.Google ScholarGoogle Scholar
  4. Dmitry Gerasimenko. 2010 (accessed 2020). Ahrefs. https://ahrefs.comGoogle ScholarGoogle Scholar
  5. Zhixiu Guo, Zijin Lin, Pan Li, and Kai Chen. 2020. SkillExplorer: Understanding the Behavior of Skills in Large Scale. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association. https://www.usenix.org/conference/usenixsecurity20/presentation/guoGoogle ScholarGoogle Scholar
  6. Hang Hu, Limin Yang, Shihan Lin, and Gang Wang. 2020. A Case Study of the Security Vetting Process of Smart-home Assistant Applications. In IEEE Workshop on the Internet of Safe Things (SafeThings).Google ScholarGoogle ScholarCross RefCross Ref
  7. Amazon Inc. [n.d.]. Alexa Skill Certification Guidelines. https://developer.amazon.com/en-US/docs/alexa/custom-skills/certification-requirements-for-custom-skills.html.Google ScholarGoogle Scholar
  8. Amazon.com Inc. [n.d.]. Alexa Skills for Business and Finance. https://www.amazon.com/Alexa-Skills-Business-Finance/b?ie=UTF8&node=14284819011.Google ScholarGoogle Scholar
  9. BRET Kinsella. 2018. Should Amazon Alexa Stop Allowing Duplicate Invocation Names? Should Google Assistant Permit Them? https://voicebot.ai/2018/03/26/amazon-alexa-stop-allowing-duplicate-invocation-names-google-assistant-permit/.Google ScholarGoogle Scholar
  10. Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. 2018. Skill Squatting Attacks on Amazon Alexa. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 33--47. https://www.usenix.org/conference/usenixsecurity18/presentation/kumarGoogle ScholarGoogle Scholar
  11. Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. 2018. Skill squatting attacks on Amazon Alexa. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 33--47.Google ScholarGoogle Scholar
  12. Christopher Lentzsch, Sheel Jayesh Shah, Benjamin Andow, Martin Degeling, Anupam Das, and William Enck. 2021. Hey Alexa, is this Skill Safe?: Taking a Closer Look at the Alexa Skill Ecosystem. In Proceedings of the 28th ISOC Annual Network and Distributed Systems Symposium (NDSS).Google ScholarGoogle ScholarCross RefCross Ref
  13. Christopher Lentzsch, Sheel Jayesh Shah, Benjamin Andow, Martin Degeling, Anupam Das, and William Enck. 2021. Hey Alexa, is this Skill Safe?: Taking a Closer Look at the Alexa Skill Ecosystem. In Proceedings of the 28th ISOC Annual Network and Distributed Systems Symposium (NDSS).Google ScholarGoogle ScholarCross RefCross Ref
  14. Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, and Liyun Ru. 2008. Identifying Web Spam with User Behavior Analysis. In Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (Beijing, China) (AIRWeb '08). Association for Computing Machinery, New York, NY, USA, 9--16. https://doi.org/10.1145/1451983.1451986Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. David J. Major, Danny Yuxing Huang, Marshini Chetty, and Nick Feamster. 2019. Alexa, Who Am I Speaking To? Understanding Users' Ability to Identify Third-Party Apps on Amazon Alexa. arXiv:1910.14112 [cs.HC]Google ScholarGoogle Scholar
  16. V. Panayotov, G. Chen, D. Povey, and S. Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5206--5210. https://doi.org/10.1109/ICASSP.2015.7178964Google ScholarGoogle ScholarCross RefCross Ref
  17. Paul Cutsinger. 2018. How to Improve Alexa Skill Discovery with Name-Free Interaction and More. https://developer.amazon.com/blogs/alexa/post/0fecdb38-97c9-48ac-953b-23814a469cfc/skill-discovery.Google ScholarGoogle Scholar
  18. Ritik Singh. 2021. 7 Ways to Find If an App Is Fake or Real Before Installing It. https://gadgetstouse.com/blog/2021/04/19/find-app-is-fake-or-real-before-installing/.Google ScholarGoogle Scholar
  19. Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. 2018. Inaudible Voice Commands: The Long-Range Attack and Defense. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 547--560. https://www.usenix.org/conference/nsdi18/presentation/royGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  20. Faysal Hossain Shezan, Hang Hu, Jiamin Wang, Gang Wang, and Yuan Tian. 2020. Read Between the Lines: An Empirical Measurement of Sensitive Applications of Voice Personal Assistant Systems. In Proceedings of the Web Conference (WWW'20).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Takeshi Sugawara, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, and Kevin Fu. 2020. Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems. In 29th USENIXS ecurity Symposium (USENIX Security 20).Google ScholarGoogle Scholar
  22. R. M. Suresh and R. Padmajavalli. 2007. An Overview of Data Preprocessing in Data and Web Usage Mining. In 2006 1st International Conference on Digital Information Management. 193--198. https://doi.org/10.1109/ICDIM.2007.369352Google ScholarGoogle Scholar
  23. Understand the Smart Home Skill API [n.d.]. Understand the Smart Home Skill API. https://developer.amazon.com/en-US/docs/alexa/smarthome/understand-the-smart-home-skill-api.html.Google ScholarGoogle Scholar
  24. Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine noodles: exploiting the gap between human and machine speech recognition. In 9th { USENIX} Workshop on Offensive Technologies ({ WOOT} 15).Google ScholarGoogle Scholar
  25. voicebot.ai. 2021. Alexa Skill Counts Surpass 80K in US, Spain Adds the Most Skills, New Skill Rate Falls Globally. https://voicebot.ai/2021/01/14/alexa-skill-counts-surpass-80k-in-us-spain-adds-the-most-skills-new-skill-introduction-rate-continues-to-fall-across-countries/.Google ScholarGoogle Scholar
  26. Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A. Gunter. 2018. Commandersong: A Systematic Approach for Practical Adversarial Voice Recognition. In Proceedings of the 27th USENIX Conference on Security Symposium (Baltimore, MD, USA) (SEC'18). USENIX Association, USA, 49--64.Google ScholarGoogle Scholar
  27. Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. DolphinAttack: Inaudible Voice Commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS '17). Association for Computing Machinery, New York, NY, USA, 103--117. https://doi.org/10.1145/3133956.3134052Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Zhang, X. Mi, X. Feng, X. Wang, Y. Tian, and F. Qian. 2019. Dangerous Skills: Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions on Virtual Personal Assistant Systems. In 2019 IEEE Symposium on Security and Privacy (SP), Vol.00. 263--278. https://doi.org/10.1109/SP.2019.00016Google ScholarGoogle Scholar
  29. Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, and Guofei Gu. 2019. Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications. In Proceedings of the Network and Distributed System Security Symposium (NDSS'19).Google ScholarGoogle ScholarCross RefCross Ref
  30. Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, and Guofei Gu. 2019. Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications.. In NDSS.Google ScholarGoogle Scholar
  31. B. Zhou, S. C. Hui, and A. C. m. Fong. 2006. An Effective Approach for Periodic Web Personalization. In 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06). 284--292. https://doi.org/10.1109/WI.2006.36Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SkillFence: A Systems Approach to Practically Mitigating Voice-Based Confusion Attacks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
        Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 6, Issue 1
        March 2022
        1009 pages
        EISSN:2474-9567
        DOI:10.1145/3529514
        Issue’s Table of Contents

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 March 2022
        Published in imwut Volume 6, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader