Abstract
Voice assistants are deployed widely and provide useful functionality. However, recent work has shown that commercial systems like Amazon Alexa and Google Home are vulnerable to voice-based confusion attacks that exploit design issues. We propose a systems-oriented defense against this class of attacks and demonstrate its functionality for Amazon Alexa. We ensure that only the skills a user intends execute in response to voice commands. Our key insight is that we can interpret a user's intentions by analyzing their activity on counterpart systems of the web and smartphones. For example, the Lyft ride-sharing Alexa skill has an Android app and a website. Our work shows how information from counterpart apps can help reduce dis-ambiguities in the skill invocation process. We build SkilIFence, a browser extension that existing voice assistant users can install to ensure that only legitimate skills run in response to their commands. Using real user data from MTurk (N = 116) and experimental trials involving synthetic and organic speech, we show that SkillFence provides a balance between usability and security by securing 90.83% of skills that a user will need with a False acceptance rate of 19.83%.
- N. Carlini, P. Mishra, T. Vaidya, Y. Zhang, M. Sherr, C. Shields, D. Wagner, and W. Zhou. 2016. Hidden Voice Commands. In 25th USENIX Security Symposium (USENIX Security 16).Google Scholar
- Yuxuan Chen, Xuejing Yuan, Jiangshan Zhang, Yue Zhao, Kai Zhang, Shengzhi Chen, and XiaoFeng Wang. 2020. Devil's Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices. In 29th USENIX Security Symposium (USENIX Security 20).Google Scholar
- E. Fernandes, J. Paupore, A. Rahmati, D. Simionato, M. Conti, and A. Prakash. 2016. FlowFence: Practical Data Protection for Emerging IoT Application Frameworks. In Proceedings of the 25th USENIX Security Symposium.Google Scholar
- Dmitry Gerasimenko. 2010 (accessed 2020). Ahrefs. https://ahrefs.comGoogle Scholar
- Zhixiu Guo, Zijin Lin, Pan Li, and Kai Chen. 2020. SkillExplorer: Understanding the Behavior of Skills in Large Scale. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association. https://www.usenix.org/conference/usenixsecurity20/presentation/guoGoogle Scholar
- Hang Hu, Limin Yang, Shihan Lin, and Gang Wang. 2020. A Case Study of the Security Vetting Process of Smart-home Assistant Applications. In IEEE Workshop on the Internet of Safe Things (SafeThings).Google ScholarCross Ref
- Amazon Inc. [n.d.]. Alexa Skill Certification Guidelines. https://developer.amazon.com/en-US/docs/alexa/custom-skills/certification-requirements-for-custom-skills.html.Google Scholar
- Amazon.com Inc. [n.d.]. Alexa Skills for Business and Finance. https://www.amazon.com/Alexa-Skills-Business-Finance/b?ie=UTF8&node=14284819011.Google Scholar
- BRET Kinsella. 2018. Should Amazon Alexa Stop Allowing Duplicate Invocation Names? Should Google Assistant Permit Them? https://voicebot.ai/2018/03/26/amazon-alexa-stop-allowing-duplicate-invocation-names-google-assistant-permit/.Google Scholar
- Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. 2018. Skill Squatting Attacks on Amazon Alexa. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 33--47. https://www.usenix.org/conference/usenixsecurity18/presentation/kumarGoogle Scholar
- Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. 2018. Skill squatting attacks on Amazon Alexa. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 33--47.Google Scholar
- Christopher Lentzsch, Sheel Jayesh Shah, Benjamin Andow, Martin Degeling, Anupam Das, and William Enck. 2021. Hey Alexa, is this Skill Safe?: Taking a Closer Look at the Alexa Skill Ecosystem. In Proceedings of the 28th ISOC Annual Network and Distributed Systems Symposium (NDSS).Google ScholarCross Ref
- Christopher Lentzsch, Sheel Jayesh Shah, Benjamin Andow, Martin Degeling, Anupam Das, and William Enck. 2021. Hey Alexa, is this Skill Safe?: Taking a Closer Look at the Alexa Skill Ecosystem. In Proceedings of the 28th ISOC Annual Network and Distributed Systems Symposium (NDSS).Google ScholarCross Ref
- Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, and Liyun Ru. 2008. Identifying Web Spam with User Behavior Analysis. In Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (Beijing, China) (AIRWeb '08). Association for Computing Machinery, New York, NY, USA, 9--16. https://doi.org/10.1145/1451983.1451986Google ScholarDigital Library
- David J. Major, Danny Yuxing Huang, Marshini Chetty, and Nick Feamster. 2019. Alexa, Who Am I Speaking To? Understanding Users' Ability to Identify Third-Party Apps on Amazon Alexa. arXiv:1910.14112 [cs.HC]Google Scholar
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5206--5210. https://doi.org/10.1109/ICASSP.2015.7178964Google ScholarCross Ref
- Paul Cutsinger. 2018. How to Improve Alexa Skill Discovery with Name-Free Interaction and More. https://developer.amazon.com/blogs/alexa/post/0fecdb38-97c9-48ac-953b-23814a469cfc/skill-discovery.Google Scholar
- Ritik Singh. 2021. 7 Ways to Find If an App Is Fake or Real Before Installing It. https://gadgetstouse.com/blog/2021/04/19/find-app-is-fake-or-real-before-installing/.Google Scholar
- Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. 2018. Inaudible Voice Commands: The Long-Range Attack and Defense. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 547--560. https://www.usenix.org/conference/nsdi18/presentation/royGoogle ScholarDigital Library
- Faysal Hossain Shezan, Hang Hu, Jiamin Wang, Gang Wang, and Yuan Tian. 2020. Read Between the Lines: An Empirical Measurement of Sensitive Applications of Voice Personal Assistant Systems. In Proceedings of the Web Conference (WWW'20).Google ScholarDigital Library
- Takeshi Sugawara, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, and Kevin Fu. 2020. Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems. In 29th USENIXS ecurity Symposium (USENIX Security 20).Google Scholar
- R. M. Suresh and R. Padmajavalli. 2007. An Overview of Data Preprocessing in Data and Web Usage Mining. In 2006 1st International Conference on Digital Information Management. 193--198. https://doi.org/10.1109/ICDIM.2007.369352Google Scholar
- Understand the Smart Home Skill API [n.d.]. Understand the Smart Home Skill API. https://developer.amazon.com/en-US/docs/alexa/smarthome/understand-the-smart-home-skill-api.html.Google Scholar
- Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine noodles: exploiting the gap between human and machine speech recognition. In 9th { USENIX} Workshop on Offensive Technologies ({ WOOT} 15).Google Scholar
- voicebot.ai. 2021. Alexa Skill Counts Surpass 80K in US, Spain Adds the Most Skills, New Skill Rate Falls Globally. https://voicebot.ai/2021/01/14/alexa-skill-counts-surpass-80k-in-us-spain-adds-the-most-skills-new-skill-introduction-rate-continues-to-fall-across-countries/.Google Scholar
- Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A. Gunter. 2018. Commandersong: A Systematic Approach for Practical Adversarial Voice Recognition. In Proceedings of the 27th USENIX Conference on Security Symposium (Baltimore, MD, USA) (SEC'18). USENIX Association, USA, 49--64.Google Scholar
- Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. DolphinAttack: Inaudible Voice Commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS '17). Association for Computing Machinery, New York, NY, USA, 103--117. https://doi.org/10.1145/3133956.3134052Google ScholarDigital Library
- N. Zhang, X. Mi, X. Feng, X. Wang, Y. Tian, and F. Qian. 2019. Dangerous Skills: Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions on Virtual Personal Assistant Systems. In 2019 IEEE Symposium on Security and Privacy (SP), Vol.00. 263--278. https://doi.org/10.1109/SP.2019.00016Google Scholar
- Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, and Guofei Gu. 2019. Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications. In Proceedings of the Network and Distributed System Security Symposium (NDSS'19).Google ScholarCross Ref
- Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, and Guofei Gu. 2019. Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications.. In NDSS.Google Scholar
- B. Zhou, S. C. Hui, and A. C. m. Fong. 2006. An Effective Approach for Periodic Web Personalization. In 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06). 284--292. https://doi.org/10.1109/WI.2006.36Google ScholarDigital Library
Index Terms
- SkillFence: A Systems Approach to Practically Mitigating Voice-Based Confusion Attacks
Recommendations
Demystifying the Vetting Process of Voice-controlled Skills on Markets
Smart speakers, such as Google Home and Amazon Echo, have become popular. They execute user voice commands via their built-in functionalities together with various third-party voice-controlled applications, called skills. Malicious skills have brought ...
Read Between the Lines: An Empirical Measurement of Sensitive Applications of Voice Personal Assistant Systems
WWW '20: Proceedings of The Web Conference 2020Voice Personal Assistant (VPA) systems such as Amazon Alexa and Google Home have been used by tens of millions of households. Recent work demonstrated proof-of-concept attacks against their voice interface to invoke unintended applications or ...
Alexa, is the skill always safe? Uncover Lenient Skill Vetting Process and Protect User Privacy at Run Time
ICSE-SEIS'24: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in SocietyVoice personal assistant (VPA) platforms (e.g., Amazon Alexa) allow developers to deploy their voice apps on third-party servers. However, this strategy introduces unexpected privacy risks to VPA customers. Malicious developers can dynamically change ...
Comments