research-article

Public Access

SkillFence: A Systems Approach to Practically Mitigating Voice-Based Confusion Attacks

Authors:
Ashish Hooda

University of Wisconsin-Madison, Madison, Wisconsin, USA

University of Wisconsin-Madison, Madison, Wisconsin, USA
View Profile

,
Matthew Wallace

University of Wisconsin-Madison, Madison, Wisconsin, USA

University of Wisconsin-Madison, Madison, Wisconsin, USA
View Profile

,
Kushal Jhunjhunwalla

University of Washington, Seattle, Washington, USA

University of Washington, Seattle, Washington, USA
View Profile

,
Earlence Fernandes

University of Wisconsin-Madison, Madison, Wisconsin, USA

University of Wisconsin-Madison, Madison, Wisconsin, USA
View Profile

,
Kassem Fawaz

University of Wisconsin-Madison, Madison, Wisconsin, USA

University of Wisconsin-Madison, Madison, Wisconsin, USA
View Profile

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 6 Issue 1Article No.: 16pp 1–26https://doi.org/10.1145/3517232

Published:29 March 2022Publication History

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Voice assistants are deployed widely and provide useful functionality. However, recent work has shown that commercial systems like Amazon Alexa and Google Home are vulnerable to voice-based confusion attacks that exploit design issues. We propose a systems-oriented defense against this class of attacks and demonstrate its functionality for Amazon Alexa. We ensure that only the skills a user intends execute in response to voice commands. Our key insight is that we can interpret a user's intentions by analyzing their activity on counterpart systems of the web and smartphones. For example, the Lyft ride-sharing Alexa skill has an Android app and a website. Our work shows how information from counterpart apps can help reduce dis-ambiguities in the skill invocation process. We build SkilIFence, a browser extension that existing voice assistant users can install to ensure that only legitimate skills run in response to their commands. Using real user data from MTurk (N = 116) and experimental trials involving synthetic and organic speech, we show that SkillFence provides a balance between usability and security by securing 90.83% of skills that a user will need with a False acceptance rate of 19.83%.

References

N. Carlini, P. Mishra, T. Vaidya, Y. Zhang, M. Sherr, C. Shields, D. Wagner, and W. Zhou. 2016. Hidden Voice Commands. In 25th USENIX Security Symposium (USENIX Security 16).Google Scholar
Yuxuan Chen, Xuejing Yuan, Jiangshan Zhang, Yue Zhao, Kai Zhang, Shengzhi Chen, and XiaoFeng Wang. 2020. Devil's Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices. In 29th USENIX Security Symposium (USENIX Security 20).Google Scholar
E. Fernandes, J. Paupore, A. Rahmati, D. Simionato, M. Conti, and A. Prakash. 2016. FlowFence: Practical Data Protection for Emerging IoT Application Frameworks. In Proceedings of the 25th USENIX Security Symposium.Google Scholar
Dmitry Gerasimenko. 2010 (accessed 2020). Ahrefs. https://ahrefs.comGoogle Scholar
Zhixiu Guo, Zijin Lin, Pan Li, and Kai Chen. 2020. SkillExplorer: Understanding the Behavior of Skills in Large Scale. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association. https://www.usenix.org/conference/usenixsecurity20/presentation/guoGoogle Scholar
Hang Hu, Limin Yang, Shihan Lin, and Gang Wang. 2020. A Case Study of the Security Vetting Process of Smart-home Assistant Applications. In IEEE Workshop on the Internet of Safe Things (SafeThings).Google ScholarCross Ref
Amazon Inc. [n.d.]. Alexa Skill Certification Guidelines. https://developer.amazon.com/en-US/docs/alexa/custom-skills/certification-requirements-for-custom-skills.html.Google Scholar
Amazon.com Inc. [n.d.]. Alexa Skills for Business and Finance. https://www.amazon.com/Alexa-Skills-Business-Finance/b?ie=UTF8&node=14284819011.Google Scholar
BRET Kinsella. 2018. Should Amazon Alexa Stop Allowing Duplicate Invocation Names? Should Google Assistant Permit Them? https://voicebot.ai/2018/03/26/amazon-alexa-stop-allowing-duplicate-invocation-names-google-assistant-permit/.Google Scholar
Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. 2018. Skill Squatting Attacks on Amazon Alexa. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 33--47. https://www.usenix.org/conference/usenixsecurity18/presentation/kumarGoogle Scholar
Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. 2018. Skill squatting attacks on Amazon Alexa. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 33--47.Google Scholar
Christopher Lentzsch, Sheel Jayesh Shah, Benjamin Andow, Martin Degeling, Anupam Das, and William Enck. 2021. Hey Alexa, is this Skill Safe?: Taking a Closer Look at the Alexa Skill Ecosystem. In Proceedings of the 28th ISOC Annual Network and Distributed Systems Symposium (NDSS).Google ScholarCross Ref
Christopher Lentzsch, Sheel Jayesh Shah, Benjamin Andow, Martin Degeling, Anupam Das, and William Enck. 2021. Hey Alexa, is this Skill Safe?: Taking a Closer Look at the Alexa Skill Ecosystem. In Proceedings of the 28th ISOC Annual Network and Distributed Systems Symposium (NDSS).Google ScholarCross Ref
Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, and Liyun Ru. 2008. Identifying Web Spam with User Behavior Analysis. In Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (Beijing, China) (AIRWeb '08). Association for Computing Machinery, New York, NY, USA, 9--16. https://doi.org/10.1145/1451983.1451986Google ScholarDigital Library
David J. Major, Danny Yuxing Huang, Marshini Chetty, and Nick Feamster. 2019. Alexa, Who Am I Speaking To? Understanding Users' Ability to Identify Third-Party Apps on Amazon Alexa. arXiv:1910.14112 [cs.HC]Google Scholar
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5206--5210. https://doi.org/10.1109/ICASSP.2015.7178964Google ScholarCross Ref
Paul Cutsinger. 2018. How to Improve Alexa Skill Discovery with Name-Free Interaction and More. https://developer.amazon.com/blogs/alexa/post/0fecdb38-97c9-48ac-953b-23814a469cfc/skill-discovery.Google Scholar
Ritik Singh. 2021. 7 Ways to Find If an App Is Fake or Real Before Installing It. https://gadgetstouse.com/blog/2021/04/19/find-app-is-fake-or-real-before-installing/.Google Scholar
Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. 2018. Inaudible Voice Commands: The Long-Range Attack and Defense. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 547--560. https://www.usenix.org/conference/nsdi18/presentation/royGoogle ScholarDigital Library
Faysal Hossain Shezan, Hang Hu, Jiamin Wang, Gang Wang, and Yuan Tian. 2020. Read Between the Lines: An Empirical Measurement of Sensitive Applications of Voice Personal Assistant Systems. In Proceedings of the Web Conference (WWW'20).Google ScholarDigital Library
Takeshi Sugawara, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, and Kevin Fu. 2020. Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems. In 29th USENIXS ecurity Symposium (USENIX Security 20).Google Scholar
R. M. Suresh and R. Padmajavalli. 2007. An Overview of Data Preprocessing in Data and Web Usage Mining. In 2006 1st International Conference on Digital Information Management. 193--198. https://doi.org/10.1109/ICDIM.2007.369352Google Scholar
Understand the Smart Home Skill API [n.d.]. Understand the Smart Home Skill API. https://developer.amazon.com/en-US/docs/alexa/smarthome/understand-the-smart-home-skill-api.html.Google Scholar
Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine noodles: exploiting the gap between human and machine speech recognition. In 9th { USENIX} Workshop on Offensive Technologies ({ WOOT} 15).Google Scholar
voicebot.ai. 2021. Alexa Skill Counts Surpass 80K in US, Spain Adds the Most Skills, New Skill Rate Falls Globally. https://voicebot.ai/2021/01/14/alexa-skill-counts-surpass-80k-in-us-spain-adds-the-most-skills-new-skill-introduction-rate-continues-to-fall-across-countries/.Google Scholar
Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A. Gunter. 2018. Commandersong: A Systematic Approach for Practical Adversarial Voice Recognition. In Proceedings of the 27th USENIX Conference on Security Symposium (Baltimore, MD, USA) (SEC'18). USENIX Association, USA, 49--64.Google Scholar
Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. DolphinAttack: Inaudible Voice Commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS '17). Association for Computing Machinery, New York, NY, USA, 103--117. https://doi.org/10.1145/3133956.3134052Google ScholarDigital Library
N. Zhang, X. Mi, X. Feng, X. Wang, Y. Tian, and F. Qian. 2019. Dangerous Skills: Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions on Virtual Personal Assistant Systems. In 2019 IEEE Symposium on Security and Privacy (SP), Vol.00. 263--278. https://doi.org/10.1109/SP.2019.00016Google Scholar
Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, and Guofei Gu. 2019. Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications. In Proceedings of the Network and Distributed System Security Symposium (NDSS'19).Google ScholarCross Ref
Yangyong Zhang, Lei Xu, Abner Mendoza, Guangliang Yang, Phakpoom Chinprutthiwong, and Guofei Gu. 2019. Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications.. In NDSS.Google Scholar
B. Zhou, S. C. Hui, and A. C. m. Fong. 2006. An Effective Approach for Periodic Web Personalization. In 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06). 284--292. https://doi.org/10.1109/WI.2006.36Google ScholarDigital Library

Index Terms

SkillFence: A Systems Approach to Practically Mitigating Voice-Based Confusion Attacks
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
  2. Security services
    1. Authentication

Recommendations

Demystifying the Vetting Process of Voice-controlled Skills on Markets

Smart speakers, such as Google Home and Amazon Echo, have become popular. They execute user voice commands via their built-in functionalities together with various third-party voice-controlled applications, called skills. Malicious skills have brought ...
Read More
Read Between the Lines: An Empirical Measurement of Sensitive Applications of Voice Personal Assistant Systems
WWW '20: Proceedings of The Web Conference 2020

Voice Personal Assistant (VPA) systems such as Amazon Alexa and Google Home have been used by tens of millions of households. Recent work demonstrated proof-of-concept attacks against their voice interface to invoke unintended applications or ...
Read More
Alexa, is the skill always safe? Uncover Lenient Skill Vetting Process and Protect User Privacy at Run Time
ICSE-SEIS'24: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Society

Voice personal assistant (VPA) platforms (e.g., Amazon Alexa) allow developers to deploy their voice apps on third-party servers. However, this strategy introduces unexpected privacy risks to VPA customers. Malicious developers can dynamically change ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 6, Issue 1
March 2022
1009 pages
EISSN:2474-9567
DOI:10.1145/3529514
Issue’s Table of Contents

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 March 2022
Published in imwut Volume 6, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Alexa
Defense
Skill
Skill-Squatting
Voice Attacks
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 278
  Total Downloads
- Downloads (Last 12 months)111
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SkillFence: A Systems Approach to Practically Mitigating Voice-Based Confusion Attacks

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

References

Cited By

Index Terms

Recommendations

Demystifying the Vetting Process of Voice-controlled Skills on Markets

Read Between the Lines: An Empirical Measurement of Sensitive Applications of Voice Personal Assistant Systems

Alexa, is the skill always safe? Uncover Lenient Skill Vetting Process and Protect User Privacy at Run Time

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SkillFence: A Systems Approach to Practically Mitigating Voice-Based Confusion Attacks

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

References

Cited By

Index Terms

Recommendations

Demystifying the Vetting Process of Voice-controlled Skills on Markets

Read Between the Lines: An Empirical Measurement of Sensitive Applications of Voice Personal Assistant Systems

Alexa, is the skill always safe? Uncover Lenient Skill Vetting Process and Protect User Privacy at Run Time

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media