skip to main content
10.1145/3384419.3430727acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article
Open Access

"Alexa, stop spying on me!": speech privacy protection against voice assistants

Published:16 November 2020Publication History

ABSTRACT

Voice assistants (VAs) are becoming highly popular recently as a general means of interacting with the Internet of Things. However, the use of always-on microphones on VAs imposes a looming threat on users' privacy. In this paper, we propose MicShield, the first system that serves as a companion device to enforce privacy preservation on VAs. MicShield introduces a novel selective jamming mechanism, which obfuscates the user's private speech while passing legitimate voice commands to the VAs. It achieves this by using a phoneme level jamming control pipeline. Our implementation and experiments demonstrate that MicShield can effectively protect a user's private speech, without affecting the VA's responsiveness.

References

  1. Amazon.com. Amazon echo. https://www.amazon.com/echo/.Google ScholarGoogle Scholar
  2. Google. Google home. https://store.google.com/product/google_home.Google ScholarGoogle Scholar
  3. Greg Sterling. Alexa devices maintain 70% market share in u.s. according to survey. https://marketingland.com/alexa-devices-maintain-70-market-share-in-us-according-to-survey-265180.Google ScholarGoogle Scholar
  4. Robert Williams. Study: Smart speaker ownership surges 36% to 53m US adults. https://www.mobilemarketer.com/news/study-smart-speaker-ownership-surges-36-to-53m-us-adults/545717/.Google ScholarGoogle Scholar
  5. Amazon. Alexa, echo devices, and your privacy, amazon help & customer service. https://www.amazon.com/gp/help/customer/display.html?nodeId=GVP69FUJ48X9DK8V.Google ScholarGoogle Scholar
  6. Google. More about data security and privacy on devices that work with assistant. https://support.google.com/googlenest/answer/7072285?hl=en.Google ScholarGoogle Scholar
  7. Google. Google home mini. https://store.google.com/product/google_home_mini.Google ScholarGoogle Scholar
  8. Russakovskii Artem. Google is permanently nerfing all home minis because mine spied on everything i said 24/7. https://www.androidpolice.com/2017/10/10/google-nerfing-home-minis-mine-spied-everything-said-247/#1.Google ScholarGoogle Scholar
  9. Soo Youn. Alexa is always listening --- and so are amazon workers. https://abcnews.go.com/Technology/alexa-listening-amazon-workers/story?id=62331191.Google ScholarGoogle Scholar
  10. Zack Wittaker. Amazonsays US government demands for customer data went up. https://techcrunch.com/2019/08/01/amazon-prism-transparency-data/.Google ScholarGoogle Scholar
  11. Heather Kelly. How to make sure your amazon echo doesn't send secret recordings, 5 2018. https://money.cnn.com/2018/05/25/technology/amazon-alexa-stop-recording/index.html.Google ScholarGoogle Scholar
  12. Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. Inaudible voice commands: The long-range attack and defense. In Proceedings of Usenix NSDI, 2018.Google ScholarGoogle Scholar
  13. Yuxin Chen, Huiying Li, Steven Nagels, Zhijing Li, Pedro Lopes, Ben Y Zhao, and Haitao Zheng. Wearable microphone jamming. In Proceedings of ACM CHI, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nirupam Roy, Haitham Hassanieh, and Romit Roy Choudhury. Backdoor: Making microphones hear inaudible sounds. In Proceedings of ACM MobiSys, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. François Grondin and François Michaud. Lightweight and optimized sound source localization and tracking methods for open and closed microphone array configurations. Robotics and Autonomous Systems, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  16. Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. Dolphinattack: Inaudible voice commands. In Proceedings of ACM CCS, 2017.Google ScholarGoogle Scholar
  17. Takeshi Sugawara, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, and Kevin Fu. Light commands: Laser-based audio injection attacks on voice-controllable systems. 2019.Google ScholarGoogle Scholar
  18. Yitao He, Junyu Bian, Xinyu Tong, Zihui Qian, Wei Zhu, Xiaohua Tian, and Xinbing Wang. Canceling Inaudible Voice Commands Against Voice Control Systems. In Proceedings of ACM MobiCom, 2019.Google ScholarGoogle Scholar
  19. Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. Hidden voice commands. In Proceedings of USENIX Security Symposium, 2016.Google ScholarGoogle Scholar
  20. Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. Cocaine noodles: exploiting the gap between human and machine speech recognition. In 9th USENIX Workshop on Offensive Technologies, 2015.Google ScholarGoogle Scholar
  21. Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A Gunter. Commandersong: A systematic approach for practical adversarial voice recognition. In Proceedings of USENIX Security Symposium, 2018.Google ScholarGoogle Scholar
  22. Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In Proceedings of IEEE Security and Privacy Workshops (SPW), 2018.Google ScholarGoogle ScholarCross RefCross Ref
  23. CMUSphinx, 2019. https://cmusphinx.github.io/.Google ScholarGoogle Scholar
  24. Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. Skill squatting attacks on amazon alexa. In Proceedings of USENIX Security Symposium, 2018.Google ScholarGoogle Scholar
  25. Nan Zhang, Xianghang Mi, Xuan Feng, XiaoFeng Wang, Yuan Tian, and Feng Qian. Dangerous skills: Understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In Proceedings of IEEE Security and Privacy, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  26. Yu-Chih Tung and Kang G Shin. Exploiting sound masking for audio privacy in smartphones. In Proceedings of ACM AsiaCCS, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Gaurav Srivastava, Kunal Bhuwalka, Swarup Kumar Sahoo, Saksham Chitkara, Kevin Ku, Matt Fredrikson, Jason Hong, and Yuvraj Agarwal. Privacyproxy: Leveraging crowdsourcing and in situ traffic analysis to detect and mitigate information leakage, 2017.Google ScholarGoogle Scholar
  28. Yuvraj Agarwal and Malcolm Hall. Protectmyprivacy: Detecting and mitigating privacy leaks on ios devices using crowdsourcing. In Proceedings of ACM MobiSys, 2013.Google ScholarGoogle Scholar
  29. Ashwin Rao, Justine Sherry, Arnaud Legout, Arvind Krishnamurthy, Walid Dabbous, and David Choffnes. Meddle: Middleboxes for increased transparency and control of mobile traffic. 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, and Xiang-Yang Li. Hidebehind: Enjoy voice input with voiceprint unclonability and anonymity. In Proceedings of ACM SenSys, 2018.Google ScholarGoogle Scholar
  31. J. Clark and P. C. van Oorschot. Sok: Ssl and https: Revisiting past challenges and evaluating certificate trust model enhancements. In Proceedings of IEEE Symposium on Security and Privacy, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alexa privacy and data handling overview. https://d1.awsstatic.com/product-marketing/A4B/White%20Paper%20-%20Alexa%20Privacy%20and%20Data%20Handling%20Overview.pdf.Google ScholarGoogle Scholar
  33. Igor Bobriakov. Comparison of top 10 speech processing APIs. https://medium.com/activewizards-machine-learning-company/comparison-of-top-10-speech-processing-apis-2293de1d337f.Google ScholarGoogle Scholar
  34. International standard iec 61672:2003. International Electrotechnical Commission, 2003.Google ScholarGoogle Scholar
  35. Noise and hearing loss prevention. https://www.asha.org/public/hearing/Noise-and-Hearing-Loss-Prevention/.Google ScholarGoogle Scholar
  36. A. D. Wyner. The wire-tap channel. The Bell System Technical Journal, 54(8):1355--1387, Oct 1975.Google ScholarGoogle ScholarCross RefCross Ref
  37. ITU-T Recommendation. Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862, 2001.Google ScholarGoogle Scholar
  38. Antony W Rix, John G Beerends, Michael P Hollier, and Andries P Hekstra. Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs. In Proceedings of IEEE ICASSP, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Amazon Transcribe, 2019. https://aws.amazon.com/transcribe/.Google ScholarGoogle Scholar
  40. Google Cloud Speech-to-Text, 2019. https://cloud.google.com/speech-to-text/.Google ScholarGoogle Scholar
  41. Amazon.com: Echo dot (3rd gen) - smart speaker with alexa - charcoal: Amazon devices. https://www.amazon.com/Echo-Dot/dp/B07FZ8S74R.Google ScholarGoogle Scholar
  42. Bjørn Karmann. Project Alias, 2019. https://www.instructables.com/id/Project-Alias/.Google ScholarGoogle Scholar
  43. Amir Anhari. Alexa dataset - build voice-first applications. https://www.kaggle.com/aanhari/alexa-dataset.Google ScholarGoogle Scholar
  44. Alex Graves and Jürgen Schmidhuber. Framewise phoneme classification with bidirectional lstm networks. In Proceedings of IEEE International Joint Conference on Neural Networks, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  45. John S Garofolo et al. Darpa timit acoustic-phonetic speech database. National Institute of Standards and Technology (NIST), 15:29--50, 1988.Google ScholarGoogle Scholar
  46. Lawrence R Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of IEEE, 77(2):257--286, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  47. The CMU Pronouncing Dictionary, 2019. http://www.speech.cs.cmu.edu/cgi-bin/cmudict.Google ScholarGoogle Scholar
  48. Tomi Kinnunen, Evgenia Chernenko, Marko Tuononen, Pasi Fränti, and Haizhou Li. Voice activity detection using mfcc features and support vector machine. In Int. Conf. on Speech and Computer (SPECOM07), Moscow, Russia, volume 2, pages 556--561, 2007.Google ScholarGoogle Scholar
  49. Logan Blue, Luis Vargas, and Patrick Traynor. Hello, is it me you're looking for? differentiating between human and electronic speakers for voice interface security. In Proceedings of ACM WiSec, 2018.Google ScholarGoogle Scholar
  50. Muhammad Ejaz Ahmed, Il-Youp Kwak, Jun Ho Huh, Iljoo Kim, Taekkyung Oh, and Hyoungshick Kim. Void: A fast and light voice liveness detection system. In Proceedings of USENIX Security Symposium, 2018.Google ScholarGoogle Scholar
  51. Amazon. Google home mini. https://www.amazon.com/gp/help/customer/display.html?nodeId=202201630.Google ScholarGoogle Scholar
  52. John D'Errico. Surface fitting using gridfit. MathWorks file exchange, 643, 2005. https://www.mathworks.com/matlabcentral/fileexchange/8998-surface-fitting-using-gridfit.Google ScholarGoogle Scholar
  53. The respeaker 6 mic array for raspberry pi, 2019. https://respeaker.io.Google ScholarGoogle Scholar
  54. Don H Johnson and Dan E Dudgeon. Array signal processing: concepts and techniques. PTR Prentice Hall Englewood Cliffs, 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Sanjib Sur, Teng Wei, and Xinyu Zhang. Autodirective Audio Capturing through a Synchronized Smartphone Array. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. J Widder and A Morcelli. Basic principles of mems microphones, 2016. https://www.edn.com/basic-principles-of-mems-microphones/.Google ScholarGoogle Scholar
  57. Kazunori Miura. Ultrasonic directive speaker. Elektor Magazine, 3:2011, 2011.Google ScholarGoogle Scholar
  58. Filterless 3w class-d stereo audio amplifier (datasheet). https://www.diodes.com/assets/Datasheets/PAM8403.pdf.Google ScholarGoogle Scholar
  59. John D. Cutnell, Kenneth W. Johnson, David Young, Shane Stadler. Physics. Wiley, 11 edition.Google ScholarGoogle Scholar
  60. H Tijdeman. On the propagation of sound waves in cylindrical tubes. Journal of Sound and Vibration, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  61. Environmental health criteria - ultrasound, 1982. https://apps.who.int/iris/bitstream/handle/10665/37263/9241540826-eng.pdf?sequence=1&isAllowed=y.Google ScholarGoogle Scholar
  62. Pimoroni pHAT DAC24-bit/192khz sound card. https://shop.pimoroni.com/products/phat-dac.Google ScholarGoogle Scholar
  63. Theano, 2019. https://github.com/Theano/Theano.Google ScholarGoogle Scholar
  64. Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. Audio set: An ontology and human-labeled dataset for audio events. In Proceedings of IEEE ICASSP, 2017.Google ScholarGoogle Scholar
  65. Ina 219 zero-drift, bidirectional current/power monitor with i2c interface. http://www.ti.com/lit/ds/symlink/ina219.pdf.Google ScholarGoogle Scholar
  66. Matthias R Mehl, Simine Vazire, Nairán Ramírez-Esparza, Richard B Slatcher, and James W Pennebaker. Are women really more talkative than men? Science, 2007.Google ScholarGoogle Scholar
  67. Johnson Dave. How to save battery on your samsung galaxy s10 in 4 simple ways. https://www.businessinsider.com/how-to-save-battery-on-samsung-galaxy-s10.Google ScholarGoogle Scholar
  68. James Morra. Ai chip brings always-on alexa to battery-powered devices. https://www.electronicdesign.com/technologies/embedded-revolution/article/21808470/ai-chip-brings-alwayson-alexa-to-batterypowered-devices.Google ScholarGoogle Scholar
  69. Google Cloud Text-to-Speech, 2019. https://cloud.google.com/text-to-speech/.Google ScholarGoogle Scholar
  70. Amazon Polly, 2019. https://aws.amazon.com/polly/.Google ScholarGoogle Scholar
  71. IBM Text-to-Speech, 2019. https://www.ibm.com/cloud/watson-text-to-speech.Google ScholarGoogle Scholar
  72. 20 helpful amazon echo voice commands for you to try. https://www.popsci.com/20-amazon-echo-voice-commands/.Google ScholarGoogle Scholar
  73. Yeonjoon Lee, Yue Zhao, Jiutian Zeng, Kwangwuk Lee, Nan Zhang, Faysal Hossain Shezan, Yuan Tian, Kai Chen, and XiaoFeng Wang. Using sonar for liveness detection to protect smart speakers against remote attackers. In Proceedings of ACM IMWUT (UbiComp), 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Linghan Zhang, Sheng Tan, and Jie Yang. Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In Proceedings of ACM CCS, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. "Alexa, stop spying on me!": speech privacy protection against voice assistants

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SenSys '20: Proceedings of the 18th Conference on Embedded Networked Sensor Systems
        November 2020
        852 pages
        ISBN:9781450375900
        DOI:10.1145/3384419

        Copyright © 2020 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 November 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate174of867submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader