skip to main content
research-article

VoiceListener: A Training-free and Universal Eavesdropping Attack on Built-in Speakers of Mobile Devices

Authors Info & Claims
Published:28 March 2023Publication History
Skip Abstract Section

Abstract

Recently, voice leakage gradually raises more significant concerns of users, due to its underlying sensitive and private information when providing intelligent services. Existing studies demonstrate the feasibility of applying learning-based solutions on built-in sensor measurements to recover voices. However, due to the privacy concerns, large-scale voices-sensor measurements samples for model training are not publicly available, leading to significant efforts in data collection for such an attack. In this paper, we propose a training-free and universal eavesdropping attack on built-in speakers, VoiceListener, which releases the data collection efforts and is able to adapt to various voices, platforms, and domains. In particular, VoiceListener develops an aliasing-corrected super resolution mechanism, including an aliasing-based pitch estimation and an aliasing-corrected voice recovering, to convert the undersampled narrow-band sensor measurements to wide-band voices. Extensive experiments demonstrate that our proposed VoiceListener could accurately recover the voices from undersampled sensor measurements and is robust to different voices, platforms and domains, realizing the universal eavesdropping attack.

References

  1. Amazon. 2021. Amazon Alexa - Learn what Alexa can do | Amazon.com. https://www.amazon.com/b?node=21576558011. (2021).Google ScholarGoogle Scholar
  2. S Abhishek Anand and Nitesh Saxena. 2018. Speechless: Analyzing the threat to speech privacy from smartphone motion sensors. In Proceedings of IEEE S&P. 1000--1017.Google ScholarGoogle Scholar
  3. Apple. 2021. Getting Raw Gyroscope Events. https://developer.apple.com/documentation/coremotion/getting_raw_gyroscope_events. (2021).Google ScholarGoogle Scholar
  4. Apple. 2021. Siri - Apple. https://www.apple.com/siri/. (2021).Google ScholarGoogle Scholar
  5. Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu, and Kui Ren. 2020. Learning-based practical smartphone eavesdropping with built-in accelerometer. In Proceedings of NDSS. 23--26.Google ScholarGoogle ScholarCross RefCross Ref
  6. Sören Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-Robert Müller, and Wojciech Samek. 2018. Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals. arXiv preprint arXiv:1807.03418 (2018). arXiv:1807.03418Google ScholarGoogle Scholar
  7. Fox Bussiness. 2019. Apple's Siri is eavesdropping on your conversations, putting users at risk: Report. https://www.foxbusiness.com/technology/apples-siri-is-eavesdropping-on-your-conversations-putting-users-at-risk. (2019).Google ScholarGoogle Scholar
  8. John Cunnison Catford. 1988. A practical introduction to phonetics. Clarendon Press Oxford.Google ScholarGoogle Scholar
  9. CowBoy Channel. 2021. Voice Assistant Industry Size, Market Share: 2021 Market Research with Growth, Manufacturers, Segments and 2023 Forecasts Research. https://www.thecowboychannel.com/story/43600953/voice-assistant-industry-size-market-share-2021-market-research-with-growth-manufacturers-segments-and-2023-forecasts-research. (2021).Google ScholarGoogle Scholar
  10. Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, and Yang Liu. 2021. Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems. In Proceedings of IEEE S&P. San Francisco, CA, USA, 694--711.Google ScholarGoogle ScholarCross RefCross Ref
  11. Meng Chen, Li Lu, Zhongjie Ba, and Kui Ren. 2022. PhoneyTalker: An Out-of-the-Box Toolkit for Adversarial Example Attack on Speaker Recognition. In Proceedings of IEEE INFOCOM. London, United Kingdom, 1419--1428.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. ChinaDialy. 2018. Suit claims Baidu apps illegally tap data. http://www.chinadaily.com.cn/a/201801/06/WS5a5016cfa31008cf16da568a.html. (2018).Google ScholarGoogle Scholar
  13. Julien Epps and W Harvey Holmes. 1999. A new technique for wideband enhancement of coded narrowband speech. In Proceedings of IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria. 174--176.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ming Gao, Yajie Liu, Yike Chen, Yimin Li, Zhongjie Ba, Xian Xu, and Jinsong Han. 2022. InertiEAR: Automatic and Device-independent IMU-based Eavesdropping on Smartphones. In Proceedings of IEEE INFOCOM. 1129--1138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John S Garofolo, Lori F Lamel, William M Fisher, Jonathan G Fiscus, and David S Pallett. 1993. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1--1.1. NASA STI/Recon technical report n 93 (1993), 27403.Google ScholarGoogle ScholarCross RefCross Ref
  16. Google. 2021. Android Developer. https://developer.android.com/guide/topics/sensors/sensors_overview. (2021).Google ScholarGoogle Scholar
  17. Google. 2021. Google Assistant, your own personal Google. https://assistant.google.com/. (2021).Google ScholarGoogle Scholar
  18. Augustine Gray and John Markel. 1976. Distance measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 24, 5 (1976), 380--391.Google ScholarGoogle ScholarCross RefCross Ref
  19. Daniel Griffin and Jae Lim. 1984. Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing 32, 2 (1984), 236--243.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jun Han, Albert Jin Chung, and Patrick Tague. 2017. Pitchln: eavesdropping via intelligible speech reconstruction using non-acoustic sensor fusion. In Proceedings of ACM/IEEE IPSN. 181--192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dik J Hermes. 1988. Measurement of pitch by subharmonic summation. The Journal of the Acoustical Society of America 83, 1 (1988), 257--264.Google ScholarGoogle ScholarCross RefCross Ref
  22. Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian J. McAuley, and Farinaz Koushanfar. 2021. WaveGuard: Understanding and Mitigating Audio Adversarial Examples. In Proceedings of USENIX Security. 2273--2290.Google ScholarGoogle Scholar
  23. Peter Jax and Peter Vary. 2003. Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden Markov model. In Proceedings of IEEE ICASSP, Vol. 1. I-I.Google ScholarGoogle ScholarCross RefCross Ref
  24. Peter Jax and Peter Vary. 2003. On artificial bandwidth extension of telephone speech. Signal Processing 83, 8 (2003), 1707--1719.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Volodymyr Kuleshov, S Zayd Enam, and Stefano Ermon. 2017. Audio super-resolution using neural nets. In Proceedings of ICLR.Google ScholarGoogle Scholar
  26. Guy Lemarquand, Romain Ravaud, Iman Shahosseini, Valérie Lemarquand, Jean Moulin, and Elie Lefeuvre. 2012. MEMS electrodynamic loudspeakers for mobile phones. Applied Acoustics 73, 4 (2012), 379--385.Google ScholarGoogle ScholarCross RefCross Ref
  27. Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff, and AI Amazon. 2019. Speech Audio Super-Resolution for Speech Recognition.. In Proceedings of ISCA INTERSPEECH. 3416--3420.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020. AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations. In Proceedings of ACM CCS. Virtual Event, USA, 1121--1134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Teck Yian Lim, Raymond A Yeh, Yijia Xu, Minh N Do, and Mark Hasegawa-Johnson. 2018. Time-frequency networks for audio super-resolution. In Proceedings of IEEE ICASSP. 646--650.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. John Makhoul and Michael Berouti. 1979. High-frequency regeneration in speech coding systems. In Proceedings of IEEE ICASSP, Vol. 4. 428--431.Google ScholarGoogle Scholar
  31. Michael I Mandel and Young Suk Cho. 2015. Audio super-resolution using concatenative resynthesis. In Proceedings of IEEE WASPAA. 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  32. Héctor A. Cordourier Maruri, Paulo Lopez-Meyer, Jonathan Huang, Willem Marco Beltman, Lama Nachman, and Hong Lu. 2018. V-Speech: Noise-Robust Speech Capturing Glasses Using Vibration Sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 4 (2018).Google ScholarGoogle Scholar
  33. MEMSIC. 2021. MMC3416xPJ. http://www.memsic.com/uploadfiles/2021/02/20210210110317113.pdf. (2021).Google ScholarGoogle Scholar
  34. Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing speech from gyroscope signals. In Proceedings of USENIX Security. 1053--1067.Google ScholarGoogle Scholar
  35. Microsoft. 2021. Cortana - Your personal productivity assistant. https://www.microsoft.com/en-us/cortana. (2021).Google ScholarGoogle Scholar
  36. D Murali Mohan, Dileep B Karpur, Manoj Narayan, and J Kishore. 2011. Artificial bandwidth extension of narrowband speech using Gaussian mixture model. In Proceedings of IEEE International Conference on Communications and Signal Processing. 410--412.Google ScholarGoogle Scholar
  37. Kun-Youl Park and Hyung Soon Kim. 2000. Narrowband to wideband conversion of speech using GMM based transformation. In Proceedings of IEEE ICASSP, Vol. 3. 1843--1846.Google ScholarGoogle Scholar
  38. Yasheng Qian and Peter Kabal. 2002. Wideband speech recovery from narrowband speech using classified codebook mapping. In Proceedings of Australian International Conference on Speech Science, Technology. 106--111.Google ScholarGoogle Scholar
  39. Nathanaël Carraz Rakotonirina. 2021. Self-Attention for Audio Super-Resolution. In Proceedings of IEEE International Workshop on Machine Learning for Signal Processing. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  40. Sriram Sami, Yimin Dai, Sean Rui Xiang Tan, Nirupam Roy, and Jun Han. 2020. Spying with your robot vacuum cleaner: eavesdropping via lidar sensors. In Proceedings ACM SenSys. 354--367.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Samsung. 2021. Samsung Bixby: Your Personal Voice Assistant | Samsung US. https://www.samsung.com/us/explore/bixby/. (2021).Google ScholarGoogle Scholar
  42. Weigao Su, Daibo Liu, Taiyuan Zhang, and Hongbo Jiang. 2022. Towards Device Independent Eavesdropping on Telephone Conversations with Built-in Accelerometer. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4 (2022).Google ScholarGoogle Scholar
  43. Cees H Taal, Richard C Hendriks, Richard Heusdens, and Jesper Jensen. 2010. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE international conference on acoustics, speech and signal processing. IEEE, 4214--4217.Google ScholarGoogle ScholarCross RefCross Ref
  44. The New York Times. 2019. Amazon's Alexa Never Stops Listening to You. Should You Worry? https://www.nytimes.com/wirecutter/blog/amazons-alexa-never-stops-listening-to-you/. (2019).Google ScholarGoogle Scholar
  45. Heming Wang and Deliang Wang. 2020. Time-frequency loss for CNN based speech super-resolution. In Proceedings of IEEE ICASSP. 861--865.Google ScholarGoogle ScholarCross RefCross Ref
  46. Tianshi Wang, Shuochao Yao, Shengzhong Liu, Jinyang Li, Dongxin Liu, Huajie Shao, Ruijie Wang, and Tarek Abdelzaher. 2021. Audio Keyword Reconstruction from On-Device Motion Sensor Signals via Neural Frequency Unfolding. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 3 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Teng Wei, Shu Wang, Anfu Zhou, and Xinyu Zhang. 2015. Acoustic eavesdropping through wireless vibrometry. In Proceedings of ACM MobiCom. 130--141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Sheng Yao and Cheung-Fat Chan. 2005. Block-based bandwidth extension of narrowband speech signal by using CDHMM. In Proceedings of IEEE ICASSP, Vol. 1. I-793.Google ScholarGoogle Scholar
  49. Li Zhang, Parth H Pathak, Muchen Wu, Yixin Zhao, and Prasant Mohapatra. 2015. Accelword: Energy efficient hotword detection through accelerometer. In Proceedings of ACM MobiSys. 301--315.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. VoiceListener: A Training-free and Universal Eavesdropping Attack on Built-in Speakers of Mobile Devices

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
        Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 1
        March 2023
        1243 pages
        EISSN:2474-9567
        DOI:10.1145/3589760
        Issue’s Table of Contents

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 March 2023
        Published in imwut Volume 7, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader