skip to main content
10.1145/3325413.3329789acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

A Case for Two-stage Inference with Knowledge Caching

Published:13 June 2019Publication History

ABSTRACT

Real-world intelligent services employing deep learning technology typically take a two-tier system architecture -- a dumb front-end device and smart back-end cloud servers. The front-end device simply forwards a human query while the back-end servers run a complex deep model to resolve the query and respond to the front-end device. While simple and effective, the current architecture not only increases the load at servers but also runs the risk of harming user privacy. In this paper, we present knowledge caching, which exploits the front-end device as a smart cache of a generalized deep model. The cache locally resolves a subset of popular or privacy-sensitive queries while it forwards the rest of them to back-end cloud servers. We discuss the feasibility of knowledge caching as well as technical challenges around deep model specialization and compression. We show our prototype two-stage inference system that populates a front-end cache with 10 voice commands out of 35 commands. We demonstrate that our specialization and compression techniques reduce the cached model size by 17.4x from the original model with 1.8x improvement on the inference accuracy.

References

  1. 2018. Smart Home Market Report: Trends, Forecast and Competitive Analysis. Technical Report. Lucintel, 8951 Cypress Waters Blvd., Suite 160, Dallas.Google ScholarGoogle Scholar
  2. 2019. Echo & Alexa - Amazon Devices. Retrieved April 10, 2019 from https: //www.amazon.com/Amazon-Echo-And-Alexa-Devices/b?node=9818047011Google ScholarGoogle Scholar
  3. 2019. Edge TPU -- Run Inference at the Edge. Retrieved April 10, 2019 from https://cloud.google.com/edge-tpu/Google ScholarGoogle Scholar
  4. 2019. The future is here: iPhone X. Retrieved April 10, 2019 from https: //www.apple.com/newsroom/2017/09/the-future-is-here-iphone-x/Google ScholarGoogle Scholar
  5. 2019. Galaxy S10 Performance. Retrieved April 10, 2019 from https://www. samsung.com/us/mobile/galaxy-s10/performance/Google ScholarGoogle Scholar
  6. 2019. Google Home. Retrieved April 10, 2019 from https://store.google.com/ product/google_homeGoogle ScholarGoogle Scholar
  7. 2019. High Performance AI at the Edge | NVIDIA Jetson TX2. Retrieved April 10, 2019 from https://www.nvidia.com/en-us/autonomous-machines/ embedded-systems/jetson-tx2/Google ScholarGoogle Scholar
  8. 2019. LG Smart Refrigerators: Powered by SmartThinQ IOT. Retrieved April 10, 2019 from https://www.lg.com/us/discover/smartthinq/refrigeratorsGoogle ScholarGoogle Scholar
  9. 2019. Samsung Family Hub Smart Refrigerator. Retrieved April 10, 2019 from https://www.samsung.com/us/explore/family-hub-refrigerator/refrigerator/Google ScholarGoogle Scholar
  10. 2019. TensorFlow audio recognition tutorial. Retrieved April 10, 2019 from https://www.tensorflow.org/tutorials/sequences/audio_recognitionGoogle ScholarGoogle Scholar
  11. Grigory Antipov, Moez Baccouche, Sid-Ahmed Berrani, and Jean-Luc Dugelay. 2016. Apparent Age Estimation from Face Images Combining General and Children-Specialized Deep Learning Models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  12. Katie Canales. 2018. A couple says that Amazon's Alexa recorded a private conversation and randomly sent it to a friend. Retrieved April 10, 2019 from https:// www.businessinsider.com/amazon-alexa-records-private-conversation-2018--5Google ScholarGoogle Scholar
  13. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On Calibration of Modern Neural Networks. In International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  16. Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR abs/1503.02531.Google ScholarGoogle Scholar
  17. Bret Kinsella. 2018. There are Now More Than 70,000 Alexa Skills Worldwide, Amazon Announces 25 Top Skills of 2018. Retrieved April 10, 2019 from https://bit.ly/2VLyqJ9Google ScholarGoogle Scholar
  18. Bret Kinsella and Ava Mutchler. 2018. Smart Speaker Consumer Adoption Report. Retrieved April 10, 2019 from https://voicebot.ai/wp-content/uploads/2018/10/ voicebot-smart-speaker-consumer-adoption-report.pdfGoogle ScholarGoogle Scholar
  19. Morten Kolbæk, Zheng-Hua Tan, and Jesper Jensen. 2017. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 1, 149--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Master's thesis. Department of Computer Science, University of Toronto.Google ScholarGoogle Scholar
  21. Kimin Lee, Changho Hwang, KyoungSoo Park, and Jinwoo Shin. 2017. Confident Multiple Choice Learning. In International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. 2018. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks. In Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  23. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning Filters for Efficient ConvNets. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  24. Mark McCaffrey, Paige Hayes, Jason Wagner, and Matt Hobbs. 2018. Consumer Intelligence Series: Prepare for the voice revolution. Technical Report.Google ScholarGoogle Scholar
  25. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. CoRR abs/1312.5602.Google ScholarGoogle Scholar
  26. Sharan Narang, Greg Diamos, Shubho Sengupta, and Erich Elsen. 2017. Exploring Sparsity in Recurrent Neural Networks. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  27. Gabriel Pereyra, George Tucker, Jan Chorowski, Lukasz Kaiser, and Geoffrey E. Hinton. 2017. Regularizing Neural Networks by Penalizing Confident Output Distributions. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  28. Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  29. Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. CoRR abs/1804.02767.Google ScholarGoogle Scholar
  30. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  31. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  32. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  33. PeteWarden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. CoRR abs/1804.03209.Google ScholarGoogle Scholar
  34. Xu-Yao Zhang, Fei Yin, Yan-Ming Zhang, Cheng-Lin Liu, and Yoshua Bengio. 2018. Drawing and Recognizing Chinese Characters with Recurrent Neural Network. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40, 4, 849--862.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Case for Two-stage Inference with Knowledge Caching

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                EMDL '19: The 3rd International Workshop on Deep Learning for Mobile Systems and Applications
                June 2019
                46 pages
                ISBN:9781450367714
                DOI:10.1145/3325413

                Copyright © 2019 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 13 June 2019

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Upcoming Conference

                MOBISYS '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader