skip to main content
10.1145/3427921.3450248acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications

Published:09 April 2021Publication History

ABSTRACT

Application performance management (APM) tools are useful to observe the performance properties of an application during production. However, APM is normally purely reactive, that is, it can only report about current or past performance degradation. Although some approaches capable of predictive application monitoring have been proposed, they can only report a predicted degradation but cannot explain its root-cause, making it hard to prevent the expected degradation.

In this paper, we present SuanMing---a framework for predicting performance degradation of microservice applications running in cloud environments. SuanMing is able to predict future root causes for anticipated performance degradations and therefore aims at preventing performance degradations before they actually occur. We evaluate SuanMing on two realistic microservice applications, TeaStore and TrainTicket, and we show that our approach is able to predict and pinpoint performance degradations with an accuracy of over 90%.

References

  1. Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang. 2020. GluonTS: Probabilistic and Neural Time Series Modeling in Python. Journal of Machine Learning Research, Vol. 21, 116 (2020), 1--6.Google ScholarGoogle Scholar
  2. Andre Bauer, Marwin Zufle, Nikolas Herbst, Albin Zehe, Andreas Hotho, and Samuel Kounev. 2020. Time Series Forecasting for Self-Aware Systems. Proc. IEEE, Vol. 108, 7 (2020), 1068--1093.Google ScholarGoogle ScholarCross RefCross Ref
  3. Christoph Bergmeir, Mauro Costantini, and José M. Benítez. 2014. On the usefulness of cross-validation for directional forecast evaluation. Computational Statistics & Data Analysis, Vol. 76 (2014), 132--143.Google ScholarGoogle ScholarCross RefCross Ref
  4. Ricardo Bianchini, Marcus Fontoura, Eli Cortez, Anand Bonde, Alexandre Muzio, Ana-Maria Constantin, Thomas Moscibroda, Gabriel Magalhaes, Girish Bablani, and Mark Russinovich. 2020. Toward ML-Centric Cloud Platforms. Commun. ACM, Vol. 63, 2 (2020), 50--59. https://doi.org/10.1145/3364684Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Pedro Capelastegui, Alvaro Navas, Francisco Huertas, Rodrigo Garcia-Carmona, and Juan Carlos Dueñas. 2013. An online failure prediction system for private IaaS platforms. In Proceedings of the 2nd International Workshop on Dependability Issues in Cloud Computing (DISCCO '13). Association for Computing Machinery, New York, NY, USA, 1--3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexander Clemm and Malte Hartwig. 2010. NETradamus: A forecasting system for system event messages. In IEEE/IFIP Network Operations and Management Symposium (NOMS) (2010), Yoshiaki Kiriha, Lisandro Zambenedetti Granville, Deep Medhi, Toshio Tonouchi, and Myung-Sup Kim (Eds.). IEEE, USA, 623--630. https://doi.org/10.1109/NOMS.2010.5488430Google ScholarGoogle ScholarCross RefCross Ref
  7. Simon Eismann, Cor-Paul Bezemer, Weiyi Shang, Dusan Okanovic, and Andre van Hoorn. 2020. Microservices: A Performance Tester's Dream or Nightmare?. In Proceedings of the 2020 ACM/SPEC International Conference on Performance Engineering (ICPE) (ICPE'20). ACM, New York, NY, USA, 12 pages. Acceptance Rate: 23.4% (15/64).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 85 (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Maria Fazio, Antonio Celesti, Rajiv Ranjan, Chang Liu, Lydia Chen, and Massimo Villari. 2016. Open Issues in Scheduling Microservices in the Cloud. IEEE Cloud Computing, Vol. 3, 5 (2016), 81--88.Google ScholarGoogle ScholarCross RefCross Ref
  10. Benito E. Flores. 1986. A pragmatic view of accuracy measurement in forecasting. Omega, Vol. 14, 2 (1986), 93--98.Google ScholarGoogle ScholarCross RefCross Ref
  11. Martin Fowler. 2015. Microservice Trade-Offs. https://martinfowler.com/articles/microservice-trade-offs.htmlGoogle ScholarGoogle Scholar
  12. Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 19--33.Google ScholarGoogle Scholar
  13. Johannes Grohmann, Nikolas Herbst, Avi Chalbani, Yair Arian, Noam Peretz, and Samuel Kounev. 2020. A Taxonomy of Techniques for SLO Failure Prediction in Software Systems. Computers, Vol. 9, 1 (2020), 10.Google ScholarGoogle ScholarCross RefCross Ref
  14. Johannes Grohmann, Nikolas Herbst, Simon Spinner, and Samuel Kounev. 2017. Self-Tuning Resource Demand Estimation. In Proceedings of the 14th IEEE International Conference on Autonomic Computing (ICAC 2017). IEEE, USA, 21--26.Google ScholarGoogle ScholarCross RefCross Ref
  15. Johannes Grohmann, Patrick K. Nicholson, Jesus Omana Iglesias, Samuel Kounev, and Diego Lugones. 2019. Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning. In Proceedings of the 20th International Middleware Conference (Davis, CA, USA) (Middleware '19). Association for Computing Machinery, New York, NY, USA, 149--162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xiaohui Gu, Spiros Papadimitriou, Philip S. Yu, and Shu-Ping Chang. 2008. Online Failure Forecast for Fault-Tolerant Data Stream Processing. In 2008 IEEE 24th International Conference on Data Engineering. IEEE, USA, 1388--1390.Google ScholarGoogle Scholar
  17. Nikolas Herbst, Ayman Amin, Artur Andrzejak, Lars Grunske, Samuel Kounev, Ole J. Mengshoel, and Priya Sundararajan. 2017. Online Workload Forecasting. In Self-Aware Computing Systems, Samuel Kounev, Jeffrey O. Kephart, Xiaoyun Zhu, and Aleksandar Milenkoski (Eds.). Springer Verlag, Berlin Heidelberg, Germany, 529--553.Google ScholarGoogle Scholar
  18. Pooyan Jamshidi, Claus Pahl, Nabor C. Mendonca, James Lewis, and Stefan Tilkov. 2018. Microservices: The Journey So Far and Challenges Ahead. IEEE Software, Vol. 35, 3 (2018), 24--35.Google ScholarGoogle ScholarCross RefCross Ref
  19. Hiranya Jayathilaka, Chandra Krintz, and Rich Wolski. 2017. Performance Monitoring and Root Cause Analysis for Cloud-hosted Web Applications. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 469--478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Anshul Jindal, Vladimir Podolskiy, and Michael Gerndt. 2019. Performance Modeling for Cloud Microservice Applications. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (ICPE '19). Association for Computing Machinery, New York, NY, USA, 25--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. James Lewis and Martin Fowler. 2014. Microservices: a definition of this new architectural term. https://martinfowler.com/articles/microservices.htmlGoogle ScholarGoogle Scholar
  22. Jinjin Lin, Pengfei Chen, and Zibin Zheng. 2018. Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-service Environments. In Service-Oriented Computing, Claus Pahl, Maja Vukovic, Jianwei Yin, and Qi Yu (Eds.), Vol. 11236. Springer International Publishing, Cham, 3--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leonardo Mariani, Mauro Pezzè, Oliviero Riganelli, and Rui Xin. 2020. Predicting failures in multi-tier distributed systems. Journal of Systems and Software, Vol. 161 (2020), 110464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Burcu Ozcelik and Cemal Yilmaz. 2016. Seer: A Lightweight Online Failure Prediction Approach. IEEE Transactions on Software Engineering, Vol. 42, 1 (2016), 26--46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Teerat Pitakrat, Jonas Grunert, Oliver Kabierschke, Fabian Keller, and Andre van Hoorn. 2014. A Framework for System Event Classification and Prediction by Means of Machine Learning. In Proceedings of the 8th International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS '14). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels, BEL, 173--180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Teerat Pitakrat, Dusan Okanovic, André van Hoorn, and Lars Grunske. 2018. Hora: Architecture-aware online failure prediction. Journal of Systems and Software, Vol. 137 (2018), 669--685.Google ScholarGoogle ScholarCross RefCross Ref
  27. Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google, Inc.Google ScholarGoogle Scholar
  28. Simon Spinner, Giuliano Casale, Fabian Brosig, and Samuel Kounev. 2015. Evaluating approaches to resource demand estimation. Performance Evaluation, Vol. 92 (2015), 51--71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. André van Hoorn, Jan Waller, and Wilhelm Hasselbring. 2012. Kieker. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE 2012). ACM, New York, NY, USA, 247.Google ScholarGoogle Scholar
  30. Joakim von Kistowski, Maximilian Deffner, and Samuel Kounev. 2018a. Run-Time Prediction of Power Consumption for Component Deployments. In 2018 IEEE International Conference on Autonomic Computing (ICAC). IEEE, USA, 151--156.Google ScholarGoogle Scholar
  31. Joakim von Kistowski, Simon Eismann, Norbert Schmitt, Andre Bauer, Johannes Grohmann, and Samuel Kounev. 2018b. TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research. In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, USA, 223--236.Google ScholarGoogle Scholar
  32. Ping Wang, Jingmin Xu, Meng Ma, Weilan Lin, Disheng Pan, Yuan Wang, and Pengfei Chen. 2018. CloudRanger: Root Cause Identification for Cloud Native Systems. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '18). IEEE Press, USA, 492--502.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jianping Weng, Jessie Hui Wang, Jiahai Yang, and Yang Yang. 2018. Root Cause Analysis of Anomalies of Multitier Services in Public Clouds. IEEE/ACM Trans. Netw., Vol. 26, 4 (2018), 1646--1659.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Li Wu, Johan Tordsson, Erik Elmroth, and Odej Kao. 2020. MicroRCA: Root Cause Localization of Performance Issues in Microservices. In IEEE/IFIP Network Operations and Management Symposium (NOMS). IEEE, Budapest, Hungary, 1--9.Google ScholarGoogle Scholar
  35. Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chao Ji, Wenhai Li, and Dan Ding. 2018. Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study. IEEE Transactions on Software Engineering, Vol. 1, 01 (2018), 1--1.Google ScholarGoogle Scholar
  36. Marwin Züfle, André Bauer, Nikolas Herbst, Valentin Curtef, and Samuel Kounev. 2017. Telescope: A Hybrid Forecast Method for Univariate Time Series. In Proceedings of the International work-conference on Time Series (ITISE 2017). Springer, Berlin Heidelberg, Germany.Google ScholarGoogle Scholar

Index Terms

  1. SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering
          April 2021
          301 pages
          ISBN:9781450381949
          DOI:10.1145/3427921

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 April 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICPE '21 Paper Acceptance Rate16of61submissions,26%Overall Acceptance Rate252of851submissions,30%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader