research-article

SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications

Authors:
Johannes Grohmann

University of Würzburg, Würzburg, Germany

University of Würzburg, Würzburg, Germany
View Profile

,
Martin Straesser

University of Würzburg, Würzburg, Germany

University of Würzburg, Würzburg, Germany
View Profile

,
Avi Chalbani

Huawei Technologies, Tel Aviv, Israel

Huawei Technologies, Tel Aviv, Israel
View Profile

,
Simon Eismann

University of Würzburg, Würzburg, Germany

University of Würzburg, Würzburg, Germany
View Profile

,
Yair Arian

Huawei Technologies, Tel Aviv, Israel

Huawei Technologies, Tel Aviv, Israel
View Profile

,
Nikolas Herbst

University of Würzburg, Würzburg, Germany

University of Würzburg, Würzburg, Germany
View Profile

,
Noam Peretz

Huawei Technologies, Tel Aviv, Israel

Huawei Technologies, Tel Aviv, Israel
View Profile

,
Samuel Kounev

University of Würzburg, Würzburg, Germany

University of Würzburg, Würzburg, Germany
View Profile

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance EngineeringApril 2021Pages 165–176https://doi.org/10.1145/3427921.3450248

Published:09 April 2021Publication History

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering

Pages 165–176

ABSTRACT

Application performance management (APM) tools are useful to observe the performance properties of an application during production. However, APM is normally purely reactive, that is, it can only report about current or past performance degradation. Although some approaches capable of predictive application monitoring have been proposed, they can only report a predicted degradation but cannot explain its root-cause, making it hard to prevent the expected degradation.

In this paper, we present SuanMing---a framework for predicting performance degradation of microservice applications running in cloud environments. SuanMing is able to predict future root causes for anticipated performance degradations and therefore aims at preventing performance degradations before they actually occur. We evaluate SuanMing on two realistic microservice applications, TeaStore and TrainTicket, and we show that our approach is able to predict and pinpoint performance degradations with an accuracy of over 90%.

References

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang. 2020. GluonTS: Probabilistic and Neural Time Series Modeling in Python. Journal of Machine Learning Research, Vol. 21, 116 (2020), 1--6.Google Scholar
Andre Bauer, Marwin Zufle, Nikolas Herbst, Albin Zehe, Andreas Hotho, and Samuel Kounev. 2020. Time Series Forecasting for Self-Aware Systems. Proc. IEEE, Vol. 108, 7 (2020), 1068--1093.Google ScholarCross Ref
Christoph Bergmeir, Mauro Costantini, and José M. Benítez. 2014. On the usefulness of cross-validation for directional forecast evaluation. Computational Statistics & Data Analysis, Vol. 76 (2014), 132--143.Google ScholarCross Ref
Ricardo Bianchini, Marcus Fontoura, Eli Cortez, Anand Bonde, Alexandre Muzio, Ana-Maria Constantin, Thomas Moscibroda, Gabriel Magalhaes, Girish Bablani, and Mark Russinovich. 2020. Toward ML-Centric Cloud Platforms. Commun. ACM, Vol. 63, 2 (2020), 50--59. https://doi.org/10.1145/3364684Google ScholarDigital Library
Pedro Capelastegui, Alvaro Navas, Francisco Huertas, Rodrigo Garcia-Carmona, and Juan Carlos Dueñas. 2013. An online failure prediction system for private IaaS platforms. In Proceedings of the 2nd International Workshop on Dependability Issues in Cloud Computing (DISCCO '13). Association for Computing Machinery, New York, NY, USA, 1--3.Google ScholarDigital Library
Alexander Clemm and Malte Hartwig. 2010. NETradamus: A forecasting system for system event messages. In IEEE/IFIP Network Operations and Management Symposium (NOMS) (2010), Yoshiaki Kiriha, Lisandro Zambenedetti Granville, Deep Medhi, Toshio Tonouchi, and Myung-Sup Kim (Eds.). IEEE, USA, 623--630. https://doi.org/10.1109/NOMS.2010.5488430Google ScholarCross Ref
Simon Eismann, Cor-Paul Bezemer, Weiyi Shang, Dusan Okanovic, and Andre van Hoorn. 2020. Microservices: A Performance Tester's Dream or Nightmare?. In Proceedings of the 2020 ACM/SPEC International Conference on Performance Engineering (ICPE) (ICPE'20). ACM, New York, NY, USA, 12 pages. Acceptance Rate: 23.4% (15/64).Google ScholarDigital Library
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 85 (2011), 2825--2830.Google ScholarDigital Library
Maria Fazio, Antonio Celesti, Rajiv Ranjan, Chang Liu, Lydia Chen, and Massimo Villari. 2016. Open Issues in Scheduling Microservices in the Cloud. IEEE Cloud Computing, Vol. 3, 5 (2016), 81--88.Google ScholarCross Ref
Benito E. Flores. 1986. A pragmatic view of accuracy measurement in forecasting. Omega, Vol. 14, 2 (1986), 93--98.Google ScholarCross Ref
Martin Fowler. 2015. Microservice Trade-Offs. https://martinfowler.com/articles/microservice-trade-offs.htmlGoogle Scholar
Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 19--33.Google Scholar
Johannes Grohmann, Nikolas Herbst, Avi Chalbani, Yair Arian, Noam Peretz, and Samuel Kounev. 2020. A Taxonomy of Techniques for SLO Failure Prediction in Software Systems. Computers, Vol. 9, 1 (2020), 10.Google ScholarCross Ref
Johannes Grohmann, Nikolas Herbst, Simon Spinner, and Samuel Kounev. 2017. Self-Tuning Resource Demand Estimation. In Proceedings of the 14th IEEE International Conference on Autonomic Computing (ICAC 2017). IEEE, USA, 21--26.Google ScholarCross Ref
Johannes Grohmann, Patrick K. Nicholson, Jesus Omana Iglesias, Samuel Kounev, and Diego Lugones. 2019. Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning. In Proceedings of the 20th International Middleware Conference (Davis, CA, USA) (Middleware '19). Association for Computing Machinery, New York, NY, USA, 149--162.Google ScholarDigital Library
Xiaohui Gu, Spiros Papadimitriou, Philip S. Yu, and Shu-Ping Chang. 2008. Online Failure Forecast for Fault-Tolerant Data Stream Processing. In 2008 IEEE 24th International Conference on Data Engineering. IEEE, USA, 1388--1390.Google Scholar
Nikolas Herbst, Ayman Amin, Artur Andrzejak, Lars Grunske, Samuel Kounev, Ole J. Mengshoel, and Priya Sundararajan. 2017. Online Workload Forecasting. In Self-Aware Computing Systems, Samuel Kounev, Jeffrey O. Kephart, Xiaoyun Zhu, and Aleksandar Milenkoski (Eds.). Springer Verlag, Berlin Heidelberg, Germany, 529--553.Google Scholar
Pooyan Jamshidi, Claus Pahl, Nabor C. Mendonca, James Lewis, and Stefan Tilkov. 2018. Microservices: The Journey So Far and Challenges Ahead. IEEE Software, Vol. 35, 3 (2018), 24--35.Google ScholarCross Ref
Hiranya Jayathilaka, Chandra Krintz, and Rich Wolski. 2017. Performance Monitoring and Root Cause Analysis for Cloud-hosted Web Applications. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 469--478.Google ScholarDigital Library
Anshul Jindal, Vladimir Podolskiy, and Michael Gerndt. 2019. Performance Modeling for Cloud Microservice Applications. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (ICPE '19). Association for Computing Machinery, New York, NY, USA, 25--32.Google ScholarDigital Library
James Lewis and Martin Fowler. 2014. Microservices: a definition of this new architectural term. https://martinfowler.com/articles/microservices.htmlGoogle Scholar
Jinjin Lin, Pengfei Chen, and Zibin Zheng. 2018. Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-service Environments. In Service-Oriented Computing, Claus Pahl, Maja Vukovic, Jianwei Yin, and Qi Yu (Eds.), Vol. 11236. Springer International Publishing, Cham, 3--20.Google ScholarDigital Library
Leonardo Mariani, Mauro Pezzè, Oliviero Riganelli, and Rui Xin. 2020. Predicting failures in multi-tier distributed systems. Journal of Systems and Software, Vol. 161 (2020), 110464.Google ScholarDigital Library
Burcu Ozcelik and Cemal Yilmaz. 2016. Seer: A Lightweight Online Failure Prediction Approach. IEEE Transactions on Software Engineering, Vol. 42, 1 (2016), 26--46.Google ScholarDigital Library
Teerat Pitakrat, Jonas Grunert, Oliver Kabierschke, Fabian Keller, and Andre van Hoorn. 2014. A Framework for System Event Classification and Prediction by Means of Machine Learning. In Proceedings of the 8th International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS '14). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels, BEL, 173--180.Google ScholarDigital Library
Teerat Pitakrat, Dusan Okanovic, André van Hoorn, and Lars Grunske. 2018. Hora: Architecture-aware online failure prediction. Journal of Systems and Software, Vol. 137 (2018), 669--685.Google ScholarCross Ref
Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google, Inc.Google Scholar
Simon Spinner, Giuliano Casale, Fabian Brosig, and Samuel Kounev. 2015. Evaluating approaches to resource demand estimation. Performance Evaluation, Vol. 92 (2015), 51--71.Google ScholarDigital Library
André van Hoorn, Jan Waller, and Wilhelm Hasselbring. 2012. Kieker. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE 2012). ACM, New York, NY, USA, 247.Google Scholar
Joakim von Kistowski, Maximilian Deffner, and Samuel Kounev. 2018a. Run-Time Prediction of Power Consumption for Component Deployments. In 2018 IEEE International Conference on Autonomic Computing (ICAC). IEEE, USA, 151--156.Google Scholar
Joakim von Kistowski, Simon Eismann, Norbert Schmitt, Andre Bauer, Johannes Grohmann, and Samuel Kounev. 2018b. TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research. In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, USA, 223--236.Google Scholar
Ping Wang, Jingmin Xu, Meng Ma, Weilan Lin, Disheng Pan, Yuan Wang, and Pengfei Chen. 2018. CloudRanger: Root Cause Identification for Cloud Native Systems. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '18). IEEE Press, USA, 492--502.Google ScholarDigital Library
Jianping Weng, Jessie Hui Wang, Jiahai Yang, and Yang Yang. 2018. Root Cause Analysis of Anomalies of Multitier Services in Public Clouds. IEEE/ACM Trans. Netw., Vol. 26, 4 (2018), 1646--1659.Google ScholarDigital Library
Li Wu, Johan Tordsson, Erik Elmroth, and Odej Kao. 2020. MicroRCA: Root Cause Localization of Performance Issues in Microservices. In IEEE/IFIP Network Operations and Management Symposium (NOMS). IEEE, Budapest, Hungary, 1--9.Google Scholar
Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chao Ji, Wenhai Li, and Dan Ding. 2018. Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study. IEEE Transactions on Software Engineering, Vol. 1, 01 (2018), 1--1.Google Scholar
Marwin Züfle, André Bauer, Nikolas Herbst, Valentin Curtef, and Samuel Kounev. 2017. Telescope: A Hybrid Forecast Method for Univariate Time Series. In Proceedings of the International work-conference on Time Series (ITISE 2017). Springer, Berlin Heidelberg, Germany.Google Scholar

Index Terms

SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications

Recommendations

Towards Efficient Diagnosis of Performance Bottlenecks in Microservice-Based Applications (Work In Progress paper)
ICPE '24 Companion: Companion of the 15th ACM/SPEC International Conference on Performance Engineering

Microservices have been a cornerstone for building scalable, flexible, and robust applications, thereby enabling service providers to enhance their systems' resilience and fault tolerance. However, adopting this architecture has often led to many ...
Read More
An Investigation into the Application of Different Performance Prediction Methods to Distributed Enterprise Applications

Response time predictions for workload on new server architectures can enhance Service Level Agreement--based resource management. This paper evaluates two performance prediction methods using a distributed enterprise application benchmark. The ...
Read More
Precise contention-aware performance prediction on virtualized multicore system

Virtualized multicore contention - aware performance prediction model.Virtual machine contention sensitivity and intensity features collection.Quantify the precise levels of performance degradation between VMs. Multicore systems are widely deployed in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering
April 2021
301 pages
ISBN:9781450381949
DOI:10.1145/3427921
General Chairs:
Johann Bourcier
University of Rennes 1, France
,
Zhen Ming (Jack) Jiang
York University, Canada
,
Program Chairs:
Cor-Paul Bezemer
University of Alberta, Canada
,
Vittorio Cortellessa
University of L'Aquila, Italy
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
explainability
forecasting
microservices
performance prediction
Qualifiers
- research-article
Conference

Acceptance Rates
ICPE '21 Paper Acceptance Rate16of61submissions,26%Overall Acceptance Rate252of851submissions,30%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 383
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Efficient Diagnosis of Performance Bottlenecks in Microservice-Based Applications (Work In Progress paper)

An Investigation into the Application of Different Performance Prediction Methods to Distributed Enterprise Applications

Precise contention-aware performance prediction on virtualized multicore system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications

ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Efficient Diagnosis of Performance Bottlenecks in Microservice-Based Applications (Work In Progress paper)

An Investigation into the Application of Different Performance Prediction Methods to Distributed Enterprise Applications

Precise contention-aware performance prediction on virtualized multicore system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media