Skip to main content

Making Runtime Data Useful for Incident Diagnosis: An Experience Report

  • Conference paper
  • First Online:
Product-Focused Software Process Improvement (PROFES 2018)

Abstract

Important and critical aspects of technical debt often surface at runtime only and are difficult to measure statically. This is a particular challenge for cloud applications because of their highly distributed nature. Fortunately, mature frameworks for collecting runtime data exist but need to be integrated.

In this paper, we report an experience from a project that implements a cloud application within Kubernetes on Azure. To analyze the runtime data of this software system, we instrumented our services with Zipkin for distributed tracing; with Prometheus and Grafana for analyzing metrics; and with fluentd, Elasticsearch and Kibana for collecting, storing and exploring log files. However, project team members did not utilize these runtime data until we created a unified and simple access using a chat bot.

We argue that even though your project collects runtime data, this is not sufficient to guarantee its usage: In order to be useful, a simple, unified access to different data sources is required that should be integrated into tools that are commonly used by team members.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Android. https://www.android.com/. Accessed 26 July 2018

  2. Azure. https://azure.microsoft.com/de-de/. Accessed 26 July 2018

  3. Bass, L., Weber, I., Zhu, L.: DevOps: A Software Architect’s Perspective. Addison-Wesley Professional, Boston (2015)

    Google Scholar 

  4. C++. https://isocpp.org/. Accessed 26 July 2018

  5. Ciolkowski, M., Guzmán, L., Trendowicz, A., Vollmer, A.M.: Challenges in assessing technical debt based on dynamic runtime data, pp. 442–445, Prague, August 2018. https://doi.org/10.1109/SEAA.2018.00078

  6. Docker. https://www.docker.com/. Accessed 26 July 2018

  7. Elastic, Inc.: Elasticsearch. https://www.elastic.co/de/products/elasticsearch/. Accessed 26 July 2018

  8. Elastic, Inc.: Kibana. https://www.elastic.co/de/products/kibana/. Accessed 26 July 2018

  9. Fluentd. https://www.fluentd.org/. Accessed 26 July 2018

  10. Go. https://golang.org/. Accessed 26 July 2018

  11. Grafana. https://grafana.com/. Accessed 26 July 2018

  12. iOS. https://www.apple.com/de/ios/. Accessed 26 July 2018

  13. Kubernetes. https://kubernetes.io/. Accessed 26 July 2018

  14. Lautenschlager, F., Philippsen, M., Kumlehn, A., Adersberger, J.: Chronix: long term storage and retrieval technology for anomaly detection in operational data. In: Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST 2017), pp. 229–242 (2017)

    Google Scholar 

  15. Lua. https://www.lua.org/. Accessed 26 July 2018

  16. Mattermost. https://mattermost.com/. Accessed 26 July 2018

  17. Openshift. https://www.openshift.com/. Accessed 26 July 2018

  18. Prometheus. http://prometheus.io/. Accessed 26 July 2018

  19. Python. https://www.python.org/. Accessed 26 July 2018

  20. Spring Boot. https://spring.io/projects/spring-boot/. Accessed 26 July 2018

  21. Zipkin. https://zipkin.io/. Accessed 26 July 2018

Download references

Acknowledgments

We thank Robert Hoffmann from Deutsche Telekom for his support.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Florian Lautenschlager or Marcus Ciolkowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lautenschlager, F., Ciolkowski, M. (2018). Making Runtime Data Useful for Incident Diagnosis: An Experience Report. In: Kuhrmann, M., et al. Product-Focused Software Process Improvement. PROFES 2018. Lecture Notes in Computer Science(), vol 11271. Springer, Cham. https://doi.org/10.1007/978-3-030-03673-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03673-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03672-0

  • Online ISBN: 978-3-030-03673-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics