Abstract
Nowadays, machine learning (ML) is an integral component in a wide range of areas, including software analytics (SA) and business intelligence (BI). As a result, the interest in custom ML-based software analytics and business intelligence solutions is rising. In practice, however, such solutions often get stuck in a prototypical stage because setting up an infrastructure for deployment and maintenance is considered complex and time-consuming. For this reason, we aim at structuring the entire process and making it more transparent by deriving an end-to-end framework from existing literature for building and deploying ML-based software analytics and business intelligence solutions. The framework is structured in three iterative cycles representing different stages in a model’s lifecycle: prototyping, deployment, update. As a result, the framework specifically supports the transitions between these stages while also covering all important activities from data collection to retraining deployed ML models. To validate the applicability of the framework in practice, we compare it to and apply it in a real-world ML-based SA/BI solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amershi, S., et al.: Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 291–300. IEEE (2019)
Arpteg, A., Brinne, B., Crnkovic-Friis, L., Bosch, J.: Software engineering challenges of deep learning. In: 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 50–59. IEEE (2018)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999). https://doi.org/10.1023/A:1007515423169
Baylor, D., et al.: TFX: a tensorflow-based production-scale machine learning platform. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1387–1395 (2017)
Breck, E., Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data validation for machine learning. In: Conference on Systems and Machine Learning (2019)
Buse, R.P., Zimmermann, T.: Information needs for software development analytics. In: 34th International Conference on Software Engineering, pp. 987–996. IEEE (2012)
Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36, 1165–1188 (2012)
Chu, X., Ilyas, I.F., Krishnan, S., Wang, J.: Data cleaning: overview and emerging challenges. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2201–2206 (2016)
Crankshaw, D., et al.: The missing piece in complex analytics: low latency, scalable model management and serving with velox (2015)
Cuzzocrea, A., Song, I.Y., Davis, K.C.: Analytics over large-scale multidimensional data: the big data revolution! In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, pp. 101–104 (2011)
Dam, H.K., Tran, T., Ghose, A.: Explainable software analytics. In: Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, pp. 53–56 (2018)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Dayal, U., Castellanos, M., Simitsis, A., Wilkinson, K.: Data integration flows for business intelligence. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 1–11 (2009)
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
Figalist, I., Elsner, C., Bosch, J., Olsson, H.H.: Breaking the vicious circle: Why AI for software analytics and business intelligence does not take off in practice. In: 46th Euromicro Conference on Software Engineering and Advanced Applications. IEEE (2020)
Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Disc. 2(1), 9–37 (1998). https://doi.org/10.1023/A:1009761603038
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Keele, S.: Guidelines for performing systematic literature reviews in software engineering. Technical report, Version 2.3 EBSE Technical Report (2007)
Khayyat, Z., et al.: BigDansing: a system for big data cleansing. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1215–1230 (2015)
Kim, M., Zimmermann, T., DeLine, R., Begel, A.: Data scientists in software teams: State of the art and challenges. IEEE Trans. Softw. Eng. 44(11), 1024–1038 (2017)
Lin, J., Kolcz, A.: Large-scale machine learning at twitter. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 793–804 (2012)
Lwakatare, L.E., Raj, A., Crnkovic, I., Bosch, J., Olsson, H.H.: Large-scale machine learning systems in real-world industrial settings a review of challenges and solutions. Inf. Softw. Technol. 127, 106368 (2020)
Menzies, T., Zimmermann, T.: Software analytics: so what? IEEE Softw. 30(4), 31–37 (2013)
Negash, S., Gray, P.: Business Intelligence. In: Handbook on Decision Support Systems 2. International Handbooks Information System. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-48716-6_9
Olston, C., et al.: Tensorflow-serving: flexible, high-performance ml serving. In: Workshop on ML Systems at NIPS (2017)
Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Rec. 47(2), 17–28 (2018)
Rajaram, S., Mishra, K., O’mara, M.: Finite state automata that enables continuous delivery of machine learning models, US Patent App. 16/229,020, April 2020
Runeson, P., Höst, M., Rainer, A., Regnell, B.: Case study research in software engineering: guidelines and examples. Wiley, Hoboken (2012)
Schelter, S., Lange, D., Schmidt, P., Celikel, M., Biessmann, F., Grafberger, A.: Automating large-scale data quality verification. Proc. VLDB Endow. 11(12), 1781–1794 (2018)
Sculley, D.: Hidden technical debt in machine learning systems. In: Advances in neural information processing systems, pp. 2503–2511 (2015)
Sparks, E.R., Venkataraman, S., Kaftan, T., Franklin, M.J., Recht, B.: KeystoneML: Optimizing pipelines for large-scale advanced analytics. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 535–546. IEEE (2017)
Tata, S., et al.: Quick access: building a smart experience for google drive. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1643–1651 (2017)
Vartak, M., et al.: ModelDB: a system for machine learning model management. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics (2016)
Vassiliadis, P.: A survey of extract-transform-load technology. Int. J. Data Warehous. Min. (IJDWM) 5(3), 1–27 (2009)
Vassiliadis, P., Simitsis, A.: Extraction, transformation, and loading. Encycl. Database Syst. 10, 1–10 (2009)
Volkovs, M., Chiang, F., Szlichta, J., Miller, R.J.: Continuous data cleaning. In: 30th International Conference on Data Engineering, pp. 244–255. IEEE (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Figalist, I., Elsner, C., Bosch, J., Olsson, H.H. (2020). An End-to-End Framework for Productive Use of Machine Learning in Software Analytics and Business Intelligence Solutions. In: Morisio, M., Torchiano, M., Jedlitschka, A. (eds) Product-Focused Software Process Improvement. PROFES 2020. Lecture Notes in Computer Science(), vol 12562. Springer, Cham. https://doi.org/10.1007/978-3-030-64148-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-64148-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64147-4
Online ISBN: 978-3-030-64148-1
eBook Packages: Computer ScienceComputer Science (R0)