Abstract
We propose a novel technique based on recurrent artificial neural networks to generate test cases for black-box testing of reactive systems. We combine functional testing inputs that are automatically generated from a model together with manually-applied test cases for robustness testing. We use this combination to train a long short-term memory (LSTM) network. As a result, the network learns an implicit representation of the usage behavior that is liable to failures. We use this network to generate new event sequences as test cases. We applied our approach in the context of an industrial case study for the black-box testing of a digital TV system. LSTM-generated test cases were able to reveal several faults, including critical ones, that were not detected with existing automated or manual testing activities. Our approach is complementary to model-based and exploratory testing, and the combined approach outperforms random testing in terms of both fault coverage and execution time.
Similar content being viewed by others
Notes
Some tasks require physical access to the system and cannot be automated.
References
Aceto, L., Ingólfsdóttir, A., Larsen, K., Srba, J. (2007). Reactive systems: modelling, specification and verification. New York: Cambridge University Press.
Agruss, C., & Johnson, B. (2000). Ad hoc software testing: a perspective on exploration and improvisation. In Florida institute of technology, pp. 68–69.
Amalfitano, D., Fasolino, A., Tramontana, P., Ta, B., Memon, A. (2015). MobiGUITAR: automated model-based testing of mobile apps. IEEE Software, 32(5), 53–59.
Barr, E., Harman, M., McMinn, P., Shahbaz, M., Yoo, S. (2015). The oracle problem in software testing: a survey. IEEE Transactions on Software Engineering, 41 (5), 507–525.
Belli, F. (2001). Finite state testing and analysis of graphical user interfaces. In Proceedings of 12th international symposium on software reliability engineering, pp. 34–43.
Belli, F., Budnik, C., White, L. (2006). Event-based modelling, analysis and testing of user interactions: approach and case study. Software Testing Verification and Reliability, 16(1), 3–32.
Berner, S., Weber, R., Keller, R. K. (2005). Observations and lessons learned from automated testing. In Proceedings of the 27th international conference on software engineering, pp. 571–579.
Bottou, L. (2012). Stochastic gradient descent tricks. In Neural networks: tricks of the trade, pp. 421–436. Springer.
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing, pp. 1724–1734.
Cotter, A., Shamir, O., Srebro, N., Sridharan, K. (2011). Better mini-batch algorithms via accelerated gradient methods. In Advances in neural information processing systems, pp. 1647–1655.
Dalal, S. R., Jain, A., Karunanithi, N., Leaton, J. M., Lott, C. M., Patton, G. C., Horowitz, B. M. (1999). Model-based testing in practice. In Proceedings of the international conference on software engineering, pp. 285–294.
Elbaum, S., Rothermel, G., Karre, IIS. (2005). M.F.: leveraging user-session data to support web application testing. IEEE Transactions on Software Engineering, 31(3), 187–202.
Entin, V., Winder, M., Zhang, B., Christmann, S. (2011). Combining model-based and capture-replay testing techniques of graphical user interfaces: an industrial approach. In Proceedings of the 4th IEEE international conference on software testing, verification and validation workshops, pp. 572–577.
Fard, A., Mirzaaghaei, M., Mesbah, A. (2014). Leveraging existing tests in automated test generation for web applications. In Proceedings of the 29th ACM/IEEE international conference on automated software engineering, pp. 67–78.
Ferguson, R., & Korel, B. (1996). The chaining approach for software test data generation. ACM Transactions on Software Engineering and Methodology, 5(1), 63–86.
Gebizli, C., & Sozer, H. (2016). Automated refinement of models for model-based testing using exploratory testing. Software Quality Journal. Published online. https://doi.org/10.1007/s11219-016-9338-2.
Gebizli, C.S., & Sozer, H. (2014). Improving models for model-based testing based on exploratory testing. In Proceedings of the 6th IEEE workshop on software test automation, pp. 656–661. (COMPSAC Companion).
Gers, F., & Schmidhuber, E. (2001). LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 12(6), 1333–1340.
Gers, F., & Schmidhuber, J. (2000). Recurrent nets that time and count. In Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, pp. 189–194.
Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv:1308.0850.
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., Schmidhuber, J. (2017). LSTM: a search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222–2232.
Guen, H. L., Marie, R., Thelin, T. (2004). Reliability estimation for statistical usage testing using Markov chains. In Proceedings of the 15th international symposium on software reliability engineering, pp. 54–65.
Hagan, M., Demuth, H., Beale, M. (1995). Neural network design. New York: PWS Publishing.
Harel, D. (1987). Statecharts: a visual formalism for complex systems. Science of Computer Programming, 8(3), 231–274.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computing, 9(8), 1735–1780.
Itkonen, J. (2011). Empirical studies on exploratory software testing. Ph.D. thesis Aalto University.
Itkonen, J., Mantyla, M. V., Lassenius, C. (2007). Defect detection efficiency: test case based vs. exploratory testing. In First international symposium on empirical software engineering and measurement, pp. 61–70. IEEE computer society.
Štefanovič, J. (2000). A neural network algorithm for digital circuits test generation. In Proceedings of the European symposium on the state of the art in computational intelligence, pp. 56-60, Physica-Verlag HD, Heidelberg.
Bach, J. (2003). Exploratory testing explained. Tech. rep., http://www.satisfice.com/articles/et-article.pdf.
Kaner, C. (2006). Exploratory testing. In Quality assurance institute worldwide annual software testing conference.
Karpathy, A. (2015). char-rnn https://github.com/karpathy/char-rnn.
Kingma, D., & Ba, J. (2014). Adam: a method for stochastic optimization. arXiv:1412.6980.
Kirac, M., Aktemur, B., Sozer, H. (2018). VISOR: a fast image processing pipeline with scaling and translation invariance for test oracle automation of visual output systems. Journal of Systems and Software, 136, 266–277.
Lukac, Z., Zlokolica, V., Mlikota, B., Radonjic, M., Velikic, I. (2012). A testing methodology and system for functional verification of general HbbTV device. In Proceedings of the IEEE international conference on consumer electronics, pp. 325–326.
Marijan, D., Zlokolica, V., Teslic, N., Pekovic, V., Tekcan, T. (2010). Automatic functional TV set failure detection system. IEEE Transactions on Consumer Electronics, 56(1), 125–133. 10.1109/TCE.2010.5439135.
Meinke, K., & Sindhu, M.A. (2013). LBTest: a learning-based testing tool for reactive systems. In Proceedings of the 6th IEEE international conference on software testing, verification and validation, pp. 447–454.
Memon, A., Banerjee, I., Nguyen, B. N., Robbins, B. (2013). The first decade of GUI ripping: extensions, applications, and broader impacts. In Proceedings of the 20th working conference on reverse engineering, pp. 11–20.
Memon, A., Soffa, M., Pollack, M. (2001). Coverage criteria for GUI testing. ACM SIGSOFT Software Engineering Notes, 26(5), 256–267.
Mesbah, A., van Deursen, A., Roest, D. (2012). Invariant-based automatic testing of modern web applications. IEEE Transactions on Software Engineering, 38 (1), 35–53.
Michael, C., McGraw, G., Schatz, M. (2001). Generating software test data by evolution. IEEE Transactions on Software Engineering, 27(12), 1085–1110.
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814.
Neto, A. C. D., Subramanyan, R., Vieira, M., Travassos, G.H. (2007). A survey on model-based testing approaches: a systematic review. In Proceedings of the 1st ACM international workshop on empirical assessment of software engineering languages and technologies, pp. 31–36.
Nguyen, B., & Memon, A. (2014). An observe-model-exercise* paradigm to test event-driven systems with undetermined input spaces. IEEE Transactions on Software Engineering, 40(3), 216–234.
Nguyen, B., Robbins, B., Banerjee, I., Memon, A. (2014). GUITAR: an innovative tool for automated testing of gui-driven software. Automated Software Engineering, 21(1), 65–105.
Pacheco, C., Lahiri, S., Ernst, M., Ball, T. (2006). Feedback-directed random test generation. In Proceedings of the 29th international conference on software engineering, pp. 396–405.
Peković, V., Teslić, N., Resetar, I., Tekcan, T. (2010). Test management and test execution system for automated verification of digital television systems. In IEEE International symposium on consumer electronics (ISCE 2010), pp. 1–6. https://doi.org/10.1109/ISCE.2010.5523721.
Rafi, D., Moses, K., Petersen, K., Mäntylä, M. (2012). Benefits and limitations of automated software testing: systematic literature review and practitioner survey. In Proceedings of the 7th international workshop on automation of software test, pp. 36–42.
Robinson, H. (1999). Finite state model-based testing on a shoestring. In Proceedings of the software testing and analysis and review west conference.
Robinson, H. (2000). Intelligent test automation – a model-based method for generating tests from a description of an application’s behavior. Software Testing and Quality Engineering Magazine, pp. 24–32.
Sak, H., Senior, A., Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the 15th annual conference of the international speech communication association, pp. 338–342.
Sivaraman, G., César, P., Vuorimaa, P. (2001). System software for digital television applications. In IEEE International conference on multimedia and expo, pp. 784–787.
Sprenkle, A., Gibson, E., Sampath, S., Pollock, L. (2005). Automated replay and failure detection for web applications. In Proceedings of the 20th IEEE/ACM international conference on automated software engineering, pp. 253–262.
Tinkham, A., & Kaner, C. (2003). Exploring exploratory testing. In Proceedings of the software testing and analysis and review east conference.
Tretmans, J. (2011). Formal methods for eternal networked software systems, Springer, Berlin.
Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.
Whittaker, J., & Thomason, M. (1994). A Markov chain model for statistical software testing. IEEE Transactions on Software Engineering, 20(10), 812–824.
Wohlin, C., Runeson, P., Host, M., Ohlsson, M., Regnell, B., Wesslen, A. (2012). Experimentation in software engineering. Berlin: Springer.
Wong, W., Debroy, V., Golden, R., Xu, X., Thuraisingham, B. (2012). Effective software fault localization using an RBF neural network. IEEE, Transactions on Reliability, 61(1), 149–169.
Wong, W., & Qi, Y. (2009). Bp neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 19(4), 573–597.
Xie, T., & Notkin, D. (2006). Tool-assisted unit-test generation and selection based on operational abstractions. Automated Software Engineering, 13(3), 345–371.
Wu, Y., & et al. (2016). Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144.
Acknowledgment
We would like to thank the software developers, test engineers, and technicians at Vestel Electronics for sharing their resources with us and supporting our case study. We also thank the anonymous reviewers for their comments on this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kıraç, M.F., Aktemur, B., Sözer, H. et al. Automatically learning usage behavior and generating event sequences for black-box testing of reactive systems. Software Qual J 27, 861–883 (2019). https://doi.org/10.1007/s11219-018-9439-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-018-9439-1