ABSTRACT
GUI testing is an important but expensive activity. Recently, research on test reuse approaches for Android applications produced interesting results. Test reuse approaches automatically migrate human-designed GUI tests from a source app to a target app that shares similar functionalities. They achieve this by exploiting semantic similarity among textual information of GUI widgets. Semantic matching of GUI events plays a crucial role in these approaches. In this paper, we present the first empirical study on semantic matching of GUI events. Our study involves 253 configurations of the semantic matching, 337 unique queries, and 8,099 distinct GUI events. We report several key findings that indicate how to improve semantic matching of test reuse approaches, propose SemFinder a novel semantic matching algorithm that outperforms existing solutions, and identify several interesting research directions.
- Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Transactions on Information Systems (TOIS), 26, 3 (2008), 1–34.Google ScholarDigital Library
- Adrian Chifor. 2021. Swiftnotes. https://play.google.com/store/apps/details?id=com.moonpi.swiftnotes Last access: Jan 2021.Google Scholar
- Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Salvatore De Carmine, and Atif M. Memon. 2012. Using GUI Ripping for Automated Testing of Android Applications. In Proceedings of the International Conference on Automated Software Engineering (ASE ’12). ACM, 258–261.Google Scholar
- Saswat Anand, Mayur Naik, Mary Jean Harrold, and Hongseok Yang. 2012. Automated Concolic Testing of Smartphone Apps. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’12). ACM, 59:1–59:11.Google ScholarDigital Library
- Andrzej Grzyb. 2021. Shopping List. https://play.google.com/store/apps/details?id=pl.com.andrzejgrzyb.shoppinglist Last access: Jan 2021.Google Scholar
- Anthony Restaino. 2021. Lightning Browser. https://play.google.com/store/apps/details?id=acr.browser.lightning Last access: Jan 2021.Google Scholar
- Apps By Vir. 2021. Tip Calc. https://play.google.com/store/apps/details?id=com.appsbyvir.tipcalculator Last access: Jan 2021.Google Scholar
- Ebru Arisoy, Tara N Sainath, Brian Kingsbury, and Bhuvana Ramabhadran. 2012. Deep neural network language models. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT. 20–28.Google ScholarDigital Library
- Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 238–247.Google ScholarCross Ref
- Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering, 41, 5 (2015), 507–525.Google ScholarDigital Library
- Giovanni Becce, Leonardo Mariani, Oliviero Riganelli, and Mauro Santoro. 2012. Extracting Widget Descriptions from GUIs. In Proceedings of the International Conference on Fundamental Approaches to Software Engineering (FASE ’12). Springer, 347–361.Google ScholarDigital Library
- Farnaz Behrang and Alessandro Orso. [n.d.]. ATM implementation. https://sites.google.com/view/apptestmigratorGoogle Scholar
- Farnaz Behrang and Alessandro Orso. 2018. Poster: Automated Test Migration for Mobile Apps. In Proceedings of the International Conference on Software Engineering (ICSE Poster ’18). ACM, 384–385.Google Scholar
- Farnaz Behrang and Alessandro Orso. 2018. Test Migration for Efficient Large-scale Assessment of Mobile App Coding Assignments. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’18). ACM, 164–175.Google ScholarDigital Library
- Farnaz Behrang and Alessandro Orso. 2019. Test migration between mobile apps with similar functionality. In Proceedings of the International Conference on Automated Software Engineering (ASE’19). 54–65.Google ScholarDigital Library
- Farnaz Behrang and Alessandro Orso. 2020. AppTestMigrator: a tool for automated test migration for Android apps. In Proceedings of the International Conference on Software Engineering (ICSE DEMO ’20). ACM, 17–20.Google ScholarDigital Library
- Benoit Letondor. 2021. EasyBudget. https://play.google.com/store/apps/details?id=com.benoitletondor.easybudgetapp Last access: Jan 2021.Google Scholar
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606.Google Scholar
- Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. arxiv:1803.11175.Google Scholar
- Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhut, Guoqiang Li, and Jinshui Wang. 2020. Unblind your apps: Predicting natural-language labels for mobile GUI components by deep learning. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 322–334.Google ScholarDigital Library
- Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Automated Test Input Generation for Android: Are We There Yet? In Proceedings of the International Conference on Automated Software Engineering (ASE ’16). IEEE Computer Society, 429–440.Google ScholarDigital Library
- Craigpark Limited. 2021. Email App for Any Mail. https://play.google.com/store/apps/details?id=park.outlook.sign.in.client Last access: Jan 2021.Google Scholar
- MJ Crick and MD Hill. 1987. The role of sensitivity analysis in assessing uncertainty. In Uncertainty analysis for performance assessments of radioactive waste disposal systems.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google Scholar
- Alan Dix. 2009. Human-computer interaction. In Encyclopedia of database systems. Springer, 1327–1331.Google Scholar
- Zhen Dong, Marcel Böhme, Lucia Cojocaru, and Abhik Roychoudhury. 2020. Time-travel testing of Android apps. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020 (ICSE’20). ACM, 481–492.Google Scholar
- douzifly. 2021. Clear List. https://f-droid.org/en/packages/douzifly.list/ Last access: Jan 2021.Google Scholar
- Markus Ermuth and Michael Pradel. 2016. Monkey see, monkey do: Effective generation of GUI tests with inferred macro events. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’16). 82–93.Google ScholarDigital Library
- Z. Gao, Z. Chen, Y. Zou, and A. M. Memon. 2016. SITAR: GUI Test Script Repair. IEEE Transactions on Software Engineering, 42, 2 (2016), 170–186.Google ScholarDigital Library
- Gaukler Faun. 2021. FOSS Browser. https://f-droid.org/en/packages/de.baumann.browser/ Last access: Jan 2021.Google Scholar
- Google. Accessed: 2017-08-12. Monkey Runner. http://developer.android.com/tools/help/monkey.htmlGoogle Scholar
- Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, and Zhendong Su. 2019. Practical GUI Testing of Android Applications via Model Abstraction and Refinement. In Proceedings of the International Conference on Software Engineering (ICSE ’19). IEEE Computer Society, 269–280.Google ScholarDigital Library
- DM Hamby. 1995. A comparison of sensitivity analysis techniques. Health physics, 68, 2 (1995), 195–204.Google Scholar
- Gang Hu, Linjie Zhu, and Junfeng Yang. 2018. AppFlow: Using Machine Learning to Synthesize Robust, Reusable UI Tests. In Proceedings of the European Software Engineering Conference held jointly with the ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE ’18). ACM, 269–282.Google ScholarDigital Library
- Tensor Flow Hub. [n.d.]. Token based text embedding trained on English Google News 200B corpus. https://tfhub.dev/google/nnlm-en-dim128/2. Last access: 2020-09-30.Google Scholar
- JPStudiosonline. 2021. Free Tip Calculator. https://play.google.com/store/apps/details?id=com.jpstudiosonline.tipcalculator Last access: Jan 2021.Google Scholar
- keith kildare. 2021. Shopping List. https://f-droid.org/en/packages/com.woefe.shoppinglist/ Last access: Jan 2021.Google Scholar
- keith kildare. 2021. Simply Do. https://f-droid.org/en/packages/kdk.android.simplydo/ Last access: Jan 2021.Google Scholar
- Matt J. Kusner, Yu Sun, Nicholas I. Kolkin, and Kilian Q. Weinberger. 2015. From Word Embeddings to Document Distances. In Proceedings of the International Conference on International Conference on Machine Learning (ICML ’15). 957–966.Google ScholarDigital Library
- Kvannli. 2021. Daily Budget. https://play.google.com/store/apps/details?id=com.kvannli.simonkvannli.dailybudget Last access: Jan 2021.Google Scholar
- Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 707–710.Google Scholar
- Hongmin Li, Xukun Li, Doina Caragea, and Cornelia Caragea. 2018. Comparison of word embeddings and sentence encodings as generalized representations for crisis tweet classification tasks. Proceedings of ISCRAM Asia Pacific.Google Scholar
- Xiao Li, Nana Chang, Yan Wang, Haohua Huang, Yu Pei, Linzhang Wang, and Xuandong Li. 2017. ATOM: Automatic maintenance of GUI test scripts for evolving mobile applications. In Proceedings of the International Conference on Software Testing, Verification and Validation (ICST ’17). 161–171.Google ScholarCross Ref
- Jun-Wei Lin, Reyhaneh Jabbarvand, and Sam Malek. [n.d.]. Craftdroid implementation. https://github.com/seal-hub/CraftDroidGoogle Scholar
- Jun-Wei Lin, Reyhaneh Jabbarvand, and Sam Malek. 2019. Test Transfer Across Mobile Apps Through Semantic Mapping. In Proceedings of the International Conference on Automated Software Engineering (ASE’34). IEEE Computer Society, 42–53.Google ScholarDigital Library
- Mario Linares-Vásquez, Martin White, Carlos Bernal-Cárdenas, Kevin Moran, and Denys Poshyvanyk. 2015. Mining android app usages for generating actionable gui-based execution scenarios. In Proceedings of the Working Conference on Mining Software Repositories (MSR ’15). IEEE Computer Society, 111–122.Google ScholarCross Ref
- Tie-Yan Liu. 2011. Learning to rank for information retrieval.Google Scholar
- Luan Kevin Ferreira. 2021. Expenses. https://play.google.com/store/apps/details?id=luankevinferreira.expenses Last access: Jan 2021.Google Scholar
- Aravind Machiry, Rohan Tahiliani, and Mayur Naik. 2013. Dynodroid: An input generation system for android apps. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’13). ACM, 224–234.Google ScholarDigital Library
- Mail.Ru Group. 2021. Mail.ru. https://play.google.com/store/apps/details?id=ru.mail.mailapp Last access: Jan 2021.Google Scholar
- Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: multi-objective automated testing for Android applications. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’16). ACM, 94–105.Google ScholarDigital Library
- Ke Mao, Mark Harman, and Yue Jia. 2017. Crowd intelligence enhances automated mobile testing. In Proceedings of the International Conference on Automated Software Engineering (ASE ’17). IEEE Computer Society, 16–26.Google ScholarCross Ref
- Leonardo Mariani, Ali Mohebbi, Mauro Pezzè, and Valerio Terragni. 2021. Semantic Matching of GUI Events for Test Reuse: Are We There Yet? https://doi.org/10.5281/zenodo.4725222 Google ScholarCross Ref
- Leonardo Mariani, Mauro Pezzè, and Daniele Zuddas. 2018. Augusto: Exploiting Popular Functionalities for the Generation of Semantic GUI Tests with Oracles. In Proceedings of the International Conference on Software Engineering (ICSE ’18). 280–290.Google ScholarDigital Library
- Leonardo Mariani, Mauro Pezzè, Valerio Terragni, and Daniele Zuddas. 2021. An Evolutionary Approach to Adapt Tests Across Mobile Apps. In International Conference on Automation of Software Test (AST ’21). 70–79. https://doi.org/10.1109/AST52587.2021.00016 Google ScholarCross Ref
- Atif Memon, Ishan Banerjee, and Adithya Nagarajan. 2003. What test oracle should I use for effective GUI testing? In Proceedings of the International Conference on Automated Software Engineering (ASE ’03). 164–173.Google ScholarDigital Library
- Atif Memon, Adithya Nagarajan, and Qing Xie. 2005. Automating regression testing for evolving GUI software. Journal of Software Maintenance and Evolution: Research and Practice, 17, 1 (2005), 27–64.Google ScholarDigital Library
- Atif M Memon. 2008. Automatically repairing event sequence-based GUI test suites for regression testing. ACM Transactions on Software Engineering and Methodology, 18, 2 (2008), 4.Google ScholarDigital Library
- Atif M. Memon, Ishan Banerjee, and Adithya Nagarajan. 2003. GUI Ripping: Reverse Engineering of Graphical User Interfaces for Testing. In Proceedings of The Working Conference on Reverse Engineering (WCRE ’03). IEEE Computer Society, 260–269.Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS ’13). 3111–3119.Google Scholar
- M. Mirzaaghaei, Fabrizio Pastore, and Mauro Pezzè. 2010. Automatically Repairing Test Cases for Evolving Method Declarations. In ICSM‘10: Proceedings of 26th IEEE International Conference on Software Maintenance.Google Scholar
- Mehdi Mirzaaghaei, Fabrizio Pastore, and Mauro Pezzè. 2012. Supporting Test Suite Evolution through Test Case Adaptation. In Proceedings of the International Conference on Software Testing, Verification and Validation (ICST ’12). IEEE Computer Society, 231–240.Google ScholarDigital Library
- Nariman Mirzaei, Hamid Bagheri, Riyadh Mahmood, and Sam Malek. 2015. SIG-Droid: Automated System Input Feneration for Android Applications. In Proceedings of the International Symposium on Software Reliability Engineering (ISSRE ’15). IEEE Computer Society, 461–471.Google ScholarDigital Library
- Kevin Moran, Mario Linares Vásquez, Carlos Bernal-Cárdenas, Christopher Vendome, and Denys Poshyvanyk. 2016. Automatically Discovering, Reporting and Reproducing Android Application Crashes. In Proceedings of the International Conference on Software Testing, Verification and Validation (ICST ’16). IEEE Computer Society, 33–44.Google ScholarCross Ref
- Mozilla. 2021. Firefox Focus. https://play.google.com/store/apps/details?id=org.mozilla.focus Last access: Jan 2021.Google Scholar
- My.com B.V.. 2021. myMail. https://play.google.com/store/apps/details?id=ru.mail.mailapp Last access: Jan 2021.Google Scholar
- OpenIntents. 2021. OI Shopping list. https://play.google.com/store/apps/details?id=org.openintents.shopping Last access: Jan 2021.Google Scholar
- Egon S Pearson, Ralph B D ‘‘’AGOSTINO, and Kimiko O Bowman. 1977. Tests for departure from normality: Comparison of powers. Biometrika, 64, 2 (1977), 231–246.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543.Google Scholar
- plafu. 2021. Writeily Pro. https://f-droid.org/en/packages/me.writeily Last access: Jan 2021.Google Scholar
- Xue Qin, Hao Zhong, and Xiaoyin Wang. 2019. TestMig: Migrating GUI Test Cases from iOS to Android. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’19). ACM, 284–295.Google ScholarDigital Library
- rainbowshops. 2021. Rainbow. https://play.google.com/store/apps/details?id=com.rainbowshops Last access: Jan 2021.Google Scholar
- Dieter Rasch and Volker Guiard. 2004. The robustness of parametric statistical methods. Psychology Science, 46 (2004), 175–208.Google Scholar
- Andreas Rau, Jenny Hotzkow, and Andreas Zeller. 2018. Efficient GUI test generation by learning from tests of other apps. In Proceedings of the International Conference on Software Engineering (ICSE Poster ’18). ACM, 370–371.Google ScholarDigital Library
- Andreas Rau, Jenny Hotzkow, and Andreas Zeller. 2018. Transferring tests across web applications. In International Conference on Web Engineering. Springer, 50–64.Google ScholarCross Ref
- roxrook. 2021. Pocket Note. https://github.com/roxrook/pocket-note-android Last access: Jan 2021.Google Scholar
- Ruben Roy. 2021. Minimal. https://f-droid.org/en/packages/com.rubenroy.minimaltodo/ Last access: Jan 2021.Google Scholar
- Jonathan Schler, Moshe Koppel, Shlomo Argamon, and James W Pennebaker. 2006. Effects of age and gender on blogging.. In AAAI spring symposium: Computational approaches to analyzing weblogs. 6, 199–205.Google Scholar
- H Andrew Schwartz, Johannes C Eichstaedt, Margaret L Kern, Lukasz Dziurzynski, Stephanie M Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, and Martin EP Seligman. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8, 9 (2013), e73791.Google ScholarCross Ref
- SECUSO Research Group. 2021. Shopping List (Privacy Friendly). https://play.google.com/store/apps/details?id=privacyfriendlyshoppinglist.secuso.org.privacyfriendlyshoppinglist Last access: Jan 2021.Google Scholar
- SECUSO Research Group. 2021. Todo List. https://f-droid.org/en/packages/douzifly.list/ Last access: Jan 2021.Google Scholar
- Stoutner. 2021. Privacy Browser. https://play.google.com/store/apps/details?id=com.stoutner.privacybrowser.standard Last access: Jan 2021.Google Scholar
- TLe Apps. 2021. Simple Tip Calculator. https://play.google.com/store/apps/details?id=com.tleapps.simpletipcalculator Last access: Jan 2021.Google Scholar
- Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics. 384–394.Google ScholarDigital Library
- Vansuita. 2021. Shopping List. https://play.google.com/store/apps/details?id=br.com.activity Last access: Jan 2021.Google Scholar
- Yanshan Wang, Sijia Liu, Naveed Afzal, Majid Rastegar-Mojarad, Liwei Wang, Feichen Shen, Paul Kingsbury, and Hongfang Liu. 2018. A comparison of word embeddings for the biomedical natural language processing. Journal of biomedical informatics, 87 (2018), 12–20.Google ScholarCross Ref
- Xusheng Xiao, Xiaoyin Wang, Zhihao Cao, Hanlin Wang, and Peng Gao. 2019. Iconintent: automatic identification of sensitive ui widgets based on icon classification for android apps. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 257–268.Google ScholarDigital Library
- xorum. 2021. Open Money Tracker. https://play.google.com/store/apps/details?id=com.blogspot.e_kanivets.moneytracker Last access: Jan 2021.Google Scholar
- Yelp, Inc. 2021. Yelp. https://play.google.com/store/apps/details?id=com.yelp.android Last access: Jan 2021.Google Scholar
- ZaidiSoft. 2021. Tip Calculator Plus. https://play.google.com/store/apps/details?id=com.zaidisoft.teninone Last access: Jan 2021.Google Scholar
- Sai Zhang, Hao Lü, and Michael D Ernst. 2013. Automatically repairing broken workflows for evolving GUI applications. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’13). 45–55.Google ScholarDigital Library
- Yixue Zhao, Justin Chen, Adriana Sejfia, Marcelo Schmitt Laser, Jie Zhang, Federica Sarro, Mark Harman, and Nenad Medvidovic. 2020. FrUITeR: a framework for evaluating UI test reuse. In Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE 20). 1190–1201.Google ScholarDigital Library
- Yu Zhao, Tingting Yu, Ting Su, Yang Liu, Wei Zheng, Jingzhi Zhang, and William G. J. Halfond. 2019. ReCDroid: Automatically Reproducing Android Application Crashes from Bug Reports. In Proceedings of the International Conference on Software Engineering (ICSE ’19). IEEE Computer Society, 128–139.Google Scholar
Index Terms
- Semantic matching of GUI events for test reuse: are we there yet?
Recommendations
Enhancing test reuse with GUI events deduplication and adaptive semantic matching
AbstractDevelopers typically employ Graphical User Interface (GUI) testing to ensure the expected behavior of applications, but they face the challenge of designing appropriate test cases with functional features. Recently, researchers have proposed ...
Highlights- Enhancing test reuse with GUI events deduplication and adaptive semantic matching.
- Design a novel two-stage adaptive matching strategy to search for the target test.
- Employ fitness function for guiding the search direction in the ...
Semantic matching in GUI test reuse
AbstractReusing test cases across apps that share similar functionalities reduces both the effort required to produce useful test cases and the time to offer reliable apps to the market. The main approaches to reuse test cases across apps combine ...
Route: Roads Not Taken in UI Testing
Core features (functionalities) of an app can often be accessed and invoked in several ways, i.e., through alternative sequences of user-interface (UI) interactions. Given the manual effort of writing tests, developers often only consider the typical way ...
Comments