ABSTRACT
Optical character recognition (OCR) algorithms often run slow. They may take several seconds to recognize the texts on a GUI screen, which makes OCR-based widget localization in test automation unfriendly for use, especially on GPU-free computers. This paper first concludes a common type of widget text to be located in GUI testing: label text, which are short texts in widgets like buttons, menu items, and window titles. We then investigate the characteristics of texts on a GUI screen and introduce a fast GPU-independent Label Text Screening (LTS) technique to accelerate the OCR process for label text localization. The technique opens the black box of OCR engines and uses a combination of simple methods to avoid excessive text analysis on a screen as much as possible. Experiments show that, on the subject datasets, LTS reduces the average OCR-based label text localization time to a large extent. On 4k resolution GUI screens, it keeps the localization time below 0.5 seconds for over about 60% of cases without GPU support on a normal laptop computer. In contrast, the existing CPU-based approaches built on popular OCR engines Tesseract, PaddleOCR, and EasyOCR usually need over 2 seconds to achieve the same goal on the same platform. Even with GPU acceleration, they can hardly keep the analysis time in 1 second. We believe the proposed approach would be helpful for implementing OCR-based test automation tools.
- 2022. Appium: Automation for iOS and Android apps. http://appium.ioGoogle Scholar
- 2022. EasyOCR. https://github.com/JaidedAI/EasyOCRGoogle Scholar
- 2022. OpenCV. https://opencv.org/Google Scholar
- 2022. PaddleOCR. https://github.com/PaddlePaddle/PaddleOCRGoogle Scholar
- 2022. python-Levenshtein. https://pypi.org/project/python-Levenshtein/Google Scholar
- 2022. Robotium. https://github.com/RobotiumTech/robotiumGoogle Scholar
- 2022. The source code and the datasets used in the experiment. https://doi.org/10.6084/m9.figshare.19722013.v1Google Scholar
- 2022. TencentOCR. https://intl.cloud.tencent.com/products/ocrGoogle Scholar
- 2022. Tesseract. https://github.com/tesseract-ocr/tesseractGoogle Scholar
- 2022. UIAutomator. http://developer.android.com/tools/testingsupport-libraryGoogle Scholar
- Mohammad Alahmadi, Abdulkarim Khormi, Biswas Parajuli, Jonathan Hassel, Sonia Haiduc, and Piyush Kumar. 2020. Code localization in programming screencasts. Empirical Software Engineering 25, 2 (2020), 1536–1572. https://doi.org/10.1007/s10664-019-09759-wGoogle ScholarCross Ref
- Emil Alégroth and Robert Feldt. 2017. On the long-term use of visual gui testing in industrial practice: a case study. Empirical Software Engineering 22, 6 (2017), 2937–2971. https://doi.org/10.1007/s10664-016-9497-6Google ScholarDigital Library
- Luca Ardito, Andrea Bottino, Riccardo Coppola, Fabrizio Lamberti, Francesco Manigrasso, Lia Morra, and Marco Torchiano. 2022. Feature matching-based approaches to improve the robustness of Android visual GUI testing. ACM Transactions on Software Engineering and Methodology 31, 2(2022). https://doi.org/10.1145/3477427Google ScholarDigital Library
- Luca Ardito, Riccardo Coppola, Maurizio Morisio, and Marco Torchiano. 2019. Espresso vs. EyeAutomate: an experiment for the comparison of two generations of Android GUI testing. In Proceedings of the Evaluation and Assessment on Software Engineering (Copenhagen, Denmark) (EASE ’19). ACM, New York, NY, USA, 13–22. https://doi.org/10.1145/3319008.3319022Google ScholarDigital Library
- Mohammad Bajammal, Andrea Stocco, Davood Mazinanian, and Ali Mesbah. 2022. A survey on the use of computer vision to improve software engineering tasks. IEEE Transactions on Software Engineering 48, 5 (2022), 1722–1742. https://doi.org/10.1109/TSE.2020.3032986Google ScholarDigital Library
- Titus Barik, Justin Smith, Kevin Lubick, Elisabeth Holmes, Jing Feng, Emerson Murphy-Hill, and Chris Parnin. 2017. Do developers read compiler error messages?. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 575–585. https://doi.org/10.1109/ICSE.2017.59Google ScholarDigital Library
- Emil Börjesson and Robert Feldt. 2012. Automated system testing using visual GUI testing tools: A comparative study in industry. In Proceedings of the IEEE 5th International Conference on Software Testing, Verification and Validation (ICST). 350–359. https://doi.org/10.1109/ICST.2012.115Google ScholarDigital Library
- Min Cai, Jiqiang Song, and Michael R. Lyu. 2002. A new approach for video text detection. In IEEE International Conference on Image Processing, Vol. 1. https://doi.org/10.1109/icip.2002.1037973Google Scholar
- Tsung Hsiang Chang, Tom Yeh, and Robert C. Miller. 2010. GUI testing using computer vision. In Proceedings of International Conference on Human Factors in Computing Systems (CHI), Vol. 3. 1535–1544. https://doi.org/10.1145/1753326.1753555Google ScholarDigital Library
- Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk, and Bernd Girod. 2011. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In International Conference on Image Processing (ICIP). 2609–2612. https://doi.org/10.1109/ICIP.2011.6116200Google ScholarCross Ref
- Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?. In Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1202–1214. https://doi.org/10.1145/3368089.3409691Google ScholarDigital Library
- Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2021. Text recognition in the wild: a survey. Comput. Surveys 54, 2 (2021). https://doi.org/10.1145/3440756Google ScholarDigital Library
- Nathan Cooper, Carlos Bernal-Cárdenas, Oscar Chaparro, Kevin Moran, and Denys Poshyvanyk. 2021. It takes two to tango: combining visual and textual information for detecting duplicate video-based bug reports. In IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 957–969. https://doi.org/10.1109/ICSE43902.2021.00091Google ScholarDigital Library
- Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST). 845–854. https://doi.org/10.1145/3126594.3126651Google ScholarDigital Library
- Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2963–2970. https://doi.org/10.1109/CVPR.2010.5540041Google ScholarCross Ref
- Mattia Fazzini and Alessandro Orso. 2017. Automated cross-platform inconsistency detection for mobile apps. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 308–318. https://doi.org/10.1109/ASE.2017.8115644Google ScholarCross Ref
- Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Xiang Bai, Jing Wang, and Shenggao Zhu. 2021. Video text tracking with a spatio-temporal complementary model. IEEE Transactions on Image Processing 30 (2021), 9321–9331. https://doi.org/10.1109/TIP.2021.3124313Google ScholarCross Ref
- Gang Hu, Linjie Zhu, and Junfeng Yang. 2018. AppFlow: Using machine learning to synthesize robust, reusable UI tests. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 269–282. https://doi.org/10.1145/3236024.3236055Google ScholarDigital Library
- Rong Huang, Palaiahnakote Shivakumara, and Seiichi Uchida. 2013. Scene character detection by an edge-ray filter. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 462–466. https://doi.org/10.1109/ICDAR.2013.99Google ScholarDigital Library
- Weilin Huang, Zhe Lin, Jianchao Yang, and Jue Wang. 2013. Text localization in natural images using stroke feature transform and text covariance descriptors. In IEEE International Conference on Computer Vision (ICCV). 1241–1248. https://doi.org/10.1109/ICCV.2013.157Google ScholarDigital Library
- Xiaodong Huang. 2019. Automatic video scene text detection based on saliency edge map. Multimedia Tools and Applications 78, 24 (2019), 34819–34838. https://doi.org/10.1007/s11042-019-08045-7Google ScholarCross Ref
- Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 116, 1 (2016), 1–20. https://doi.org/10.1007/s11263-015-0823-zGoogle ScholarDigital Library
- Tauseef Khan, Ram Sarkar, and Ayatullah Faruk Mollah. 2021. Deep learning approaches to scene text detection: a comprehensive review. 54, 5 (2021), 3239–3298. https://doi.org/10.1007/s10462-020-09930-6Google ScholarDigital Library
- Vijeta Khare, Palaiahnakote Shivakumara, Paramesran Raveendran, and Michael Blumenstein. 2016. A blind deconvolution model for scene text detection and recognition in video. Pattern Recognition 54(2016), 128–148. https://doi.org/10.1016/j.patcog.2016.01.008Google ScholarDigital Library
- Abdulkarim Khormi, Mohammad Alahmadi, and Sonia Haiduc. 2020. A study on the accuracy of OCR engines for source code transcription from programming screencasts. In Proceedings of the IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). 65–75. https://doi.org/10.1145/3379597.3387468Google ScholarDigital Library
- Jung Jin Lee, Pyoung Hean Lee, Seong Whan Lee, Alan Yuille, and Christof Koch. 2011. AdaBoost for text detection in natural scene. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 429–434. https://doi.org/10.1109/ICDAR.2011.93Google ScholarDigital Library
- Jun Wei Lin, Navid Salehnamadi, and Sam Malek. 2020. Test automation in open-source Android apps: a large-scale empirical study. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1078–1089. https://doi.org/10.1145/3324884.3416623Google ScholarDigital Library
- Chunmei Liu, Chunheng Wang, and Ruwei Dai. 2005. Text detection in images based on unsupervised classification of edge-based features. In 8th International Conference on Document Analysis and Recognition (ICDAR). 610–614 Vol. 2. https://doi.org/10.1109/ICDAR.2005.228Google ScholarDigital Library
- Xiaoqian Liu and Weiqiang Wang. 2012. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Transactions on Multimedia 14, 2 (2012), 482–489. https://doi.org/10.1109/TMM.2011.2177646Google ScholarDigital Library
- Shangbang Long, Xin He, and Cong Yao. 2021. Scene text detection and recognition: the deep learning era. International Journal of Computer Vision 129, 1 (2021), 161–184. https://doi.org/10.1007/s11263-020-01369-0Google ScholarDigital Library
- Michael R. Lyu, Jiqiang Song, and Min Cai. 2005. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology 15, 2(2005), 243–255. https://doi.org/10.1109/TCSVT.2004.841653Google ScholarCross Ref
- Shilpa Mahajan and Rajneesh Rani. 2021. Text detection and localization in scene images: a broad review. 54, 6 (2021), 4317–4377. https://doi.org/10.1007/s10462-021-10000-8Google ScholarDigital Library
- V. N. Manjunath Aradhya, H. T. Basavaraju, and D. S. Guru. 2021. Decade research on text detection in images/videos: a review. 14, 2 (2021), 405–431. https://doi.org/10.1007/s12065-019-00248-zGoogle Scholar
- Tuan Anh Nguyen and Christoph Csallner. 2016. Reverse engineering mobile application user interfaces with REMAUI. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 248–259. https://doi.org/10.1109/ASE.2015.32Google ScholarDigital Library
- Ednawati Rainarli, Suprapto, and Wahyono. 2021. A decade: Review of scene text detection methods. Computer Science Review 42 (2021), 100434. https://doi.org/10.1016/j.cosrev.2021.100434Google ScholarDigital Library
- Zobeir Raisi, Mohamed A. Naiel, Paul Fieguth, Steven Wardell, and John Zelek. 2020. Text detection and recognition in the wild: a review. (2020), 13–15. arxiv:2006.04305http://arxiv.org/abs/2006.04305Google Scholar
- Palaiahnakote Shivakumara, Rushi Padhuman Sreedhar, Trung Quy Phan, Shijian Lu, and Chew Lim Tan. 2012. Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Transactions on Circuits and Systems for Video Technology 22, 8(2012), 1227–1235. https://doi.org/10.1109/TCSVT.2012.2198129Google ScholarDigital Library
- Richard Szeliski. 2010. Computer vision: algorithms and applications (1st ed.). Springer-Verlag, Berlin, Heidelberg.Google Scholar
- Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, and Chunhua Shen. 2019. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE International Conference on Computer Vision. 8439–8448. https://doi.org/10.1109/ICCV.2019.00853Google ScholarCross Ref
- Mulong Xie, Sidong Feng, Zhenchang Xing, Jieshan Chen, and Chunyang Chen. 2020. UIED: a hybrid tool for GUI element detection. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). ACM, New York, NY, USA, 1655–1659. https://doi.org/10.1145/3368089.3417940Google ScholarDigital Library
- Tongtong Xu, Minxue Pan, Yu Pei, Guiyin Li, Xia Zeng, Tian Zhang, Yuetang Deng, and Xuandong Li. 2021. GUIDER: GUI structure and vision co-guided test script repair for Android apps. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 191–203. https://doi.org/10.1145/3460319.3464830Google ScholarDigital Library
- Bo Yang, Zhenchang Xing, Xin Xia, Chunyang Chen, Deheng Ye, and Shanping Li. 2021. Don’t do that! Hunting down visual design smells in complex UIs against design guidelines. In Proceedings of the International Conference on Software Engineering (ICSE). 761–772. https://doi.org/10.1109/ICSE43902.2021.00075Google ScholarDigital Library
- Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 7(2015), 1480–1500. https://doi.org/10.1109/TPAMI.2014.2366765Google ScholarDigital Library
- Xu Cheng Yin, Ze Yu Zuo, Shu Tian, and Cheng Lin Liu. 2016. Text detection, tracking and recognition in video: a comprehensive survey. IEEE Transactions on Image Processing 25, 6 (2016), 2752–2773. https://doi.org/10.1109/TIP.2016.2554321Google ScholarDigital Library
- Chong Yu, Yonghong Song, and Yuanlin Zhang. 2016. Scene text localization using edge analysis and feature pool. Neurocomputing 175, PartA (2016), 652–661. https://doi.org/10.1016/j.neucom.2015.10.105Google ScholarDigital Library
- Shengcheng Yu, Chunrong Fang, Zhenfei Cao, Xu Wang, Tongyu Li, and Zhenyu Chen. 2021. Prioritize crowdsourced test reports via deep screenshot understanding. In Proceedings of the International Conference on Software Engineering (ICSE). 946–956. https://doi.org/10.1109/ICSE43902.2021.00090Google ScholarDigital Library
- Shengcheng Yu, Chunrong Fang, Yexiao Yun, and Yang Feng. 2021. Layout and image recognition driving cross-platform automated mobile testing. In Proceedings of the International Conference on Software Engineering (ICSE). 1561–1571. https://doi.org/10.1109/ICSE43902.2021.00139Google ScholarDigital Library
- Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2642–2651. https://doi.org/10.1109/CVPR.2017.283Google ScholarCross Ref
Index Terms
- Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications
Recommendations
GUI-guided repair of mobile test scripts
ICSE '19: Proceedings of the 41st International Conference on Software Engineering: Companion ProceedingsGraphical User Interface (GUI) testing has been the focus of mobile app testing. Manual test cases, containing valuable human knowledge about the apps under test, are often coded as scripts to enable automated and repeated execution for test cost ...
Inferring Types of References to GUI Objects in Test Scripts
ICST '09: Proceedings of the 2009 International Conference on Software Testing Verification and ValidationSince manual black-box testing of GUI-based APplications (GAPs) is tedious and laborious, test engineers create test scripts to automate the testing process. These test scripts interact with GAPs by performing actions on their GUI objects. Unlike ...
Neural Networks Pipeline for Offline Machine Printed Arabic OCR
In the context of Arabic optical characters recognition, Arabic poses more challenges because of its cursive nature. We purpose a system for recognizing a document containing Arabic text, using a pipeline of three neural networks. The first network ...
Comments