research-article

Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications

Authors:
Ju Qian

Nanjing University of Aeronautics and Astronautics, China

Nanjing University of Aeronautics and Astronautics, China

0000-0001-8028-7213
View Profile

,
Yingwei Ma

Nanjing University of Aeronautics and Astronautics, China

Nanjing University of Aeronautics and Astronautics, China

0000-0003-2352-2226
View Profile

,
Chenghao Lin

Nanjing University of Aeronautics and Astronautics, China

Nanjing University of Aeronautics and Astronautics, China
View Profile

,
Lin Chen

Nanjing University, China

Nanjing University, China
View Profile

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software EngineeringOctober 2022Article No.: 6Pages 1–13https://doi.org/10.1145/3551349.3556966

Published:05 January 2023Publication History

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Pages 1–13

ABSTRACT

Optical character recognition (OCR) algorithms often run slow. They may take several seconds to recognize the texts on a GUI screen, which makes OCR-based widget localization in test automation unfriendly for use, especially on GPU-free computers. This paper first concludes a common type of widget text to be located in GUI testing: label text, which are short texts in widgets like buttons, menu items, and window titles. We then investigate the characteristics of texts on a GUI screen and introduce a fast GPU-independent Label Text Screening (LTS) technique to accelerate the OCR process for label text localization. The technique opens the black box of OCR engines and uses a combination of simple methods to avoid excessive text analysis on a screen as much as possible. Experiments show that, on the subject datasets, LTS reduces the average OCR-based label text localization time to a large extent. On 4k resolution GUI screens, it keeps the localization time below 0.5 seconds for over about 60% of cases without GPU support on a normal laptop computer. In contrast, the existing CPU-based approaches built on popular OCR engines Tesseract, PaddleOCR, and EasyOCR usually need over 2 seconds to achieve the same goal on the same platform. Even with GPU acceleration, they can hardly keep the analysis time in 1 second. We believe the proposed approach would be helpful for implementing OCR-based test automation tools.

References

2022. Appium: Automation for iOS and Android apps. http://appium.ioGoogle Scholar
2022. EasyOCR. https://github.com/JaidedAI/EasyOCRGoogle Scholar
2022. OpenCV. https://opencv.org/Google Scholar
2022. PaddleOCR. https://github.com/PaddlePaddle/PaddleOCRGoogle Scholar
2022. python-Levenshtein. https://pypi.org/project/python-Levenshtein/Google Scholar
2022. Robotium. https://github.com/RobotiumTech/robotiumGoogle Scholar
2022. The source code and the datasets used in the experiment. https://doi.org/10.6084/m9.figshare.19722013.v1Google Scholar
2022. TencentOCR. https://intl.cloud.tencent.com/products/ocrGoogle Scholar
2022. Tesseract. https://github.com/tesseract-ocr/tesseractGoogle Scholar
2022. UIAutomator. http://developer.android.com/tools/testingsupport-libraryGoogle Scholar
Mohammad Alahmadi, Abdulkarim Khormi, Biswas Parajuli, Jonathan Hassel, Sonia Haiduc, and Piyush Kumar. 2020. Code localization in programming screencasts. Empirical Software Engineering 25, 2 (2020), 1536–1572. https://doi.org/10.1007/s10664-019-09759-wGoogle ScholarCross Ref
Emil Alégroth and Robert Feldt. 2017. On the long-term use of visual gui testing in industrial practice: a case study. Empirical Software Engineering 22, 6 (2017), 2937–2971. https://doi.org/10.1007/s10664-016-9497-6Google ScholarDigital Library
Luca Ardito, Andrea Bottino, Riccardo Coppola, Fabrizio Lamberti, Francesco Manigrasso, Lia Morra, and Marco Torchiano. 2022. Feature matching-based approaches to improve the robustness of Android visual GUI testing. ACM Transactions on Software Engineering and Methodology 31, 2(2022). https://doi.org/10.1145/3477427Google ScholarDigital Library
Luca Ardito, Riccardo Coppola, Maurizio Morisio, and Marco Torchiano. 2019. Espresso vs. EyeAutomate: an experiment for the comparison of two generations of Android GUI testing. In Proceedings of the Evaluation and Assessment on Software Engineering (Copenhagen, Denmark) (EASE ’19). ACM, New York, NY, USA, 13–22. https://doi.org/10.1145/3319008.3319022Google ScholarDigital Library
Mohammad Bajammal, Andrea Stocco, Davood Mazinanian, and Ali Mesbah. 2022. A survey on the use of computer vision to improve software engineering tasks. IEEE Transactions on Software Engineering 48, 5 (2022), 1722–1742. https://doi.org/10.1109/TSE.2020.3032986Google ScholarDigital Library
Titus Barik, Justin Smith, Kevin Lubick, Elisabeth Holmes, Jing Feng, Emerson Murphy-Hill, and Chris Parnin. 2017. Do developers read compiler error messages?. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 575–585. https://doi.org/10.1109/ICSE.2017.59Google ScholarDigital Library
Emil Börjesson and Robert Feldt. 2012. Automated system testing using visual GUI testing tools: A comparative study in industry. In Proceedings of the IEEE 5th International Conference on Software Testing, Verification and Validation (ICST). 350–359. https://doi.org/10.1109/ICST.2012.115Google ScholarDigital Library
Min Cai, Jiqiang Song, and Michael R. Lyu. 2002. A new approach for video text detection. In IEEE International Conference on Image Processing, Vol. 1. https://doi.org/10.1109/icip.2002.1037973Google Scholar
Tsung Hsiang Chang, Tom Yeh, and Robert C. Miller. 2010. GUI testing using computer vision. In Proceedings of International Conference on Human Factors in Computing Systems (CHI), Vol. 3. 1535–1544. https://doi.org/10.1145/1753326.1753555Google ScholarDigital Library
Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk, and Bernd Girod. 2011. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In International Conference on Image Processing (ICIP). 2609–2612. https://doi.org/10.1109/ICIP.2011.6116200Google ScholarCross Ref
Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?. In Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1202–1214. https://doi.org/10.1145/3368089.3409691Google ScholarDigital Library
Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2021. Text recognition in the wild: a survey. Comput. Surveys 54, 2 (2021). https://doi.org/10.1145/3440756Google ScholarDigital Library
Nathan Cooper, Carlos Bernal-Cárdenas, Oscar Chaparro, Kevin Moran, and Denys Poshyvanyk. 2021. It takes two to tango: combining visual and textual information for detecting duplicate video-based bug reports. In IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 957–969. https://doi.org/10.1109/ICSE43902.2021.00091Google ScholarDigital Library
Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST). 845–854. https://doi.org/10.1145/3126594.3126651Google ScholarDigital Library
Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2963–2970. https://doi.org/10.1109/CVPR.2010.5540041Google ScholarCross Ref
Mattia Fazzini and Alessandro Orso. 2017. Automated cross-platform inconsistency detection for mobile apps. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 308–318. https://doi.org/10.1109/ASE.2017.8115644Google ScholarCross Ref
Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Xiang Bai, Jing Wang, and Shenggao Zhu. 2021. Video text tracking with a spatio-temporal complementary model. IEEE Transactions on Image Processing 30 (2021), 9321–9331. https://doi.org/10.1109/TIP.2021.3124313Google ScholarCross Ref
Gang Hu, Linjie Zhu, and Junfeng Yang. 2018. AppFlow: Using machine learning to synthesize robust, reusable UI tests. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 269–282. https://doi.org/10.1145/3236024.3236055Google ScholarDigital Library
Rong Huang, Palaiahnakote Shivakumara, and Seiichi Uchida. 2013. Scene character detection by an edge-ray filter. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 462–466. https://doi.org/10.1109/ICDAR.2013.99Google ScholarDigital Library
Weilin Huang, Zhe Lin, Jianchao Yang, and Jue Wang. 2013. Text localization in natural images using stroke feature transform and text covariance descriptors. In IEEE International Conference on Computer Vision (ICCV). 1241–1248. https://doi.org/10.1109/ICCV.2013.157Google ScholarDigital Library
Xiaodong Huang. 2019. Automatic video scene text detection based on saliency edge map. Multimedia Tools and Applications 78, 24 (2019), 34819–34838. https://doi.org/10.1007/s11042-019-08045-7Google ScholarCross Ref
Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 116, 1 (2016), 1–20. https://doi.org/10.1007/s11263-015-0823-zGoogle ScholarDigital Library
Tauseef Khan, Ram Sarkar, and Ayatullah Faruk Mollah. 2021. Deep learning approaches to scene text detection: a comprehensive review. 54, 5 (2021), 3239–3298. https://doi.org/10.1007/s10462-020-09930-6Google ScholarDigital Library
Vijeta Khare, Palaiahnakote Shivakumara, Paramesran Raveendran, and Michael Blumenstein. 2016. A blind deconvolution model for scene text detection and recognition in video. Pattern Recognition 54(2016), 128–148. https://doi.org/10.1016/j.patcog.2016.01.008Google ScholarDigital Library
Abdulkarim Khormi, Mohammad Alahmadi, and Sonia Haiduc. 2020. A study on the accuracy of OCR engines for source code transcription from programming screencasts. In Proceedings of the IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). 65–75. https://doi.org/10.1145/3379597.3387468Google ScholarDigital Library
Jung Jin Lee, Pyoung Hean Lee, Seong Whan Lee, Alan Yuille, and Christof Koch. 2011. AdaBoost for text detection in natural scene. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 429–434. https://doi.org/10.1109/ICDAR.2011.93Google ScholarDigital Library
Jun Wei Lin, Navid Salehnamadi, and Sam Malek. 2020. Test automation in open-source Android apps: a large-scale empirical study. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1078–1089. https://doi.org/10.1145/3324884.3416623Google ScholarDigital Library
Chunmei Liu, Chunheng Wang, and Ruwei Dai. 2005. Text detection in images based on unsupervised classification of edge-based features. In 8th International Conference on Document Analysis and Recognition (ICDAR). 610–614 Vol. 2. https://doi.org/10.1109/ICDAR.2005.228Google ScholarDigital Library
Xiaoqian Liu and Weiqiang Wang. 2012. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Transactions on Multimedia 14, 2 (2012), 482–489. https://doi.org/10.1109/TMM.2011.2177646Google ScholarDigital Library
Shangbang Long, Xin He, and Cong Yao. 2021. Scene text detection and recognition: the deep learning era. International Journal of Computer Vision 129, 1 (2021), 161–184. https://doi.org/10.1007/s11263-020-01369-0Google ScholarDigital Library
Michael R. Lyu, Jiqiang Song, and Min Cai. 2005. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology 15, 2(2005), 243–255. https://doi.org/10.1109/TCSVT.2004.841653Google ScholarCross Ref
Shilpa Mahajan and Rajneesh Rani. 2021. Text detection and localization in scene images: a broad review. 54, 6 (2021), 4317–4377. https://doi.org/10.1007/s10462-021-10000-8Google ScholarDigital Library
V. N. Manjunath Aradhya, H. T. Basavaraju, and D. S. Guru. 2021. Decade research on text detection in images/videos: a review. 14, 2 (2021), 405–431. https://doi.org/10.1007/s12065-019-00248-zGoogle Scholar
Tuan Anh Nguyen and Christoph Csallner. 2016. Reverse engineering mobile application user interfaces with REMAUI. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 248–259. https://doi.org/10.1109/ASE.2015.32Google ScholarDigital Library
Ednawati Rainarli, Suprapto, and Wahyono. 2021. A decade: Review of scene text detection methods. Computer Science Review 42 (2021), 100434. https://doi.org/10.1016/j.cosrev.2021.100434Google ScholarDigital Library
Zobeir Raisi, Mohamed A. Naiel, Paul Fieguth, Steven Wardell, and John Zelek. 2020. Text detection and recognition in the wild: a review. (2020), 13–15. arxiv:2006.04305http://arxiv.org/abs/2006.04305Google Scholar
Palaiahnakote Shivakumara, Rushi Padhuman Sreedhar, Trung Quy Phan, Shijian Lu, and Chew Lim Tan. 2012. Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Transactions on Circuits and Systems for Video Technology 22, 8(2012), 1227–1235. https://doi.org/10.1109/TCSVT.2012.2198129Google ScholarDigital Library
Richard Szeliski. 2010. Computer vision: algorithms and applications (1st ed.). Springer-Verlag, Berlin, Heidelberg.Google Scholar
Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, and Chunhua Shen. 2019. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE International Conference on Computer Vision. 8439–8448. https://doi.org/10.1109/ICCV.2019.00853Google ScholarCross Ref
Mulong Xie, Sidong Feng, Zhenchang Xing, Jieshan Chen, and Chunyang Chen. 2020. UIED: a hybrid tool for GUI element detection. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). ACM, New York, NY, USA, 1655–1659. https://doi.org/10.1145/3368089.3417940Google ScholarDigital Library
Tongtong Xu, Minxue Pan, Yu Pei, Guiyin Li, Xia Zeng, Tian Zhang, Yuetang Deng, and Xuandong Li. 2021. GUIDER: GUI structure and vision co-guided test script repair for Android apps. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 191–203. https://doi.org/10.1145/3460319.3464830Google ScholarDigital Library
Bo Yang, Zhenchang Xing, Xin Xia, Chunyang Chen, Deheng Ye, and Shanping Li. 2021. Don’t do that! Hunting down visual design smells in complex UIs against design guidelines. In Proceedings of the International Conference on Software Engineering (ICSE). 761–772. https://doi.org/10.1109/ICSE43902.2021.00075Google ScholarDigital Library
Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 7(2015), 1480–1500. https://doi.org/10.1109/TPAMI.2014.2366765Google ScholarDigital Library
Xu Cheng Yin, Ze Yu Zuo, Shu Tian, and Cheng Lin Liu. 2016. Text detection, tracking and recognition in video: a comprehensive survey. IEEE Transactions on Image Processing 25, 6 (2016), 2752–2773. https://doi.org/10.1109/TIP.2016.2554321Google ScholarDigital Library
Chong Yu, Yonghong Song, and Yuanlin Zhang. 2016. Scene text localization using edge analysis and feature pool. Neurocomputing 175, PartA (2016), 652–661. https://doi.org/10.1016/j.neucom.2015.10.105Google ScholarDigital Library
Shengcheng Yu, Chunrong Fang, Zhenfei Cao, Xu Wang, Tongyu Li, and Zhenyu Chen. 2021. Prioritize crowdsourced test reports via deep screenshot understanding. In Proceedings of the International Conference on Software Engineering (ICSE). 946–956. https://doi.org/10.1109/ICSE43902.2021.00090Google ScholarDigital Library
Shengcheng Yu, Chunrong Fang, Yexiao Yun, and Yang Feng. 2021. Layout and image recognition driving cross-platform automated mobile testing. In Proceedings of the International Conference on Software Engineering (ICSE). 1561–1571. https://doi.org/10.1109/ICSE43902.2021.00139Google ScholarDigital Library
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2642–2651. https://doi.org/10.1109/CVPR.2017.283Google ScholarCross Ref

Index Terms

Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

GUI-guided repair of mobile test scripts
ICSE '19: Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings

Graphical User Interface (GUI) testing has been the focus of mobile app testing. Manual test cases, containing valuable human knowledge about the apps under test, are often coded as scripts to enable automated and repeated execution for test cost ...
Read More
Inferring Types of References to GUI Objects in Test Scripts
ICST '09: Proceedings of the 2009 International Conference on Software Testing Verification and Validation

Since manual black-box testing of GUI-based APplications (GAPs) is tedious and laborious, test engineers create test scripts to automate the testing process. These test scripts interact with GAPs by performing actions on their GUI objects. Unlike ...
Read More
Neural Networks Pipeline for Offline Machine Printed Arabic OCR

In the context of Arabic optical characters recognition, Arabic poses more challenges because of its cursive nature. We purpose a system for recognizing a document containing Arabic text, using a pipeline of three neural networks. The first network ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
October 2022
2006 pages
ISBN:9781450394758
DOI:10.1145/3551349

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 January 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GUI testing
OCR
computer vision
test automation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate82of337submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 245
  Total Downloads
- Downloads (Last 12 months)156
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

GUI-guided repair of mobile test scripts

Inferring Types of References to GUI Objects in Test Scripts

Neural Networks Pipeline for Offline Machine Printed Arabic OCR

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications

ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

GUI-guided repair of mobile test scripts

Inferring Types of References to GUI Objects in Test Scripts

Neural Networks Pipeline for Offline Machine Printed Arabic OCR

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media