skip to main content
10.1145/3551349.3556966acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications

Published:05 January 2023Publication History

ABSTRACT

Optical character recognition (OCR) algorithms often run slow. They may take several seconds to recognize the texts on a GUI screen, which makes OCR-based widget localization in test automation unfriendly for use, especially on GPU-free computers. This paper first concludes a common type of widget text to be located in GUI testing: label text, which are short texts in widgets like buttons, menu items, and window titles. We then investigate the characteristics of texts on a GUI screen and introduce a fast GPU-independent Label Text Screening (LTS) technique to accelerate the OCR process for label text localization. The technique opens the black box of OCR engines and uses a combination of simple methods to avoid excessive text analysis on a screen as much as possible. Experiments show that, on the subject datasets, LTS reduces the average OCR-based label text localization time to a large extent. On 4k resolution GUI screens, it keeps the localization time below 0.5 seconds for over about 60% of cases without GPU support on a normal laptop computer. In contrast, the existing CPU-based approaches built on popular OCR engines Tesseract, PaddleOCR, and EasyOCR usually need over 2 seconds to achieve the same goal on the same platform. Even with GPU acceleration, they can hardly keep the analysis time in 1 second. We believe the proposed approach would be helpful for implementing OCR-based test automation tools.

References

  1. 2022. Appium: Automation for iOS and Android apps. http://appium.ioGoogle ScholarGoogle Scholar
  2. 2022. EasyOCR. https://github.com/JaidedAI/EasyOCRGoogle ScholarGoogle Scholar
  3. 2022. OpenCV. https://opencv.org/Google ScholarGoogle Scholar
  4. 2022. PaddleOCR. https://github.com/PaddlePaddle/PaddleOCRGoogle ScholarGoogle Scholar
  5. 2022. python-Levenshtein. https://pypi.org/project/python-Levenshtein/Google ScholarGoogle Scholar
  6. 2022. Robotium. https://github.com/RobotiumTech/robotiumGoogle ScholarGoogle Scholar
  7. 2022. The source code and the datasets used in the experiment. https://doi.org/10.6084/m9.figshare.19722013.v1Google ScholarGoogle Scholar
  8. 2022. TencentOCR. https://intl.cloud.tencent.com/products/ocrGoogle ScholarGoogle Scholar
  9. 2022. Tesseract. https://github.com/tesseract-ocr/tesseractGoogle ScholarGoogle Scholar
  10. 2022. UIAutomator. http://developer.android.com/tools/testingsupport-libraryGoogle ScholarGoogle Scholar
  11. Mohammad Alahmadi, Abdulkarim Khormi, Biswas Parajuli, Jonathan Hassel, Sonia Haiduc, and Piyush Kumar. 2020. Code localization in programming screencasts. Empirical Software Engineering 25, 2 (2020), 1536–1572. https://doi.org/10.1007/s10664-019-09759-wGoogle ScholarGoogle ScholarCross RefCross Ref
  12. Emil Alégroth and Robert Feldt. 2017. On the long-term use of visual gui testing in industrial practice: a case study. Empirical Software Engineering 22, 6 (2017), 2937–2971. https://doi.org/10.1007/s10664-016-9497-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Luca Ardito, Andrea Bottino, Riccardo Coppola, Fabrizio Lamberti, Francesco Manigrasso, Lia Morra, and Marco Torchiano. 2022. Feature matching-based approaches to improve the robustness of Android visual GUI testing. ACM Transactions on Software Engineering and Methodology 31, 2(2022). https://doi.org/10.1145/3477427Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Luca Ardito, Riccardo Coppola, Maurizio Morisio, and Marco Torchiano. 2019. Espresso vs. EyeAutomate: an experiment for the comparison of two generations of Android GUI testing. In Proceedings of the Evaluation and Assessment on Software Engineering (Copenhagen, Denmark) (EASE ’19). ACM, New York, NY, USA, 13–22. https://doi.org/10.1145/3319008.3319022Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mohammad Bajammal, Andrea Stocco, Davood Mazinanian, and Ali Mesbah. 2022. A survey on the use of computer vision to improve software engineering tasks. IEEE Transactions on Software Engineering 48, 5 (2022), 1722–1742. https://doi.org/10.1109/TSE.2020.3032986Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Titus Barik, Justin Smith, Kevin Lubick, Elisabeth Holmes, Jing Feng, Emerson Murphy-Hill, and Chris Parnin. 2017. Do developers read compiler error messages?. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 575–585. https://doi.org/10.1109/ICSE.2017.59Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Emil Börjesson and Robert Feldt. 2012. Automated system testing using visual GUI testing tools: A comparative study in industry. In Proceedings of the IEEE 5th International Conference on Software Testing, Verification and Validation (ICST). 350–359. https://doi.org/10.1109/ICST.2012.115Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Min Cai, Jiqiang Song, and Michael R. Lyu. 2002. A new approach for video text detection. In IEEE International Conference on Image Processing, Vol. 1. https://doi.org/10.1109/icip.2002.1037973Google ScholarGoogle Scholar
  19. Tsung Hsiang Chang, Tom Yeh, and Robert C. Miller. 2010. GUI testing using computer vision. In Proceedings of International Conference on Human Factors in Computing Systems (CHI), Vol. 3. 1535–1544. https://doi.org/10.1145/1753326.1753555Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk, and Bernd Girod. 2011. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In International Conference on Image Processing (ICIP). 2609–2612. https://doi.org/10.1109/ICIP.2011.6116200Google ScholarGoogle ScholarCross RefCross Ref
  21. Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?. In Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1202–1214. https://doi.org/10.1145/3368089.3409691Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2021. Text recognition in the wild: a survey. Comput. Surveys 54, 2 (2021). https://doi.org/10.1145/3440756Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nathan Cooper, Carlos Bernal-Cárdenas, Oscar Chaparro, Kevin Moran, and Denys Poshyvanyk. 2021. It takes two to tango: combining visual and textual information for detecting duplicate video-based bug reports. In IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 957–969. https://doi.org/10.1109/ICSE43902.2021.00091Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST). 845–854. https://doi.org/10.1145/3126594.3126651Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2963–2970. https://doi.org/10.1109/CVPR.2010.5540041Google ScholarGoogle ScholarCross RefCross Ref
  26. Mattia Fazzini and Alessandro Orso. 2017. Automated cross-platform inconsistency detection for mobile apps. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 308–318. https://doi.org/10.1109/ASE.2017.8115644Google ScholarGoogle ScholarCross RefCross Ref
  27. Yuzhe Gao, Xing Li, Jiajian Zhang, Yu Zhou, Dian Jin, Xiang Bai, Jing Wang, and Shenggao Zhu. 2021. Video text tracking with a spatio-temporal complementary model. IEEE Transactions on Image Processing 30 (2021), 9321–9331. https://doi.org/10.1109/TIP.2021.3124313Google ScholarGoogle ScholarCross RefCross Ref
  28. Gang Hu, Linjie Zhu, and Junfeng Yang. 2018. AppFlow: Using machine learning to synthesize robust, reusable UI tests. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 269–282. https://doi.org/10.1145/3236024.3236055Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rong Huang, Palaiahnakote Shivakumara, and Seiichi Uchida. 2013. Scene character detection by an edge-ray filter. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 462–466. https://doi.org/10.1109/ICDAR.2013.99Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Weilin Huang, Zhe Lin, Jianchao Yang, and Jue Wang. 2013. Text localization in natural images using stroke feature transform and text covariance descriptors. In IEEE International Conference on Computer Vision (ICCV). 1241–1248. https://doi.org/10.1109/ICCV.2013.157Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xiaodong Huang. 2019. Automatic video scene text detection based on saliency edge map. Multimedia Tools and Applications 78, 24 (2019), 34819–34838. https://doi.org/10.1007/s11042-019-08045-7Google ScholarGoogle ScholarCross RefCross Ref
  32. Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 116, 1 (2016), 1–20. https://doi.org/10.1007/s11263-015-0823-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tauseef Khan, Ram Sarkar, and Ayatullah Faruk Mollah. 2021. Deep learning approaches to scene text detection: a comprehensive review. 54, 5 (2021), 3239–3298. https://doi.org/10.1007/s10462-020-09930-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vijeta Khare, Palaiahnakote Shivakumara, Paramesran Raveendran, and Michael Blumenstein. 2016. A blind deconvolution model for scene text detection and recognition in video. Pattern Recognition 54(2016), 128–148. https://doi.org/10.1016/j.patcog.2016.01.008Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abdulkarim Khormi, Mohammad Alahmadi, and Sonia Haiduc. 2020. A study on the accuracy of OCR engines for source code transcription from programming screencasts. In Proceedings of the IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). 65–75. https://doi.org/10.1145/3379597.3387468Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jung Jin Lee, Pyoung Hean Lee, Seong Whan Lee, Alan Yuille, and Christof Koch. 2011. AdaBoost for text detection in natural scene. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 429–434. https://doi.org/10.1109/ICDAR.2011.93Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jun Wei Lin, Navid Salehnamadi, and Sam Malek. 2020. Test automation in open-source Android apps: a large-scale empirical study. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1078–1089. https://doi.org/10.1145/3324884.3416623Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chunmei Liu, Chunheng Wang, and Ruwei Dai. 2005. Text detection in images based on unsupervised classification of edge-based features. In 8th International Conference on Document Analysis and Recognition (ICDAR). 610–614 Vol. 2. https://doi.org/10.1109/ICDAR.2005.228Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xiaoqian Liu and Weiqiang Wang. 2012. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Transactions on Multimedia 14, 2 (2012), 482–489. https://doi.org/10.1109/TMM.2011.2177646Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shangbang Long, Xin He, and Cong Yao. 2021. Scene text detection and recognition: the deep learning era. International Journal of Computer Vision 129, 1 (2021), 161–184. https://doi.org/10.1007/s11263-020-01369-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Michael R. Lyu, Jiqiang Song, and Min Cai. 2005. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology 15, 2(2005), 243–255. https://doi.org/10.1109/TCSVT.2004.841653Google ScholarGoogle ScholarCross RefCross Ref
  42. Shilpa Mahajan and Rajneesh Rani. 2021. Text detection and localization in scene images: a broad review. 54, 6 (2021), 4317–4377. https://doi.org/10.1007/s10462-021-10000-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. V. N. Manjunath Aradhya, H. T. Basavaraju, and D. S. Guru. 2021. Decade research on text detection in images/videos: a review. 14, 2 (2021), 405–431. https://doi.org/10.1007/s12065-019-00248-zGoogle ScholarGoogle Scholar
  44. Tuan Anh Nguyen and Christoph Csallner. 2016. Reverse engineering mobile application user interfaces with REMAUI. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 248–259. https://doi.org/10.1109/ASE.2015.32Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ednawati Rainarli, Suprapto, and Wahyono. 2021. A decade: Review of scene text detection methods. Computer Science Review 42 (2021), 100434. https://doi.org/10.1016/j.cosrev.2021.100434Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zobeir Raisi, Mohamed A. Naiel, Paul Fieguth, Steven Wardell, and John Zelek. 2020. Text detection and recognition in the wild: a review. (2020), 13–15. arxiv:2006.04305http://arxiv.org/abs/2006.04305Google ScholarGoogle Scholar
  47. Palaiahnakote Shivakumara, Rushi Padhuman Sreedhar, Trung Quy Phan, Shijian Lu, and Chew Lim Tan. 2012. Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Transactions on Circuits and Systems for Video Technology 22, 8(2012), 1227–1235. https://doi.org/10.1109/TCSVT.2012.2198129Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Richard Szeliski. 2010. Computer vision: algorithms and applications (1st ed.). Springer-Verlag, Berlin, Heidelberg.Google ScholarGoogle Scholar
  49. Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, and Chunhua Shen. 2019. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proceedings of the IEEE International Conference on Computer Vision. 8439–8448. https://doi.org/10.1109/ICCV.2019.00853Google ScholarGoogle ScholarCross RefCross Ref
  50. Mulong Xie, Sidong Feng, Zhenchang Xing, Jieshan Chen, and Chunyang Chen. 2020. UIED: a hybrid tool for GUI element detection. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). ACM, New York, NY, USA, 1655–1659. https://doi.org/10.1145/3368089.3417940Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Tongtong Xu, Minxue Pan, Yu Pei, Guiyin Li, Xia Zeng, Tian Zhang, Yuetang Deng, and Xuandong Li. 2021. GUIDER: GUI structure and vision co-guided test script repair for Android apps. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 191–203. https://doi.org/10.1145/3460319.3464830Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Bo Yang, Zhenchang Xing, Xin Xia, Chunyang Chen, Deheng Ye, and Shanping Li. 2021. Don’t do that! Hunting down visual design smells in complex UIs against design guidelines. In Proceedings of the International Conference on Software Engineering (ICSE). 761–772. https://doi.org/10.1109/ICSE43902.2021.00075Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 7(2015), 1480–1500. https://doi.org/10.1109/TPAMI.2014.2366765Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xu Cheng Yin, Ze Yu Zuo, Shu Tian, and Cheng Lin Liu. 2016. Text detection, tracking and recognition in video: a comprehensive survey. IEEE Transactions on Image Processing 25, 6 (2016), 2752–2773. https://doi.org/10.1109/TIP.2016.2554321Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Chong Yu, Yonghong Song, and Yuanlin Zhang. 2016. Scene text localization using edge analysis and feature pool. Neurocomputing 175, PartA (2016), 652–661. https://doi.org/10.1016/j.neucom.2015.10.105Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Shengcheng Yu, Chunrong Fang, Zhenfei Cao, Xu Wang, Tongyu Li, and Zhenyu Chen. 2021. Prioritize crowdsourced test reports via deep screenshot understanding. In Proceedings of the International Conference on Software Engineering (ICSE). 946–956. https://doi.org/10.1109/ICSE43902.2021.00090Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Shengcheng Yu, Chunrong Fang, Yexiao Yun, and Yang Feng. 2021. Layout and image recognition driving cross-platform automated mobile testing. In Proceedings of the International Conference on Software Engineering (ICSE). 1561–1571. https://doi.org/10.1109/ICSE43902.2021.00139Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2642–2651. https://doi.org/10.1109/CVPR.2017.283Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Accelerating OCR-Based Widget Localization for Test Automation of GUI Applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
      October 2022
      2006 pages
      ISBN:9781450394758
      DOI:10.1145/3551349

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 January 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate82of337submissions,24%
    • Article Metrics

      • Downloads (Last 12 months)156
      • Downloads (Last 6 weeks)19

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format