skip to main content
research-article

Improving the Accessibility of Mobile OCR Apps Via Interactive Modalities

Published:09 August 2017Publication History
Skip Abstract Section

Abstract

We describe two experiments with a system designed to facilitate the use of mobile optical character recognition (OCR) by blind people. This system, implemented as an iOS app, enables two interaction modalities (autoshot and guidance). In the first study, augmented reality fiducials were used to track a smartphone’s camera, whereas in the second study, the text area extent was detected using a dedicated text spotting and text line detection algorithm. Although the guidance modality was expected to be superior in terms of faster text access, this was shown to be true only when some conditions (involving the user interface and text detection modules) are met. Both studies also showed that our participants, after experimenting with the autoshot or guidance modality, appeared to have improved their skill at taking OCR-readable pictures even without use of such interaction modalities.

References

  1. Hend S. Al-Khalifa. 2008. Utilizing QR code and mobile phones for blinds and visually impaired people. In Proceedings of the International Conference on Computers for Handicapped Persons. 1065--1069. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jeffrey Bigham, Chandrika Jayant, Andrew Miller, Brandyn White, and Tom Yeh. 2010b. VizWiz::LocateIt—enabling blind people to locate objects in their environment. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10).Google ScholarGoogle ScholarCross RefCross Ref
  3. Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samuel White, and Tom Yeh. 2010a. VizWiz: Nearly real-time answers to visual questions. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology (UIST’10). ACM, New York, NY, 333--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alessandro Bissacco, Mark Cummins, Yuval Netzer, and Hartmut Neven. 2013. PhotoOCR: Reading text in uncontrolled conditions. In Proceedings of the IEEE International Conference on Computer Vision. 785--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Erin Brady, Meredith Ringel Morris, Yu Zhong, Samuel White, and Jeffrey P. Bigham. 2013. Visual challenges in the everyday lives of blind people. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). 2117--2126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Rickey Dale Burks, Charles Lee Oakes III, Randy Ray Morlen, Bharat Prasad, Michael Frank Morris, and Xia Hua. 2012. Systems and methods to use a digital camera to remotely deposit a negotiable instrument. US Patent 8,290,237.Google ScholarGoogle Scholar
  8. John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8, 6, 679--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. James Coughlan and Roberto Manduchi. 2013. Camera-based access to visual information. In Assistive Technology for Blindness and Low Vision, R. Manduchi and S. Kurniawan (Eds.). CRC Press, Boca Raton, FL, 219--246.Google ScholarGoogle Scholar
  10. Michael P. Cutter and Roberto Manduchi. 2013. Real time camera phone guidance for compliant document image acquisition without sight. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition. IEEE, Los Alamitos, CA, 408--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michael P. Cutter and Roberto Manduchi. 2015. Towards mobile OCR: How to take a good picture of a document without sight. In Proceedings of the 2015 ACM Symposium on Document Engineering. ACM, New York, NY, 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Patrick Doncaster and Andrew J. H. Davey. 2007. Analysis of Variance and Covariance: How to Choose and Construct Models for the Life Sciences. Cambridge University Press.Google ScholarGoogle Scholar
  13. Richard O. Duda and Peter E. Hart. 1972. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM 15, 1, 11--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Epshtein, E. Ofek, and Y. Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10).Google ScholarGoogle Scholar
  15. Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Donald Hedeker and Robert D. Gibbons. 2006. Longitudinal Data Analysis. Vol. 451. John Wiley 8 Sons.Google ScholarGoogle Scholar
  17. Bill Holton. 2016. A day in the life: Technology that assists a visually impaired person throughout the day. AFB AccessWorld Magazine 17, 2. Available at http://www.afb.org/afbpress/pubnew.asp?DocID=aw170202.Google ScholarGoogle Scholar
  18. Chandrika Jayant, Hanjie Ji, Samuel White, and Jeffrey P. Bigham. 2011. Supporting blind photography. In Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’11). 203--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shaun K. Kane, Brian Frey, and Jacob O. Wobbrock. 2013. Access lens: A gesture-based screen reader for real-world documents. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM, New York, NY, 347--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere de las Heras. 2013. ICDAR 2013 Robust Reading Competition. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition (ICDAR’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Roberto Manduchi and James M. Coughlan. 2014. The last meter: Blind visual guidance to a target. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 3113--3122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lukáš Neumann and Jiří Matas. 2012. Real-time scene text localization and recognition. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 3538--3545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Siyang Qin and Roberto Manduchi. 2016. A fast and robust text spotter. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  24. Roy Shilkrot, Jochen Huber, Wong Meng Ee, Pattie Maes, and Suranga Nanayakkara. 2015. Fingerreader: A wearable device to explore printed text on the go. In Proceedings of the 33rd Annual Conference on Human Factors in Computing Systems (CHI’15). 2363--2372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lee Stearns, Ruofei Du, Uran Oh, Catherine Jou, Leah Findlater, David A. Ross, and Jon E. Froehlich. 2016. Evaluating haptic and auditory directional guidance to assist blind people in reading printed text using finger-mounted cameras. ACM Transactions on Accessible Computing 9, 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Deborah Stein. 1998. The Optacon: Past, Present, and Future. Retrieved July 5, 2017, from https://nfb.org//Images/nfb/ Publications/bm/bm98/bm980506.htm.Google ScholarGoogle Scholar
  27. Ender Tekin and James M. Coughlan. 2010. A mobile phone application enabling visually impaired users to find and read product barcodes. In Proceedings of the 12th International Conference on Computers Helping People With Special Needs (ICCHP’10). 290--295. DOI:http://dl.acm.org/citation.cfm?id=1880751.1880800 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Marynel Vázquez and Aaron Steinfeld. 2012. Helping visually impaired users properly aim a camera. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Assessibility (ASSETS’12). 95--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ali Zandifar and Antoine Chahine. 2002. A video based interface to textual information for the visually impaired. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI’02). 325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenig Liu, and Xiang Bai. 2016. Multi-oriented text detection with fully convolutional networks. arXiv:1604.04018.Google ScholarGoogle Scholar
  31. Yu Zhong, Pierre J. Garrigues, and Jeffrey P. Bigham. 2013. Real time object scanning using a mobile phone and cloud-based visual search engine. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’13). Article No. 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yu Zhong, Walter S. Lasecki, Erin Brady, and Jeffrey P. Bigham. 2015. RegionSpeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15). 2353--2362. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving the Accessibility of Mobile OCR Apps Via Interactive Modalities

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Accessible Computing
        ACM Transactions on Accessible Computing  Volume 10, Issue 4
        October 2017
        129 pages
        ISSN:1936-7228
        EISSN:1936-7236
        DOI:10.1145/3131767
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 August 2017
        • Accepted: 1 March 2017
        • Revised: 1 January 2017
        • Received: 1 September 2016
        Published in taccess Volume 10, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader