Abstract
We describe two experiments with a system designed to facilitate the use of mobile optical character recognition (OCR) by blind people. This system, implemented as an iOS app, enables two interaction modalities (autoshot and guidance). In the first study, augmented reality fiducials were used to track a smartphone’s camera, whereas in the second study, the text area extent was detected using a dedicated text spotting and text line detection algorithm. Although the guidance modality was expected to be superior in terms of faster text access, this was shown to be true only when some conditions (involving the user interface and text detection modules) are met. Both studies also showed that our participants, after experimenting with the autoshot or guidance modality, appeared to have improved their skill at taking OCR-readable pictures even without use of such interaction modalities.
- Hend S. Al-Khalifa. 2008. Utilizing QR code and mobile phones for blinds and visually impaired people. In Proceedings of the International Conference on Computers for Handicapped Persons. 1065--1069. Google ScholarDigital Library
- Jeffrey Bigham, Chandrika Jayant, Andrew Miller, Brandyn White, and Tom Yeh. 2010b. VizWiz::LocateIt—enabling blind people to locate objects in their environment. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10).Google ScholarCross Ref
- Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samuel White, and Tom Yeh. 2010a. VizWiz: Nearly real-time answers to visual questions. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology (UIST’10). ACM, New York, NY, 333--342. Google ScholarDigital Library
- Alessandro Bissacco, Mark Cummins, Yuval Netzer, and Hartmut Neven. 2013. PhotoOCR: Reading text in uncontrolled conditions. In Proceedings of the IEEE International Conference on Computer Vision. 785--792. Google ScholarDigital Library
- Erin Brady, Meredith Ringel Morris, Yu Zhong, Samuel White, and Jeffrey P. Bigham. 2013. Visual challenges in the everyday lives of blind people. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). 2117--2126. Google ScholarDigital Library
- Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32. Google ScholarDigital Library
- Rickey Dale Burks, Charles Lee Oakes III, Randy Ray Morlen, Bharat Prasad, Michael Frank Morris, and Xia Hua. 2012. Systems and methods to use a digital camera to remotely deposit a negotiable instrument. US Patent 8,290,237.Google Scholar
- John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8, 6, 679--698. Google ScholarDigital Library
- James Coughlan and Roberto Manduchi. 2013. Camera-based access to visual information. In Assistive Technology for Blindness and Low Vision, R. Manduchi and S. Kurniawan (Eds.). CRC Press, Boca Raton, FL, 219--246.Google Scholar
- Michael P. Cutter and Roberto Manduchi. 2013. Real time camera phone guidance for compliant document image acquisition without sight. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition. IEEE, Los Alamitos, CA, 408--412. Google ScholarDigital Library
- Michael P. Cutter and Roberto Manduchi. 2015. Towards mobile OCR: How to take a good picture of a document without sight. In Proceedings of the 2015 ACM Symposium on Document Engineering. ACM, New York, NY, 75--84. Google ScholarDigital Library
- C. Patrick Doncaster and Andrew J. H. Davey. 2007. Analysis of Variance and Covariance: How to Choose and Construct Models for the Life Sciences. Cambridge University Press.Google Scholar
- Richard O. Duda and Peter E. Hart. 1972. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM 15, 1, 11--15. Google ScholarDigital Library
- B. Epshtein, E. Ofek, and Y. Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10).Google Scholar
- Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press. Google ScholarDigital Library
- Donald Hedeker and Robert D. Gibbons. 2006. Longitudinal Data Analysis. Vol. 451. John Wiley 8 Sons.Google Scholar
- Bill Holton. 2016. A day in the life: Technology that assists a visually impaired person throughout the day. AFB AccessWorld Magazine 17, 2. Available at http://www.afb.org/afbpress/pubnew.asp?DocID=aw170202.Google Scholar
- Chandrika Jayant, Hanjie Ji, Samuel White, and Jeffrey P. Bigham. 2011. Supporting blind photography. In Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’11). 203--210. Google ScholarDigital Library
- Shaun K. Kane, Brian Frey, and Jacob O. Wobbrock. 2013. Access lens: A gesture-based screen reader for real-world documents. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13). ACM, New York, NY, 347--350. Google ScholarDigital Library
- Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere de las Heras. 2013. ICDAR 2013 Robust Reading Competition. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition (ICDAR’13). Google ScholarDigital Library
- Roberto Manduchi and James M. Coughlan. 2014. The last meter: Blind visual guidance to a target. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 3113--3122. Google ScholarDigital Library
- Lukáš Neumann and Jiří Matas. 2012. Real-time scene text localization and recognition. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 3538--3545. Google ScholarDigital Library
- Siyang Qin and Roberto Manduchi. 2016. A fast and robust text spotter. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, Los Alamitos, CA, 1--8.Google ScholarCross Ref
- Roy Shilkrot, Jochen Huber, Wong Meng Ee, Pattie Maes, and Suranga Nanayakkara. 2015. Fingerreader: A wearable device to explore printed text on the go. In Proceedings of the 33rd Annual Conference on Human Factors in Computing Systems (CHI’15). 2363--2372. Google ScholarDigital Library
- Lee Stearns, Ruofei Du, Uran Oh, Catherine Jou, Leah Findlater, David A. Ross, and Jon E. Froehlich. 2016. Evaluating haptic and auditory directional guidance to assist blind people in reading printed text using finger-mounted cameras. ACM Transactions on Accessible Computing 9, 1, 1. Google ScholarDigital Library
- Deborah Stein. 1998. The Optacon: Past, Present, and Future. Retrieved July 5, 2017, from https://nfb.org//Images/nfb/ Publications/bm/bm98/bm980506.htm.Google Scholar
- Ender Tekin and James M. Coughlan. 2010. A mobile phone application enabling visually impaired users to find and read product barcodes. In Proceedings of the 12th International Conference on Computers Helping People With Special Needs (ICCHP’10). 290--295. DOI:http://dl.acm.org/citation.cfm?id=1880751.1880800 Google ScholarDigital Library
- Marynel Vázquez and Aaron Steinfeld. 2012. Helping visually impaired users properly aim a camera. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Assessibility (ASSETS’12). 95--102. Google ScholarDigital Library
- Ali Zandifar and Antoine Chahine. 2002. A video based interface to textual information for the visually impaired. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI’02). 325. Google ScholarDigital Library
- Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenig Liu, and Xiang Bai. 2016. Multi-oriented text detection with fully convolutional networks. arXiv:1604.04018.Google Scholar
- Yu Zhong, Pierre J. Garrigues, and Jeffrey P. Bigham. 2013. Real time object scanning using a mobile phone and cloud-based visual search engine. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’13). Article No. 20. Google ScholarDigital Library
- Yu Zhong, Walter S. Lasecki, Erin Brady, and Jeffrey P. Bigham. 2015. RegionSpeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15). 2353--2362. Google ScholarDigital Library
Index Terms
- Improving the Accessibility of Mobile OCR Apps Via Interactive Modalities
Recommendations
LêRótulos: A Mobile Application Based on Text Recognition in Images to Assist Visually Impaired People
Universal Access in Human-Computer Interaction. Methods, Technologies, and UsersAbstractThe autonomy of the visual impaired person can be evaluated in day to day activities like recognizing objects, identifying textual information, among others. This paper features the OCR technology-based LêRótulos application, with the objective of ...
Towards a real-time system for finding and reading signs for visually impaired users
ICCHP'12: Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part IIPrinted text is a ubiquitous form of information that is inaccessible to many blind and visually impaired people unless it is represented in a non-visual form such as Braille. OCR (optical character recognition) systems have been used by blind and ...
Mobile device accessibility for the visually impaired: problems mapping and recommendations
Mobile devices can be an important ally that improves the quality of life of visually impaired people by permitting greater independence in the execution of certain tasks and facilitating social inclusion. This work presents a systematic review that ...
Comments