Purely vision-based segmentation of web pages for assistive technology
Introduction
We begin with a general overview of our proposed research into vision-based segmentation of web pages, clarifying the anticipated benefit for users with assistive needs. We then focus more specifically on the computer vision research, highlighting the main elements of what we are proposing and their intended contribution.
Today there are an increasing number of users on the Internet with specific assistive needs. Visually complex webpages with dense, information-rich structures can be difficult for these users to navigate. In this paper, we present research in support of providing systems that facilitate access to web content for users with visual or cognitive impairments. At the core of our solution are computer vision algorithms, applied to produce an effective segmentation of webpage content, which can then be employed to deliver alternative, useful depictions of those webpages to users with assistive needs.
Our approach is one that aims to provide semantically-rich representations of web page content structure by treating web pages as images to be interpreted using computer vision techniques. In developing this framework, we reflected upon its potential value for a wide range of users with challenges requiring assistive technology. Our initial motivation was to support improved audio screen readers for users who are visually impaired [1]. But our system as designed could also support selective presentation of full content for users, of particular benefit for reducing extraneous elements and emphasizing central elements instead. This may be of particular use for users such as the elderly.
We first present the proposed algorithms for segmentation, clarifying the novelty of the computer vision techniques and presenting a validation of the methodology as sound and effective in capturing webpage content. We then examine a host of user communities who may be well served by a system depicting web content that is guided by our algorithms. We also outline some directions for future research, both in extending the technical solution that is offered and also with respect to conducting user studies to demonstrate usability. In all, we emphasize the value of providing a solution that is not tied to the implementation and underlying code of the webpages, discussing how approaching the challenge from a computer vision standpoint offers important contributions for assistive technology.
The objective of our vision-based method is to determine the hierarchical structure of a web page layout using visual cues, without reference to the implementation of the web page. Our intention is for this system to serve as a back-end system, supporting front-end systems that reformat the web page for presentation to the user. Many such front-end systems, such as screen readers, exist today. Existing back-end systems for depicting web pages may use visual cues, but extract them from visual attributes defined in the code. As code-based analysis is brittle, we want to instead leverage the image of the rendered page. We believe that this approach has three principal advantages:
- 1.
It does not depend on the quality or implementation language of the underlying code (provided that the browser’s rendering engine can handle it).
- 2.
It allows for semantically significant divisions within images, Flash objects, and other entities that are treated monolithically in HTML or CSS code.
- 3.
Perhaps most importantly, it analyzes the web page’s structure using as evidence the page designer’s view of the page (the appearance of the rendered web page)1.
Essentially, the advantage of an image-based analysis is that it depends not on the details of how the visual structure of the page is produced, but rather on what the visual appearance is. It uses exactly the information seen by users who do not require assistive technology to make the same type of inference about the structure of the page contents. In this paper we present a robust, extensible Bayesian framework, grounded in a formal model of web page appearance, for performing image-based segmentation of a web page, together with a comparison between the results of such an analysis and more traditional code-based techniques. As we shall see, assistive technology systems that rely on source code-based segmentation algorithms face challenges when there are, for example, images or Flash objects in the page. These algorithms would only be able to treat these objects atomically, and would be unable to detect their internal structure. As a result, users who require distracting content to be suppressed would not be able to select only parts of these objects for display.
Section snippets
Related work
Although relatively few researchers have attempted to use vision-based segmentation of web pages to support screen reader technology, there has been considerable work on using vision-based page segmentation in information retrieval and optical character recognition systems. This section examines some prominent or otherwise interesting techniques used in these and other fields (which could constitute the foundation of a back-end system designed to support effective depiction of web pages for
Our proposed vision-based method
Our system takes as input an image of a rendered web page and produces a hierarchical segmentation of the image. The original image is identical to the output of the browser’s rendering engine, as intended by the page’s designer. The image is segmented by first detecting edges in the image, then searching for the segmentation which is best supported by the edge structure. The system is best viewed at three levels, as follows. At the high-level, it takes as input the image of a rendered page
Implementation and experimental results
We present both qualitative and quantitative results of our algorithm. Qualitative results are shown first, followed by a quantitative comparison between segmentations produced by our algorithm and segmentations produced by taking the bounding boxes of the nodes of the DOM tree of the page. We were inspired to design our algorithm to address a variety of challenges faced by source code-based solutions; Appendix B discusses a several of these challenges, with practical examples.
Our test dataset
Discussion
Although our primary focus is on the application of our method to web pages, there are other, similar domains to which it could be applied, perhaps in a slightly modified forms. Our method is designed for artificial, designed images (as opposed to the more common use of computer vision for natural images) which convey information about their semantic structure through their visual organization. Other cases of this include images of a desktop windowing system, infographics, and academic papers
Applications
In this section, we first discuss sample users with assistive needs in the context of web use. We then describe examples of assistive systems that could produce alternative depictions of web pages based on a segmentation of the page, and show mock-ups of these interface front-ends. Each of these very different interfaces depends on a high-quality segmentation to provide information about the page structure; segmentation algorithms such as ours are versatile back-end components that can be
Future work
There are two primary directions for future work. The first is to extend our existing model, to make it broader or deeper. This is research focused on computer vision. The second is to explore in greater detail the user community requiring assistive technology based on our proposed model. We outline both of these directions for future research below.
Conclusion
In this paper, we have developed a computer-vision model for determining the segmentation of webpages, which can then be leveraged to offer improved depictions of these pages for users with a variety of assistive needs. Our proposed system is a back-end for use in assistive technology systems. This system supplies the front-end with rich, semantically significant information. We have also explained how our system can be readily extended to provide higher-level information such as segment
Acknowledgments
Thanks to NSERC (Natural Sciences and Engineering Research Council of Canada) for financial support. We also wish to acknowledge the contributions of Shari Trewin from IBM TJ Watson during initial brainstorming of ideas on a desiderata for reducing clutter in webpages, as HCI assistive technology; and John A. Doucette, for feedback on an earlier version of the paper. We are grateful as well to the anonymous reviewers for their very helpful comments.
Michael Cormier is a PhD student at the Cheriton School of Computer Science, University of Waterloo. He completed a Master’s thesis in computer science at the University of Waterloo in 2013. His current research interests are computer vision and its applications to assistive technology. He also has an interest in vision with unconventional image formation models. Michael is currently supported by a Canadian government NSERC PGS-D scholarship and by institutional scholarships from the University
References (62)
Automated document segmentation
Pattern Recogn. Lett.
(1994)Page segmentation using the description of the background
Comput. Vis. Image Understand.
(1998)- et al.
Barriers common to mobile and disabled web users
Interact. Comput.
(2011) - et al.
Personalising web page presentation for older people
Interact. Comput.
(2006) - et al.
A performance evaluation of statistical tests for edge detection in textured images
Comput. Vis. Image Underst.
(2014) - et al.
Revisiting breadth vs. depth in menu structures for blind users of screen readers
Interact. Comput.
(2010) - et al.
A robust vision-based framework for screen readers
Proceedings of the 2014 Workshop on Assistive Computer Vision and Robotics
(2014) - et al.
A perceptually-supported sketch editor
Proceedings of the 7th annual ACM symposium on User interface software and Technology
(1994) - et al.
A document segmentation, classification and recognition system
Proceedings of the Second International Conference on Systems Integration, 1992 (ICSI ’92)
(1992) - K. Kise, Handbook of Document Image Processing and Recognition, Springer-Verlag,...
Web document segmentation for better extraction of information: A review
Int. J. Comput. Appl.
VIPS: A Vision-Based Page Segmentation Algorithm, Technical Report, MSR-TR-2003-79
Learning important models for web page blocks based on layout and content analysis
SIGKDD Explor. Newsl.
Structured document segmentation and representation by the modified x-y tree
Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR ’99.
Detecting web content function using generalized hidden Markov model
Proceedings of the 5th International Conference on Machine Learning and Applications, ICMLA ’06.
Prefab: Implementing advanced behaviors using pixel-based reverse engineering of interface structure
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A versatile model for web page representation, information extraction and content re-packaging
Proceedings of the 11th ACM Symposium on Document Engineering, DocEng ’11
Vision-based SLAM and moving objects tracking for the perceptual support of a smart walker platform
Proceedings of the 2014 Workshop on Assistive Computer Vision and Robotics
Descending stairs detection with low-power sensors
Proceedings of the 2014 Workshop on Assistive Computer Vision and Robotics
An intelligent powered wheelchair for users with dementia: Case studies with noah (navigation and obstacle avoidance help).
Proceedings of the AAAI Fall Symposium: Artificial Intelligence for Gerontechnology
Regionspeak: Quick comprehensive spatial descriptions of complex images for blind users
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Trajectory capture in frontal plane geometry for visually impaired
Proceedings of 2006 International Conference on Auditory Displays
Issues in web presentation for cognitive accessibility
Evaluating websites for older adults: adherence to ‘senior-friendly’ guidelines and end-user performance
Behav. Inf. Technol.
Improving accessibility through the visual structure of web contents
Proceedings of the 4th International Conference on Universal Access in Human-Computer Interaction (UAHCI ’07)
Annotation-based transcoding for nonvisual web access
Proceedings of the ASSETS 2000
Site-wide annotation: Reconstructing existing pages to be accessible
Proceedings of the ASSETS 2002
Screen readers cannot see (ontology based semantic annotation for visually impaired web travellers)
Csurf: a context-driven non-visual webbrowser
Proceedings of the 16th International Conference on World Wide Web, WWW 2007
Cited by (19)
Box clustering segmentation: A new method for vision-based web page preprocessing
2017, Information Processing and ManagementCitation Excerpt :In our Box Clustering Segmentation method, we strictly avoid using DOM and the HTML-based heuristics. We use a purely visual representation of the documents which makes our method closer to other methods based on the graphical document representation (Cormier et al., 2016; Wei et al., 2015). On the other hand, we don’t detect the visual separators explicitly and the clustering approach is closer to the Web Content Clustering by Alcic and Conrad (2011).
Computer vision for assistive technologies
2017, Computer Vision and Image UnderstandingCitation Excerpt :In this field, another issue to be faced by computer vision is related to soft and hard biometrics: the recognition of the persons in front of the assisted person (i.e. the possibility to have an accurate face recognition system) is an increasing demand from visual impaired users (Chaudhry and Chandra, 2015). A novel vision-based method to analyze the layout of a web page to facilitate access to web content for users with visual impairments was proposed in Cormier et al. (2016). Another useful application concerns with the design, development and evaluation of wearable mobile reading devices that rely on robust document image analysis in order to identify the structure of the document (Keefer and Bourbakis, 2014; Keefer et al., 2013; Koo and Cho, 2010).
Web Page Content Block Identification with Extended Block Properties
2023, Applied Sciences (Switzerland)Defining Patterns for a Conversational Web
2023, Conference on Human Factors in Computing Systems - ProceedingsUtilizing Machine Learning for the Identification of Visually Similar Web Elements
2023, Proceedings - 2023 IEEE International Conference on e-Business Engineering, ICEBE 2023Methods for Automatic Web Page Layout Testing and Analysis: A Review
2023, IEEE Access
Michael Cormier is a PhD student at the Cheriton School of Computer Science, University of Waterloo. He completed a Master’s thesis in computer science at the University of Waterloo in 2013. His current research interests are computer vision and its applications to assistive technology. He also has an interest in vision with unconventional image formation models. Michael is currently supported by a Canadian government NSERC PGS-D scholarship and by institutional scholarships from the University of Waterloo. He also serves as a Graduate Ambassador for the Cheriton School.
Karyn Moffatt is an assistant professor in the School of Information Studies at McGill University. Her broad research area is Human Computer Interaction (HCI), with a specific focus on the ways in which technology can be employed to meet the needs of older adults and people with disabilities. Prior to joining McGill University, Karyn was a post-doctoral fellow at the University of Toronto supported by awards from NSERC and CIHR s Health Care, Technology, and Place strategic initiative. She received her doctorate in computer science from the University of British Columbia in 2010.
Robin Cohen is a Professor at the David R. Cheriton School of Computer Science at the University of Waterloo, in Waterloo, Ontario, Canada. Her research interests are in the subfields of user modeling and multiagent systems, within artificial intelligence. One focus of her current work is on providing streamlined presentation of content to users in online settings such as social networks. She has been a faculty member at the University of Waterloo for over 30 years and is a former Associate Dean of Research in the Faculty of Mathematics. She is also a Senior Member of the AAAI.
Richard Mann is an Associate Professor in the Cheriton School of Computer Science at University of Waterloo. His interests are in the areas of Artificial Intelligence, Perception and Learning, Computer Vision and Computer Audio.