ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Data & Knowledge Engineering
Volume 63, Issue 3, December 2007, Pages 795-810
25th International Conference on Conceptual Modeling (ER 2006) - ’Four of the best papers presented
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (662 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
doi:10.1016/j.datak.2007.04.009    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2007 Elsevier B.V. All rights reserved.

Automatically maintaining navigation sequences for querying semi-structured web sources

Alberto PanCorresponding Author Contact Information, a, E-mail The Corresponding Author, Juan Raposoa, E-mail The Corresponding Author, Manuel Álvareza, E-mail The Corresponding Author, Víctor Carneiroa, E-mail The Corresponding Author and Fernando Bellasa, E-mail The Corresponding Author

aDepartment of Information and Communication Technologies, Facultad de Informatica, Campus de Elviña s/n, University of A Coruña, 15071 A Coruña, Spain

Received 14 November 2006; 
revised 22 February 2007; 
accepted 13 April 2007. 
Available online 13 May 2007.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

A substantial subset of Web data has an underlying structure. For instance, the pages obtained in response to a query executed through a Web search form are usually generated by a program that accesses structured data in a local database, and embeds them into an HTML template. For software programs to gain full benefit from these “semi-structured” Web sources, wrapper programs must be built to provide a “machine-readable” view over them. Since Web sources are autonomous, they may experience changes that invalidate the current wrapper, thus automatic maintenance is an important issue. Wrappers must perform two tasks: navigating through Web sites and extracting structured data from HTML pages. While several works have addressed the automatic maintenance of data extraction tasks, the problem of maintaining the navigation sequences remains unaddressed to the best of our knowledge. In this paper, we propose a set of novel techniques to fill this gap.

Keywords: Technologies of DBs/mediators and wrappers; Data mining/Web-based information; Web/Web-based information systems

Article Outline

1. Introduction
1.1. Wrappers for semi-structured Web sources
1.2. Wrapper maintenance
2. Web navigation and automatic maintenance in ITPilot
2.1. Navigation sequence model and execution
2.2. Wrapper model
2.3. Sequence of steps for maintaining wrappers in ITPilot
3. Maintaining navigation sequences
3.1. Maintaining the query sequence
3.1.1. Finding matches for searchable attributes in a form
3.1.2. Determining if the response pages obtained through a form are valid
3.1.3. Multi-step forms and login/password forms
3.2. “Next interval” sequences
3.3. Maintaining the ‘More detail’ sequence
4. Experiments
4.1. Experiments monitoring real changes
4.2. Experiments simulating changes
5. Related work
6. Conclusions and future work
Acknowledgements
References
Vitae






Data & Knowledge Engineering
Volume 63, Issue 3, December 2007, Pages 795-810
25th International Conference on Conceptual Modeling (ER 2006) - ’Four of the best papers presented
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.