ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Journal of Systems and Software
Volume 79, Issue 2, February 2006, Pages 180-190
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (613 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.jss.2005.05.009    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2005 Elsevier Inc. All rights reserved.

Efficient evaluation of linear path expressions on large-scale heterogeneous XML documents using information retrieval techniques

Young-Ho Parka, Corresponding Author Contact Information, E-mail The Corresponding Author, Kyu-Young Whanga, E-mail The Corresponding Author, Byung Suk Leeb, E-mail The Corresponding Author and Wook-Shin Hanc, E-mail The Corresponding Author

aDepartment of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), 373-1, Koo-Sung Dong, Yoo-Sung Ku, Daejeon 305-701, Republic of Korea bDepartment of Computer Science, University of Vermont, Burlington, VT 05405, USA cDepartment of Computer Engineering, Kyungpook National University, Daegu 702-701, Republic of Korea

Received 28 May 2004; 
revised 9 May 2005; 
accepted 9 May 2005. 
Available online 5 July 2005.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

We propose XIR-Linear, a method for efficiently evaluating linear path expressions (LPEs) on large-scale heterogeneous XML documents using information retrieval (IR) techniques. LPEs are the primary form of XPath queries, and their evaluation techniques have been researched actively. XPath queries in their general form are partial match queries, and these queries are particularly useful for searching documents of heterogeneous schemas. Thus, XIR-Linear is geared for partial match queries expressed as LPEs. XIR-Linear has its basis on existing methods using relational tables (e.g., XRel, XParent), and drastically improves their efficiency using the inverted index technique. Specifically, it indexes the labels in label paths (i.e., sequences of node labels) like keywords in texts, and finds the label paths matching the LPE far more efficiently than string match used in the existing methods. We demonstrate the efficiency of XIR-Linear by comparing it with XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Linear outperforms XRel and XParent by an order of magnitude with the performance gap widening as database size grows.

Keywords: XML; Inverted indexes; Partial match queries; Information retrieval

Article Outline

1. Introduction
2. Preliminaries
2.1. XML document model
2.2. XML query model
3. Related Work
3.1. Instance-level methods
3.2. Schema-level methods
3.2.1. XRel
3.2.2. XParent
4. XIR-Linear storage structures
5. XIR-Linear query processing algorithms
6. Performance evaluation
6.1. Experimental setup
6.1.1. Databases
6.1.2. Queries
6.1.3. Computing environment
6.2. Experimental results
7. Conclusions
References












 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.