Copyright © 2005 Elsevier Inc. All rights reserved.
Efficient evaluation of linear path expressions on large-scale heterogeneous XML documents using information retrieval techniques
Received 28 May 2004;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
We propose XIR-Linear, a method for efficiently evaluating linear path expressions (LPEs) on large-scale heterogeneous XML documents using information retrieval (IR) techniques. LPEs are the primary form of XPath queries, and their evaluation techniques have been researched actively. XPath queries in their general form are partial match queries, and these queries are particularly useful for searching documents of heterogeneous schemas. Thus, XIR-Linear is geared for partial match queries expressed as LPEs. XIR-Linear has its basis on existing methods using relational tables (e.g., XRel, XParent), and drastically improves their efficiency using the inverted index technique. Specifically, it indexes the labels in label paths (i.e., sequences of node labels) like keywords in texts, and finds the label paths matching the LPE far more efficiently than string match used in the existing methods. We demonstrate the efficiency of XIR-Linear by comparing it with XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Linear outperforms XRel and XParent by an order of magnitude with the performance gap widening as database size grows.
Keywords: XML; Inverted indexes; Partial match queries; Information retrieval
Article Outline
- 1. Introduction
- 2. Preliminaries
- 2.1. XML document model
- 2.2. XML query model
- 3. Related Work
- 4. XIR-Linear storage structures
- 5. XIR-Linear query processing algorithms
- 6. Performance evaluation
- 6.1. Experimental setup
- 6.1.1. Databases
- 6.1.2. Queries
- 6.1.3. Computing environment
- 6.2. Experimental results
- 7. Conclusions
- References







E-mail Article
Add to my Quick Links

Cited By in Scopus (0)






