Copyright © 2008 Elsevier B.V. All rights reserved.
Integrating and querying distributed XML data via XLink
Received 11 December 2006;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
XML instances are not necessarily self-contained but may have connections to remote XML data residing on other servers. In this paper, we show that—in spite of its minor support and use in the XML world—the XLink language provides a powerful mechanism for expressing such links both from the modeling point of view and for actually querying interlinked XML data: in our dbxlink approach, the links are not seen as explicit links (where the users must be aware of the links and traverse them explicitly in their queries), but define views that combine into a logical, transparent XML model which serves as an external schema and can be queried by XPath/XQuery. We motivate the underlying modeling and give a concise and declarative specification as an XML-to-XML mapping. We also describe the implementation of the model as an extension of the eXist [eXist: an Open Source Native XML Database, http://exist-db.org/] XML database system. The approach can be applied both for distribution of data and for integration of data from autonomous sources.
Keywords: XML; Distributed Data; Data Integration; Querying XML
Article Outline
- 1. Introduction
- 1.1. Aside: Data Integration Approaches
- 1.2. Structure of the paper
- 1.3. Running example
- 1.4. Relationship with previous publications
- 2. Linking XML data
- 2.1. XPointer
- 2.2. XLink
- 2.3. Usage of XLinks in XML Instances
- 2.4. XInclude
- 3. Query support for XLink references
- 4. Scenarios where XLink references can be applied
- 4.1. Data reorganization and splitting
- 4.2. Data integration
- 4.3. Data integration process
- 4.4. Requirements
- 5. Mapping XLink references to (virtual) XML instances
- 5.1. Retaining the logical model after splitting an XML instance
- 5.2. Operations for integrating XML data via XLinks
- 5.3. Modeling directives
- 6. Formal Specification
- 6.1. Phi: the XML-to-XML mapping
- 6.2. Gamma: expanding individual XLink elements
- 6.3. Details: handling of IDREF/IDREFS
- 6.4. Summary
- 7. Logical Model: Analysis, Problematic Cases and the Frontiers
- 7.1. Theorems
- 7.2. Integration Examples
- 7.3. Directed acyclic graphs and infinite trees
- 7.4. Further usage of XLink views
- 7.5. Considerations on databases and documents
- 7.6. Pathological cases
- 7.7. Frontiers
- 7.7.1. Upward and sideways axes
- 7.7.2. IDREF steps
- 7.8. Semantics of queries against linked XML instances
- 8. Enabling Xpath/Xquery engines for handling XLinks
- 8.1. Stepwise query evaluation
- 8.2. Relevant XLinks for the current navigation step
- 8.3. Distributed evaluation strategies
- 8.3.1. Distributed evaluation—hybrid shipping
- 8.3.2. Local evaluation—data shipping
- 8.3.3. Remote evaluation—query shipping
- 8.4. Non-downward axes and absolute paths
- 8.5. IDs, virtual IDs and dereferencing
- 8.6. Default
- 8.7. Another design and evaluation example: nested linking
- 8.8. Evaluation timepoints/activating event
- 8.9. Caching strategies
- 9. Evaluation: controlling and pruning the search space
- 9.1. Search Space and Cycles
- 9.1.1. Cycles in the logical model
- 9.1.2. Cycles in the links
- 9.2. Optimization algorithms and data structures
- 9.2.1. Runtime metadata analysis
- 9.2.2. Static metadata analysis
- 9.2.3. Data guides
- 9.2.4. Indexes on local data
- 9.2.5. Indexing and query answering on distributed data
- 9.2.6. Requirements on indexing for dbxlink
- 9.2.7. Projecting XML fragments
- 9.2.8. Stream processing
- 9.2.9. Parallel evaluation
- 9.2.10. Query containment/Caching
- 10. Discussion and comparison
- 11. Conclusion and perspectives
- 11.1. Testbed and demonstrator
- 11.2. Application areas
- 11.3. Further work
- 11.3.1. Connecting to Web Services
- 11.3.2. Third-party links and data injection
- 11.3.3. Optimization
- References







E-mail Article
Add to my Quick Links

Cited By in Scopus (0)






