ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Information Systems
Volume 33, Issue 6, September 2008, Pages 508-566
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (5092 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.is.2008.02.003    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2008 Elsevier B.V. All rights reserved.

Integrating and querying distributed XML data via XLink

Wolfgang MayCorresponding Author Contact Information, a, E-mail The Corresponding Author, Erik Behrendsa, E-mail The Corresponding Author and Oliver Fritzena, E-mail The Corresponding Author

aInstitut für Informatik, Lotzestrasse 16-18, D-37083 Göttingen, Germany

Received 11 December 2006; 
revised 26 October 2007; 
accepted 5 February 2008. 
Recommended by Prof. M. Yoshikawa. 
Available online 10 March 2008.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

XML instances are not necessarily self-contained but may have connections to remote XML data residing on other servers. In this paper, we show that—in spite of its minor support and use in the XML world—the XLink language provides a powerful mechanism for expressing such links both from the modeling point of view and for actually querying interlinked XML data: in our dbxlink approach, the links are not seen as explicit links (where the users must be aware of the links and traverse them explicitly in their queries), but define views that combine into a logical, transparent XML model which serves as an external schema and can be queried by XPath/XQuery. We motivate the underlying modeling and give a concise and declarative specification as an XML-to-XML mapping. We also describe the implementation of the model as an extension of the eXist [eXist: an Open Source Native XML Database, http://exist-db.org/] XML database system. The approach can be applied both for distribution of data and for integration of data from autonomous sources.

Keywords: XML; Distributed Data; Data Integration; Querying XML

Article Outline

1. Introduction
1.1. Aside: Data Integration Approaches
1.2. Structure of the paper
1.3. Running example
1.4. Relationship with previous publications
2. Linking XML data
2.1. XPointer
2.2. XLink
2.3. Usage of XLinks in XML Instances
2.4. XInclude
3. Query support for XLink references
4. Scenarios where XLink references can be applied
4.1. Data reorganization and splitting
4.2. Data integration
4.3. Data integration process
4.4. Requirements
5. Mapping XLink references to (virtual) XML instances
5.1. Retaining the logical model after splitting an XML instance
5.2. Operations for integrating XML data via XLinks
5.3. Modeling directives
6. Formal Specification
6.1. Phi: the XML-to-XML mapping
6.2. Gamma: expanding individual XLink elements
6.3. Details: handling of IDREF/IDREFS
6.4. Summary
7. Logical Model: Analysis, Problematic Cases and the Frontiers
7.1. Theorems
7.2. Integration Examples
7.3. Directed acyclic graphs and infinite trees
7.4. Further usage of XLink views
7.5. Considerations on databases and documents
7.6. Pathological cases
7.7. Frontiers
7.7.1. Upward and sideways axes
7.7.2. IDREF steps
7.8. Semantics of queries against linked XML instances
8. Enabling Xpath/Xquery engines for handling XLinks
8.1. Stepwise query evaluation
8.2. Relevant XLinks for the current navigation step
8.3. Distributed evaluation strategies
8.3.1. Distributed evaluation—hybrid shipping
8.3.2. Local evaluation—data shipping
8.3.3. Remote evaluation—query shipping
8.4. Non-downward axes and absolute paths
8.5. IDs, virtual IDs and dereferencing
8.5.1. IDREFs in referenced documents
8.5.2. Virtual IDs from make-attribute
8.6. Default
8.7. Another design and evaluation example: nested linking
8.8. Evaluation timepoints/activating event
8.9. Caching strategies
9. Evaluation: controlling and pruning the search space
9.1. Search Space and Cycles
9.1.1. Cycles in the logical model
9.1.2. Cycles in the links
9.2. Optimization algorithms and data structures
9.2.1. Runtime metadata analysis
9.2.2. Static metadata analysis
9.2.3. Data guides
9.2.4. Indexes on local data
9.2.5. Indexing and query answering on distributed data
9.2.6. Requirements on indexing for dbxlink
9.2.7. Projecting XML fragments
9.2.8. Stream processing
9.2.9. Parallel evaluation
9.2.10. Query containment/Caching
10. Discussion and comparison
10.1. Comparison with related approaches
10.2. Generalization of the approach
11. Conclusion and perspectives
11.1. Testbed and demonstrator
11.2. Application areas
11.3. Further work
11.3.1. Connecting to Web Services
11.3.2. Third-party links and data injection
11.3.3. Optimization
References


















Information Systems
Volume 33, Issue 6, September 2008, Pages 508-566
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.