ABSTRACT
This paper introduces the MTree index algorithm, a special purpose XML XPath index designed to meet the needs of the hierarchical XPath query language. With the increasing importance of XML, XPath, and XQuery, several methods have been proposed for creating XML structure indexes and many variants using relational technology have been proposed. This work proposes a new XML structure index, called MTree, which is designed to be optimal for traversing all XPath axes. The primary feature of MTree lies in its ability to provide the next subtree root node in document order, for all axes, to each context node in O(1). MTree is a special purpose XPath index structure that matches the special purpose query requirements for XPath. This approach is in contrast to other approaches that map the problem domain into general purpose index structures such as B-Tree that must reconstruct the XML tree from those structures for every query. MTree supports modification operations such as insert and delete. MTree has been implemented both in memory and on disk, and performance results using XMark benchmark data are presented showing up to two orders of magnitude improvement over other well-known implementations.
- Anders Berglund, Scott Boag, Don Chamberlin, Mary F. Fernández, Michael Kay, Jonathan Robie, Jérôme Siméon. XML Path Language (XPath) 2.0 W3C Working Draft 29 October 2004, http://www.w3.org/TR/xpath20/Google Scholar
- Grust, T. Accelerating XPath Location Steps. ACM SIGMOD 2002, June 4--6, Madison, Wisconsin, USA Google ScholarDigital Library
- Roberto Bayardo, Vanja Josifovski, Daniel Gruhl, Jussi Myllymaki. An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing. WWW2004, May 17--22, 2004, New York, New York, www.almaden.ibm.com/cs/people/bayardo/ps/www04.pdf Google ScholarDigital Library
- Peter Buneman, Martin Grohe, Ghristoph Kock. Path Queries on Compressed XML. Proceedings of the 29th VLDB Conference, Berlin, Germany, 2003. Google ScholarDigital Library
- Raghav Kaushik, Philip Bohannon, Jeffery F. Naughton, Henry F. Korth. Covering Indexes for Branching Path Queries. ACM SIGMOD 2002, June 4--6, Madison, WI. Google ScholarDigital Library
- Rajasekar Krishnamurthy, Raghav Kaushik, and Jeffery F. Naughton. XML-to-SQL Query Translation Literature: The State of the Art and Open Problems, XML Symposium (XSym) 2003, pages 1--18 http://www.cs.wisc.edu/~sekar/publications.htmlGoogle ScholarCross Ref
- Mary Fernandez and Jerome Simeon. Growing XQuery. ECOOP 2003 - Object-Oriented Programming, 17th European Conference, Darmstadt, Germany, July 21--25, 2003, Proceedings, pages 405--430Google Scholar
- Xeres2 Java Parser http://xml.apache.org/xerces2-j/Google Scholar
- Sax Document Tracer Example http://xml.apache.org/xerces2-j/samples-sax.htmlGoogle Scholar
- Michael Stoner. Portable Performance Measurement Macros for Intel Architecture. http://www.intel.com/cd/ids/developer/asmona/eng/microprocessors/ia32/pentium4/optimization/19949.htm?page=1Google Scholar
- XMark---An XML Benchmark Project. http://monetdb.cwi.nl/xml/index.htGoogle Scholar
- Georg Gottlob, Christoph Koch, Reinhard Pickler. Efficient Algorithms for Processing XPath Queries. Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002 Google ScholarDigital Library
- Xalan-Java version 2.6.0. http://xml.apache.org/xalan-jGoogle Scholar
- Q. Li and B. Moon. Indexing and querying XML data for regular path expressions, In Proc. of VLDB 2001, Roma, Italy, September, 2001. http://citeseer.ist.psu.edu/li01indexing.html Google ScholarDigital Library
- Saxon-B 8.1.1. http://saxon.sourceforge.net/Google Scholar
- S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural joins: a primitive for efficient XML query pattern matching, pages 141--152, ICDE, 2002Google ScholarCross Ref
- Dennis Shasha, Jason T. L. Wang, Rosalba Giugno. Algorithmics and applications of tree and graph searching. Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, Madison, Wisconsin, June 2002. Google ScholarDigital Library
- Eclipse IDE. http://www.eclipse.org/Google Scholar
- Haixun Wang, Sanghyun Park, Wei Fan, Philip S. Yu. ViST: A Dynamic Index Method for Querying XML Data by Tree Structures. ACM SIGMOD 2003, June 9--12, San Diego, CA Google ScholarDigital Library
- Jason McHugh, Jennifer Widom. Query Optimization for XML. Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999 Google ScholarDigital Library
- Nicolas Bruno, Nick Koudas, Divesh Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. ACM SIGMOD 2002, June 4--6, Madison, Wisconsin, USA Google ScholarDigital Library
- Raghav Kaushik, Rajasekar Krishnamurthy, Jeffery F. Naughton, Raghu Ramakrishnaan. On the Integration of Structure Indexes and Inverted Lists. ACM SIGMOD 2004 June 13--18, Paris France Google ScholarDigital Library
- Amelie Marian and Jerome Simeon. Projecting XML Documents. Proceedings of the 29th VLDB Conference, Berlin, Germany, 2003 Google ScholarDigital Library
- Zhiyuan Chen, Chen Li, Jian Pei, Yufei Tao, Haixun Wang Wei Wang, Jiong Yang, Jun Yang, Donghui Zhang. Recent Progress on Selected Topics in Database Research: A Report by Nine Young Chinese Researchers Working in the United States. Indexing XML by Tree Structures. Journal of Computer Science and Technology archive, Volume 18, Issue 5, September 2003, Pages: 538--552 Google ScholarDigital Library
- Schwentick, Thomas. Xpath Query Containment. SIGMOD Record, Vol 33, No.1, March 2004 Google ScholarDigital Library
- Hanyu Li, Mong Li Lee, Wynne Hsu, Chao Chen. An evaluation of XML indexes for structural join. ACM SIGMOD Record, Volume 33 Issue 3, September 2004 Google ScholarDigital Library
Index Terms
- MTree: an XML XPath graph index
Recommendations
SPath: a path language for XML schema
WWW '07: Proceedings of the 16th international conference on World Wide WebXML is increasingly being used as a typed data format, and therefore it becomes more important to gain access to the type system; very often this is an XML Schema. The XML Schema Path Language (SPath) presented in this paper provides access to XML ...
XXS: Efficient XPath Evaluation on Compressed XML Documents
The eXtensible Markup Language (XML) is acknowledged as the de facto standard for semistructured data representation and data exchange on the Web and many other scenarios. A well-known shortcoming of XML is its verbosity, which increases manipulation, ...
Schema-conscious XML indexing
User queries on extensible markup language (XML) documents are typically expressed as regular path expressions. A variety of indexing techniques for efficiently retrieving the results to such queries have been proposed in the recent literature. While ...
Comments