Abstract
Using index to process structural queries on XML data is a natural way. F&B-Index has been proven to be the smallest index which covers all branching path queries. One disadvantage which prevents the wide usage of F&B-Index is that its construction requires lots of time and very large main memory. However, few works focus on this problem. In this paper, we propose an effective and efficient F&B-Index construction algorithm, SAM, for DAG-structured XML data. By maintaining only a small part of index, SAM can save required space of construction. Avoiding complex computation of the selection of nodes to process, SAM takes less time cost than existing algorithms. Theoretical analysis and experimental results show that SAM is correct, effective and efficient.
Research supported by the key Program National Natural Science Foundation of China (NSFC) under Grant No. 60533110, NSFC under Grant No. 60473075, National Grand Fundamental research 973 Program of China under Grant No. 2006CB303000 and Program for New Century Excellent Talents in University (NCET) under Grant No. NCET-05-0333.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)
Wang, W.: PBiTree Coding and Efficient Processing of Containment Joins. In: The 19th International Conference on Data Engineering (ICDE 2003), Bangalore, India, pp. 391–402 (2003)
Chen, Q., Lim, A., Ong, K.W.: D(K)-Index: An Adaptive Structural Summary for Graph-Structured Data. In: Proceedings of the 22nd ACM International Conference on Management of Data (SIGMOD 2003), San Diego, California, USA, pp. 134–144 (2003)
Kaushik, R.: Covering Indexes for Branching Path Queries. In: Proceedings of the ACM SIGMOD Conference, Madison, USA, pp. 133–144 (2002)
Wang, W., Wang, H.: Efficient Processing of XML Path Queries Using the Disk-based F&B Index. In: The 31st Proc. of VLDB, Norway, pp. 145–156 (2005)
Ramanan, P.: Covering Indexes for XML Queries: Bisimulation-Simulation= Negation. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), Berlin, German, pp. 165–176 (2003)
Paige, R., Tarjan, R.E.: Three Partition refinement algorithms. SIAM Journal on Computing 16(6), 973–989 (1987)
Liu, X., Li, J., Wang, H.: SAJ: An F&B-Index Construction Algorithm with Optimized Space Cost. In: Proc. of NDBC, Guangzhou, China, pp. 413–417 (2006)
Gene Ontology, http://www.geneontology.org
XMark. The xml-benchmark project (Apr. 2001), http://www.xml-benchmark.org
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal xml pattern matching. In: SIGMOD, San Jose, CA, pp. 310–321 (2002)
Cormen, T.H., et al.: Introduction to Algorithms. MIT Press, Cambridge (1994)
Park, D.: Concurrency and Automata on Infinite Sequences. In: Deussen, P. (ed.) GI-TCS 1981. LNCS, vol. 104, Springer, Heidelberg (1981)
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proc. of the 23rd VLDB Conf., Greece, pp. 436–445 (1997)
Fernandez, M.F.: Optimizing regular path expressions using graph schemas. In: Proc. of the 14th Int.Conf.on Data Engineering (ICDE 1998), Florida, USA, pp. 14–23 (1998)
Milner, R. (ed.): A Calculus of Communication Systems. LNCS, vol. 92. Springer, Heidelberg (1980)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Liu, X., Li, J., Wang, H. (2007). SAM: An Efficient Algorithm for F&B-Index Construction . In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds) Advances in Data and Web Management. APWeb WAIM 2007 2007. Lecture Notes in Computer Science, vol 4505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72524-4_72
Download citation
DOI: https://doi.org/10.1007/978-3-540-72524-4_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72483-4
Online ISBN: 978-3-540-72524-4
eBook Packages: Computer ScienceComputer Science (R0)