A Novel Method for Mining Frequent Subtrees from XML Data

Zhang, Wan-Song; Liu, Da-Xin; Zhang, Jian-Pei

doi:10.1007/978-3-540-28651-6_44

Wan-Song Zhang¹⁹,
Da-Xin Liu¹⁹ &
Jian-Pei Zhang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3177))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1312 Accesses
3 Citations

Abstract

In this paper, we focus on the problem of finding frequent subtrees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm RSTMiner that computes all rooted subtrees appearing in a collection of XML data trees with frequent above a user-specified threshold using a special structure Me-tree. In this algorithm, Me-tree is used as a merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. The keys of the algorithm are efficient pruning candidates with Me-Tree structure and incrementally enumerating all rooted sub-trees in canonical form based on a extended right most expansion technique. Experiment results show that RSTMiner algorithm is efficient and scalable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. IEEE Trans. Knowl. Data Eng. 12, 353–371 (2000)
Article Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining, 307–328 (1996)
Google Scholar
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees: Proofs. Technical Report 1, Leiden Institute of Advanced Computer Science, Universiteit Leiden (2003)
Google Scholar
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proceedings of the IEEE International Conference on Data Mining, IEEE ICDM (2002)
Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Chapter Google Scholar
Yang, L.H., Lee, M.-L., Hsu, W., Acharya, S.: Mining Frequent Quer Patterns from XML Queries. In: DASFAA, pp. 355–362 (2003)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)
Google Scholar
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: Proceedings of the first International Workshop on Mining Graphs, Trees and Sequences, MGTS 2003 (2003)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM 2002 (2002)
Google Scholar
Liu, G., Lu, H., Xu, Y., Yu, J.X.: Ascending Frequency Ordered Prefixtree: Efficient Mining of Frequent Patterns. In: DASFAA, pp. 65–72 (2003)
Google Scholar
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Yang, L.H., Lee, M.-L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: VLDB, pp. 69–80 (2003)
Google Scholar
http://www.cs.wics.edu/niagara/data.html
Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 47–52. Springer, Heidelberg (2001)
Chapter Google Scholar
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Engineering University, 150001, Harbin, China
Wan-Song Zhang, Da-Xin Liu & Jian-Pei Zhang

Authors

Wan-Song Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Da-Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Pei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Computing, and Mathematics, University of Exeter, EX4 4QF, Exeter, UK
Zheng Rong Yang
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
School of Engineering, Computer Science and Mathematics, University of Exeter, EX4 4QF, UK
Richard M. Everson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, WS., Liu, DX., Zhang, JP. (2004). A Novel Method for Mining Frequent Subtrees from XML Data. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-28651-6_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22881-3
Online ISBN: 978-3-540-28651-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics