Abstract
In this paper, we focus on the problem of finding frequent subtrees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm RSTMiner that computes all rooted subtrees appearing in a collection of XML data trees with frequent above a user-specified threshold using a special structure Me-tree. In this algorithm, Me-tree is used as a merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. The keys of the algorithm are efficient pruning candidates with Me-Tree structure and incrementally enumerating all rooted sub-trees in canonical form based on a extended right most expansion technique. Experiment results show that RSTMiner algorithm is efficient and scalable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, K., Liu, H.: Discovering Structural Association of Semistructured Data. IEEE Trans. Knowl. Data Eng. 12, 353–371 (2000)
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining, 307–328 (1996)
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees: Proofs. Technical Report 1, Leiden Institute of Advanced Computer Science, Universiteit Leiden (2003)
Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proceedings of the IEEE International Conference on Data Mining, IEEE ICDM (2002)
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Yang, L.H., Lee, M.-L., Hsu, W., Acharya, S.: Mining Frequent Quer Patterns from XML Queries. In: DASFAA, pp. 355–362 (2003)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: Proceedings of the first International Workshop on Mining Graphs, Trees and Sequences, MGTS 2003 (2003)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM 2002 (2002)
Liu, G., Lu, H., Xu, Y., Yu, J.X.: Ascending Frequency Ordered Prefixtree: Efficient Mining of Frequent Patterns. In: DASFAA, pp. 65–72 (2003)
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)
Yang, L.H., Lee, M.-L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: VLDB, pp. 69–80 (2003)
Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 47–52. Springer, Heidelberg (2001)
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, WS., Liu, DX., Zhang, JP. (2004). A Novel Method for Mining Frequent Subtrees from XML Data. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-28651-6_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22881-3
Online ISBN: 978-3-540-28651-6
eBook Packages: Springer Book Archive