Abstract
This chapter will discuss some new research directions in the frequent subtree mining field. This will be discussed from both the application and technical perspectives. Since frequent subtree mining (FSM) is a relatively new field compared with frequent itemset/sequence mining, many lessons can be learned form the more mature research in frequent itemset/sequence mining. A drawback of frequent pattern mining in general is that often, for a set support threshold, the number of frequent patterns becomes quite large due to some characteristics of the database. This may cause not only algorithm complexity problems, but also significant delays in the analysis and interpretation of the results. Many of the patterns may not be useful for the application at hand and/or are redundant, or not of interest to the user. Furthermore, it is also not always clear what support threshold is satisfactory for obtaining reasonable results. These are all important research areas, with some significant achievements in complexity reduction from the algorithmic and application perspectives. Some of these or similar ideas can, to a certain extent, already be applied in the FSM field, but others will need refinements and extensions to be flexible enough to cope with the additional structural properties of the data. In Section 12.2, we highlight some of the work in frequent itemset/sequence mining where the same or similar idea can be applied and prove useful in the FSM field. At the end of Section 12.2, we look at some work that has already been initiated in frequent pattern filtering and the incorporation of application-oriented constraints.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. Paper presented at the Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25 (2004)
Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. Paper presented at the Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of database systems, Seattle, Washington, USA, June 1-3 (1998)
Agrawal, R., Imieliski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington D.C., USA, May 26-28, pp. 207–216. ACM, New York (1993)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamato, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. Paper presented at the Proceedings of the 2nd SIAM International Conference on Data Mining (SIAM 2002), Arlington, VA, USA, April 11-13 (2002)
Bathorn, R., Kopman, A., Siebes, A.: Reducing the Frequent Pattern Set. Paper presented at the Proceedings of the 6th IEEE international Conference on Data Mining – Workshops (ICDMW 2006), Hong Kong, China, December 18-22 (2006)
Bay, S.D., Pazzani, M.J.: Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Beeri, C., Milo, T.: Schemas for integration and translation of structured and semi-structured data. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 296–313. Springer, Heidelberg (1998)
Blumberg, R., Atre, S.: The Problem with Unstructured Data. Information Management Magazine (2003)
Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. International Journal of Information Theories and Applications 10(4), 370–376 (2003)
Brin, S., Motwani, R., Silverstein, C.: Beyond Market Baskets: Generalizing Association Rules to Correlations. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15, pp. 265–276. ACM, New York (1997)
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. Paper presented at the Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15 (1997)
Brodie, M.L.: Computer Science 2.0: A New World of Data Management. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007), Vienna, Austria, September 23-27, p. 1161 (2007)
Bucila, C., Gehrke, J., Kifer, D., White, W.: DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints. Data Mining and Knowledge Discovery 7(3), 241–272 (2003)
Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F.: Data-Peeler: Constraint Based Closed Pattern Mining in n-ary Relations. Paper presented at the Proceedings of the SIAM International Conference on Data Mining (SDM 2008), Atlanta, Georgia, USA, April 24-26 (2008)
Do, H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. Paper presented at the Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), Hong Kong, China, August 20-23 (2002)
Doan, A., Halevy, A.Y.: Semantic-Integration Research in the Database Community: A Brief Survey AI Magazine, 26 (2005)
Feng, L., Chang, E., Dillon, T.S.: A semantic network-based design methodology for XML documents. ACM Transactions on Information Systems 20(4), 390–421 (2002)
Fu, Y., Han, J.: Meta-rule guided mining of association rules in relational databases. Paper presented at the Proceedings of the 1st International Workshop on Knowledge Discovery in Databases with Deductive and Object-Oriented Databases (KDOOD 1995), Singapore, December 8 (1995)
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Paper presented at the Proceedings of the 25th International Conference on Very Large Databases (VLDB 1999), Edinburgh, Scotland, UK, September 7-10 (1999)
Goodman, A., Kamath, C., Kumar, V.: Data Analysis in the 21st Century. Statistical Analysis and Data Mining 1(1), 1–3 (2008)
Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. Paper presented at the Proceedings of the 16th International Conference on Data Engineering (ICDE 2000), San Diego, CA, USA, February 28 - March 3 (2000)
Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993a)
Gruber, T.R.: Towards Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human and Computer Studies 43(5/6), 907–928 (1993b)
Hadzic, F., Dillon, T.S., Tan, H., Feng, L., Chang, E.: Mining Frequent Patterns using Self-Organizing Map. In: Taniar, D. (ed.) Advances in Data Warehousing and Mining Series, pp. 121–142. Idea Group Inc., USA (2007)
Hagenbuchner, M., Sperduti, A.: A Self-Organizing Map for Adaptive Processing of Structured Data. IEEE Transactions on Neural Networks 14(3), 491–505 (2003)
Hagenbuchner, M., Sperduti, A., Tsoi, A.: Contextual Processing of Graphs using Self-Organizing Maps. Paper presented at the Proceedings of the 13th European Symposium on Artificial Neural Networks, Bruges, Belgium, April 27-29 (2005)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier, Morgan Kaufmann Publishers, San Francisco, CA, USA (2006)
Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., Mamitsuka, H.: Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics 24(16), 167–173 (2008)
He, B., Chang, K.C.C.: Statistical Schema Matching across Web Query Interfaces. Paper presented at the Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, USA, June 9-12 (2003)
Jones, D.M., Bench-Capon, T.J.M., Visser, P.R.S.: Methodologies for Ontology Development. Paper presented at the Proceedings of the IT&KNOWS Conference of the 15th IFIP World Computer Congress, Budapest, Hungary, August 31 - Septmeber 4 (1998)
Kappel, G., Kapsammer, E., Retschitzegger, W.: Integrating XML and Relational Database Systems. World Wide Web 7(4), 343–384 (2004)
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. Paper presented at the Proceedings of the 3rd International Conference on Information and Knowledge Management (CIKM 1994), Gaithersburg, Maryland, USA, November 29 - December 2 (1994)
Kniif, J.D., Feelders, A.: Monotone Constraints in Frequent Tree Mining. Paper presented at the Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands (BENELEARN 2005), Enschede, Netherlands, Februrary 17-18 (2005)
Knijf, J.D.: FAT-miner: Mining Frequent Attribute Trees. Paper presented at the Proceedings of the 22nd Annual ACM Symposium on Applied Computing, Seoul, Korea, March 11-15 (2007)
Knijf, J.D.: Mining Tree Patterns with Almost Smallest Supertrees. Paper presented at the Proceedings of the 2008 SIAM International Conference on Data Mining (SDM), Atlanta, Georgia, USA., April 24-26 (2008)
Kohonen, T.: The Self-Organizing Map. Proceedings of the IEEE 78(9), 1460–1480 (1990)
Lakshmanan, L.V.S., Ng, R.T., Han, J., Pang, A.: Optimization of Constrained Frequent Set Queries with 2-variable Constraints. In: Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, pp. 157–168. ACM, New York (1999)
Liu, B., Hsu, W., Ma, Y.: Mining Association Rules with Multiple Minimum Supports. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, pp. 337–341. ACM, New York (1999)
Liu, B., Hsu, W., Ma, Y.: Pruning and summarizing the discovered associations. Paper presented at the Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA August 15-18 (1999)
Lopez, F.R., Laurent, A., Poncalet, P., Teisseire, M.: Fuzzy Tree Mining: Go Soft on Your Nodes. In: Melin, P., Castillo, O., Aguilar, L.T., Kacprzyk, J., Pedrycz, W. (eds.) IFSA 2007. LNCS (LNAI), vol. 4529, pp. 145–154. Springer, Heidelberg (2007)
Lopez, F.R., Laurent, A., Poncalet, P., Teisseire, M.: FTMnodes: Fuzzy tree mining based on partial inclusion. Fuzzy Sets and Systems 160(15), 2224–2240 (2009)
McBrien, P., Poulovassilis, A.: A Semantic Approach to Integrating XML and Structured Data Sources. In: Dittrich, K.R., Geppert, A., Norrie, M.C. (eds.) CAiSE 2001. LNCS, vol. 2068, p. 330. Springer, Heidelberg (2001)
Meggido, N., Srikant, R.: Discovering Predictive Association Rules. Paper presented at the Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York City, New York, USA, August 27-31 (1998)
Micheli, A., Sona, D., Sperduti, A.: Contextual processing of structured data by recursive cascade correlation. IEEE Transactions on Neural Networks 15(6), 1396–1410 (2004)
Micheli, A.: Neural network for graphs: a contextual constructive approach. IEEE Transactions on Neural Networks 20(3), 498–511 (2009)
Murakami, S., Doi, K., Yamamoto, A.: Finding Frequent Patterns from Compressed Tree-Structured Data. Paper presented at the Proceedings of the 11th International Conference on Discovery Science, Budapest, Hungary, October 13-16 (2004)
Nakamura, A., Kudo, M.: Mining Frequent Trees with Node-Inclusion Constraints. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 850–860. Springer, Heidelberg (2005)
Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory Mining and Pruning Optimizations of Constrained Associations Rules. In: Proceedings of the ACM-SIGMOD International Conference on Management of data, SIGMOD 1998, Seattle, WA, USA, June 2-4, pp. 13–24. ACM, New York (1998)
Onuma, J., Doi, K., Yamamoto, A.: Data compression and anti-unification for semistructured documents with tree grammars. In: IEIC Technical Report, Artificial intelligence and knowledge-based processing, vol. 106(38), Institute of Electronics, Information and Communication Engineer, Kyoto (2006) (in Japanese)
Ozaki, T., Ohkawa, T.: Mining Mutually Dependent Ordered Subtrees in Tree Databases. Paper presented at the Proceedings of the PAKDD 2009, Wrokshop on New Frontiers in Applied Data Mining, Osaka, Japan, May 20-23 (2009)
Pan, Q.H., Hadzic, F., Dillon, T.S.: Conjoint Data Mining of Structured and Semi-structured Data. In: Proceedings of the 4th International Conference on the Semantics, Knowledge and Grid (SKG 2008), Beijing, China, December 3-5, pp. 87-94 (2008)
Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. Paper presented at the Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2-6 (2001)
Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. Journal of Intelligent Information Systems 28(2), 133–160 (2007)
Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–238. AAAI/MIT Press (1991)
Sestito, S., Dillon, T.S.: Automated Knowledge Acquisition. Prentice Hall, Sydney (1994)
Mohd Shaharanee, I.N., Hadzic, F., Dillon, T.S.: Interestingness of Association Rules using Symmetrical Tau and Logistic Regression. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS, vol. 5866, pp. 422–431. Springer, Heidelberg (2009)
Shen, W.-M., Ong, K., Mitbander, B., Zaniolo, C.: Metaqueries for Data Mining. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 375–398. AAAI/MIT Press (1996)
Siebes, A., Vreeken, J., Leeuwen, M.V.: Itemsets that compress. Paper presented at the Proceedings of the 6th SIAM International Conference on Data Mining (SDM 2006), Bethesda, MD, USA, April 20-22 (2006)
Silverstein, C., Brin, S., Motwani, R.: Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery 2(1), 39–68 (1998)
Srikant, R., Vu, Q., Agrawal, R.: Mining Association Rules with Item Constraints. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, CA, USA, August 14-17, pp. 67-73 (1997)
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. Paper presented at the Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada, July 23-26 (2002)
Tosaka, H., Nakamura, A., Kudo, M.: Mining Subtrees with Frequent Occurrence of Similar Subtrees. Paper presented at the Proceedings of the 10th International Conference on Discovery Science, Sendai, Japan, October 1-4 (2007)
Uschold, M., Grninger, M.: Ontologies: Principles, Methods and Applications. Knowledge Engineering Review 11(2), 93–136 (1996)
Voegtlin, T, Context quantization and contextual self-organizing maps. Paper presented at the Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), Como, Italy, July 24-27 (2000)
Wang, J., Han, J., Lu, Y., Tzvetkov, P.: TFP: An efficient algorithm for mining top-k frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering 17(5), 652–664 (2005)
Webb, G.I.: Preliminary investigations into statistically valid exploratory rule discovery. Paper presented at the Proceedings of the Australasian Data Mining Workshop (AusDM 2003), Canberra, Australia, December 8 (2003)
Webb, G.I.: Discovering Significant Patterns. Machine Learning 68(1), 1–33 (2007)
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. Paper presented at the Proceedings of the 31st International Conference on Very Large Databases (VLDB 2005), Trondheim, Norway, August 30 - September 2 (2005)
Xin, D., Han, J., Yan, X., Cheng, H.: On compressing frequent patterns. Data and Knowledge Engineering 60(1), 5–29 (2006)
Xiong, H., Tan, P.-N., Kumar, V.: Hyperclique pattern discovery. Data Mining and Knowledge Discovery 13(2), 219–242 (2006)
Yang, J., Wang, W.: CLUSEQ: Efficient and effective sequence clustering. Paper presented at the Proceedings of the 19th International Conference on Data Engineering (ICDE 2003), Bangalore, India, March 5-8 (2003)
Yun, H., Ha, D., Hwang, B., Ryu, K.H.: Mining association rules on significant rare data using relative support. Journal of Systems and Software 67(3), 181–191 (2003)
Zaki, M.J., Lesh, N., Ogihara, M.: PlanMine: Predicting Plan Failures Using Sequence Mining. Artificial Intelligence Review 14(6), 421–446 (2000)
Zhang, H., Padmanabhan, B., Tuzhilin, A.: On the discovery of significant statistical quantitative rules. Paper presented at the Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA, USA, August 22-25 (2004)
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hadzic, F., Tan, H., Dillon, T.S. (2011). New Research Directions. In: Mining of Data with Complex Structures. Studies in Computational Intelligence, vol 333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17557-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-17557-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17556-5
Online ISBN: 978-3-642-17557-2
eBook Packages: EngineeringEngineering (R0)