New Research Directions

Hadzic, Fedja; Tan, Henry; Dillon, Tharam S.

doi:10.1007/978-3-642-17557-2_12

Fedja Hadzic,
Henry Tan &
Tharam S. Dillon

Part of the book series: Studies in Computational Intelligence ((SCI,volume 333))

794 Accesses

Abstract

This chapter will discuss some new research directions in the frequent subtree mining field. This will be discussed from both the application and technical perspectives. Since frequent subtree mining (FSM) is a relatively new field compared with frequent itemset/sequence mining, many lessons can be learned form the more mature research in frequent itemset/sequence mining. A drawback of frequent pattern mining in general is that often, for a set support threshold, the number of frequent patterns becomes quite large due to some characteristics of the database. This may cause not only algorithm complexity problems, but also significant delays in the analysis and interpretation of the results. Many of the patterns may not be useful for the application at hand and/or are redundant, or not of interest to the user. Furthermore, it is also not always clear what support threshold is satisfactory for obtaining reasonable results. These are all important research areas, with some significant achievements in complexity reduction from the algorithmic and application perspectives. Some of these or similar ideas can, to a certain extent, already be applied in the FSM field, but others will need refinements and extensions to be flexible enough to cope with the additional structural properties of the data. In Section 12.2, we highlight some of the work in frequent itemset/sequence mining where the same or similar idea can be applied and prove useful in the FSM field. At the end of Section 12.2, we look at some work that has already been initiated in frequent pattern filtering and the incorporation of application-oriented constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. Paper presented at the Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25 (2004)
Google Scholar
Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. Paper presented at the Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of database systems, Seattle, Washington, USA, June 1-3 (1998)
Google Scholar
Agrawal, R., Imieliski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington D.C., USA, May 26-28, pp. 207–216. ACM, New York (1993)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamato, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. Paper presented at the Proceedings of the 2nd SIAM International Conference on Data Mining (SIAM 2002), Arlington, VA, USA, April 11-13 (2002)
Google Scholar
Bathorn, R., Kopman, A., Siebes, A.: Reducing the Frequent Pattern Set. Paper presented at the Proceedings of the 6th IEEE international Conference on Data Mining – Workshops (ICDMW 2006), Hong Kong, China, December 18-22 (2006)
Google Scholar
Bay, S.D., Pazzani, M.J.: Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Article MATH Google Scholar
Beeri, C., Milo, T.: Schemas for integration and translation of structured and semi-structured data. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 296–313. Springer, Heidelberg (1998)
Chapter Google Scholar
Blumberg, R., Atre, S.: The Problem with Unstructured Data. Information Management Magazine (2003)
Google Scholar
Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. International Journal of Information Theories and Applications 10(4), 370–376 (2003)
Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond Market Baskets: Generalizing Association Rules to Correlations. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15, pp. 265–276. ACM, New York (1997)
Chapter Google Scholar
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. Paper presented at the Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15 (1997)
Google Scholar
Brodie, M.L.: Computer Science 2.0: A New World of Data Management. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007), Vienna, Austria, September 23-27, p. 1161 (2007)
Google Scholar
Bucila, C., Gehrke, J., Kifer, D., White, W.: DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints. Data Mining and Knowledge Discovery 7(3), 241–272 (2003)
Article MathSciNet Google Scholar
Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F.: Data-Peeler: Constraint Based Closed Pattern Mining in n-ary Relations. Paper presented at the Proceedings of the SIAM International Conference on Data Mining (SDM 2008), Atlanta, Georgia, USA, April 24-26 (2008)
Google Scholar
Do, H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. Paper presented at the Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), Hong Kong, China, August 20-23 (2002)
Google Scholar
Doan, A., Halevy, A.Y.: Semantic-Integration Research in the Database Community: A Brief Survey AI Magazine, 26 (2005)
Google Scholar
Feng, L., Chang, E., Dillon, T.S.: A semantic network-based design methodology for XML documents. ACM Transactions on Information Systems 20(4), 390–421 (2002)
Article Google Scholar
Fu, Y., Han, J.: Meta-rule guided mining of association rules in relational databases. Paper presented at the Proceedings of the 1st International Workshop on Knowledge Discovery in Databases with Deductive and Object-Oriented Databases (KDOOD 1995), Singapore, December 8 (1995)
Google Scholar
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Paper presented at the Proceedings of the 25th International Conference on Very Large Databases (VLDB 1999), Edinburgh, Scotland, UK, September 7-10 (1999)
Google Scholar
Goodman, A., Kamath, C., Kumar, V.: Data Analysis in the 21st Century. Statistical Analysis and Data Mining 1(1), 1–3 (2008)
Article MathSciNet Google Scholar
Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. Paper presented at the Proceedings of the 16th International Conference on Data Engineering (ICDE 2000), San Diego, CA, USA, February 28 - March 3 (2000)
Google Scholar
Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993a)
Article Google Scholar
Gruber, T.R.: Towards Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human and Computer Studies 43(5/6), 907–928 (1993b)
Google Scholar
Hadzic, F., Dillon, T.S., Tan, H., Feng, L., Chang, E.: Mining Frequent Patterns using Self-Organizing Map. In: Taniar, D. (ed.) Advances in Data Warehousing and Mining Series, pp. 121–142. Idea Group Inc., USA (2007)
Google Scholar
Hagenbuchner, M., Sperduti, A.: A Self-Organizing Map for Adaptive Processing of Structured Data. IEEE Transactions on Neural Networks 14(3), 491–505 (2003)
Article Google Scholar
Hagenbuchner, M., Sperduti, A., Tsoi, A.: Contextual Processing of Graphs using Self-Organizing Maps. Paper presented at the Proceedings of the 13th European Symposium on Artificial Neural Networks, Bruges, Belgium, April 27-29 (2005)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier, Morgan Kaufmann Publishers, San Francisco, CA, USA (2006)
Google Scholar
Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., Mamitsuka, H.: Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics 24(16), 167–173 (2008)
Article Google Scholar
He, B., Chang, K.C.C.: Statistical Schema Matching across Web Query Interfaces. Paper presented at the Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, USA, June 9-12 (2003)
Google Scholar
Jones, D.M., Bench-Capon, T.J.M., Visser, P.R.S.: Methodologies for Ontology Development. Paper presented at the Proceedings of the IT&KNOWS Conference of the 15th IFIP World Computer Congress, Budapest, Hungary, August 31 - Septmeber 4 (1998)
Google Scholar
Kappel, G., Kapsammer, E., Retschitzegger, W.: Integrating XML and Relational Database Systems. World Wide Web 7(4), 343–384 (2004)
Article Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. Paper presented at the Proceedings of the 3rd International Conference on Information and Knowledge Management (CIKM 1994), Gaithersburg, Maryland, USA, November 29 - December 2 (1994)
Google Scholar
Kniif, J.D., Feelders, A.: Monotone Constraints in Frequent Tree Mining. Paper presented at the Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands (BENELEARN 2005), Enschede, Netherlands, Februrary 17-18 (2005)
Google Scholar
Knijf, J.D.: FAT-miner: Mining Frequent Attribute Trees. Paper presented at the Proceedings of the 22nd Annual ACM Symposium on Applied Computing, Seoul, Korea, March 11-15 (2007)
Google Scholar
Knijf, J.D.: Mining Tree Patterns with Almost Smallest Supertrees. Paper presented at the Proceedings of the 2008 SIAM International Conference on Data Mining (SDM), Atlanta, Georgia, USA., April 24-26 (2008)
Google Scholar
Kohonen, T.: The Self-Organizing Map. Proceedings of the IEEE 78(9), 1460–1480 (1990)
Article Google Scholar
Lakshmanan, L.V.S., Ng, R.T., Han, J., Pang, A.: Optimization of Constrained Frequent Set Queries with 2-variable Constraints. In: Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, pp. 157–168. ACM, New York (1999)
Google Scholar
Liu, B., Hsu, W., Ma, Y.: Mining Association Rules with Multiple Minimum Supports. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, pp. 337–341. ACM, New York (1999)
Chapter Google Scholar
Liu, B., Hsu, W., Ma, Y.: Pruning and summarizing the discovered associations. Paper presented at the Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA August 15-18 (1999)
Google Scholar
Lopez, F.R., Laurent, A., Poncalet, P., Teisseire, M.: Fuzzy Tree Mining: Go Soft on Your Nodes. In: Melin, P., Castillo, O., Aguilar, L.T., Kacprzyk, J., Pedrycz, W. (eds.) IFSA 2007. LNCS (LNAI), vol. 4529, pp. 145–154. Springer, Heidelberg (2007)
Chapter Google Scholar
Lopez, F.R., Laurent, A., Poncalet, P., Teisseire, M.: FTMnodes: Fuzzy tree mining based on partial inclusion. Fuzzy Sets and Systems 160(15), 2224–2240 (2009)
Article Google Scholar
McBrien, P., Poulovassilis, A.: A Semantic Approach to Integrating XML and Structured Data Sources. In: Dittrich, K.R., Geppert, A., Norrie, M.C. (eds.) CAiSE 2001. LNCS, vol. 2068, p. 330. Springer, Heidelberg (2001)
Chapter Google Scholar
Meggido, N., Srikant, R.: Discovering Predictive Association Rules. Paper presented at the Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York City, New York, USA, August 27-31 (1998)
Google Scholar
Micheli, A., Sona, D., Sperduti, A.: Contextual processing of structured data by recursive cascade correlation. IEEE Transactions on Neural Networks 15(6), 1396–1410 (2004)
Article Google Scholar
Micheli, A.: Neural network for graphs: a contextual constructive approach. IEEE Transactions on Neural Networks 20(3), 498–511 (2009)
Article Google Scholar
Murakami, S., Doi, K., Yamamoto, A.: Finding Frequent Patterns from Compressed Tree-Structured Data. Paper presented at the Proceedings of the 11th International Conference on Discovery Science, Budapest, Hungary, October 13-16 (2004)
Google Scholar
Nakamura, A., Kudo, M.: Mining Frequent Trees with Node-Inclusion Constraints. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 850–860. Springer, Heidelberg (2005)
Chapter Google Scholar
Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory Mining and Pruning Optimizations of Constrained Associations Rules. In: Proceedings of the ACM-SIGMOD International Conference on Management of data, SIGMOD 1998, Seattle, WA, USA, June 2-4, pp. 13–24. ACM, New York (1998)
Chapter Google Scholar
Onuma, J., Doi, K., Yamamoto, A.: Data compression and anti-unification for semistructured documents with tree grammars. In: IEIC Technical Report, Artificial intelligence and knowledge-based processing, vol. 106(38), Institute of Electronics, Information and Communication Engineer, Kyoto (2006) (in Japanese)
Google Scholar
Ozaki, T., Ohkawa, T.: Mining Mutually Dependent Ordered Subtrees in Tree Databases. Paper presented at the Proceedings of the PAKDD 2009, Wrokshop on New Frontiers in Applied Data Mining, Osaka, Japan, May 20-23 (2009)
Google Scholar
Pan, Q.H., Hadzic, F., Dillon, T.S.: Conjoint Data Mining of Structured and Semi-structured Data. In: Proceedings of the 4th International Conference on the Semantics, Knowledge and Grid (SKG 2008), Beijing, China, December 3-5, pp. 87-94 (2008)
Google Scholar
Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. Paper presented at the Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2-6 (2001)
Google Scholar
Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. Journal of Intelligent Information Systems 28(2), 133–160 (2007)
Article Google Scholar
Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–238. AAAI/MIT Press (1991)
Google Scholar
Sestito, S., Dillon, T.S.: Automated Knowledge Acquisition. Prentice Hall, Sydney (1994)
Google Scholar
Mohd Shaharanee, I.N., Hadzic, F., Dillon, T.S.: Interestingness of Association Rules using Symmetrical Tau and Logistic Regression. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS, vol. 5866, pp. 422–431. Springer, Heidelberg (2009)
Chapter Google Scholar
Shen, W.-M., Ong, K., Mitbander, B., Zaniolo, C.: Metaqueries for Data Mining. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 375–398. AAAI/MIT Press (1996)
Google Scholar
Siebes, A., Vreeken, J., Leeuwen, M.V.: Itemsets that compress. Paper presented at the Proceedings of the 6th SIAM International Conference on Data Mining (SDM 2006), Bethesda, MD, USA, April 20-22 (2006)
Google Scholar
Silverstein, C., Brin, S., Motwani, R.: Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery 2(1), 39–68 (1998)
Article Google Scholar
Srikant, R., Vu, Q., Agrawal, R.: Mining Association Rules with Item Constraints. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, CA, USA, August 14-17, pp. 67-73 (1997)
Google Scholar
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. Paper presented at the Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada, July 23-26 (2002)
Google Scholar
Tosaka, H., Nakamura, A., Kudo, M.: Mining Subtrees with Frequent Occurrence of Similar Subtrees. Paper presented at the Proceedings of the 10th International Conference on Discovery Science, Sendai, Japan, October 1-4 (2007)
Google Scholar
Uschold, M., Grninger, M.: Ontologies: Principles, Methods and Applications. Knowledge Engineering Review 11(2), 93–136 (1996)
Article Google Scholar
Voegtlin, T, Context quantization and contextual self-organizing maps. Paper presented at the Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), Como, Italy, July 24-27 (2000)
Google Scholar
Wang, J., Han, J., Lu, Y., Tzvetkov, P.: TFP: An efficient algorithm for mining top-k frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering 17(5), 652–664 (2005)
Article Google Scholar
Webb, G.I.: Preliminary investigations into statistically valid exploratory rule discovery. Paper presented at the Proceedings of the Australasian Data Mining Workshop (AusDM 2003), Canberra, Australia, December 8 (2003)
Google Scholar
Webb, G.I.: Discovering Significant Patterns. Machine Learning 68(1), 1–33 (2007)
Article Google Scholar
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. Paper presented at the Proceedings of the 31st International Conference on Very Large Databases (VLDB 2005), Trondheim, Norway, August 30 - September 2 (2005)
Google Scholar
Xin, D., Han, J., Yan, X., Cheng, H.: On compressing frequent patterns. Data and Knowledge Engineering 60(1), 5–29 (2006)
Article Google Scholar
Xiong, H., Tan, P.-N., Kumar, V.: Hyperclique pattern discovery. Data Mining and Knowledge Discovery 13(2), 219–242 (2006)
Article MathSciNet Google Scholar
Yang, J., Wang, W.: CLUSEQ: Efficient and effective sequence clustering. Paper presented at the Proceedings of the 19th International Conference on Data Engineering (ICDE 2003), Bangalore, India, March 5-8 (2003)
Google Scholar
Yun, H., Ha, D., Hwang, B., Ryu, K.H.: Mining association rules on significant rare data using relative support. Journal of Systems and Software 67(3), 181–191 (2003)
Article Google Scholar
Zaki, M.J., Lesh, N., Ogihara, M.: PlanMine: Predicting Plan Failures Using Sequence Mining. Artificial Intelligence Review 14(6), 421–446 (2000)
Article MATH Google Scholar
Zhang, H., Padmanabhan, B., Tuzhilin, A.: On the discovery of significant statistical quantitative rules. Paper presented at the Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA, USA, August 22-25 (2004)
Google Scholar

Download references

Authors

Fedja Hadzic
View author publications
You can also search for this author in PubMed Google Scholar
Henry Tan
View author publications
You can also search for this author in PubMed Google Scholar
Tharam S. Dillon
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hadzic, F., Tan, H., Dillon, T.S. (2011). New Research Directions. In: Mining of Data with Complex Structures. Studies in Computational Intelligence, vol 333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17557-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-17557-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17556-5
Online ISBN: 978-3-642-17557-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics