Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 333))

  • 794 Accesses

Abstract

This chapter will discuss some new research directions in the frequent subtree mining field. This will be discussed from both the application and technical perspectives. Since frequent subtree mining (FSM) is a relatively new field compared with frequent itemset/sequence mining, many lessons can be learned form the more mature research in frequent itemset/sequence mining. A drawback of frequent pattern mining in general is that often, for a set support threshold, the number of frequent patterns becomes quite large due to some characteristics of the database. This may cause not only algorithm complexity problems, but also significant delays in the analysis and interpretation of the results. Many of the patterns may not be useful for the application at hand and/or are redundant, or not of interest to the user. Furthermore, it is also not always clear what support threshold is satisfactory for obtaining reasonable results. These are all important research areas, with some significant achievements in complexity reduction from the algorithmic and application perspectives. Some of these or similar ideas can, to a certain extent, already be applied in the FSM field, but others will need refinements and extensions to be flexible enough to cope with the additional structural properties of the data. In Section 12.2, we highlight some of the work in frequent itemset/sequence mining where the same or similar idea can be applied and prove useful in the FSM field. At the end of Section 12.2, we look at some work that has already been initiated in frequent pattern filtering and the incorporation of application-oriented constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. Paper presented at the Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25 (2004)

    Google Scholar 

  2. Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. Paper presented at the Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of database systems, Seattle, Washington, USA, June 1-3 (1998)

    Google Scholar 

  3. Agrawal, R., Imieliski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington D.C., USA, May 26-28, pp. 207–216. ACM, New York (1993)

    Google Scholar 

  4. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamato, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. Paper presented at the Proceedings of the 2nd SIAM International Conference on Data Mining (SIAM 2002), Arlington, VA, USA, April 11-13 (2002)

    Google Scholar 

  5. Bathorn, R., Kopman, A., Siebes, A.: Reducing the Frequent Pattern Set. Paper presented at the Proceedings of the 6th IEEE international Conference on Data Mining – Workshops (ICDMW 2006), Hong Kong, China, December 18-22 (2006)

    Google Scholar 

  6. Bay, S.D., Pazzani, M.J.: Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)

    Article  MATH  Google Scholar 

  7. Beeri, C., Milo, T.: Schemas for integration and translation of structured and semi-structured data. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 296–313. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Blumberg, R., Atre, S.: The Problem with Unstructured Data. Information Management Magazine (2003)

    Google Scholar 

  9. Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. International Journal of Information Theories and Applications 10(4), 370–376 (2003)

    Google Scholar 

  10. Brin, S., Motwani, R., Silverstein, C.: Beyond Market Baskets: Generalizing Association Rules to Correlations. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15, pp. 265–276. ACM, New York (1997)

    Chapter  Google Scholar 

  11. Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. Paper presented at the Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15 (1997)

    Google Scholar 

  12. Brodie, M.L.: Computer Science 2.0: A New World of Data Management. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007), Vienna, Austria, September 23-27, p. 1161 (2007)

    Google Scholar 

  13. Bucila, C., Gehrke, J., Kifer, D., White, W.: DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints. Data Mining and Knowledge Discovery 7(3), 241–272 (2003)

    Article  MathSciNet  Google Scholar 

  14. Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F.: Data-Peeler: Constraint Based Closed Pattern Mining in n-ary Relations. Paper presented at the Proceedings of the SIAM International Conference on Data Mining (SDM 2008), Atlanta, Georgia, USA, April 24-26 (2008)

    Google Scholar 

  15. Do, H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. Paper presented at the Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), Hong Kong, China, August 20-23 (2002)

    Google Scholar 

  16. Doan, A., Halevy, A.Y.: Semantic-Integration Research in the Database Community: A Brief Survey AI Magazine, 26 (2005)

    Google Scholar 

  17. Feng, L., Chang, E., Dillon, T.S.: A semantic network-based design methodology for XML documents. ACM Transactions on Information Systems 20(4), 390–421 (2002)

    Article  Google Scholar 

  18. Fu, Y., Han, J.: Meta-rule guided mining of association rules in relational databases. Paper presented at the Proceedings of the 1st International Workshop on Knowledge Discovery in Databases with Deductive and Object-Oriented Databases (KDOOD 1995), Singapore, December 8 (1995)

    Google Scholar 

  19. Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. Paper presented at the Proceedings of the 25th International Conference on Very Large Databases (VLDB 1999), Edinburgh, Scotland, UK, September 7-10 (1999)

    Google Scholar 

  20. Goodman, A., Kamath, C., Kumar, V.: Data Analysis in the 21st Century. Statistical Analysis and Data Mining 1(1), 1–3 (2008)

    Article  MathSciNet  Google Scholar 

  21. Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. Paper presented at the Proceedings of the 16th International Conference on Data Engineering (ICDE 2000), San Diego, CA, USA, February 28 - March 3 (2000)

    Google Scholar 

  22. Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993a)

    Article  Google Scholar 

  23. Gruber, T.R.: Towards Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human and Computer Studies 43(5/6), 907–928 (1993b)

    Google Scholar 

  24. Hadzic, F., Dillon, T.S., Tan, H., Feng, L., Chang, E.: Mining Frequent Patterns using Self-Organizing Map. In: Taniar, D. (ed.) Advances in Data Warehousing and Mining Series, pp. 121–142. Idea Group Inc., USA (2007)

    Google Scholar 

  25. Hagenbuchner, M., Sperduti, A.: A Self-Organizing Map for Adaptive Processing of Structured Data. IEEE Transactions on Neural Networks 14(3), 491–505 (2003)

    Article  Google Scholar 

  26. Hagenbuchner, M., Sperduti, A., Tsoi, A.: Contextual Processing of Graphs using Self-Organizing Maps. Paper presented at the Proceedings of the 13th European Symposium on Artificial Neural Networks, Bruges, Belgium, April 27-29 (2005)

    Google Scholar 

  27. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier, Morgan Kaufmann Publishers, San Francisco, CA, USA (2006)

    Google Scholar 

  28. Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., Mamitsuka, H.: Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics 24(16), 167–173 (2008)

    Article  Google Scholar 

  29. He, B., Chang, K.C.C.: Statistical Schema Matching across Web Query Interfaces. Paper presented at the Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, USA, June 9-12 (2003)

    Google Scholar 

  30. Jones, D.M., Bench-Capon, T.J.M., Visser, P.R.S.: Methodologies for Ontology Development. Paper presented at the Proceedings of the IT&KNOWS Conference of the 15th IFIP World Computer Congress, Budapest, Hungary, August 31 - Septmeber 4 (1998)

    Google Scholar 

  31. Kappel, G., Kapsammer, E., Retschitzegger, W.: Integrating XML and Relational Database Systems. World Wide Web 7(4), 343–384 (2004)

    Article  Google Scholar 

  32. Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. Paper presented at the Proceedings of the 3rd International Conference on Information and Knowledge Management (CIKM 1994), Gaithersburg, Maryland, USA, November 29 - December 2 (1994)

    Google Scholar 

  33. Kniif, J.D., Feelders, A.: Monotone Constraints in Frequent Tree Mining. Paper presented at the Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands (BENELEARN 2005), Enschede, Netherlands, Februrary 17-18 (2005)

    Google Scholar 

  34. Knijf, J.D.: FAT-miner: Mining Frequent Attribute Trees. Paper presented at the Proceedings of the 22nd Annual ACM Symposium on Applied Computing, Seoul, Korea, March 11-15 (2007)

    Google Scholar 

  35. Knijf, J.D.: Mining Tree Patterns with Almost Smallest Supertrees. Paper presented at the Proceedings of the 2008 SIAM International Conference on Data Mining (SDM), Atlanta, Georgia, USA., April 24-26 (2008)

    Google Scholar 

  36. Kohonen, T.: The Self-Organizing Map. Proceedings of the IEEE 78(9), 1460–1480 (1990)

    Article  Google Scholar 

  37. Lakshmanan, L.V.S., Ng, R.T., Han, J., Pang, A.: Optimization of Constrained Frequent Set Queries with 2-variable Constraints. In: Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, pp. 157–168. ACM, New York (1999)

    Google Scholar 

  38. Liu, B., Hsu, W., Ma, Y.: Mining Association Rules with Multiple Minimum Supports. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, pp. 337–341. ACM, New York (1999)

    Chapter  Google Scholar 

  39. Liu, B., Hsu, W., Ma, Y.: Pruning and summarizing the discovered associations. Paper presented at the Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA August 15-18 (1999)

    Google Scholar 

  40. Lopez, F.R., Laurent, A., Poncalet, P., Teisseire, M.: Fuzzy Tree Mining: Go Soft on Your Nodes. In: Melin, P., Castillo, O., Aguilar, L.T., Kacprzyk, J., Pedrycz, W. (eds.) IFSA 2007. LNCS (LNAI), vol. 4529, pp. 145–154. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  41. Lopez, F.R., Laurent, A., Poncalet, P., Teisseire, M.: FTMnodes: Fuzzy tree mining based on partial inclusion. Fuzzy Sets and Systems 160(15), 2224–2240 (2009)

    Article  Google Scholar 

  42. McBrien, P., Poulovassilis, A.: A Semantic Approach to Integrating XML and Structured Data Sources. In: Dittrich, K.R., Geppert, A., Norrie, M.C. (eds.) CAiSE 2001. LNCS, vol. 2068, p. 330. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  43. Meggido, N., Srikant, R.: Discovering Predictive Association Rules. Paper presented at the Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York City, New York, USA, August 27-31 (1998)

    Google Scholar 

  44. Micheli, A., Sona, D., Sperduti, A.: Contextual processing of structured data by recursive cascade correlation. IEEE Transactions on Neural Networks 15(6), 1396–1410 (2004)

    Article  Google Scholar 

  45. Micheli, A.: Neural network for graphs: a contextual constructive approach. IEEE Transactions on Neural Networks 20(3), 498–511 (2009)

    Article  Google Scholar 

  46. Murakami, S., Doi, K., Yamamoto, A.: Finding Frequent Patterns from Compressed Tree-Structured Data. Paper presented at the Proceedings of the 11th International Conference on Discovery Science, Budapest, Hungary, October 13-16 (2004)

    Google Scholar 

  47. Nakamura, A., Kudo, M.: Mining Frequent Trees with Node-Inclusion Constraints. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 850–860. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  48. Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory Mining and Pruning Optimizations of Constrained Associations Rules. In: Proceedings of the ACM-SIGMOD International Conference on Management of data, SIGMOD 1998, Seattle, WA, USA, June 2-4, pp. 13–24. ACM, New York (1998)

    Chapter  Google Scholar 

  49. Onuma, J., Doi, K., Yamamoto, A.: Data compression and anti-unification for semistructured documents with tree grammars. In: IEIC Technical Report, Artificial intelligence and knowledge-based processing, vol. 106(38), Institute of Electronics, Information and Communication Engineer, Kyoto (2006) (in Japanese)

    Google Scholar 

  50. Ozaki, T., Ohkawa, T.: Mining Mutually Dependent Ordered Subtrees in Tree Databases. Paper presented at the Proceedings of the PAKDD 2009, Wrokshop on New Frontiers in Applied Data Mining, Osaka, Japan, May 20-23 (2009)

    Google Scholar 

  51. Pan, Q.H., Hadzic, F., Dillon, T.S.: Conjoint Data Mining of Structured and Semi-structured Data. In: Proceedings of the 4th International Conference on the Semantics, Knowledge and Grid (SKG 2008), Beijing, China, December 3-5, pp. 87-94 (2008)

    Google Scholar 

  52. Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. Paper presented at the Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2-6 (2001)

    Google Scholar 

  53. Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. Journal of Intelligent Information Systems 28(2), 133–160 (2007)

    Article  Google Scholar 

  54. Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–238. AAAI/MIT Press (1991)

    Google Scholar 

  55. Sestito, S., Dillon, T.S.: Automated Knowledge Acquisition. Prentice Hall, Sydney (1994)

    Google Scholar 

  56. Mohd Shaharanee, I.N., Hadzic, F., Dillon, T.S.: Interestingness of Association Rules using Symmetrical Tau and Logistic Regression. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS, vol. 5866, pp. 422–431. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  57. Shen, W.-M., Ong, K., Mitbander, B., Zaniolo, C.: Metaqueries for Data Mining. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 375–398. AAAI/MIT Press (1996)

    Google Scholar 

  58. Siebes, A., Vreeken, J., Leeuwen, M.V.: Itemsets that compress. Paper presented at the Proceedings of the 6th SIAM International Conference on Data Mining (SDM 2006), Bethesda, MD, USA, April 20-22 (2006)

    Google Scholar 

  59. Silverstein, C., Brin, S., Motwani, R.: Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Mining and Knowledge Discovery 2(1), 39–68 (1998)

    Article  Google Scholar 

  60. Srikant, R., Vu, Q., Agrawal, R.: Mining Association Rules with Item Constraints. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, CA, USA, August 14-17, pp. 67-73 (1997)

    Google Scholar 

  61. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. Paper presented at the Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada, July 23-26 (2002)

    Google Scholar 

  62. Tosaka, H., Nakamura, A., Kudo, M.: Mining Subtrees with Frequent Occurrence of Similar Subtrees. Paper presented at the Proceedings of the 10th International Conference on Discovery Science, Sendai, Japan, October 1-4 (2007)

    Google Scholar 

  63. Uschold, M., Grninger, M.: Ontologies: Principles, Methods and Applications. Knowledge Engineering Review 11(2), 93–136 (1996)

    Article  Google Scholar 

  64. Voegtlin, T, Context quantization and contextual self-organizing maps. Paper presented at the Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), Como, Italy, July 24-27 (2000)

    Google Scholar 

  65. Wang, J., Han, J., Lu, Y., Tzvetkov, P.: TFP: An efficient algorithm for mining top-k frequent closed itemsets. IEEE Transactions on Knowledge and Data Engineering 17(5), 652–664 (2005)

    Article  Google Scholar 

  66. Webb, G.I.: Preliminary investigations into statistically valid exploratory rule discovery. Paper presented at the Proceedings of the Australasian Data Mining Workshop (AusDM 2003), Canberra, Australia, December 8 (2003)

    Google Scholar 

  67. Webb, G.I.: Discovering Significant Patterns. Machine Learning 68(1), 1–33 (2007)

    Article  Google Scholar 

  68. Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. Paper presented at the Proceedings of the 31st International Conference on Very Large Databases (VLDB 2005), Trondheim, Norway, August 30 - September 2 (2005)

    Google Scholar 

  69. Xin, D., Han, J., Yan, X., Cheng, H.: On compressing frequent patterns. Data and Knowledge Engineering 60(1), 5–29 (2006)

    Article  Google Scholar 

  70. Xiong, H., Tan, P.-N., Kumar, V.: Hyperclique pattern discovery. Data Mining and Knowledge Discovery 13(2), 219–242 (2006)

    Article  MathSciNet  Google Scholar 

  71. Yang, J., Wang, W.: CLUSEQ: Efficient and effective sequence clustering. Paper presented at the Proceedings of the 19th International Conference on Data Engineering (ICDE 2003), Bangalore, India, March 5-8 (2003)

    Google Scholar 

  72. Yun, H., Ha, D., Hwang, B., Ryu, K.H.: Mining association rules on significant rare data using relative support. Journal of Systems and Software 67(3), 181–191 (2003)

    Article  Google Scholar 

  73. Zaki, M.J., Lesh, N., Ogihara, M.: PlanMine: Predicting Plan Failures Using Sequence Mining. Artificial Intelligence Review 14(6), 421–446 (2000)

    Article  MATH  Google Scholar 

  74. Zhang, H., Padmanabhan, B., Tuzhilin, A.: On the discovery of significant statistical quantitative rules. Paper presented at the Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), Seattle, WA, USA, August 22-25 (2004)

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hadzic, F., Tan, H., Dillon, T.S. (2011). New Research Directions. In: Mining of Data with Complex Structures. Studies in Computational Intelligence, vol 333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17557-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17557-2_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17556-5

  • Online ISBN: 978-3-642-17557-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics