Copyright © 2005 Elsevier B.V. All rights reserved.
FRACTURE mining: Mining frequently and concurrently mutating structures from historical XML documents
Received 20 July 2005;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
In the past few years, the fast proliferation of available XML documents has stimulated a great deal of interest in discovering hidden and nontrivial knowledge from XML repositories. However, to the best of our knowledge, none of existing work on XML mining has taken into account of the dynamic nature of XML documents as online information. The present article proposes a novel type of frequent pattern, namely, FRequently And Concurrently muTating substructUREs (FRACTURE), that is mined from the evolution of an XML document. A discovered FRACTURE is a set of substructures of an XML document that frequently change together. Knowledge obtained from FRACTURE is useful in applications such as XML indexing, XML clustering etc. In order to keep the result patterns concise and explicit, we further formulate the problem of maximal FRACTURE mining. Two algorithms, which employ the level-wise and divide-and-conquer strategies respectively, are designed to mine the set of FRACTUREs. The second algorithm, which is more efficient, is also optimized to discover the set of maximal FRACTUREs. Experiments involving a wide range of synthetic and real-life datasets verify the efficiency and scalability of the developed algorithms.
Keywords: XML; Frequent pattern; Structural delta
Article Outline
- 1. Introduction
- 1.1. Motivation
- 1.2. Roadmap of the paper
- 2. Overview and contributions
- 3. Problem statement
- 3.1. Preliminary definitions
- 3.2. Metrics
- 3.2.1. Degree of change
- 3.2.2. Frequency of change
- 3.2.3. Weight
- 3.3. FRACTURE
- 3.4. Problem definition
- 4. Algorithms
- 4.1. Fracture mining
- 4.1.1. Apriori-FRACTURE
- 4.1.2. FPG-FRACTURE
- 4.1.2.1. Data structure
- 4.1.2.2. Mining algorithm
- 4.2. Maximal FRACTURE mining
- 5. Experimental results
- 5.1. Experiments on synthetic datasets
- 5.1.1. Datasets
- 5.1.2. Methodology and results for algorithms of FRACTURE mining
- 5.1.3. Methodology and results for algorithms of maximal FRACTURE mining
- 5.2. Experiments on real-life datasets
- 6. Applications
- 6.1. Native XML storage
- 6.2. Approximate XML change detection
- 6.3. Web crawling
- 6.4. Market basket analysis
- 7. Related works
- 8. Conclusions and future work
- Appendix. Appendix
- References
- Vitae







E-mail Article
Add to my Quick Links

Cited By in Scopus (2)






