ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Data & Knowledge Engineering
Volume 59, Issue 2, November 2006, Pages 320-347
Including: Sixth ACM International Workshop on Web Information and Data Management
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (1040 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.datak.2005.09.002    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2005 Elsevier B.V. All rights reserved.

FRACTURE mining: Mining frequently and concurrently mutating structures from historical XML documents

Ling ChenCorresponding Author Contact Information, a, E-mail The Corresponding Author, Sourav S. Bhowmicka, E-mail The Corresponding Author and Liang-Tien Chiaa, E-mail The Corresponding Author

aSchool of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore

Received 20 July 2005; 
revised 7 September 2005; 
revised 7 September 2005. 
Available online 11 October 2005.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

In the past few years, the fast proliferation of available XML documents has stimulated a great deal of interest in discovering hidden and nontrivial knowledge from XML repositories. However, to the best of our knowledge, none of existing work on XML mining has taken into account of the dynamic nature of XML documents as online information. The present article proposes a novel type of frequent pattern, namely, FRequently And Concurrently muTating substructUREs (FRACTURE), that is mined from the evolution of an XML document. A discovered FRACTURE is a set of substructures of an XML document that frequently change together. Knowledge obtained from FRACTURE is useful in applications such as XML indexing, XML clustering etc. In order to keep the result patterns concise and explicit, we further formulate the problem of maximal FRACTURE mining. Two algorithms, which employ the level-wise and divide-and-conquer strategies respectively, are designed to mine the set of FRACTUREs. The second algorithm, which is more efficient, is also optimized to discover the set of maximal FRACTUREs. Experiments involving a wide range of synthetic and real-life datasets verify the efficiency and scalability of the developed algorithms.

Keywords: XML; Frequent pattern; Structural delta

Article Outline

1. Introduction
1.1. Motivation
1.2. Roadmap of the paper
2. Overview and contributions
3. Problem statement
3.1. Preliminary definitions
3.2. Metrics
3.2.1. Degree of change
3.2.2. Frequency of change
3.2.3. Weight
3.3. FRACTURE
3.4. Problem definition
4. Algorithms
4.1. Fracture mining
4.1.1. Apriori-FRACTURE
4.1.2. FPG-FRACTURE
4.1.2.1. Data structure
4.1.2.2. Mining algorithm
4.2. Maximal FRACTURE mining
4.2.1. Optimization of subtree ordering
4.2.2. Optimization of selectively examining subtrees
4.2.3. Optimization of mining Signed-FPtree of single path
5. Experimental results
5.1. Experiments on synthetic datasets
5.1.1. Datasets
5.1.2. Methodology and results for algorithms of FRACTURE mining
5.1.3. Methodology and results for algorithms of maximal FRACTURE mining
5.2. Experiments on real-life datasets
5.2.1. Methodology and results on DBLP data
5.2.2. Methodology and results on Web log data
6. Applications
6.1. Native XML storage
6.2. Approximate XML change detection
6.3. Web crawling
6.4. Market basket analysis
7. Related works
7.1. XML structure mining
7.2. Frequent pattern mining
7.3. Maximal frequent pattern mining
8. Conclusions and future work
Appendix. Appendix
References
Vitae




















Data & Knowledge Engineering
Volume 59, Issue 2, November 2006, Pages 320-347
Including: Sixth ACM International Workshop on Web Information and Data Management
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.