ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Information Sciences
Volume 178, Issue 6, 15 March 2008, Pages 1498-1518
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (351 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.ins.2007.10.014    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2007 Elsevier Inc. All rights reserved.

Efficient strategies for tough aggregate constraint-based sequential pattern mining

Enhong Chena, E-mail The Corresponding Author, Huanhuan Caoa, E-mail The Corresponding Author, Qing Lib, Corresponding Author Contact Information, E-mail The Corresponding Author and Tieyun Qianc, E-mail The Corresponding Author

aDepartment of Computer Science, University of Science and Technology of China, Hefei Anhui, PR China bDepartment of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong cDepartment of Computer Science, Wuhan University, Wuhan, Hubei, PR China

Received 13 November 2006; 
revised 2 July 2007; 
accepted 12 October 2007. 
Available online 30 October 2007.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

Frequent sequential pattern mining with constraints is the task of discovering patterns by incorporating the user defined constraints into the mining process, thus not only improving mining efficiency but also making the discovered patterns to better meet user requirements. Though many studies have been done, few have been carried out on the “tough aggregate constraints” due to the diffIculty of pushing the constraints into the mining process. In this paper we provide efficient strategies to deal with tough aggregate constraints. Through a theoretical analysis of the tough aggregate constraints based on the concept of total contribution of sequences, we first show that two typical kinds of constraints can be transformed into the same form and thus can be processed in a uniform way. We then propose a novel algorithm called PTAC (sequential frequent Patterns mining with Tough Aggregate Constraints) to reduce the cost of using tough aggregate constraints through incorporating two effective strategies. One avoids checking data items one by one by utilizing the features of promisingness exhibited by some other items and validity of the corresponding prefix. The other avoids constructing an unnecessary projected database through effectively pruning those unpromising new patterns that may, otherwise, serve as new prefixes. With these strategies, our algorithm obtains good performance in speed and space, as demonstrated by experimental studies conducted on the synthetic datasets generated by the IBM sequence generator, in addition to a real dataset.

Keywords: Frequent sequential pattern; Tough aggregate constraints

Article Outline

1. Introduction
2. Related work
3. Sequential pattern mining and tough aggregate constraints
3.1. Sequential pattern mining
3.2. Theoretical analysis
4. PTAC – a new algorithm for the tough aggregate constraints
4.1. The framework
4.2. Two new strategies
4.2.1. Pruning candidate sequences
4.2.2. Pruning new patterns before constructing projected databases
4.3. Room for further optimization
5. Experiment and analysis
5.1. Experimental datasets
5.2. Experimental platform
5.3. Experiments on synthetic datasets
5.3.1. Comparing the running time and scalability
5.3.2. Comparing the effectiveness of pruning strategies and the cost of space
5.4. Experiments on the real dataset
6. Conclusion
Acknowledgements
References

















Information Sciences
Volume 178, Issue 6, 15 March 2008, Pages 1498-1518
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.