DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology

Lin, Ming-Yen; Lee, Suh-Yin; Wang, Sheng-Shun

doi:10.1007/3-540-47887-6_19

DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology

Ming-Yen Lin⁴,
Suh-Yin Lee⁴ &
Sheng-Shun Wang⁴

Conference paper
First Online: 01 January 2002

2126 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Abstract

An active research in data mining is the discovery of sequential patterns, which finds all frequent sub-sequences in a sequence database. Most of the studies specify no time constraints such as maximum/minimum gaps between adjacent elements of a pattern in the mining so that the resultant patterns may be uninteresting. In addition, a data sequence containing a pattern is rigidly defined as only when each element of the pattern is contained in a distinct element of the sequence. This limitation might lose useful patterns for some applications because sometimes items of an element might be spread across adjoining elements within a specified time period or time window. Therefore, we propose a pattern-growth approach for mining the generalized sequential patterns. Our approach features in reducing the size of sub-databases by bounded and windowed projection techniques. Bounded projections keep only time-gap valid sub-sequences and windowed projections save non-redundant sub-sequences satisfying the sliding time window constraint. Furthermore, the delimited growth technique directly generates constraint-satisfactory patterns and speeds up the growing process. The empirical evaluations show that the proposed approach has good linear scalability and outperforms the well-known GSP algorithm in the discovery of generalized sequential patterns.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal and R. Srikant, „Mining Sequential Patterns,“ Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14, 1995.
Google Scholar
C. Bettini, X. S. Wang, and S. Jajodia, „Mining Temporal Relationships with Multiple Granularities in Time Sequences,“ Data Engineering Bulletin, Vol. 21, pp. 32–38, 1998.
Google Scholar
M. N. Garofalakis, R. Rastogi, and K. Shim, „SPIRIT: Sequential Pattern Mining with Regular Expression Constraints,“ Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, pp. 223–234, 1999.
Google Scholar
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal and M.-C. Hsu, „FreeSpan: Frequent pattern-projected sequential pattern mining,“ Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 355–359, 2000.
Google Scholar
M. Y. Lin and S. Y. Lee, „Incremental Update on Sequential Patterns in Large Databases,“ Proceedings of 10th IEEE International Conference on Tools with Artificial Intelligence, Taipei, Taiwan, pp. 24–31, 1998.
Google Scholar
H. Mannila, H. Toivonen and A. I. Verkamo, „Discovery of Frequent Episodes in Event Sequences,“ Data Mining and Knowledge Discovery, Vol. 1, Issue 3, pp. 259–289, 1997.
Article Google Scholar
F. Masseglia, F. Cathala, and P. Poncelet, „The PSP Approach for Mining Sequential Patterns,“ Proceedings of 1998 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, Vol. 1510, Nantes, France, pp. 176–184, Sep. 1998.
Google Scholar
T. Oates, M. D. Schmill, D. Jensen, and P. R. Cohen, „A Family of Algorithms for Finding Temporal Structure in Data,“ Proceedings of the 6th International Workshop on AI and Statistics, Fort Lauderdale, Florida, pp. 371–378, 1997.
Google Scholar
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal and M.-C. Hsu, „PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth,“ Proceedings of 2001 International Conference on Data Engineering, pp. 215–224, 2001.
Google Scholar
T. Shintani and M. Kitsuregawa, „Mining algorithms for sequential patterns in parallel: Hash based approach,“ Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data mining, pp. 283–294, 1998.
Google Scholar
R. Srikant and R. Agrawal, „Mining Sequential Patterns: Generalizations and Performance Improvements,“ Proceedings of the 5th International Conference on Extending Database Technology, Avignon, France, pp. 3–17, 1996. (An extended version is the IBM Research Report RJ 9994)
Google Scholar
K. Wang, „Discovering patterns from large and dynamic sequential data,“ Journal of Intelligent Information Systems, Vol. 9, No. 1, pp. 33–56, 1997.
Article Google Scholar
M. J. Zaki, „Sequence Mining in Categorical Domains: Incorporating Constraints,“ Proceedings of the 9th International Conference on Information and Knowledge Management, Washington D.C., pp. 422–429, 2000.
Google Scholar
M. J. Zaki, „SPADE: An Efficient Algorithm for Mining Frequent Sequences,“ Machine Learning Journal, Vol. 42, No. 1/2, pp. 31–60, 2001.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Chiao Tung University, Taiwan 30050, R.O.C.
Ming-Yen Lin, Suh-Yin Lee & Sheng-Shun Wang

Authors

Ming-Yen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Suh-Yin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Shun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EE Department, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan, ROC
Ming-Syan Chen
IBM Thomas J. Watson Research Center, 30 Sawmill River Road, Hawthorne, NY, 10532, USA
Philip S. Yu
School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore, 119260
Bing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, MY., Lee, SY., Wang, SS. (2002). DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_19

Download citation

DOI: https://doi.org/10.1007/3-540-47887-6_19
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics