Abstract
Databases of sequences can contain consecutive repetitions of items. This is the case in particular when some items represent discretized quantitative values. We show that on such databases, a typical algorithm like the SPADE algorithm tends to loose its efficiency. SPADE is based on the used of lists containing the localization of the occurrences of a pattern in the sequences and these lists are not appropriated in the case of data with repetitions. We introduce the concept of generalized occurrences and the corresponding primitive operators to manipulate them. We present an algorithm called GO-SPADE that extends SPADE to incorporate generalized occurrences. Finally we present experiments showing that GO-SPADE can handle sequences containing consecutive repetitions at nearly no extra cost.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the VLDB Conference, Santiago, Chile, September 1994.
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the 11th International Conference on Data Engineering (ICDE’95), pages 3–14, Taipei, Taiwan, March 1995. IEEE Computer Society.
G. Das, L. K.I., H. Mannila, G. Renganathan, and P. Padhraic Smyth. Rule discovery from time series. In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD’98), pages 16–22, New York (USA), August 1998. AAAI Press.
J. Han, J. Pei, B. Han Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. Freespan: Frequent pattern-projected sequential pattern mining. In Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD’00), pages 355–359, August 2000.
M. Leleu and J. Boulicaut. Signing stock market situations by means of characteristic sequential patterns. In Proc. of the 3rd International Conference on Data Mining (DM’02), Bologna, Italy, September 2002. WIT Press.
H. Mannila, H. Toivonen, and A. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–298, November 1997.
F. Masseglia, C. F., and P. P. The PSP approach for mining sequential patterns. In Proc. of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery in Databases (PKDD’98), pages 176–184, Nantes, France, September 1998. Lecture Notes in Artificial Intelligence, Springer Verlag.
J. Pei, B. Han, B. Mortazavi-Asl, and H. Pinto. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. of the 17th International Conference on Data Engineering (ICDE’01), 2001.
J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21–30, May 2000.
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the 5th International Conference on Extending Database Technology (EDBT’96), pages 3–17, Avignon, France, September 1996.
M. Zaki. Efficient enumeration of frequent sequences. In Proc. of the 7th International Conference on Information and Knowledge Management (CIKM’98), pages 68–75, November 1998.
M. Zaki. Sequence mining in categorical domains: incorporating constraints. In Proc. of the 9th International Conference on Information and Knowledge Management (CIKM’00), pages 422–429, Washington, DC, USA, November 2000.
M. Zaki. Spade: an efficient algorithm for mining frequent sequences. Machine Learning, Special issue on Unsupervised Learning, 42(1/2):31–60, Jan/Feb 2001.
M. Zaki and C. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proc. of the 2nd SIAM International Conference on Data Mining, Arlington, Virginia, USA, April 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leleu, M., Rigotti, C., Boulicaut, JF., Euvrard, G. (2003). GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions. In: Perner, P., Rosenfeld, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2003. Lecture Notes in Computer Science, vol 2734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45065-3_26
Download citation
DOI: https://doi.org/10.1007/3-540-45065-3_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40504-7
Online ISBN: 978-3-540-45065-8
eBook Packages: Springer Book Archive