Abstract
We propose a new method to classify patterns, using closed and maximal frequent patterns as features. Generally, classification requires a previous mapping from the patterns to classify to vectors of features, and frequent patterns have been used as features in the past. Closed patterns maintain the same information as frequent patterns using less space and maximal patterns maintain approximate information. We use them to reduce the number of classification features. We present a new framework for XML tree stream classification. For the first component of our classification framework, we use closed tree mining algorithms for evolving data streams. For the second component, we use state of the art classification methods for data streams. To the best of our knowledge this is the first work on tree classification in streaming data varying with time. We give a first experimental evaluation of the proposed classification method.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arimura, H., Uno, T.: An output-polynomial time algorithm for mining frequent closed attribute trees. In: ILP, pp. 1–19 (2005)
Balcázar, J.L., Bifet, A., Lozano, A.: Mining implications from lattices of closed trees. In: Extraction et gestion des connaissances (EGC 2008), pp. 373–384 (2008)
Balcázar, J.L., Bifet, A., Lozano, A.: Mining frequent closed rooted trees. Accepted for publication in Machine Learning Journal (2009)
Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining (2007)
Bifet, A., Gavaldà, R.: Mining adaptively frequent closed unlabeled rooted trees in data streams. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008)
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2009)
Chi, Y., Xia, Y., Yang, Y., Muntz, R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. Fundamenta Informaticae XXI, 1001–1038 (2001)
Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: ACL 2001, pp. 263–270 (2001)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM Journal on Computing 14(1), 27–45 (2002)
Garriga, G.C., Kralj, P., Lavrač, N.: Closed sets for labeled data. J. Mach. Learn. Res. 9, 559–580 (2008)
Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis (2007), http://sourceforge.net/projects/moa-datastream
Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: ICML, pp. 291–298 (2002)
Kudo, T., Maeda, E., Matsumoto, Y.: An application of boosting to graph classification. In: NIPS (2004)
Kudo, T., Matsumoto, Y.: A boosting algorithm for classification of semi-structured text. In: EMNLP, pp. 301–308 (2004)
Li, J., Li, H., Wong, L., Pei, J., Dong, G.: Minimum description length principle: Generators are preferable to closed patterns. In: AAAI (2006)
Punin, J., Krishnamoorthy, M., Zaki, M.: LOGML: Log markup language for web usage mining. In: WEBKDD Workshop, with SIGKDD (2001)
Song, G.-j., Yang, D.-q., Cui, B., Zheng, B., Liu, Y., Xie, K.-Q.: CLAIM: An efficient method for relaxed frequent closed itemsets mining over stream data. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 664–675. Springer, Heidelberg (2007)
Termier, A., Rousset, M.-C., Sebag, M., Ohara, K., Washio, T., Motoda, H.: DryadeParent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans. Knowl. Data Eng. 20(3), 300–320 (2008)
Yan, X., Han, J.: CloseGraph: mining closed frequent graph patterns. In: KDD 2003, pp. 286–295. ACM Press, New York (2003)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD 2002 (2002)
Zaki, M.J., Aggarwal, C.C.: XRules: an effective structural classifier for xml data. In: KDD 2003, pp. 316–325. ACM Press, New York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bifet, A., Gavaldà, R. (2009). Adaptive XML Tree Classification on Evolving Data Streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-04180-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)