Abstract
This paper presents the Feature Decomposition Approach for improving supervised learning tasks. While in Feature Selection the aim is to identify a representative set of features from which to construct a classification model, in Feature Decomposition, the goal is to decompose the original set of features into several subsets. A classification model is built for each subset, and then all generated models are combined. This paper presents theoretical and practical aspects of the Feature Decomposition Approach. A greedy procedure, called DOT (Decomposed Oblivious Trees), is developed to decompose the input features set into subsets and to build a classification model for each subset separately. The results achieved in the empirical comparison testing with well-known learning algorithms (like C4.5) indicate the superiority of the feature decomposition approach in learning tasks that contains high number of features and moderate numbers of tuples.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ali K. M., Pazzani M. J., Error Reduction through Learning Multiple Descriptions, Machine Learning, 24(3): 173–202, 1996.
Almuallim H. and Dietterich T.G., Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69(1–2):279–306, 1994.
Attneave, F., Applications of Information Theory to Psychology. Holt, Rinehart and Winston, 1959.
Bay, S. Nearest neighbor classification from multiple feature subsets. Intelligent Data Analysis, 3(3): 191–209, 1999.
Bellman, R., Adaptive Control Processes: A Guided Tour, Princeton University Press, 1961.
Blum, A. and Mitchell, T., “Combining Labeled and Unlabeled Data with Cotraining”, COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 1998.
Buntine, W., “Graphical Models for Discovering Knowledge”, in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pp 59–82. AAAI/MIT Press, 1996.
Chan, P.K. and Stolfo, S.J., A Comparative Evaluation of Voting and Metalearning on Partitioned Data, Proc. 12th Intl. Conf. on Machine Learning ICML-95, 1995.
Dietterich, T. G., and Bakiri, G., Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2:263–286, 1995.
Domingos, P., and Pazzani, M., “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” Machine Learning, 29: 103–130, 1997.
Duda, R., and Hart, P., Pattern Classification and Scene Analysis, New-York, NY: Wiley, 1973.
Dunteman, G.H., Principal Components Analysis, Sage Publications, 1989.
Fayyad, U., Piatesky-Shapiro, G., and Smyth P., “From Data Minig to Knowledge Discovery: An Overview,” in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pp 1–30, MIT Press, 1996.
Friedman, J.H., and Tukey, J.W., “A Projection Pursuit Algorithm for Exploratory Data Analysis,” IEEE Transactions on Computers, 23 (9): 881–889, 1974.
Friedman, J.H., “On bias, variance, 0/1-loss and the curse of dimensionality,” Data Mining and Knowledge Discovery, 1 (1): 55–77, 1997.
Fukunaga, K., Introduction to Statistical Pattern Recognition. San Diego, CA: Academic, 1990.
Hwang J., Lay S., and Lippman A., “Nonparametric multivariate density estimation: A comparative study,” IEEE Trans. Signal Processing, vol. 42, pp. 2795–2810, Oct. 1994.
Jimenez, L. O., and Landgrebe D. A., “Supervised Classification in High-Dimensional Space: Geometrical, Statistical, and Asymptotical Properties of Multivariate Data.” IEEE Transaction on Systems Man, and Cybernetics — Part C: Applications and Reviews, 28:39–54, 1998.
Kim J.O., and C.W. Mueller, Factor Analysis: Statistical Methods and Practical Issues. Sage Publications, 1978.
Kononenko, I., “Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition”. In Current Trends In reply to: Knowledge Acquisition, IOS Press, 1990.
Kononenko, I., “Semi-naive Bayesian classifier,” in Proceedings of the Sixth European Working Session on Learning, Springer-Verlag, pp. 206–219, 1991.
Kusiak, A., Decomposition in Data Mining: An Industrial Case Study, IEEE Transactions on Electronics Packaging Manufacturing, Vol. 23, No. 4, 2000, pp. 345–353
Langley, P., “Selection of relevant features in machine learning,” in Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press, 1994.
Langley, P. and Sage, S., Oblivious decision trees and abstract cases. Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, Seattle, WA: AAAI Press, 113–117.
Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers, 1998.
Maimon, O., and M. Last, Knowledge Discovery and Data Mining: The Info-Fuzzy network (IFN) methodology, Kluwer Academic Publishers, 2000.
Maimon, O. and Rokach, L., “Data Mining by Attribute Decomposition with semiconductors manufacturing case study” in D. Bracha, Editor, Data Mining for Design and Manufacturing: Methods and Applications, Kluwer Academic Publishers, 2001.
Mansour, Y., and McAllester, D., Generalization Bounds for Decision Trees, COLT 2000: 220–224.
Merz, C.J, and Murphy. P.M., UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
Michie, D., “Problem decomposition and the learning of skills,” in Proceedings of the European Conference on Machine Learning, Springer-Verlag, PP. 17–31, 1995.
Pfahringer, B., “Controlling constructive induction in CiPF,” in Proceedings of the European Conference on Machine Learning, Springer-Verlag, pp. 242–256. 1994.
Pickard, L., B. Kitchenham, and S. Linkman., “An investigation of analysis techniques for software datasets”, in Proc. 6th IEEE Intl. Software Metrics Symposium. Boca Raton, FL: IEEE Computer Society, 1999.
Quinlan, J.R., C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.
Ridgeway, G., Madigan, D., Richardson, T. and O’Kane, J. (1998), “Interpretable Boosted Naive Bayes Classification”, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp 101–104.
Salzberg. S. L. (1997), “On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach”. Data Mining and Knowledge Discovery, 1, 312–327, Kluwer Academic Publishers, Boston.
Schmitt, M. On the complexity of computing and learning with multiplicative neural networks, to appear in Neural Computation, 2001.
Schlimmer, J. C. Efficiently inducing determinations: A complete and systematic search algorithm that uses optimal pruning. In Proceedings of the 1993 International Conference on Machine Learning, pages 284–290, San Mateo, CA, 1993. Morgan Kaufmann.
Shapiro, A. D., Structured induction in expert systems, Turing Institute Press in association with Addison-Wesley Publishing Company, 1987.
Vapnik, V.N., 1995. The Nature of Statistical Learning The-ory. Springer-Verlag, New York.
Wallace, C. S., 1996. MML Inference of Predictive Trees, Graphs and Nets. In Computational Learning and Probabilitic Reasoning, A., Gammerman (ed), Wiley, pp43–66.
Walpole, R. E., and Myers, R. H., Probability and Statistics for Engineers and Scientists, pp. 268–272, 1986.
Zaki, M. J., and Ho, C. T., Eds., Large-Scale Parallel Data Mining. New York: Springer-Verlag, 2000.
Zupan, B., Bohanec, M., Demsar, J., and Bratko, I., “Feature transformation by function decomposition,” IEEE intelligent systems & their applications, 13: 38–43, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maimon, O., Rokach, L. (2002). Improving Supervised Learning by Feature Decomposition. In: Eiter, T., Schewe, KD. (eds) Foundations of Information and Knowledge Systems. FoIKS 2002. Lecture Notes in Computer Science, vol 2284. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45758-5_12
Download citation
DOI: https://doi.org/10.1007/3-540-45758-5_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43220-3
Online ISBN: 978-3-540-45758-9
eBook Packages: Springer Book Archive