The Effect of Dimensionality Reduction on Large Scale Hierarchical Classification

Kosmpoulos, Aris; Paliouras, Georgios; Androutsopoulos, Ion

doi:10.1007/978-3-319-11382-1_16

Aris Kosmpoulos^22,23,
Georgios Paliouras²² &
Ion Androutsopoulos²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8685))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1059 Accesses
2 Citations

Abstract

Many classification problems are related to a hierarchy of classes, that can be exploited in order to perform hierarchical classification of test objects. The most basic way of hierarchical classification is that of cascade classification, which greedily traverses the hierarchy from root to the predicted leaf. In order to perform cascade classification, a classifier must be trained for each node of the hierarchy. In large scale problems, the number of features can be prohibitively large for the classifiers in the upper levels of the hierarchy. It is therefore desirable to reduce the dimensionality of the feature space at these levels. In this paper we examine the computational feasibility of the most common dimensionality reduction method (Principal Component Analysis) for this problem, as well as the computational benefits that it provides for cascade classification and its effect on classification accuracy. Our experiments on two benchmark datasets with a large hierarchy show that it is possible to perform a certain version of PCA efficiently in such large hierarchies, with a slight decrease in the accuracy of the classifiers. Furthermore, we show that PCA can be used selectively at the top levels of the hierarchy in order to decrease the loss in accuracy. Finally, the reduced feature space, provided by the PCA, facilitates the use of more costly and possibly more accurate classifiers, such as non-linear SVMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kosmopoulos, A., Gaussier, É., Paliouras, G., Aseervatham, S.: The ECIR 2010 large scale hierarchical classification workshop. In: SIGIR Forum. vol. 44, pp. 23–32 (2010)
Google Scholar
Roweis, S.: EM Algorithms for PCA and SPCA. In: Advances in Neural Information Processing Systems, pp. 626–632 (1998)
Google Scholar
Van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: A comparative review. Journal of Machine Learning Research 10, 66–71 (2009)
Google Scholar
Patridge, M., Calvo, R.: Fast dimensionality reduction and Simple PCA. Intelligent Data Analysis 2, 292–298 (1997)
Google Scholar
Oja, E.: Principal components, minor components, and linear neural networks. In: Neural Networks, pp. 927–935 (1992)
Google Scholar
Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. Software, Environments, and Tools 6 (1998)
Google Scholar
Grbovic, M., Dance, R.C., Vucetic, S.: Sparse Principal Component Analysis with Constraints. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)
Google Scholar
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed Fisher vectors. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391 (2010)
Google Scholar
Dekel, O., Keshet, J., Singer, Y.: Large margin hierarchical classification. In: ICML 2004: Proceedings of the Twenty First International Conference on Machine Learning, p. 27 (2004)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods, pp. 42–49. ACM Press (1999)
Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
MATH Google Scholar
Venables, K.V., Ripley, B.D.: Modern Applied Statistics with S. Springer (2002)
Google Scholar
Setiono, R., Liu, H.: Chi2: Feature selection and discretization of numeric attributes. In: Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence (1995)
Google Scholar
Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., Ma, W.: Support Vector Machines Classification with a Very Large-scale Taxonomy. In: SIGKDD Explor. Newsl., pp. 36–43 (2005)
Google Scholar
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. In: ACM Transactions on Intelligent Systems and Technology (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”9, Athens, Greece
Aris Kosmpoulos & Georgios Paliouras
Department of Informatics, Athens University of Economics and Business, Greece
Aris Kosmpoulos & Ion Androutsopoulos

Authors

Aris Kosmpoulos
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Paliouras
View author publications
You can also search for this author in PubMed Google Scholar
Ion Androutsopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Google Inc., Brandschenkestraße 110, 8002, Zurich, Switzerland
Evangelos Kanoulas
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstrasse 9-11, 1040, Vienna, Austria
Mihai Lupu
Information School, University of Sheffield, Sheffield, UK
Paul Clough
Department of Computer Science and IT, RMIT University, 3000, Melbourne, VIC, Australia
Mark Sanderson
Department of Computing, Edge Hill University, L39 4QP, Ormskirk, Lancashire, UK
Mark Hall
Vienna University of Technology, Austria
Allan Hanbury
Information School, University of Sheffield, Regent Court, 211 Portobello, S1 4DP, Sheffield, UK
Elaine Toms

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kosmpoulos, A., Paliouras, G., Androutsopoulos, I. (2014). The Effect of Dimensionality Reduction on Large Scale Hierarchical Classification. In: Kanoulas, E., et al. Information Access Evaluation. Multilinguality, Multimodality, and Interaction. CLEF 2014. Lecture Notes in Computer Science, vol 8685. Springer, Cham. https://doi.org/10.1007/978-3-319-11382-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-11382-1_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11381-4
Online ISBN: 978-3-319-11382-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics