Multi-label Text Classification Using Multinomial Models

Vilar, David; Castro, María José; Sanchis, Emilio

doi:10.1007/978-3-540-30228-5_20

David Vilar⁵,
María José Castro⁶ &
Emilio Sanchis⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3230))

Included in the following conference series:

International Conference on Natural Language Processing (in Spain)

719 Accesses
8 Citations

Abstract

Traditional approaches to pattern recognition tasks normally consider only the unilabel classification problem, that is, each observation (both in the training and test sets) has one unique class label associated to it. Yet in many real-world tasks this is only a rough approximation, as one sample can be labeled with a set of classes and thus techniques for the more general multi-label problem have to be explored. In this paper we review the techniques presented in our previous work and discuss its application to the field of text classification, using the multinomial (Naive Bayes) classifier. Results are presented on the Reuters-21578 dataset, and our proposed approach obtains satisfying results.

This work has been partially supported by the Spanish CICYT under contracts TIC2002-04103-C03-03 and TIC2003-07158-C04-03

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

McCallum, A.K.: Multi-Label Text Classification with a Mixture Model Trained by EM. In: NIPS 1999 (1999)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Article Google Scholar
Castro, M.J., Vilar, D., Sanchis, E., Aibar, P.: Uniclass and Multiclass Connectionist Classification of Dialogue Acts. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 266–273. Springer, Heidelberg (2003)
Chapter Google Scholar
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39, 135–168 (2000)
Article Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 69–90 (1999)
Article Google Scholar
Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Nigam, K., McCalum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Article Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naive Bayes text classification. In: AAAI/ICML 1998 Workshop on Learning for Text Categorization, pp. 41–48. AAAI Press, Menlo Park (1998)
Google Scholar
Juan, A., Ney, H.: Reversing and Smoothing the Multinomial Naive Bayes Text Classifier. In: Proc. of the 2nd Int. Workshop on Pattern Recognition in Information Systems (PRIS 2002), Alacant (Spain), pp. 200–212 (2002)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, New York (2001)
MATH Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)
Book Google Scholar
Ney, H., Martin, S., Wessel, F.: Satistical Language Modeling Using Leaving-One-Out. In: Corpus-based Methods in Language and Speech Proceesing, pp. 174–207. Kluwer Academic Publishers, Dordrecht (1997)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen University, D-52056, Aachen, Germany
David Vilar
Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, E-46022, València, Spain
María José Castro & Emilio Sanchis

Authors

David Vilar
View author publications
You can also search for this author in PubMed Google Scholar
María José Castro
View author publications
You can also search for this author in PubMed Google Scholar
Emilio Sanchis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
José Luis Vicedo
Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Spain
Patricio Martínez-Barco
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Departamento de Lenguajes y Sistemas Informáticos, Carretera de San Vicente del Raspeig, Universidad de Alicante, 03690 San Vicente del Raspeig, Alicante, Spain
Maximiliano Saiz Noeda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vilar, D., Castro, M.J., Sanchis, E. (2004). Multi-label Text Classification Using Multinomial Models. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-30228-5_20
Published: 20 October 2004
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics