Specificity Helps Text Classification

Bouma, Lucas; de Rijke, Maarten

doi:10.1007/11735106_60

Lucas Bouma²² &
Maarten de Rijke²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3936))

Included in the following conference series:

European Conference on Information Retrieval

1557 Accesses
1 Citations

Abstract

We examine the impact on classification effectiveness of semantic differences in categories. Specifically, we measure broadness and narrowness of categories in terms of their distance to the root of a hierarchically organized thesaurus. Using categories of four different levels degrees of broadness, we show that classifying documents into narrow categories gives better scores than classifying them into broad terms, which we attribute to the fact that more specific categories are associated with terms with a higher discriminatory power.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proceedings of the Workshop on Mining for and from the Semantic Web at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 70–87 (2004), http://www.aifb.uni-karlsruhe.de/WBS/sbl/publications/2004-08-ws-msw-bloehdorn-hothoboosting-semantic-features.pdf
Dayanik, A., Fradkin, D., Genkin, A., Kantor, P., Madigan, D., Lewis, D., Menkov, V.: Dimacs at the TREC 2004 genomics track. In: The Thirteenth Text Retrieval, Conference, TREC 2004 (2005)
Google Scholar
Granitzer, M.: Hierarchical Text Classification usingMethods from Machine Learning. Master’s thesis, Graz University of Technology (2003)
Google Scholar
Hersh, W., Bhuptiraju, R., Ross, L., Johnson, P., Cohen, A., Kraemer, D.: TREC 2004 genomics track overview. In: The Thirteenth Text Retrieval, Conference, TREC 2004 (2005)
Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods (1999)
Google Scholar
MeSH. National library of medicine, medical subject headings, MeSH (2005), http://www.nlm.nih.gov/mesh/MBrowser.html
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
TREC Genomics Track. Trec genomics 2004 ad hoc task documents (2005), URL: http://ir.ohsu.edu/genomics/
Wibowo, W., Williams, H.: On using hierarchies for document classification. In: Proceedings of the Fourth Australasian Document Computing Symposium, Coffs Harbour, Australia (1999)
Google Scholar
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

ISLA, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Lucas Bouma & Maarten de Rijke

Authors

Lucas Bouma
View author publications
You can also search for this author in PubMed Google Scholar
Maarten de Rijke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Queen Mary, University of London, London, UK
Mounia Lalmas
Department of Information Science, City University, Northampton Square, EC1V OHB, London, UK
Andy MacFarlane
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Queen Mary University of London, UK
Anastasios Tombros
CWI, Amsterdam, The Netherlands
Theodora Tsikrika
Department of Computing, Imperial College London, South Kensington Campus, SW7 2AZ, London, UK
Alexei Yavlinsky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bouma, L., de Rijke, M. (2006). Specificity Helps Text Classification. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_60

Download citation

DOI: https://doi.org/10.1007/11735106_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33347-0
Online ISBN: 978-3-540-33348-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics