skip to main content
10.1145/1458082.1458361acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Incorporating topical support documents into a small training set in text categorization

Published:26 October 2008Publication History

ABSTRACT

This paper explores the incorporation of topical support documents into a training set as a means of compensating for a shortage of positive training data in text categorization. To support topical representation, our method applies a simple transformation to documents, i.e., making new documents from existing positive documents by squaring a conventional term weight. The topical support documents thus created not only are expected to preserve the topic, but even improve the topical representation by emphasizing terms with higher weights. Experiments with support vector machines showed the effectiveness on RCV1 collection with a small number of positive training data. Our topical support representation achieved 52.01% and 8.83% improvements for 33 and 56 categories of RCV1 Topic in micro-averaged F1 with less than 100 and 300 positive documents in learning, respectively. Result analyses based on robustness indicate that topical support documents contribute to a steady and stable improvement.

References

  1. DeCoste, D. and Schölkopf, B. 2002. Training invariant support vector machines. Machine Learning 46(1). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Joachims, T. 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proc. of the European Conference on Machine Learning (ECML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lee, K.-S., Kageura, K. 2007. Virtual relevant documents in text categorization with support vector machines, Information Processing & Management, 43(4), Elsvier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lewis, D. D., Yang, Y., Rose, T., and Li, F. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5:361--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sassano, M. 2003. Virtual Examples for Text Classification with Support Vector Machines, In Proc. of the 2003 conference on Empirical methods in natural language processing, pp. 208--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shen, D., Pan, R., Sun, J.-T., Pan, J., Wu, K., Yin, J., and Yang, Q. 2006. Query enrichment for web-query classification. ACM Transaction on Information Systems (TOIS), 24(3), pp. 320--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yang, Y. 2001. A study on thresholding strategies for text categorization. In Proc. of 24th ACM SIGIR Conference, pp 137--145. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Incorporating topical support documents into a small training set in text categorization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
        October 2008
        1562 pages
        ISBN:9781595939913
        DOI:10.1145/1458082

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 October 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      • Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader