Skip to main content

Advertisement

Log in

Using kNN model for automatic text categorization

  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

An investigation is conducted on two well-known similarity-based learning approaches to text categorization: the k-nearest neighbors (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNN Model) is proposed. It combines the strength of both kNN and Rocchio. A text categorization prototype, which implements kNN Model along with kNN and Rocchio, is described. An experimental evaluation of different methods is carried out on two common document corpora: the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the proposed kNN model-based method outperforms the kNN and Rocchio classifiers, and is therefore a good alternative for kNN and Rocchio in some application areas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lam W, Ho C (1998) Using a generalized instance set for automatic text categorization. SIGIR'98, pp 81–89

  2. Lewis D (1998) Naïve (Bayes) at forty: the independent assumption in information retrieval. In: Proceedings of ECML-98, 10th European conference on machine learning, pp 4–15

  3. Cohen W, Singer Y (1999) Context-sensitive learning methods for text categorization. ACM Trans Inform Syst 17(2):141–173

    Google Scholar 

  4. Li H, Yamanishi K (1999) Text classification using esc-based stochastic decision lists. In: Proceedings of CIKM-99, 8th ACM international conference on information and knowledge management, pp 122–130

  5. Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of SIGIR-99, 22nd ACM international conference on research and development in information retrieval, pp 42–49

  6. Ruiz M, Srinivasan P (1999) Hierarchical neural networks for text categorization. In: Proceedings of SIGIR-99, 22nd ACM International Information Retrieval, pp 281–282

  7. Mitchell T (1996) Machine learning. McGraw Hill, New York

  8. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of 10th European conference on machine learning, Chemnitz, Germany, pp 137–142

  9. Joachims T (2001) A statistical learning model of text classification for support vector machines. In: Proceedings of SIGIR-01, 24th ACM international conference on research and development in information retrieval, pp 128–136

  10. Rocchio J (1971) Relevance feedback in information retrieval. In: The SMART retrieval system: experiments in automatic document processing. Salton G (ed) Prentice-Hall, Englewood Cliffs

  11. Joachims T (1997) A probabilistic analysis of the rocchio algorithm with TFIDF for test categorization. In: Proceedings of ICML-97, 14th international conference on machine learning, pp 143–151

  12. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47

    Google Scholar 

  13. Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7): 1895–1924

    Google Scholar 

  14. Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Vesley, Reading

  15. Han E, Karypis G (2000) Centroid-based document classification: analysis and experimental results, Technical Report:#00-017, University of Minnesota, Department of Computer Science / Army HPC Research Center, Minneapolis, MN 55455

  16. ICONS (2001) ICONS Consortium, intelligent content management system contract number IST-2001-32429. Annex I-Description of work

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was partly supported by the European Commission project ICONS, project no. IST-2001-32429.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, G., Wang, H., Bell, D. et al. Using kNN model for automatic text categorization. Soft Comput 10, 423–430 (2006). https://doi.org/10.1007/s00500-005-0503-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-005-0503-y

Keywords

Navigation