Skip to main content

Vector Space Model of Text Classification Based on Inertia Contribution of Document

  • Conference paper
  • First Online:
Emerging Technologies for Developing Countries (AFRICATEK 2018)

Abstract

The use of textual data has increased exponentially in recent years due to the networking infrastructure such as Facebook, Twitter, Wikipedia, Blogs, and so one. Analysis of this massive textual data can help to automatically categorize and label new content. Before classification process, term weighting scheme is the crucial step for representing the documents in a way suitable for classification algorithms. In this paper, we are conducting a survey on the term weighting schemes and we propose an efficient term weighting scheme that provide a better classification accuracy than those obtening with the famous TF-IDF, the recent IF-IGM and the others term weighting schemes in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 60.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dejun, X., Maosong, S.: Chinese text categorization based on the binary weighting model with non-binary smoothing. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 408–419. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36618-0_29

    Chapter  MATH  Google Scholar 

  2. Lan, M., Tan, C.L., Jian, S., Yue, L.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 721–735 (2009)

    Article  Google Scholar 

  3. Deng, Z.-H., Tang, S.-W., Yang, D.-Q., Zhang, M., Li, L.-Y., Xie, K.-Q.: A comparative study on feature weight in text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 588–597. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24655-8_64

    Chapter  Google Scholar 

  4. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 784–788 (2003)

    Google Scholar 

  5. Yang, J., Wang, J., Liu, Z., Qu, Z.: A term weighting scheme based on the measure of relevance and distinction for text categorization. In: International Conference on Advanced Computing Technologies and Applications, ICACTA-2015, pp 13–22. https://doi.org/10.1016/j.procs.2015.03.074

    Article  Google Scholar 

  6. Feng, G., Wang, H., Sun, T., Zhang, L.: A term frequency based weighting scheme using naïve bayes for text classification. J. Comput. Theor. Nanosci. 319–326 (2016). https://doi.org/10.1166/jctn.2016.4807

    Article  Google Scholar 

  7. Wang, T., Cai, Y., Leung, H., Cai, Z., Min, H.: Entropy-based Term Weighting Schemes for Text Categorization in VSM. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence, 12 p. https://doi.org/10.1109/ictai.2015.57

  8. Yoshida, T.M.M.K.K.: Term weighting method based on information gain ratio for summarizing documents retrieved by IR systems. J. Nat. Lang. Process. 9(4), 3–32 (2001)

    Google Scholar 

  9. Chen, K., Zhang, Z., Long, J., Zhang, H.: Turning from TF-IDF to TF-IGM for term weighting in text classification. J. Expert Syst. Appl. 66(C), 245–260 (2016)

    Article  Google Scholar 

  10. Cormack, G.V., Gómez Hidalgo, J.M., Puertas Sánz, E.: Spam filtering for short messages. In: Proceedings of the Sixteenth ACM Conference on Conference on information and Knowledge Management, Lisbon, Portugal, 06–10 November 2007, CIKM 2007, pp. 313–320. ACM, New York (2007). http://doi.acm.org/10.1145/1321440.1321486

  11. Geng, J., Lu, Y., Chen, W., Qin, Z.: An improved text categorization algorithm based on VSM. In: 2014 IEEE 17th International Conference on Computational Science and Engineering. https://doi.org/10.1109/cse.2014.313

  12. Wu, H., Gu, X., Gu, Y.: Balancing between over-weighting and under-weighting in supervised term weighting. Inf. Process. Manag. 53, 547–557 (2017). https://doi.org/10.1016/j.ipm.2016.10.003

    Article  Google Scholar 

  13. Karisani, P., Rahgozar, M., Oroumchian, F.: A query term re-weighting approach using document similarity. Inf. Process. Manag. 52(3), 478–489 (2016). https://doi.org/10.1016/j.ipm.2015.09.002

    Article  Google Scholar 

  14. Haddoud, M., Mokhtari, A., Lecroq, T., Abdeddaïm, S.: Combining supervised term-weighting metrics for SVM text classification with extended term representation. Knowl. Inf. Syst. (2016). https://doi.org/10.1007/s10115-016-0924-1

    Article  Google Scholar 

  15. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the ACL 2004. http://www.cs.cornell.edu/people/pabo/movie-review-data

Download references

Acknowledgment

We would like to express our sincere thanks to the CEA-MITIC (Centre d’Excellence Africain en Mathématiques, Informatique et Tic) who financed our research by paying the publication fees of the 2papiers that we published in Africatek2018. The CEA-MITIC, located at the UFR of Applied Sciences and Technology (UFR SAT) of the Gaston Berger University (UGB) of Saint-Louis in Senegal, is a consortium of university institutions in Senegal and subregion of Senegal, research institutions and national, regional and international companies involved in the ICT sector.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fodé Camara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kandé, D., Camara, F., Marone, R.M., Ndiaye, S. (2019). Vector Space Model of Text Classification Based on Inertia Contribution of Document. In: Zitouni, R., Agueh, M. (eds) Emerging Technologies for Developing Countries. AFRICATEK 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 260. Springer, Cham. https://doi.org/10.1007/978-3-030-05198-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05198-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05197-6

  • Online ISBN: 978-3-030-05198-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics