Skip to main content
Log in

Deep Convolutional Neural Network for Knowledge-Infused Text Classification

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Deep neural networks are extensively used in text mining and Natural Language Processing is to enable computers to understand, analyze, and generate natural language data, such as text or speech, but semantic resources, such as taxonomies and ontologies, are not fully included in deep learning. In this paper, we use Deep Convolutional Neural Network (Deep CNN) to classify research papers using the Computer Science Ontology, an ontology of research areas in the field of computer science. It takes as input the abstract and keywords of a particular research paper and returns the relevant research topic. To evaluate our ontology, we used a gold standard dataset that includes research articles. To further improve text classification results, we propose to design a Deep CNN model. We then used ontology matching to reduce the classes and get better results. Experimental results show that the proposed approach outperforms the one with the highest precision, recall, and F1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  2. Jang, H., Bang, S., Xiao, W., Carenini, G., Ng, R., Ji L.Y.: KW-ATTN: knowledge infused attention for accurate and interpretable text classification. In: Proceedings of Deep Learning Inside out (DeeLIO): the 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 96–107 (2021)

  3. Malik, S., Jain, S.: Knowledge-infused text classification for the biomedical domain. Int. J. Inf. Syst. Model. Des. (IJISMD) 13(10), 1–15 (2022)

    Article  Google Scholar 

  4. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. Neural Inf Process Syst 26, 3111–3119 (2013)

    Google Scholar 

  5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. Neural Inf Process Syst 30, 5998–6008 (2017)

    Google Scholar 

  6. Marcus, G.: The next decade in AI: four steps towards robust artificial intelligence. arXiv preprint arXiv:2002.06177 (2020)

  7. Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., Kiela, D.: Adversarial NLI: a new benchmark for natural language understanding. arXiv preprint arXiv:1910.14599 (2019)

  8. Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? a strong baseline for natural language attack on text classification and entailment. Proc. AAAI Conf. Artif. Intell. 34(05), 8018–8025 (2020)

    Google Scholar 

  9. Liu, X., Cheng, H., He, P., Chen, W., Wang, Y., Poon, H., Gao, J.: Adversarial training for large neural language models. arXiv preprint arXiv:2004.08994 (2020)

  10. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54(3), 1–40 (2020)

    Article  Google Scholar 

  11. Gupta, V.: Recent trends in text classification techniques. Int. J. Comput. Appl.Comput. Appl. 35(6), 45–51 (2011)

    Google Scholar 

  12. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modeling sentences. arXiv preprint arXiv:1404.2188 (2014)

  13. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  14. Liu, J., Chang, W.C., Wu, Y., Yang, Y.: Deep learning for extreme multi-label text classification. In: Proceedings of the 40th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–124 (2017)

  15. Decker, S.L., Aleman-Meza, B., Cameron, D., Arpinar, I.B.: Detection of bursty and emerging trends towards identification of researchers at the early stage of trends (2007)

  16. Mai, F., Lukas G., Ansgar S.: Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text. In: Proceedings of the 18th ACM/IEEE on joint conference on digital libraries, pp. 169–178 (2018)

  17. Allan, J., Carbonell, J. G., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report (1998)

  18. Duvvuru, A., Radhakrishnan, S., More, D., Kamarthi, S., Sultornsanee, S.: Analyzing structural and temporal characteristics of keyword system in academic research articles. Procedia Comput. Sci. 20, 439–445 (2013)

    Article  Google Scholar 

  19. Salatino, A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: Classifying research papers with the computer science ontology (2018)

  20. Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: The computer science ontology: a large-scale taxonomy of research areas. In: The Semantic Web–ISWC 2018: 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part II 17, pp. 187–205. Springer International Publishing (2018)

  21. Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The CSO classifier: Ontology-driven detection of research topics in scholarly articles. In: Digital Libraries for Open Knowledge: 23rd International Conference on Theory and Practice of Digital Libraries, TPDL 2019, Oslo, Norway, September 9–12, 2019, Proceedings 23, pp. 296–311. Springer International Publishing (2019)

  22. https://github.com/nltk/nltk.

  23. Osisanwo, F.Y., Akinsola, J.E.T., Awodele, O., Hinmikaiye, J.O., Olakanmi, O., Akinjobi, J.: Supervised machine learning algorithms: classification and comparison. Int. J. Comput. Trends Technol. (IJCTT) 48(3), 128–138 (2017)

    Article  Google Scholar 

  24. Khanum, M., Mahboob, T., Imtiaz, W., Ghafoor, H.A., Sehar, R.: A survey on unsupervised machine learning algorithms for automation, classification and maintenance. Int. J. Comput. Appl.Comput. Appl. 119(13), 34–39 (2015)

    Google Scholar 

  25. Guo, Q., Ji, W., Zhong, S., Zhou, E.: The analysis of the ontology-based k-means clustering algorithm. In: Conference of the 2nd International Conference on Computer Science and Electronics Engineering, pp. 734–737. Atlantis Press (2013)

  26. Vateekul, P., Kubat, M.: Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data. In 2009 IEEE International Conference on Data Mining Workshops, pp. 320–325. IEEE, (2009)

  27. Dargan, S., Kumar, M., Ayyagari, M.R., Kumar, G.: A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng. 27, 1071–1092 (2020)

    Article  MathSciNet  Google Scholar 

  28. https://jmlr.csail.mit.edu/papers/v12/pedregosa1 1a.html, (n.d.)

  29. Xu, S., Li, Y., Wang, Z.: Bayesian multinomial Naïve Bayes classifier to text classification. In: Advanced Multimedia and Ubiquitous Engineering: MUE/FutureTech 11 pp. 347–352. Springer, Singapore (2017)

  30. Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In: On the Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3–7, 2003. Proceedings, pp. 986–996. Springer, Berlin (2003)

  31. Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012)

    MathSciNet  Google Scholar 

  32. Qin, Y.P., Wang, X.K.: Study on multi-label text classification based on SVM. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 300–304. IEEE (2009)

  33. Myllymaki, P., Tirri, H.: Bayesian case-based reasoning with neural networks. In: IEEE International Conference on Neural Networks, pp. 422–427. IEEE (1993)

  34. Srinivasan, P., Ruiz, M.E.: Automatic text categorization using neural network. In: Proceedings of the 8th ASIS SIG/CR Workshop on Classification Research, pp. 59–72 (1998)

  35. Ng, H.T., Goh, W.B., & Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 67–73. (1997)

  36. Lee, Y.H., Tsao, W.J., Chu, T.H.: Use of ontology to support concept-based text categorization. In: Designing E-Business Systems. Markets, Services, and Networks: 7th Workshop on E-Business, WEB 2008, Paris, France, December 13, 2008, Revised Selected Papers 7, pp. 201–213. Springer, Berlin (2009)

  37. Yu, F., Zheng, D.Q., Zhao, T.J., Li, S., Yu, H.: Text classification based on a combination of ontology with statistical method. In: 2006 International Conference on Machine Learning and Cybernetics, pp. 1042–1047. IEEE (2006)

  38. Zhou, P., El-Gohary, N.: Ontology-based multilabel text classification of construction regulatory documents. J. Comput. Civ. Eng.Comput. Civ. Eng. 30(4), 04015058 (2016)

    Article  Google Scholar 

  39. Osborne, F., Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In: The Semantic Web-ISWC 2015: 14th International Semantic Web Conference, Bethlehem, PA, USA, October 11–15, Proceedings, Part I 14, pp. 408–424. Springer International Publishing (2015)

  40. Osborne, F., Salatino, A., Birukou, A., Motta, E.: Automatic classification of springer nature proceedings with smart topic miner. In: The Semantic Web-ISWC 2016: 15th International Semantic Web Conference, Kobe, Japan, October 17–21, Proceedings, Part II 15, pp. 383–399. Springer International Publishing (2016)

  41. Thanapalasingam, T., Osborne, F., Birukou, A., Motta, E.: The smart book recommender: an ontology-driven application for recommending editorial products (2018)

  42. Singh, V., Saini, B.: An effective tokenization algorithm for information retrieval systems. Department of Computer Engineering, National Institute of Technology Kurukshetra, Haryana, India (2014)

  43. Sarica, S., Luo, J.: Stopwords in technical language processing. PLoS One 16(8), e0254937 (2021)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Mandal, P.K., Mahto, R.: Deep CNN-LSTM with word embeddings for news headline sarcasm detection. In: 16th International Conference on Information Technology-New Generations pp. 495–498. Springer International Publishing (2019)

  45. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

  46. Massmann, S., Engmann, D., Rahm, E.: COMA++: results for the Ontology Alignment Contest OAEI 2006. Ontology Matching 225 (2006)

  47. O'Shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015)

  48. Nagi, J., Ducatelle, F., Di Caro, G. A., Cireşan, D., Meier, U., Giusti, A., Gambardella, L.M.: Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: 2011 IEEE international conference on signal and image processing applications (ICSIPA), pp. 342–347. IEEE (2011)

  49. Jin, J., Dundar, A., Culurciello, E.: Flattened convolutional neural networks for feedforward acceleration. arXiv preprint arXiv:1412.5474 (2014)

  50. Baldi, P., Sadowski, P.J.: Understanding dropout. Adv. Neural Inf. Process. Syst 26 (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonika Malik.

Ethics declarations

Conflict of interest

Sarika Jain is a guest editor of the special issue, "The Way Forward with AI-complete Problems". She was not involved in the peer review or handling of the manuscript. On behalf of all authors, the corresponding author states that there is no other potential conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malik, S., Jain, S. Deep Convolutional Neural Network for Knowledge-Infused Text Classification. New Gener. Comput. (2024). https://doi.org/10.1007/s00354-024-00245-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00354-024-00245-6

Keywords

Navigation