Structure-Based Supervised Term Weighting and Regularization for Text Classification

Shanavas, Niloofer; Wang, Hui; Lin, Zhiwei; Hawe, Glenn

doi:10.1007/978-3-030-23281-8_9

Niloofer Shanavas¹⁹,
Hui Wang¹⁹,
Zhiwei Lin¹⁹ &
…
Glenn Hawe¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11608))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1504 Accesses
1 Citations

Abstract

Text documents have rich information that can be useful for different tasks. How to utilise the rich information in texts effectively and efficiently for tasks such as text classification is still an active research topic. One approach is to weight the terms in a text document based on their relevance to the classification task at hand. Another approach is to utilise structural information in a text document to regularize learning so that the learned model is more accurate. An important question is, can we combine the two approaches to achieve better performance? This paper presents a novel method for utilising the rich information in texts. We use supervised term weighting, which utilises the class information in a set of pre-classified training documents, thus the resulting term weighting is class specific. We also use structured regularization, which incorporates structural information into the learning process. A graph is built for each class from the pre-classified training documents and structural information in the graphs is used to calculate the supervised term weights and to define the groups for structured regularization. Experimental results for six text classification tasks show the increase in text classification accuracy with the utilisation of structural information in text for both weighting and regularization. Using graph-based text representation for supervised term weighting and structured regularization can build a compact model with considerable improvement in the performance of text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://disi.unitn.it/moschitti/corpora.htm.

References

Aggarwal, C.C.: Data Classification: Algorithms and Applications. CRC Press, Boca Raton (2014)
Book Google Scholar
Bakin, S., et al.: Adaptive regression and model selection in data mining problems (1999)
Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008)
Article Google Scholar
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York (2007)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)
Article Google Scholar
Lewis, D.D.: Representation quality in text classification: an introduction and experiment. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)
Google Scholar
Martins, A.F., Smith, N.A., Aguiar, P.M., Figueiredo, M.A.: Structured sparsity in structured prediction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1500–1511 (2011)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34, 1–47 (2002)
Article Google Scholar
Shanavas, N., Wang, H., Lin, Z., Hawe, G.: Centrality-based approach for supervised term weighting. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 1261–1268. IEEE (2016)
Google Scholar
Skianis, K., Rousseau, F., Vazirgiannis, M.: Regularizing text categorization with clusters of words. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1827–1837 (2016)
Google Scholar
Skianis, K., Tziortziotis, N., Vazirgiannis, M.: Orthogonal matching pursuit for text classification. In: Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pp. 93–103 (2018)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Yogatama, D., Smith, N.: Making the most of bag of words: sentence regularization with alternating direction method of multipliers. In: International Conference on Machine Learning, pp. 656–664 (2014)
Google Scholar
Yogatama, D., Smith, N.A.: Linguistic structured sparsity in text categorization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 786–796 (2014)
Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Article MathSciNet Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Ulster University, Jordanstown, UK
Niloofer Shanavas, Hui Wang, Zhiwei Lin & Glenn Hawe

Authors

Niloofer Shanavas
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Glenn Hawe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niloofer Shanavas .

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Salford, Salford, UK
Farid Meziane
University of Salford, Salford, UK
Sunil Vadera
Oakland University, Rochester, MI, USA
Vijayan Sugumaran
CSE, University of Salford, Salford, UK
Mohamad Saraee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shanavas, N., Wang, H., Lin, Z., Hawe, G. (2019). Structure-Based Supervised Term Weighting and Regularization for Text Classification. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds) Natural Language Processing and Information Systems. NLDB 2019. Lecture Notes in Computer Science(), vol 11608. Springer, Cham. https://doi.org/10.1007/978-3-030-23281-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-23281-8_9
Published: 21 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23280-1
Online ISBN: 978-3-030-23281-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics