Inappropriate Text Detection and Rephrasing Using NLP

Jain, Sanyam; Tripathy, B. K.

doi:10.1007/978-3-031-53731-8_21

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2030))

Included in the following conference series:

International Conference on Soft Computing and its Engineering Applications

291 Accesses

Abstract

The impact of offensive language on public and professional discourse highlights the need for efficient mitigating measures. Cutting-edge computational linguistic techniques were used to identify and treat such language in a novel way. A two-pronged mechanism is used when hazardous content is found: offending terminology is either removed or put through Natural Language Pre-processing, producing rephrased information that maintains the original meaning of the text. Additionally, this work uses two freely accessible datasets for text categorization. The technique is unique, because during the rephrasing stage, we consider the incorrect words to get their synonyms, and we choose to fit for replacement in the phrase. Classification best accuracy we have achieved of about 95%. The method is comprehensive and aims to create a setting that encourages courteous and peaceful discussion while maintaining semantic integrity. This research provides a sophisticated approach to fostering meaningful relationships in both public and professional contexts by fully addressing incorrect language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Natural Language Processing Tool for White Collar Crime Investigation

A Comprehensive Review on Transforming Security and Privacy with NLP

A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges

Article Open access 12 January 2022

References

Yenala, H., Jhanwar, A., Chinnakotla, M.K., Goyal, J.: Deep learning for detecting inappropriate content in text. Inter. J. Data Sci. Anal. 6, 273–286 (2018)
Article Google Scholar
Xu, Z., Zhu, S.: Filtering offensive language in online communities using grammatical relations. In: Proceedings of the Seventh Annual CEAS 2010 (2010)
Google Scholar
Parnell, A.C., González-Castro, V., Alaiz-Rodríguez, R., et al.: Machine Learning techniques for the detection of inappropriate erotic content in text. Inter. J. Comput. Intell. Syst. 13(1), 591 (2020) ISSN 1875–6883
Google Scholar
Yousaf, K., Nawaz, T.: A deep learning-based approach for inappropriate content detection and classification of youtube videos. IEEE Access 10, 16283–16298 (2022). https://doi.org/10.1109/ACCESS.2022.3147519
Article Google Scholar
Wazir, A.S.B., Karim, H.A., Lyn, H.S., Ahmad Fauzi, M.F., Mansor, S., Lye, M.H.: Deep learning-based detection of inappropriate speech content for film censorship. IEEE Access 10, 101697–101715 (2022). doi: https://doi.org/10.1109/ACCESS.2022.3208921
Golem, V., Karan, M., Šnajder, J.: Combining shallow and deep learning for aggressive text detection. In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 188–198 (August 2018)
Google Scholar
Papadamou, K., et al.: Disturbed youtube for kids: characterizing and detecting inappropriate videos targeting young children. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 14(1), pp. 522–533 (2020). https://doi.org/10.1609/icwsm.v14i1.7320
Endang, W.P., Patti, V.: Cross-domain and cross-lingual abusive language detection: a hybrid approach with deep learning and a multilingual lexicon. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (2019)
Google Scholar
Shah, F., Anwar, A., ul haq, I., AlSalman, H., Hussain, S., Al-Hadhrami, S.: Artificial Intelligence as a Service for Immoral Content Detection and Eradication (2022)
Google Scholar
Chen, H., McKeever, S., Delany, S.J.: The use of deep learning distributed representations in the identification of abusive text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 13(01), pp. 125–133 (2019). https://doi.org/10.1609/icwsm.v13i01.3215
Kaur, S., Singh, S., Kaushal, S.: Abusive content detection in online userGenerated data: a survey, Procedia Comput. Sci. 189, 274- 281 (2021). ISSN 1877–0509,
Google Scholar
Lee, Y., Yoon, S., Jung, K.: Comparative studies of detecting abusive language on twitter. arXiv preprint arXiv:1808.10245 (2018)
Kompally, P., Sethuraman, S.C., Walczak, S., Johnson, S., Cruz, M.V.: Malang: a decentralized deep learning approach for detecting abusive textual content. Appl. Sci. 11(18), 8701 (2021)
Article Google Scholar
Pitsilis, G.K., Ramampiaro, H., Langseth, H.:Detecting offensive language in tweets using deep learning. arXiv preprint arXiv:1801.04433 (2018)
Chen, H., McKeever, S., Delany, S.J.: Abusive text detection using neural networks. In: AICS (2017)
Google Scholar
Urrutia Zubikarai, A.: Appled NLP and ML for the detection of inappropiarte text in a communications platform. MS thesis. Universitat Politècnica de Catalunya (2020)
Google Scholar
Tripathy, B.K.: Audio to Indian sign language interpreter (AISLI) using machine translation and NLP techniques. In: Hybrid Computational Intelligent Systems. pp. 189–200. CRC Press (2023)
Google Scholar
Cjadams, J.S., Elliott, J., Dixon, L., Mark McDonald, N., et al.: Toxic Comment Classification Challenge. Kaggle (2017). https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge
Samoshyn, A.: Hate Speech and Offensive Language Dataset. Kaggle (2020). https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset
Nicapotato Bad Bad Words. Kaggle (2017). https://www.kaggle.com/datasets/nicapotato/bad-bad-words

Download references

Author information

Authors and Affiliations

School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
Sanyam Jain & B. K. Tripathy

Authors

Sanyam Jain
View author publications
You can also search for this author in PubMed Google Scholar
B. K. Tripathy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. K. Tripathy .

Editor information

Editors and Affiliations

Charotar University of Science and Technology, Changa, India
Kanubhai K. Patel
University of South Dakota, Vermillion, SD, USA
KC Santosh
Charotar University of Science and Technology, Changa, India
Atul Patel
Indian Statistical Institute, Kolkata, India
Ashish Ghosh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, S., Tripathy, B.K. (2024). Inappropriate Text Detection and Rephrasing Using NLP. In: Patel, K.K., Santosh, K., Patel, A., Ghosh, A. (eds) Soft Computing and Its Engineering Applications. icSoftComp 2023. Communications in Computer and Information Science, vol 2030. Springer, Cham. https://doi.org/10.1007/978-3-031-53731-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-53731-8_21
Published: 12 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53730-1
Online ISBN: 978-3-031-53731-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics