skip to main content
research-article

Detecting Arabic Spam Reviews in Social Networks Based on Classification Algorithms

Published:01 November 2021Publication History
Skip Abstract Section

Abstract

Reviews or comments that users leave on social media have great importance for companies and business entities. New product ideas can be evaluated based on customer reactions. However, this use of social media is complicated by those who post spam on social media in the form of reviews and comments.

Designing methodologies to automatically detect and block social media spam is complicated by the fact that spammers continuously develop new ways to leave their spam comments. Researchers have proposed several methods to detect English spam reviews. However, few studies have been conducted to detect Arabic spam reviews. This article proposes a keyword-based method for detecting Arabic spam reviews. Keywords or Features are subsets of words from the original text that are labelled as important. A term's weight, Term Frequency–Inverse Document Frequency (TF-IDF) matrix, and filter methods (such as information gain, chi-squared, deviation, correlation, and uncertainty) have been used to extract keywords from Arabic text.

The method proposed in this article detects Arabic spam in Facebook comments. The dataset consists of 3,000 Arabic comments extracted from Facebook pages. Four different machine learning algorithms are used in the detection process, including C4.5, kNN, SVM, and Naïve Bayes classifiers. The results show that the Decision Tree classifier outperforms the other classification algorithms, with a detection accuracy of 92.63%.

REFERENCES

  1. [1] Liu B.. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies 5, 1 (2012), 1167.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Pang B. and Lee L.. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2 (1–2), 1135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Khanna P.. 2017. Sentiment analysis: An approach to opinion mining from Twitter data using r. International Journal of Advanced Research in Computer Science 8, 8 (2017), 252256.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Jindal N. and Liu B.. 2007. Review spam detection. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 11891190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Jindal N. and Liu B.. 2008. Opinion spam and analysis. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM’08). 219--230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Ma Y. and Li F.. 2012. Detecting review spam: Challenges and opportunities. In Proceedings of the 8th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 651--654.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Jadon E. and Sharma R.. 2017. Data mining: Document classification using naive Bayes classifier. International Journal of Computer Applications 167, 6 (2017), 1316.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Mhamed Mataoui et al. 2017. A proposed spam detection approach for Arabic social networks content. In International Conference on Mathematics and Information Technology (ICMIT’17). IEEE, 222--226.Google ScholarGoogle Scholar
  9. [9] Al-Kabi M., Wahsheh H., Alsmadi I., Al-Shawakfa E., Wahbeh A., and Al-Hmoud A.. 2012. Content-based analysis to detect Arabic web spam. Journal of Information Science 38, 3 (2012), 284296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Witten I., Frank E., Hall M. A., and Pal C. J.. 2016. Data Mining: Practical Machine Learning Tools and Techniques (4th ed.). Morgan Kaufmann, Burlington, MA.Google ScholarGoogle Scholar
  11. [11] Fürnkranz J.. 2017. Decision tree. In Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA, 330333.Google ScholarGoogle Scholar
  12. [12] Salloum S. A., AlHamad A. Q., Al-Emran M., and Shaalan K.. 2018. A survey of Arabic text mining. In Intelligent Natural Language Processing: Trends and Applications. K. Shaalan, A. Hassanien, F. Tolba (eds.). Studies in Computational Intelligence, Springer, Cham, 740.Google ScholarGoogle Scholar
  13. [13] Pang B. and Lee L.. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. arXiv preprint cs/0409058.Google ScholarGoogle Scholar
  14. [14] Saif H., He Y., and Alani H.. 2012. Semantic sentiment analysis of Twitter. In The Semantic Web (ISWC’12). Springer, Berlin, Heidelberg, 508--524.Google ScholarGoogle Scholar
  15. [15] Abdul-Mageed M., Diab M., and Kübler S.. 2014. SAMAR: Subjectivity and sentiment analysis for Arabic social media. Computer Speech & Language 28, 1 (2014), 2037.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] El-Halees A.. 2011. Arabic opinion mining using combined classification approach. In Proceedings of the International Arab Conference on Information Technology (ACIT’11), Naif Arab University for Security Science (NAUSS), Riyadh, Saudi Arabia. 1114.Google ScholarGoogle Scholar
  17. [17] Duwairi R. and Qarqaz I.. 2016. A framework for Arabic sentiment analysis using supervised classification. International Journal of Data Mining, Modelling and Management 8, 4 (2016), 369.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Song L., Lau R. Y., and Yin C.. 2014. Discriminative topic mining for social spam detection. In PACIS.Google ScholarGoogle Scholar
  19. [19] Lam H.-Y. and Yeung D.-Y.. 2007. A learning approach to spam detection based on social networks. In Proceedings of 4th Conference on Email and Anti-Spam (CEAS’07).Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Lupher A., Engle C., and Xin R.. 2012. Feature selection and classification of spam on social networking sites. Retrieved October 11, 2021 from http://bid.berkeley.edu/cs294-1-spring12/images/archive/6/6a/20120515031244!Spam-lupher-engle-xin.pdf.Google ScholarGoogle Scholar
  21. [21] Markines B., Cattuto C., and Menczer F.. 2009. Social spam detection. In Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb’09). 41--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Wang D., Irani D., and Pu C.. 2011. A social-spam detection framework. In Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS’11). 46--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Gao H., Hu J., Wilson C., Li Z., Chen Y., and Zhao B. Y.. 2010. Detecting and characterizing social spam campaigns. In Proceedings of the 10th Annual Conference on Internet Measurement (IMC’10). 35--47.Google ScholarGoogle Scholar
  24. [24] Wahsheh H. A., Al-Kabi M. N., and Alsmadi I. M.. 2012. A link and content hybrid approach for Arabic web spam detection. International Journal of Intelligent Systems and Applications 5, 1 (2012), 3043.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Wahsheh H., Al-Kabi M., and Alsmadi I.. 2012. Spam detection methods for Arabic web pages. In Proceedings of the 1st Taibah University International Conference on Computing and Information Technology (ICCIT'2), Al-Madinah Al-Munawwarah, Saudi Arabia. v2,(2012c). 486--490.Google ScholarGoogle Scholar
  26. [26] Abu Hammad A. and El-Halees A.. 2015. An approach for detecting spam in Arabic opinion reviews. International Arab Journal of Information Technology 12, 1 (2015), 916.Google ScholarGoogle Scholar
  27. [27] Aski A. and Sourati N.. 2016. Proposed efficient algorithm to filter spam using machine learning techniques. Pacific Science Review A: Natural Science and Engineering 18, 2 (2016), 145149.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Kaur G. and Gurm R. K.. 2016. A survey on classification techniques in Internet environment. In International Journal of Advanced Research in Computer and Communication Engineering 5, 3 (2016), 589593.Google ScholarGoogle Scholar
  29. [29] Shahariar G. M., Biswas S., Omar F., Shah F. M., and Hassan S. B.. 2019. Spam review detection using deep learning. In IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON’19). IEEE, 00270033.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Makkar A. and Kumar N.. 2020. An efficient deep learning-based scheme for web spam detection in IoT environment. Future Generation Computer Systems 108, 467487.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Roy P. K., Singh J. P., and Banerjee S.. 2020. Deep learning to filter SMS spam. Future Generation Computer Systems 102 (2020), 524533.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Qaiser S. and Ali R.. 2018. Text mining: Use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications. 181. DOI: 10.5120/ijca2018917395.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Detecting Arabic Spam Reviews in Social Networks Based on Classification Algorithms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 1
        January 2022
        442 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3494068
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 November 2021
        • Accepted: 1 July 2021
        • Revised: 1 June 2021
        • Received: 1 July 2020
        Published in tallip Volume 21, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format