skip to main content
10.1145/3394231.3397920acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

User Identity Linkage in Social Media Using Linguistic and Social Interaction Features

Published:06 July 2020Publication History

ABSTRACT

Social media users often hold several accounts in their effort to multiply the spread of their thoughts, ideas, and viewpoints. In the particular case of objectionable content, users tend to create multiple accounts to bypass the combating measures enforced by social media platforms and thus retain their online identity even if some of their accounts are suspended. User identity linkage aims to reveal social media accounts likely to belong to the same natural person so as to prevent the spread of abusive/illegal activities. To this end, this work proposes a machine learning-based detection model, which uses multiple attributes of users’ online activity in order to identify whether two or more virtual identities belong to the same real natural person. The models efficacy is demonstrated on two cases on abusive and terrorism-related Twitter content.

Skip Supplemental Material Section

Supplemental Material

3394231.3397920.mp4

mp4

98 MB

References

  1. Meghan J Babcock, Vivian P Ta, and William Ickes. 2014. Latent semantic similarity and language style matching in initial dyadic interactions. Journal of Language and Social Psychology 33, 1 (2014), 78–88.Google ScholarGoogle ScholarCross RefCross Ref
  2. Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 759–760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. John D Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1301–1309.Google ScholarGoogle Scholar
  4. Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 13–22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Measuring #gamergate: A tale of hate, sexism, and bullying. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 1285–1290.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Maura Conway, Moign Khawaja, Suraj Lakhani, Jeremy Reffin, Andrew Robertson, and David Weir. 2018. Disrupting Daesh: measuring takedown of online terrorist material and its impacts. Studies in Conflict & Terrorism(2018), 1–20.Google ScholarGoogle Scholar
  7. Ali Fisher. 2015. Swarmcast: How jihadist networks maintain a persistent online presence. Perspectives on Terrorism 9, 3 (2015), 3–20.Google ScholarGoogle Scholar
  8. Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2019. A Unified Deep Learning Architecture for Abuse Detection. In Proceedings of the 2019 ACM on Web Science Conference (to appear). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Gephi.2019. https://gephi.org/.Google ScholarGoogle Scholar
  10. Ilias Gialampoukidis, George Kalpakis, Theodora Tsikrika, Symeon Papadopoulos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2017. Detection of terrorism-related twitter communities using centrality scores. In Proceedings of the 2nd International Workshop on Multimedia Forensics and Security. ACM, 21–25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 447–458.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P Gummadi. 2015. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1799–1808.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168–177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2013. Detecting multiple aliases in social media. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, 1004–1011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2015. Timeprints for identifying social media users with multiple aliases. Security Informatics 4, 1 (2015), 7.Google ScholarGoogle ScholarCross RefCross Ref
  16. Imrul Kayes, Nicolas Kourtellis, Daniele Quercia, Adriana Iamnitchi, and Francesco Bonchi. 2015. The Social World of Content Abusers in Community Question Answering. In Proceedings of the 24th International Conference on World Wide Web. IW3C2, 570–580.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ji-Hyun Kim. 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational statistics & data analysis 53, 11 (2009), 3735–3745.Google ScholarGoogle Scholar
  18. Jytte Klausen. 2015. Tweeting the Jihad: Social media networks of Western foreign fighters in Syria and Iraq. Studies in Conflict & Terrorism 38, 1 (2015), 1–22.Google ScholarGoogle ScholarCross RefCross Ref
  19. Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian. 2017. An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th International Conference on World Wide Web. IW3C2, 857–866.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li Liu, William K Cheung, Xin Li, and Lejian Liao. 2016. Aligning Users across Social Networks Using Network Embedding. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. AAAI Press, 1774–1780.Google ScholarGoogle Scholar
  21. Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 International Conference on Management of data. ACM, 51–62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Anshu Malhotra, Luam Totti, Wagner Meira Jr, Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user footprints in different online social networks. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining. IEEE Computer Society, 1065–1070.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Adrienne Massanari. 2017. #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society(2017), 329–346.Google ScholarGoogle Scholar
  24. Ryan McDonald, Kevin Lerman, and Fernando Pereira. 2006. Multilingual dependency analysis with a two-stage discriminative parser. In Proceedings of the 10th Conference on Computational Natural Language Learning. ACL, 216–220.Google ScholarGoogle ScholarCross RefCross Ref
  25. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781(2013).Google ScholarGoogle Scholar
  26. Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model.. In Interspeech, Vol. 2. 3.Google ScholarGoogle Scholar
  27. Xin Mu, Feida Zhu, Ee-Peng Lim, Jing Xiao, Jianzong Wang, and Zhi-Hua Zhou. 2016. User identity linkage by latent user space modelling. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining. ACM, 1775–1784.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Arjun Mukherjee and Bing Liu. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 conference on Empirical Methods in natural Language Processing. Association for Computational Linguistics, 207–217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shuyo Nakatani. 2010. Language Detection Library for Java. https://github.com/shuyo/language-detectionGoogle ScholarGoogle Scholar
  30. Gonzalo Navarro. 2001. A Guided Tour to Approximate String Matching. Comput. Surveys 33, 1 (2001), 31–88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yuanping Nie, Yan Jia, Shudong Li, Xiang Zhu, Aiping Li, and Bin Zhou. 2016. Identifying users across social networks based on dynamic core interests. Neurocomputing 210(2016), 107–115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Leonardo Nizzoli, Marco Avvenuti, Stefano Cresci, and Maurizio Tesconi. 2019. Extremist Propaganda Tweet Classification with Deep Learning in Realistic Scenarios. In Proceedings of the 10th ACM Conference on Web Science. 203–204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jan Pennekamp, Martin Henze, Oliver Hohlfeld, and Andriy Panchenko. 2019. Hi DoppelgäNger : Towards Detecting Manipulation in News Comments. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, 197–205.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1532–1543.Google ScholarGoogle ScholarCross RefCross Ref
  35. Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking users across domains with location data: Theory and validation. In Proceedings of the 25th International Conference on World Wide Web. IW3C2, 707–719.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Juan Soler-Company and Leo Wanner. 2017. On the relevance of syntactic and discourse features for author profiling and identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. ACL, 681–687.Google ScholarGoogle Scholar
  37. Abu Bakr Soliman, Kareem Eissa, and Samhaa R El-Beltagy. 2017. Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Computer Science 117 (2017), 256–265.Google ScholarGoogle ScholarCross RefCross Ref
  38. Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in Social Media. ACL, 59–68.Google ScholarGoogle Scholar
  39. Statista. 2019. Number of monthly active Twitter users worldwide from 1st quarter 2010 to 1st quarter 2019. goo.gl/JLy8Ko.Google ScholarGoogle Scholar
  40. [40] Theano.2019. http://deeplearning.net/software/theano/.Google ScholarGoogle Scholar
  41. Michail Tsikerdekis and Sherali Zeadally. 2014. Multiple account identity deception detection in social media using nonverbal behavior. IEEE Transactions on Information Forensics and Security 9, 8(2014), 1311–1321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Twitter Public Policy. 2018. Expanding and building #TwitterTransparency. http://bit.ly/2SIHGNf.Google ScholarGoogle Scholar
  43. Queenie Wong. 2019. Facebook pulls down fake accounts from the UK and Romania. cnet.co/2w81ZJd.Google ScholarGoogle Scholar
  44. Reza Zafarani and Huan Liu. 2013. Connecting users across social media sites: a behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 41–49.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WebSci '20: Proceedings of the 12th ACM Conference on Web Science
    July 2020
    361 pages
    ISBN:9781450379892
    DOI:10.1145/3394231

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 6 July 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate218of875submissions,25%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format