ABSTRACT
Social media users often hold several accounts in their effort to multiply the spread of their thoughts, ideas, and viewpoints. In the particular case of objectionable content, users tend to create multiple accounts to bypass the combating measures enforced by social media platforms and thus retain their online identity even if some of their accounts are suspended. User identity linkage aims to reveal social media accounts likely to belong to the same natural person so as to prevent the spread of abusive/illegal activities. To this end, this work proposes a machine learning-based detection model, which uses multiple attributes of users’ online activity in order to identify whether two or more virtual identities belong to the same real natural person. The models efficacy is demonstrated on two cases on abusive and terrorism-related Twitter content.
Supplemental Material
- Meghan J Babcock, Vivian P Ta, and William Ickes. 2014. Latent semantic similarity and language style matching in initial dyadic interactions. Journal of Language and Social Psychology 33, 1 (2014), 78–88.Google ScholarCross Ref
- Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 759–760.Google ScholarDigital Library
- John D Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1301–1309.Google Scholar
- Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 13–22.Google ScholarDigital Library
- Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Measuring #gamergate: A tale of hate, sexism, and bullying. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 1285–1290.Google ScholarDigital Library
- Maura Conway, Moign Khawaja, Suraj Lakhani, Jeremy Reffin, Andrew Robertson, and David Weir. 2018. Disrupting Daesh: measuring takedown of online terrorist material and its impacts. Studies in Conflict & Terrorism(2018), 1–20.Google Scholar
- Ali Fisher. 2015. Swarmcast: How jihadist networks maintain a persistent online presence. Perspectives on Terrorism 9, 3 (2015), 3–20.Google Scholar
- Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2019. A Unified Deep Learning Architecture for Abuse Detection. In Proceedings of the 2019 ACM on Web Science Conference (to appear). ACM.Google ScholarDigital Library
- [9] Gephi.2019. https://gephi.org/.Google Scholar
- Ilias Gialampoukidis, George Kalpakis, Theodora Tsikrika, Symeon Papadopoulos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2017. Detection of terrorism-related twitter communities using centrality scores. In Proceedings of the 2nd International Workshop on Multimedia Forensics and Security. ACM, 21–25.Google ScholarDigital Library
- Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 447–458.Google ScholarDigital Library
- Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P Gummadi. 2015. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1799–1808.Google ScholarDigital Library
- Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168–177.Google ScholarDigital Library
- Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2013. Detecting multiple aliases in social media. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, 1004–1011.Google ScholarDigital Library
- Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2015. Timeprints for identifying social media users with multiple aliases. Security Informatics 4, 1 (2015), 7.Google ScholarCross Ref
- Imrul Kayes, Nicolas Kourtellis, Daniele Quercia, Adriana Iamnitchi, and Francesco Bonchi. 2015. The Social World of Content Abusers in Community Question Answering. In Proceedings of the 24th International Conference on World Wide Web. IW3C2, 570–580.Google ScholarDigital Library
- Ji-Hyun Kim. 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational statistics & data analysis 53, 11 (2009), 3735–3745.Google Scholar
- Jytte Klausen. 2015. Tweeting the Jihad: Social media networks of Western foreign fighters in Syria and Iraq. Studies in Conflict & Terrorism 38, 1 (2015), 1–22.Google ScholarCross Ref
- Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian. 2017. An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th International Conference on World Wide Web. IW3C2, 857–866.Google ScholarDigital Library
- Li Liu, William K Cheung, Xin Li, and Lejian Liao. 2016. Aligning Users across Social Networks Using Network Embedding. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. AAAI Press, 1774–1780.Google Scholar
- Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 International Conference on Management of data. ACM, 51–62.Google ScholarDigital Library
- Anshu Malhotra, Luam Totti, Wagner Meira Jr, Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user footprints in different online social networks. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining. IEEE Computer Society, 1065–1070.Google ScholarDigital Library
- Adrienne Massanari. 2017. #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society(2017), 329–346.Google Scholar
- Ryan McDonald, Kevin Lerman, and Fernando Pereira. 2006. Multilingual dependency analysis with a two-stage discriminative parser. In Proceedings of the 10th Conference on Computational Natural Language Learning. ACL, 216–220.Google ScholarCross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781(2013).Google Scholar
- Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model.. In Interspeech, Vol. 2. 3.Google Scholar
- Xin Mu, Feida Zhu, Ee-Peng Lim, Jing Xiao, Jianzong Wang, and Zhi-Hua Zhou. 2016. User identity linkage by latent user space modelling. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining. ACM, 1775–1784.Google ScholarDigital Library
- Arjun Mukherjee and Bing Liu. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 conference on Empirical Methods in natural Language Processing. Association for Computational Linguistics, 207–217.Google ScholarDigital Library
- Shuyo Nakatani. 2010. Language Detection Library for Java. https://github.com/shuyo/language-detectionGoogle Scholar
- Gonzalo Navarro. 2001. A Guided Tour to Approximate String Matching. Comput. Surveys 33, 1 (2001), 31–88.Google ScholarDigital Library
- Yuanping Nie, Yan Jia, Shudong Li, Xiang Zhu, Aiping Li, and Bin Zhou. 2016. Identifying users across social networks based on dynamic core interests. Neurocomputing 210(2016), 107–115.Google ScholarDigital Library
- Leonardo Nizzoli, Marco Avvenuti, Stefano Cresci, and Maurizio Tesconi. 2019. Extremist Propaganda Tweet Classification with Deep Learning in Realistic Scenarios. In Proceedings of the 10th ACM Conference on Web Science. 203–204.Google ScholarDigital Library
- Jan Pennekamp, Martin Henze, Oliver Hohlfeld, and Andriy Panchenko. 2019. Hi DoppelgäNger : Towards Detecting Manipulation in News Comments. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, 197–205.Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1532–1543.Google ScholarCross Ref
- Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking users across domains with location data: Theory and validation. In Proceedings of the 25th International Conference on World Wide Web. IW3C2, 707–719.Google ScholarDigital Library
- Juan Soler-Company and Leo Wanner. 2017. On the relevance of syntactic and discourse features for author profiling and identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. ACL, 681–687.Google Scholar
- Abu Bakr Soliman, Kareem Eissa, and Samhaa R El-Beltagy. 2017. Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Computer Science 117 (2017), 256–265.Google ScholarCross Ref
- Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in Social Media. ACL, 59–68.Google Scholar
- Statista. 2019. Number of monthly active Twitter users worldwide from 1st quarter 2010 to 1st quarter 2019. goo.gl/JLy8Ko.Google Scholar
- [40] Theano.2019. http://deeplearning.net/software/theano/.Google Scholar
- Michail Tsikerdekis and Sherali Zeadally. 2014. Multiple account identity deception detection in social media using nonverbal behavior. IEEE Transactions on Information Forensics and Security 9, 8(2014), 1311–1321.Google ScholarDigital Library
- Twitter Public Policy. 2018. Expanding and building #TwitterTransparency. http://bit.ly/2SIHGNf.Google Scholar
- Queenie Wong. 2019. Facebook pulls down fake accounts from the UK and Romania. cnet.co/2w81ZJd.Google Scholar
- Reza Zafarani and Huan Liu. 2013. Connecting users across social media sites: a behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 41–49.Google ScholarDigital Library
Recommendations
Social media user classification: based on social capital expectation, susceptibility, and compulsion loop
ICEC '17: Proceedings of the International Conference on Electronic CommerceSocial media such as Facebook, Instagram and Twitter are originally developed as communication tools among individuals for private conversations. Through the platforms, people share photos, stories and news with their social media friends to interact ...
What is Twitter, a social network or a news media?
WWW '10: Proceedings of the 19th international conference on World wide webTwitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal ...
Social Media Marketing in Luxury Retail
This study examines the potentials of social media marketing for luxury retailers. Social media marketing tactics of three luxury retail brands Barneys New York, Net-a-Porter.com, and Saks Fifth Avenue were examined across three major social media sites ...
Comments