research-article

User Identity Linkage in Social Media Using Linguistic and Social Interaction Features

Authors:
Despoina Chatzakou

Information Technologies Institute Centre for Research and Technology Hellas, Greece

Information Technologies Institute Centre for Research and Technology Hellas, Greece
View Profile

,
Juan Soler-Company

Pompeu Fabra University, Spain

Pompeu Fabra University, Spain
View Profile

,
Theodora Tsikrika

Information Technologies Institute CERTH, Greece

Information Technologies Institute CERTH, Greece
View Profile

,
Leo Wanner

ICREA and University Pompeu Fabra, Spain

ICREA and University Pompeu Fabra, Spain
View Profile

,
Stefanos Vrochidis

Information Technologies Institute, Greece

Information Technologies Institute, Greece
View Profile

,
Ioannis Kompatsiaris

CERTH - ITI, Greece

CERTH - ITI, Greece
View Profile

WebSci '20: Proceedings of the 12th ACM Conference on Web ScienceJuly 2020Pages 295–304https://doi.org/10.1145/3394231.3397920

Published:06 July 2020Publication History

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

Pages 295–304

ABSTRACT

Social media users often hold several accounts in their effort to multiply the spread of their thoughts, ideas, and viewpoints. In the particular case of objectionable content, users tend to create multiple accounts to bypass the combating measures enforced by social media platforms and thus retain their online identity even if some of their accounts are suspended. User identity linkage aims to reveal social media accounts likely to belong to the same natural person so as to prevent the spread of abusive/illegal activities. To this end, this work proposes a machine learning-based detection model, which uses multiple attributes of users’ online activity in order to identify whether two or more virtual identities belong to the same real natural person. The models efficacy is demonstrated on two cases on abusive and terrorism-related Twitter content.

Supplemental Material

3394231.3397920.mp4

mp4

98 MB

Download

References

Meghan J Babcock, Vivian P Ta, and William Ickes. 2014. Latent semantic similarity and language style matching in initial dyadic interactions. Journal of Language and Social Psychology 33, 1 (2014), 78–88.Google ScholarCross Ref
Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 759–760.Google ScholarDigital Library
John D Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1301–1309.Google Scholar
Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 13–22.Google ScholarDigital Library
Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Measuring #gamergate: A tale of hate, sexism, and bullying. In Proceedings of the 26th International Conference on World Wide Web Companion. IW3C2, 1285–1290.Google ScholarDigital Library
Maura Conway, Moign Khawaja, Suraj Lakhani, Jeremy Reffin, Andrew Robertson, and David Weir. 2018. Disrupting Daesh: measuring takedown of online terrorist material and its impacts. Studies in Conflict & Terrorism(2018), 1–20.Google Scholar
Ali Fisher. 2015. Swarmcast: How jihadist networks maintain a persistent online presence. Perspectives on Terrorism 9, 3 (2015), 3–20.Google Scholar
Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2019. A Unified Deep Learning Architecture for Abuse Detection. In Proceedings of the 2019 ACM on Web Science Conference (to appear). ACM.Google ScholarDigital Library
[9] Gephi.2019. https://gephi.org/.Google Scholar
Ilias Gialampoukidis, George Kalpakis, Theodora Tsikrika, Symeon Papadopoulos, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2017. Detection of terrorism-related twitter communities using centrality scores. In Proceedings of the 2nd International Workshop on Multimedia Forensics and Security. ACM, 21–25.Google ScholarDigital Library
Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 447–458.Google ScholarDigital Library
Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P Gummadi. 2015. On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1799–1808.Google ScholarDigital Library
Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168–177.Google ScholarDigital Library
Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2013. Detecting multiple aliases in social media. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, 1004–1011.Google ScholarDigital Library
Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. 2015. Timeprints for identifying social media users with multiple aliases. Security Informatics 4, 1 (2015), 7.Google ScholarCross Ref
Imrul Kayes, Nicolas Kourtellis, Daniele Quercia, Adriana Iamnitchi, and Francesco Bonchi. 2015. The Social World of Content Abusers in Community Question Answering. In Proceedings of the 24th International Conference on World Wide Web. IW3C2, 570–580.Google ScholarDigital Library
Ji-Hyun Kim. 2009. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational statistics & data analysis 53, 11 (2009), 3735–3745.Google Scholar
Jytte Klausen. 2015. Tweeting the Jihad: Social media networks of Western foreign fighters in Syria and Iraq. Studies in Conflict & Terrorism 38, 1 (2015), 1–22.Google ScholarCross Ref
Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian. 2017. An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th International Conference on World Wide Web. IW3C2, 857–866.Google ScholarDigital Library
Li Liu, William K Cheung, Xin Li, and Lejian Liao. 2016. Aligning Users across Social Networks Using Network Embedding. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. AAAI Press, 1774–1780.Google Scholar
Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 International Conference on Management of data. ACM, 51–62.Google ScholarDigital Library
Anshu Malhotra, Luam Totti, Wagner Meira Jr, Ponnurangam Kumaraguru, and Virgilio Almeida. 2012. Studying user footprints in different online social networks. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining. IEEE Computer Society, 1065–1070.Google ScholarDigital Library
Adrienne Massanari. 2017. #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society(2017), 329–346.Google Scholar
Ryan McDonald, Kevin Lerman, and Fernando Pereira. 2006. Multilingual dependency analysis with a two-stage discriminative parser. In Proceedings of the 10th Conference on Computational Natural Language Learning. ACL, 216–220.Google ScholarCross Ref
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781(2013).Google Scholar
Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model.. In Interspeech, Vol. 2. 3.Google Scholar
Xin Mu, Feida Zhu, Ee-Peng Lim, Jing Xiao, Jianzong Wang, and Zhi-Hua Zhou. 2016. User identity linkage by latent user space modelling. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining. ACM, 1775–1784.Google ScholarDigital Library
Arjun Mukherjee and Bing Liu. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 conference on Empirical Methods in natural Language Processing. Association for Computational Linguistics, 207–217.Google ScholarDigital Library
Shuyo Nakatani. 2010. Language Detection Library for Java. https://github.com/shuyo/language-detectionGoogle Scholar
Gonzalo Navarro. 2001. A Guided Tour to Approximate String Matching. Comput. Surveys 33, 1 (2001), 31–88.Google ScholarDigital Library
Yuanping Nie, Yan Jia, Shudong Li, Xiang Zhu, Aiping Li, and Bin Zhou. 2016. Identifying users across social networks based on dynamic core interests. Neurocomputing 210(2016), 107–115.Google ScholarDigital Library
Leonardo Nizzoli, Marco Avvenuti, Stefano Cresci, and Maurizio Tesconi. 2019. Extremist Propaganda Tweet Classification with Deep Learning in Realistic Scenarios. In Proceedings of the 10th ACM Conference on Web Science. 203–204.Google ScholarDigital Library
Jan Pennekamp, Martin Henze, Oliver Hohlfeld, and Andriy Panchenko. 2019. Hi DoppelgäNger : Towards Detecting Manipulation in News Comments. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, 197–205.Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 1532–1543.Google ScholarCross Ref
Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking users across domains with location data: Theory and validation. In Proceedings of the 25th International Conference on World Wide Web. IW3C2, 707–719.Google ScholarDigital Library
Juan Soler-Company and Leo Wanner. 2017. On the relevance of syntactic and discourse features for author profiling and identification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. ACL, 681–687.Google Scholar
Abu Bakr Soliman, Kareem Eissa, and Samhaa R El-Beltagy. 2017. Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Computer Science 117 (2017), 256–265.Google ScholarCross Ref
Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in Social Media. ACL, 59–68.Google Scholar
Statista. 2019. Number of monthly active Twitter users worldwide from 1st quarter 2010 to 1st quarter 2019. goo.gl/JLy8Ko.Google Scholar
[40] Theano.2019. http://deeplearning.net/software/theano/.Google Scholar
Michail Tsikerdekis and Sherali Zeadally. 2014. Multiple account identity deception detection in social media using nonverbal behavior. IEEE Transactions on Information Forensics and Security 9, 8(2014), 1311–1321.Google ScholarDigital Library
Twitter Public Policy. 2018. Expanding and building #TwitterTransparency. http://bit.ly/2SIHGNf.Google Scholar
Queenie Wong. 2019. Facebook pulls down fake accounts from the UK and Romania. cnet.co/2w81ZJd.Google Scholar
Reza Zafarani and Huan Liu. 2013. Connecting users across social media sites: a behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 41–49.Google ScholarDigital Library

Recommendations

Social media user classification: based on social capital expectation, susceptibility, and compulsion loop
ICEC '17: Proceedings of the International Conference on Electronic Commerce

Social media such as Facebook, Instagram and Twitter are originally developed as communication tools among individuals for private conversations. Through the platforms, people share photos, stories and news with their social media friends to interact ...
Read More
What is Twitter, a social network or a news media?
WWW '10: Proceedings of the 19th international conference on World wide web

Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal ...
Read More
Social Media Marketing in Luxury Retail

This study examines the potentials of social media marketing for luxury retailers. Social media marketing tactics of three luxury retail brands Barneys New York, Net-a-Porter.com, and Saks Fifth Avenue were examined across three major social media sites ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

WebSci '20: Proceedings of the 12th ACM Conference on Web Science
July 2020
361 pages
ISBN:9781450379892
DOI:10.1145/3394231

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 July 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Abusive and Illegal content
Actor identity resolution
Twitter
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate218of875submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 253
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

User Identity Linkage in Social Media Using Linguistic and Social Interaction Features

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Social media user classification: based on social capital expectation, susceptibility, and compulsion loop

What is Twitter, a social network or a news media?

Social Media Marketing in Luxury Retail

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

User Identity Linkage in Social Media Using Linguistic and Social Interaction Features

WebSci '20: Proceedings of the 12th ACM Conference on Web Science

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

Social media user classification: based on social capital expectation, susceptibility, and compulsion loop

What is Twitter, a social network or a news media?

Social Media Marketing in Luxury Retail

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media