research-article

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Authors:
Shobeir Fakhraei

University of Maryland, College Park, MD, USA

University of Maryland, College Park, MD, USA
View Profile

,
James Foulds

University of California, Santa Cruz, CA, USA

University of California, Santa Cruz, CA, USA
View Profile

,
Madhusudana Shashanka

if(we), San Francisco, CA, USA

if(we), San Francisco, CA, USA
View Profile

,
Lise Getoor

University of California, Santa Cruz, CA, USA

University of California, Santa Cruz, CA, USA
View Profile

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2015Pages 1769–1778https://doi.org/10.1145/2783258.2788606

Published:10 August 2015Publication History

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1769–1778

ABSTRACT

Detecting unsolicited content and the spammers who create it is a long-standing challenge that affects all of us on a daily basis. The recent growth of richly-structured social networks has provided new challenges and opportunities in the spam detection landscape. Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. We model a social network as a time-stamped multi-relational graph where vertices represent users, and edges represent different activities between them. To identify spammer accounts, our approach makes use of structural features, sequence modelling, and collective reasoning. We leverage relational sequence information using k-gram features and probabilistic modelling with a mixture of Markov models. Furthermore, in order to perform collective reasoning and improve the predictive power of a noisy abuse reporting system, we develop a statistical relational model using hinge-loss Markov random fields (HL-MRFs), a class of probabilistic graphical models which are highly scalable. We use Graphlab Create and Probabilistic Soft Logic (PSL) to prototype and experimentally evaluate our solutions on internet-scale data from Tagged.com. Our experiments demonstrate the effectiveness of our approach, and show that models which incorporate the multi-relational nature of the social network significantly gain predictive performance over those that do not.

Supplemental Material

p1769.mp4

mp4

241.5 MB

Download

References

Wikipedia. History of email spam -- Wikipedia, the free encyclopedia, 2014. URL http://en.wikipedia.org/wiki/History_of_email_spam.Google Scholar
Harold Nguyen. 2013 state of social media spam. Technical report, Nexgate. URL http://go.nexgate.com/nexgate-social-media-spam-research-report.Google Scholar
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, 1999.Google Scholar
J Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems (NIPS), 2005.Google Scholar
Tommy R Jensen and Bjarne Toft. Graph coloring problems. John Wiley & Sons, 2011.Google Scholar
S Pemmaraju and S Skiena. Implementing discrete mathematics: Combinatorics and graph theory with mathematica, 2003.Google ScholarCross Ref
Thomas Schank. Algorithmic aspects of triangle-based network analysis. Phd in computer science, University Karlsruhe, 2007.Google Scholar
Zhengzheng Xing, Jian Pei, and Eamonn Keogh. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter, 2010. Google ScholarDigital Library
Fuchun Peng, Dale Schuurmans, and Shaojun Wang. Augmenting naive Bayes classifiers with statistical language models. Information Retrieval, 2004. Google ScholarDigital Library
Fei Zheng and Geoffrey I Webb. Tree augmented naive Bayes. In Encyclopedia of Machine Learning, pages 990--991. Springer, 2010.Google ScholarCross Ref
Stephen H. Bach, Bert Huang, Ben London, and Lise Getoor. Hinge-loss Markov random fields: Convex inference for structured prediction. In Uncertainty in Artificial Intelligence (UAI), 2013.Google Scholar
S. H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic. arXiv:1505.04406 {cs.LG}, 2015.Google ScholarDigital Library
Jay Pujara, Hui Miao, Lise Getoor, and William Cohen. Knowledge graph identification. In International Semantic Web Conference (ISWC), 2013. Google ScholarDigital Library
Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daume III, and Lise Getoor. Learning latent engagement patterns of students in online courses. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI Press, 2014.Google ScholarDigital Library
Fakhraei, Huang, Raschid, and Getoor}fakhraei:tcbb14Shobeir Fakhraei, Bert Huang, Louiqa Raschid, and Lise Getoor. Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014\natexlaba. Google ScholarDigital Library
Shobeir Fakhraei, Louiqa Raschid, and Lise Getoor. Drug-target interaction prediction for drug repurposing with probabilistic similarity logic. In ACM SIGKDD 12th International Workshop on Data Mining in Bioinformatics (BIOKDD). ACM, 2013. Google ScholarDigital Library
Bert Huang, Angelika Kimmig, Lise Getoor, and Jennifer Golbeck. A flexible framework for probabilistic models of social trust. In International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction (SBP), 2013. Google ScholarDigital Library
Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. Network sampling: From static to streaming graphs. ACM Trans. Knowl. Discov. Data, 2013. Google ScholarDigital Library
Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006. Google ScholarDigital Library
Mohammad Al Hasan and Mohammed J. Zaki. Output space sampling for graph patterns. PVLDB, 2009. Google ScholarDigital Library
Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.Google Scholar
Enrico Blanzieri and Anton Bryl. A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 2008. Google ScholarDigital Library
Nikita Spirin and Jiawei Han. Survey on web spam detection: principles and algorithms. ACM SIGKDD Explorations Newsletter, 13 (2): 50--64, 2012. Google ScholarDigital Library
Nisheeth Shrivastava, Anirban Majumder, and Rajeev Rastogi. Mining (social) network graphs to detect random link attacks. In Data Engineering, 2008. ICDE 2008. IEEE 24th International conference on, pages 486--495. IEEE, 2008. Google ScholarDigital Library
Chi-Yao Tseng and Ming-Syan Chen. Incremental SVM model for spam detection on dynamic email social networks. In Computational Science and Engineering, 2009. CSE'09. International conference on, volume 4, pages 128--135. IEEE, 2009. Google ScholarDigital Library
P Oscar and VP Roychowdbury. Leveraging social networks to fight spam. IEEE Computer, 38 (4): 61--68, 2005. Google ScholarDigital Library
Luca Becchetti, Carlos Castillo, Debora Donato, Ricardo Baeza-Yates, and Stefano Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2008. Google ScholarDigital Library
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, and Fabrizio Silvestri. Know your neighbors: Web spam detection using the web topology. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007. Google ScholarDigital Library
yi et al.(2004)Gyöngyi, Garcia-Molina, and Pedersen}gyongyi2004combatingZoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. Combating web spam with trustrank. In Proceedings of the thirtieth international conference on very large data bases, pages 576--587. VLDB Endowment, 2004. Google ScholarDigital Library
Paul-Alexandru Chirita, Jörg Diederich, and Wolfgang Nejdl. Mailrank: using ranking for spam detection. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 373--380. ACM, 2005. Google ScholarDigital Library
Jacob Abernethy, Olivier Chapelle, and Carlos Castillo. Graph regularization methods for web spam detection. Machine Learning, 81 (2): 207--225, 2010. Google ScholarDigital Library
Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. Internet Computing, IEEE, 11 (6): 36--45, 2007. Google ScholarDigital Library
Xia Hu, Jiliang Tang, and Huan Liu. Leveraging knowledge across media for spammer detection in microblogging. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 2014. Google ScholarDigital Library
Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, and Yihong Zhao. Unik: unsupervised social network spam detection. In Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, 2013. Google ScholarDigital Library
Tao Stein, Erdong Chen, and Karan Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems. ACM, 2011. Google ScholarDigital Library
Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, and Ben Y Zhao. Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 35--47. ACM, 2010. Google ScholarDigital Library
Benjamin Markines, Ciro Cattuto, and Filippo Menczer. Social spam detection. In Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pages 41--48. ACM, 2009. Google ScholarDigital Library
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.Google Scholar
Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: social honeypotsGoogle Scholar
machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 435--442. ACM, 2010.Google ScholarDigital Library
Yin Zhu, Xiao Wang, Erheng Zhong, Nathan N Liu, He Li, and Qiang Yang. Discovering spammers in social networks. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.Google Scholar
Gueorgi Kossinets and Duncan J Watts. Empirical analysis of an evolving social network. Science, 2006.Google Scholar
Xin Jin, Cindy Xide Lin, Jiebo Luo, and Jiawei Han. Socialspamguard: A data mining-based spam detection system for social media networks. PVLDB, 2011.Google Scholar
Xianchao Zhang, Shaoping Zhu, and Wenxin Liang. Detecting spam and promoting campaigns in the twitter social network. In ICDM, pages 1194--1199, 2012. Google ScholarDigital Library
Garc\'ıa, and Bringas}laorden2012collectiveCarlos Laorden, Borja Sanz, Igor Santos, Patxi Galán-García, and Pablo G Bringas. Collective classification for spam filtering. Logic Journal of IGPL, 2012.Google Scholar
Guang-Gang Geng, Qiudan Li, and Xinchang Zhang. Link based small sample learning for web spam detection. In Proceedings of the 18th international conference on World wide web, pages 1185--1186. ACM, 2009. Google ScholarDigital Library
Mohamadali Torkamani and Daniel Lowd. Convex adversarial collective classification. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 642--650, 2013.Google Scholar
Fakhraei, Soltanian-Zadeh, and Fotouhi}fakhraei2014biasShobeir Fakhraei, Hamid Soltanian-Zadeh, and Farshad Fotouhi. Bias and stability of single variable classifiers for feature ranking and selection. Expert Systems with Applications, 41 (15): 6945 -- 6958, 2014\natexlabb. Google ScholarDigital Library

Index Terms

Collective Spammer Detection in Evolving Multi-Relational Social Networks
1. Computing methodologies
  1. Machine learning

Recommendations

Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Graph Neural Networks (GNNs) have been widely applied to fraud detection problems in recent years, revealing the suspiciousness of nodes by aggregating their neighborhood information via different relations. However, few prior works have noticed the ...
Read More
Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection
WWW '21: Proceedings of the Web Conference 2021

Graph-based fraud detection approaches have escalated lots of attention recently due to the abundant relational information of graph-structured data, which may be beneficial for the detection of fraudsters. However, the GNN-based algorithms could fare ...
Read More
DeepWalk: online learning of social representations
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
collective classification
graph mining
graphlab
heterogeneous networks
hinge-loss markov random fields (hl-mrf)
k-grams
multi-relational networks
probabilistic soft logic (psl)
sequence mining
social networks
social spam
spam
tree-augmented naive bayes
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 54
  Total Citations
  View Citations
- 981
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Collective Spammer Detection in Evolving Multi-Relational Social Networks

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters

Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection

DeepWalk: online learning of social representations