skip to main content
10.1145/2783258.2788606acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Published:10 August 2015Publication History

ABSTRACT

Detecting unsolicited content and the spammers who create it is a long-standing challenge that affects all of us on a daily basis. The recent growth of richly-structured social networks has provided new challenges and opportunities in the spam detection landscape. Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. We model a social network as a time-stamped multi-relational graph where vertices represent users, and edges represent different activities between them. To identify spammer accounts, our approach makes use of structural features, sequence modelling, and collective reasoning. We leverage relational sequence information using k-gram features and probabilistic modelling with a mixture of Markov models. Furthermore, in order to perform collective reasoning and improve the predictive power of a noisy abuse reporting system, we develop a statistical relational model using hinge-loss Markov random fields (HL-MRFs), a class of probabilistic graphical models which are highly scalable. We use Graphlab Create and Probabilistic Soft Logic (PSL) to prototype and experimentally evaluate our solutions on internet-scale data from Tagged.com. Our experiments demonstrate the effectiveness of our approach, and show that models which incorporate the multi-relational nature of the social network significantly gain predictive performance over those that do not.

Skip Supplemental Material Section

Supplemental Material

p1769.mp4

mp4

241.5 MB

References

  1. Wikipedia. History of email spam -- Wikipedia, the free encyclopedia, 2014. URL http://en.wikipedia.org/wiki/History_of_email_spam.Google ScholarGoogle Scholar
  2. Harold Nguyen. 2013 state of social media spam. Technical report, Nexgate. URL http://go.nexgate.com/nexgate-social-media-spam-research-report.Google ScholarGoogle Scholar
  3. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, 1999.Google ScholarGoogle Scholar
  4. J Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems (NIPS), 2005.Google ScholarGoogle Scholar
  5. Tommy R Jensen and Bjarne Toft. Graph coloring problems. John Wiley & Sons, 2011.Google ScholarGoogle Scholar
  6. S Pemmaraju and S Skiena. Implementing discrete mathematics: Combinatorics and graph theory with mathematica, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  7. Thomas Schank. Algorithmic aspects of triangle-based network analysis. Phd in computer science, University Karlsruhe, 2007.Google ScholarGoogle Scholar
  8. Zhengzheng Xing, Jian Pei, and Eamonn Keogh. A brief survey on sequence classification. ACM SIGKDD Explorations Newsletter, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fuchun Peng, Dale Schuurmans, and Shaojun Wang. Augmenting naive Bayes classifiers with statistical language models. Information Retrieval, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fei Zheng and Geoffrey I Webb. Tree augmented naive Bayes. In Encyclopedia of Machine Learning, pages 990--991. Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  11. Stephen H. Bach, Bert Huang, Ben London, and Lise Getoor. Hinge-loss Markov random fields: Convex inference for structured prediction. In Uncertainty in Artificial Intelligence (UAI), 2013.Google ScholarGoogle Scholar
  12. S. H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic. arXiv:1505.04406 {cs.LG}, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jay Pujara, Hui Miao, Lise Getoor, and William Cohen. Knowledge graph identification. In International Semantic Web Conference (ISWC), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daume III, and Lise Getoor. Learning latent engagement patterns of students in online courses. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI Press, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fakhraei, Huang, Raschid, and Getoor}fakhraei:tcbb14Shobeir Fakhraei, Bert Huang, Louiqa Raschid, and Lise Getoor. Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014\natexlaba. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shobeir Fakhraei, Louiqa Raschid, and Lise Getoor. Drug-target interaction prediction for drug repurposing with probabilistic similarity logic. In ACM SIGKDD 12th International Workshop on Data Mining in Bioinformatics (BIOKDD). ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bert Huang, Angelika Kimmig, Lise Getoor, and Jennifer Golbeck. A flexible framework for probabilistic models of social trust. In International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction (SBP), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nesreen K. Ahmed, Jennifer Neville, and Ramana Kompella. Network sampling: From static to streaming graphs. ACM Trans. Knowl. Discov. Data, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mohammad Al Hasan and Mohammed J. Zaki. Output space sampling for graph patterns. PVLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.Google ScholarGoogle Scholar
  22. Enrico Blanzieri and Anton Bryl. A survey of learning-based techniques of email spam filtering. Artificial Intelligence Review, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nikita Spirin and Jiawei Han. Survey on web spam detection: principles and algorithms. ACM SIGKDD Explorations Newsletter, 13 (2): 50--64, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nisheeth Shrivastava, Anirban Majumder, and Rajeev Rastogi. Mining (social) network graphs to detect random link attacks. In Data Engineering, 2008. ICDE 2008. IEEE 24th International conference on, pages 486--495. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chi-Yao Tseng and Ming-Syan Chen. Incremental SVM model for spam detection on dynamic email social networks. In Computational Science and Engineering, 2009. CSE'09. International conference on, volume 4, pages 128--135. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P Oscar and VP Roychowdbury. Leveraging social networks to fight spam. IEEE Computer, 38 (4): 61--68, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Luca Becchetti, Carlos Castillo, Debora Donato, Ricardo Baeza-Yates, and Stefano Leonardi. Link analysis for web spam detection. ACM Transactions on the Web (TWEB), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, and Fabrizio Silvestri. Know your neighbors: Web spam detection using the web topology. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. yi et al.(2004)Gyöngyi, Garcia-Molina, and Pedersen}gyongyi2004combatingZoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. Combating web spam with trustrank. In Proceedings of the thirtieth international conference on very large data bases, pages 576--587. VLDB Endowment, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Paul-Alexandru Chirita, Jörg Diederich, and Wolfgang Nejdl. Mailrank: using ranking for spam detection. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 373--380. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jacob Abernethy, Olivier Chapelle, and Carlos Castillo. Graph regularization methods for web spam detection. Machine Learning, 81 (2): 207--225, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. Internet Computing, IEEE, 11 (6): 36--45, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xia Hu, Jiliang Tang, and Huan Liu. Leveraging knowledge across media for spammer detection in microblogging. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, and Yihong Zhao. Unik: unsupervised social network spam detection. In Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tao Stein, Erdong Chen, and Karan Mangla. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, and Ben Y Zhao. Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 35--47. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Benjamin Markines, Ciro Cattuto, and Filippo Menczer. Social spam detection. In Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pages 41--48. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.Google ScholarGoogle Scholar
  39. Kyumin Lee, James Caverlee, and Steve Webb. Uncovering social spammers: social honeypotsGoogle ScholarGoogle Scholar
  40. machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 435--442. ACM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yin Zhu, Xiao Wang, Erheng Zhong, Nathan N Liu, He Li, and Qiang Yang. Discovering spammers in social networks. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.Google ScholarGoogle Scholar
  42. Gueorgi Kossinets and Duncan J Watts. Empirical analysis of an evolving social network. Science, 2006.Google ScholarGoogle Scholar
  43. Xin Jin, Cindy Xide Lin, Jiebo Luo, and Jiawei Han. Socialspamguard: A data mining-based spam detection system for social media networks. PVLDB, 2011.Google ScholarGoogle Scholar
  44. Xianchao Zhang, Shaoping Zhu, and Wenxin Liang. Detecting spam and promoting campaigns in the twitter social network. In ICDM, pages 1194--1199, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Garc\'ıa, and Bringas}laorden2012collectiveCarlos Laorden, Borja Sanz, Igor Santos, Patxi Galán-García, and Pablo G Bringas. Collective classification for spam filtering. Logic Journal of IGPL, 2012.Google ScholarGoogle Scholar
  46. Guang-Gang Geng, Qiudan Li, and Xinchang Zhang. Link based small sample learning for web spam detection. In Proceedings of the 18th international conference on World wide web, pages 1185--1186. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mohamadali Torkamani and Daniel Lowd. Convex adversarial collective classification. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 642--650, 2013.Google ScholarGoogle Scholar
  48. Fakhraei, Soltanian-Zadeh, and Fotouhi}fakhraei2014biasShobeir Fakhraei, Hamid Soltanian-Zadeh, and Farshad Fotouhi. Bias and stability of single variable classifiers for feature ranking and selection. Expert Systems with Applications, 41 (15): 6945 -- 6958, 2014\natexlabb. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Collective Spammer Detection in Evolving Multi-Relational Social Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader