skip to main content
10.1145/3331184.3331197acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Finding Camouflaged Needle in a Haystack?: Pornographic Products Detection via Berrypicking Tree Model

Published:18 July 2019Publication History

ABSTRACT

It is an important and urgent research problem for decentralized eCommerce services, e.g., eBay, eBid, and Taobao, to detect illegal products, e.g., unclassified pornographic products. However, it is a challenging task as some sellers may utilize and change camouflaged text to deceive the current detection algorithms. In this study, we propose a novel task to dynamically locate the pornographic products from very large product collections. Unlike prior product classification efforts focusing on textual information, the proposed model, BerryPIcking TRee MoDel (BIRD), utilizes both product textual content and buyers' seeking behavior information as berrypicking trees. In particular, the BIRD encodes both semantic information with respect to all branches sequence and the overall latent buyer intent during the whole seeking process. An extensive set of experiments have been conducted to demonstrate the advantage of the proposed model against alternative solutions. To facilitate further research of this practical and important problem, the codes and buyers' seeking behavior data have been made publicly available1.

Skip Supplemental Material Section

Supplemental Material

cite2-12h00-d2.mp4

mp4

482.7 MB

References

  1. Prudhvi Ratna Badri Satya, Kyumin Lee, Dongwon Lee, Thanh Tran, and Jason Jiasheng Zhang. 2016. Uncovering fake likers in online social networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2365--2370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (2015), 1--15.Google ScholarGoogle Scholar
  3. Marcia J. Bates. 1989. The design of browsing and berrypicking techniques for the online search interface. Online review, Vol. 13, 5 (1989), 407--424.Google ScholarGoogle Scholar
  4. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research, Vol. 3, Feb (2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cheng Cao, James Caverlee, Kyumin Lee, Hancheng Ge, and Jinwook Chung. 2015. Organic or organized? Exploring url sharing behavior. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 513--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Elfreda A. Chatman. 1999. A theory of life in the round. Journal of the American Society for information Science, Vol. 50, 3 (1999), 207--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), 1724--1734.Google ScholarGoogle ScholarCross RefCross Ref
  8. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brenda Dervin. 1998. Sense-making theory and practice: an overview of user interests in knowledge seeking and use. Journal of knowledge management, Vol. 2, 2 (1998), 36--46.Google ScholarGoogle ScholarCross RefCross Ref
  10. Carsten Eickhoff, Jaime Teevan, Ryen White, and Susan Dumais. 2014. Lessons from the journey: a query log analysis of within-session learning. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Song Feng, Longfei Xing, Anupam Gogar, and Yejin Choi. 2012. Distributional Footprints of Deceptive Product Reviews. ICWSM, Vol. 12 (2012), 98--105.Google ScholarGoogle Scholar
  12. David Mandell Freeman. 2017. Can you spot the fakes? On the limitations of user feedback in online social networks. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1093--1102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1243--1252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Guoxiu He and Wei Lu. 2018. Entire Information Attentive GRU for Text Representation. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '18). ACM, 163--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Marti A. Hearst, Susan T. Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. 1998. Support vector machines. IEEE Intelligent Systems and their applications, Vol. 13, 4 (1998), 18--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ramon Ferrer i Cancho and Ricard V Solé. 2003. Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences, Vol. 100, 3 (2003), 788--791.Google ScholarGoogle ScholarCross RefCross Ref
  18. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhuoren Jiang, Liangcai Gao, Ke Yuan, Zheng Gao, Zhi Tang, and Xiaozhong Liu. 2018. Mathematics Content Understanding for Cyberlearning via Formula Evolution Map. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 37--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zhuoren Jiang, Yue Yin, Liangcai Gao, Yao Lu, and Xiaozhong Liu. 2018. Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 562--570.Google ScholarGoogle ScholarCross RefCross Ref
  22. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, Volume 1: Long Papers (2014), 655--665.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mahmood Khosrowjerdi. 2016. A review of theory-driven models of trust in the online health context. IFLA journal, Vol. 42, 3 (2016), 189--206.Google ScholarGoogle ScholarCross RefCross Ref
  24. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014. 1746--1751.Google ScholarGoogle ScholarCross RefCross Ref
  25. Diederik P. Kingma and Jimmy Ba. {n. d.}. Adam: A method for stochastic optimization. In International Conference for Learning Representations. 1--15.Google ScholarGoogle Scholar
  26. James Krikelas. 1983. Information-seeking behavior: Patterns and concepts. Drexel library quarterly, Vol. 19, 2 (1983), 5--20.Google ScholarGoogle Scholar
  27. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, Vol. 521, 7553 (2015), 436.Google ScholarGoogle Scholar
  28. Kyumin Lee, James Caverlee, Zhiyuan Cheng, and Daniel Z. Sui. 2013. Campaign extraction from social media. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 5, 1 (2013), 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kyumin Lee, Brian David Eoff, and James Caverlee. 2011. Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter. In Fifth International AAAI Conference on Weblogs and Social Media. 185--192.Google ScholarGoogle Scholar
  30. Tao Lei, Yu Zhang, Sida I Wang, Hui Dai, and Yoav Artzi. 2018. Simple Recurrent Units for Highly Parallelizable Recurrence. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4470--4481.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yuqing Lu, Lei Zhang, Yudong Xiao, and Yangguang Li. 2013. Simultaneously detecting fake reviews and review spammers using factor graph model. In Proceedings of the 5th annual ACM web science conference. ACM, 225--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. Computer Science (2013).Google ScholarGoogle Scholar
  33. Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model.. In Interspeech, Vol. 2. 3.Google ScholarGoogle Scholar
  34. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. In Aistats, Vol. 5. Citeseer, 246--252.Google ScholarGoogle Scholar
  36. Myle Ott, Claire Cardie, and Jeff Hancock. 2012. Estimating the prevalence of deception in online review communities. In Proceedings of the 21st international conference on World Wide Web. ACM, 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sherif Saad, Issa Traore, Ali Ghorbani, Bassam Sayed, David Zhao, Wei Lu, John Felix, and Payman Hakimian. 2011. Detecting P2P botnets through network behavior analysis and machine learning. In Privacy, Security and Trust (PST), 2011 Ninth Annual International Conference on. IEEE, 174--180.Google ScholarGoogle ScholarCross RefCross Ref
  38. Dinghan Shen, Guoyin Wang, Wenlin Wang, Martin Renqiang Min, Qinliang Su, Yizhe Zhang, Chunyuan Li, Ricardo Henao, and Lawrence Carin. 2018. Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Volume 1: Long Papers. 440--450.Google ScholarGoogle ScholarCross RefCross Ref
  39. Ning Su, Yiqun Liu, Zhao Li, Yuli Liu, Min Zhang, and Shaoping Ma. 2018. Detecting Crowdturfing "Add to Favorites" Activities in Online Shopping. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1673--1682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015).Google ScholarGoogle Scholar
  41. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Bingning Wang, Kang Liu, and Jun Zhao. 2016. Inner attention based recurrent neural networks for answer selection. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1288--1297.Google ScholarGoogle ScholarCross RefCross Ref
  43. Chenglong Wang, Feijun Jiang, and Hongxia Yang. 2017. A hybrid framework for text modeling with convolutional RNN. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2061--2069. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ryen W. White, Gary Marchionini, and Gheorghe Muresan. 2008. Evaluating exploratory search systems. Information Processing and Management, Vol. 44, 2 (2008), 433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Chang Xu and Jie Zhang. 2015. Towards collusive fraud detection in online reviews. In 2015 IEEE International Conference on Data Mining (ICDM). IEEE, 1051--1056. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Chang Xu, Jie Zhang, Kuiyu Chang, and Chong Long. 2013. Uncovering collusive spammers in Chinese review websites. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 979--988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Junting Ye and Leman Akoglu. 2015. Discovering opinion spammer groups by network footprints. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 267--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wenpeng Yin and Hinrich Schütze. 2015. Convolutional neural network for paraphrase identification. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 901--911.Google ScholarGoogle ScholarCross RefCross Ref
  49. Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. 2015. A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015).Google ScholarGoogle Scholar

Index Terms

  1. Finding Camouflaged Needle in a Haystack?: Pornographic Products Detection via Berrypicking Tree Model

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
          July 2019
          1512 pages
          ISBN:9781450361729
          DOI:10.1145/3331184

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 July 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGIR'19 Paper Acceptance Rate84of426submissions,20%Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader