skip to main content
10.1145/3178876.3186182acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Free Access

Discovering Progression Stages in Trillion-Scale Behavior Logs

Published:10 April 2018Publication History

ABSTRACT

User engagement is a key factor for the success of web services. Studying the following questions will help establishing business strategies leading to their success: How do the behaviors of users in a web service evolve over time? To reach a certain engagement level, what are the common stages that many users go through? How can we represent the stage that each individual user lies in? To answer these questions, we propose a behavior model that discovers the progressions of users' behaviors from a given starting point - such as a new subscription or first experience of certain features - to a particular target stage such as a predefined engagement level of interest. Under our model, transitions over stages represent progression of users where each stage in our model is characterized by probability distributions over types of actions, frequencies of actions, and next stages to move. Each user performs actions and moves to a next stage following the probability distributions characterizing the current stage. We also develop a fast and memory-efficient algorithm that fits our model to trillions of behavioral logs. Our algorithm scales linearly with the size of data. Especially, its distributed version implemented in the MapReduce framework successfully handles petabyte-scale data with one trillion actions. Lastly, we show the effectiveness of our model and algorithm by applying them to real-world data from LinkedIn. We discover meaningful stages that LinkedIn users go through leading to predefined target goals. In addition, our trained models are shown to be useful for downstream tasks such as prediction of future actions.

References

  1. 2017. Apache Hadoop. (2017). http://hadoop.apache.org/Google ScholarGoogle Scholar
  2. 2017. Statistical Machine Intelligence & Learning Engine. (2017). https://github. com/haifengl/smileGoogle ScholarGoogle Scholar
  3. David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In SODA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Narayanaswamy Balakrishnan. 2006. Continuous multivariate distributions. Wiley Online Library.Google ScholarGoogle Scholar
  5. Iyad Batal, Dmitriy Fradkin, James Harrison, Fabian Moerchen, and Milos Hauskrecht. 2012. Mining recent temporal patterns for event detection in multivariate time series data. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fabrício Benevenuto, Tiago Rodrigues, Meeyoung Cha, and VirgÃlio Almeida. 2009. Characterizing User Behavior in Online Social Networks. In IMC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Austin R. Benson, Ravi Kumar, and Andrew Tomkins. 2016. Modeling User Consumption Sequences. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Wray L Buntine. 1994. Operations for learning with graphical models. Journal of Artificial Intelligence Research (1994), 159--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Igor V. Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, and Steven White. 2000. Visualization of navigation patterns on a Web site using model-based clustering. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (methodological) (1977), 1--38.Google ScholarGoogle Scholar
  12. Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alexander J. Smola, and Le Song. 2015. Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Flavio Figueiredo, Bruno Ribeiro, Jussara M. Almeida, and Christos Faloutsos. 2016. TribeFlow: Mining & Predicting User Trajectories. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Srivatsan Laxman, Vikram Tankasali, and Ryen W. White. 2008. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Liangyue Li, How Jing, Hanghang Tong, Jaewon Yang, Qi He, and Bee-Chung Chen. 2017. NEMO: Next Career Move Prediction with Contextual Embedding. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xin Liu. 2015. Modeling Users? Dynamic Preference for Personalized Recommendation. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yasuko Matsubara and Yasushi Sakurai. 2016. Regime Shifts in Streams: Real-time Forecasting of Co-evolving Time Sequences. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, and Masatoshi Yoshikawa. 2012. Fast Mining and Forecasting of Complex TimeStamped Events. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Charalampos Mavroforakis, Isabel Valera, and Manuel Gomez-Rodriguez. 2017. Modeling the Dynamics of Learning Activity on the Web. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Julian John McAuley and Jure Leskovec. 2013. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Peter McCullagh. 1984. Generalized linear models. European Journal of Operational Research 16, 3 (1984), 285--292.Google ScholarGoogle ScholarCross RefCross Ref
  23. David Martin Powers. 2011. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2, 1 (2011), 37--63.Google ScholarGoogle ScholarCross RefCross Ref
  24. Dafna Shahaf, Jaewon Yang, Caroline Suen, Jeff Jacobs, Heidi Wang, and Jure Leskovec. 2013. Information Cartography: Creating Zoomable, Large-Scale Maps of Information. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Amit Singhal. 2001. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24, 4 (2001), 35--43.Google ScholarGoogle Scholar
  26. Moshe Sniedovich. 2010. Dynamic programming: foundations and principles. CRC press.Google ScholarGoogle Scholar
  27. Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y. Zhao. 2016. Unsupervised Clickstream Clustering for User Behavior Analysis. In CHI. {28} Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, and Nigam Shah. 2014. Finding progression stages in time-evolving event sequences. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to Do Next: Modeling User Behaviors by Time-LSTM. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discovering Progression Stages in Trillion-Scale Behavior Logs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            WWW '18: Proceedings of the 2018 World Wide Web Conference
            April 2018
            2000 pages
            ISBN:9781450356398

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            International World Wide Web Conferences Steering Committee

            Republic and Canton of Geneva, Switzerland

            Publication History

            • Published: 10 April 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format