ABSTRACT
User engagement is a key factor for the success of web services. Studying the following questions will help establishing business strategies leading to their success: How do the behaviors of users in a web service evolve over time? To reach a certain engagement level, what are the common stages that many users go through? How can we represent the stage that each individual user lies in? To answer these questions, we propose a behavior model that discovers the progressions of users' behaviors from a given starting point - such as a new subscription or first experience of certain features - to a particular target stage such as a predefined engagement level of interest. Under our model, transitions over stages represent progression of users where each stage in our model is characterized by probability distributions over types of actions, frequencies of actions, and next stages to move. Each user performs actions and moves to a next stage following the probability distributions characterizing the current stage. We also develop a fast and memory-efficient algorithm that fits our model to trillions of behavioral logs. Our algorithm scales linearly with the size of data. Especially, its distributed version implemented in the MapReduce framework successfully handles petabyte-scale data with one trillion actions. Lastly, we show the effectiveness of our model and algorithm by applying them to real-world data from LinkedIn. We discover meaningful stages that LinkedIn users go through leading to predefined target goals. In addition, our trained models are shown to be useful for downstream tasks such as prediction of future actions.
- 2017. Apache Hadoop. (2017). http://hadoop.apache.org/Google Scholar
- 2017. Statistical Machine Intelligence & Learning Engine. (2017). https://github. com/haifengl/smileGoogle Scholar
- David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In SODA. Google ScholarDigital Library
- Narayanaswamy Balakrishnan. 2006. Continuous multivariate distributions. Wiley Online Library.Google Scholar
- Iyad Batal, Dmitriy Fradkin, James Harrison, Fabian Moerchen, and Milos Hauskrecht. 2012. Mining recent temporal patterns for event detection in multivariate time series data. In KDD. Google ScholarDigital Library
- Fabrício Benevenuto, Tiago Rodrigues, Meeyoung Cha, and VirgÃlio Almeida. 2009. Characterizing User Behavior in Online Social Networks. In IMC. Google ScholarDigital Library
- Austin R. Benson, Ravi Kumar, and Andrew Tomkins. 2016. Modeling User Consumption Sequences. In WWW. Google ScholarDigital Library
- Wray L Buntine. 1994. Operations for learning with graphical models. Journal of Artificial Intelligence Research (1994), 159--225. Google ScholarDigital Library
- Igor V. Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, and Steven White. 2000. Visualization of navigation patterns on a Web site using model-based clustering. In KDD. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
- Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (methodological) (1977), 1--38.Google Scholar
- Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alexander J. Smola, and Le Song. 2015. Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams. In KDD. Google ScholarDigital Library
- Flavio Figueiredo, Bruno Ribeiro, Jussara M. Almeida, and Christos Faloutsos. 2016. TribeFlow: Mining & Predicting User Trajectories. In WWW. Google ScholarDigital Library
- Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW. Google ScholarDigital Library
- Srivatsan Laxman, Vikram Tankasali, and Ryen W. White. 2008. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD. Google ScholarDigital Library
- Liangyue Li, How Jing, Hanghang Tong, Jaewon Yang, Qi He, and Bee-Chung Chen. 2017. NEMO: Next Career Move Prediction with Contextual Embedding. In WWW. Google ScholarDigital Library
- Xin Liu. 2015. Modeling Users? Dynamic Preference for Personalized Recommendation. In IJCAI. Google ScholarDigital Library
- Yasuko Matsubara and Yasushi Sakurai. 2016. Regime Shifts in Streams: Real-time Forecasting of Co-evolving Time Sequences. In KDD. Google ScholarDigital Library
- Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, and Masatoshi Yoshikawa. 2012. Fast Mining and Forecasting of Complex TimeStamped Events. In KDD. Google ScholarDigital Library
- Charalampos Mavroforakis, Isabel Valera, and Manuel Gomez-Rodriguez. 2017. Modeling the Dynamics of Learning Activity on the Web. In WWW. Google ScholarDigital Library
- Julian John McAuley and Jure Leskovec. 2013. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In WWW. Google ScholarDigital Library
- Peter McCullagh. 1984. Generalized linear models. European Journal of Operational Research 16, 3 (1984), 285--292.Google ScholarCross Ref
- David Martin Powers. 2011. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2, 1 (2011), 37--63.Google ScholarCross Ref
- Dafna Shahaf, Jaewon Yang, Caroline Suen, Jeff Jacobs, Heidi Wang, and Jure Leskovec. 2013. Information Cartography: Creating Zoomable, Large-Scale Maps of Information. In KDD. Google ScholarDigital Library
- Amit Singhal. 2001. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24, 4 (2001), 35--43.Google Scholar
- Moshe Sniedovich. 2010. Dynamic programming: foundations and principles. CRC press.Google Scholar
- Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y. Zhao. 2016. Unsupervised Clickstream Clustering for User Behavior Analysis. In CHI. {28} Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In WWW. Google ScholarDigital Library
- Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, and Nigam Shah. 2014. Finding progression stages in time-evolving event sequences. In WWW. Google ScholarDigital Library
- Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to Do Next: Modeling User Behaviors by Time-LSTM. In IJCAI. Google ScholarDigital Library
Index Terms
- Discovering Progression Stages in Trillion-Scale Behavior Logs
Recommendations
Finding progression stages in time-evolving event sequences
WWW '14: Proceedings of the 23rd international conference on World wide webEvent sequences, such as patients' medical histories or users' sequences of product reviews, trace how individuals progress over time. Identifying common patterns, or progression stages, in such event sequences is a challenging task because not every ...
Behavior Informatics to Discover Behavior Insight for Active and Tailored Client Management
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningBehavior is ubiquitous, and behavior intelligence and insight play an important role in data understanding and business problem-solving. Behavior Informatics [1,2] emerges as an important tool for discovering behavior intelligence and behavior insight. ...
Comments