Discovering Progression Stages in Trillion-Scale Behavior Logs

Authors:
Kijung Shin

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Mahdi Shafiei

LinkedIn Corporation, Mountain View, CA, USA

LinkedIn Corporation, Mountain View, CA, USA
View Profile

,
Myunghwan Kim

LinkedIn Corporation, Mountain View, CA, USA

LinkedIn Corporation, Mountain View, CA, USA
View Profile

,
Aastha Jain

LinkedIn Corporation, Mountain View, CA, USA

LinkedIn Corporation, Mountain View, CA, USA
View Profile

,
Hema Raghavan

LinkedIn Corporation, Mountain View, CA, USA

LinkedIn Corporation, Mountain View, CA, USA
View Profile

WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018Pages 1765–1774https://doi.org/10.1145/3178876.3186182

Published:10 April 2018Publication History

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 1765–1774

ABSTRACT

User engagement is a key factor for the success of web services. Studying the following questions will help establishing business strategies leading to their success: How do the behaviors of users in a web service evolve over time? To reach a certain engagement level, what are the common stages that many users go through? How can we represent the stage that each individual user lies in? To answer these questions, we propose a behavior model that discovers the progressions of users' behaviors from a given starting point - such as a new subscription or first experience of certain features - to a particular target stage such as a predefined engagement level of interest. Under our model, transitions over stages represent progression of users where each stage in our model is characterized by probability distributions over types of actions, frequencies of actions, and next stages to move. Each user performs actions and moves to a next stage following the probability distributions characterizing the current stage. We also develop a fast and memory-efficient algorithm that fits our model to trillions of behavioral logs. Our algorithm scales linearly with the size of data. Especially, its distributed version implemented in the MapReduce framework successfully handles petabyte-scale data with one trillion actions. Lastly, we show the effectiveness of our model and algorithm by applying them to real-world data from LinkedIn. We discover meaningful stages that LinkedIn users go through leading to predefined target goals. In addition, our trained models are shown to be useful for downstream tasks such as prediction of future actions.

References

2017. Apache Hadoop. (2017). http://hadoop.apache.org/Google Scholar
2017. Statistical Machine Intelligence & Learning Engine. (2017). https://github. com/haifengl/smileGoogle Scholar
David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In SODA. Google ScholarDigital Library
Narayanaswamy Balakrishnan. 2006. Continuous multivariate distributions. Wiley Online Library.Google Scholar
Iyad Batal, Dmitriy Fradkin, James Harrison, Fabian Moerchen, and Milos Hauskrecht. 2012. Mining recent temporal patterns for event detection in multivariate time series data. In KDD. Google ScholarDigital Library
Fabrício Benevenuto, Tiago Rodrigues, Meeyoung Cha, and VirgÃlio Almeida. 2009. Characterizing User Behavior in Online Social Networks. In IMC. Google ScholarDigital Library
Austin R. Benson, Ravi Kumar, and Andrew Tomkins. 2016. Modeling User Consumption Sequences. In WWW. Google ScholarDigital Library
Wray L Buntine. 1994. Operations for learning with graphical models. Journal of Artificial Intelligence Research (1994), 159--225. Google ScholarDigital Library
Igor V. Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, and Steven White. 2000. Visualization of navigation patterns on a Web site using model-based clustering. In KDD. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (methodological) (1977), 1--38.Google Scholar
Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alexander J. Smola, and Le Song. 2015. Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams. In KDD. Google ScholarDigital Library
Flavio Figueiredo, Bruno Ribeiro, Jussara M. Almeida, and Christos Faloutsos. 2016. TribeFlow: Mining & Predicting User Trajectories. In WWW. Google ScholarDigital Library
Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW. Google ScholarDigital Library
Srivatsan Laxman, Vikram Tankasali, and Ryen W. White. 2008. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD. Google ScholarDigital Library
Liangyue Li, How Jing, Hanghang Tong, Jaewon Yang, Qi He, and Bee-Chung Chen. 2017. NEMO: Next Career Move Prediction with Contextual Embedding. In WWW. Google ScholarDigital Library
Xin Liu. 2015. Modeling Users? Dynamic Preference for Personalized Recommendation. In IJCAI. Google ScholarDigital Library
Yasuko Matsubara and Yasushi Sakurai. 2016. Regime Shifts in Streams: Real-time Forecasting of Co-evolving Time Sequences. In KDD. Google ScholarDigital Library
Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, and Masatoshi Yoshikawa. 2012. Fast Mining and Forecasting of Complex TimeStamped Events. In KDD. Google ScholarDigital Library
Charalampos Mavroforakis, Isabel Valera, and Manuel Gomez-Rodriguez. 2017. Modeling the Dynamics of Learning Activity on the Web. In WWW. Google ScholarDigital Library
Julian John McAuley and Jure Leskovec. 2013. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In WWW. Google ScholarDigital Library
Peter McCullagh. 1984. Generalized linear models. European Journal of Operational Research 16, 3 (1984), 285--292.Google ScholarCross Ref
David Martin Powers. 2011. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies 2, 1 (2011), 37--63.Google ScholarCross Ref
Dafna Shahaf, Jaewon Yang, Caroline Suen, Jeff Jacobs, Heidi Wang, and Jure Leskovec. 2013. Information Cartography: Creating Zoomable, Large-Scale Maps of Information. In KDD. Google ScholarDigital Library
Amit Singhal. 2001. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24, 4 (2001), 35--43.Google Scholar
Moshe Sniedovich. 2010. Dynamic programming: foundations and principles. CRC press.Google Scholar
Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y. Zhao. 2016. Unsupervised Clickstream Clustering for User Behavior Analysis. In CHI. {28} Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In WWW. Google ScholarDigital Library
Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, and Nigam Shah. 2014. Finding progression stages in time-evolving event sequences. In WWW. Google ScholarDigital Library
Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai. 2017. What to Do Next: Modeling User Behaviors by Time-LSTM. In IJCAI. Google ScholarDigital Library

Index Terms

Discovering Progression Stages in Trillion-Scale Behavior Logs
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning in probabilistic graphical models
        Latent variable models
2. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering
  2. World Wide Web
    1. Web applications
      1. Social networks
    2. Web mining
      1. Web log analysis

Recommendations

Finding progression stages in time-evolving event sequences
WWW '14: Proceedings of the 23rd international conference on World wide web

Event sequences, such as patients' medical histories or users' sequences of product reviews, trace how individuals progress over time. Identifying common patterns, or progression stages, in such event sequences is a challenging task because not every ...
Read More
A Stochastic Model of Cancer Progression and Screening
Read More
Behavior Informatics to Discover Behavior Insight for Active and Tailored Client Management
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Behavior is ubiquitous, and behavior intelligence and insight play an important role in data understanding and business problem-solving. Behavior Informatics [1,2] emerges as an important tool for discovering behavior intelligence and behavior insight. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 10 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
behavior log
behavior modeling
mapreduce
online social network
progression
stage
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 434
  Total Downloads
- Downloads (Last 12 months)51
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Discovering Progression Stages in Trillion-Scale Behavior Logs

WWW '18: Proceedings of the 2018 World Wide Web Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Finding progression stages in time-evolving event sequences

A Stochastic Model of Cancer Progression and Screening

Behavior Informatics to Discover Behavior Insight for Active and Tailored Client Management