Viral information propagation in the Digg online social network
Introduction
Everyday in online social networks (OSNs), thousands of users post news articles, videos, photos etc. which become visible to their connected users as new online content. As most of these forms of media never spread to a wide audience from the sources, many users are influenced by their networking. The causes and dynamics by which information proliferates throughout OSNs are still poorly understood. A greater comprehension of the mathematics underlying the spread of information in OSNs would have important applications for advertisers seeking to wage more effective online marketing campaigns and may enable a more rapid spread of information over OSNs in the aftermath of political crises or natural disasters.
The focus of this article is the OSN Digg.com (DOSN). In this network, users are able to post content to a personal web page, vote for (“digg”) or against (“bury”) this content and share the content with users to whom they are connected. There are two forms of connections between users: directed (one user can share content with another user but not vice versa) and bidirected (both users can share content with each other). Once posted content receives some large number of votes over a particular period of time, the content is then posted to the homepage of Digg.com and is visible to all users in the DOSN.
One of the reasons we select DOSN to illustrate our epidemiological approach is our accessibility to the dataset of Ref. [1] which contains the information of voting characteristics of the DOSN for June 2009. In particular, the data contains 3553 distinct stories (online content), the number of votes a particular story received, the particular users that voted for each story and the time at which each user cast the vote. On average, each story received approximately 850 votes where the minimum number of votes was 122 and the maximum 24 099. It should be noted, that this dataset only includes the stories that were promoted to the homepage of Digg.com in June 2009. In addition to voting data, [1] also contains the connectivity information of 71 367 distinct users which includes: the users to which each user is connected, the time at which the connection was created and the type of connection that was created (directed or bidirected). We determined that on average, every user is connected to 24 other users of which approximately half were directed (48.901%) and half were bidirected (51.099%). As in Ref. [2], we defined the distance metric between two users in the DOSN as the minimum number of connections (directed or bidirected) needed to connect them. We defined two users as being disconnected if there does not exist any path in the DOSN connecting them. We refer to Ref. [3] for the excellent empirical characterization of this data.
A goal of modeling the propagation of information in an OSN is to understand the rate at which a piece of online content influences the users as a function of time and distance away from the source of the propagation. The linear diffusive model of Feng et al. [4] used a temporal–spatial partial differential equation (PDE) model to explain these rates of spread in the DOSN. By fixing their model’s parameters and altering the initial conditions to replicate the information propagation, they were able to achieve an average model accuracy of 97.41% for the most popular story. Additionally, they examined all stories receiving more than 3000 votes (134 stories). In approximately 60% of these stories, they had model accuracies greater than 80%.
Our focus was directed towards an adaptation and application of an epidemiological model which describes the spread of a virus in a population. In modifying and utilizing this model, we were able to predict the cumulative number of users who voted for any shared story at time (hours) after its initial posting, the time period (viral period) during which the story diffuses quickly through the DOSN, and the peaking time for the total time of “influence users” to reach the maximum, and the turning point when the information spread starts to slow down. By using this model, we achieved higher model accuracies than [4] in both the most popular story and the most popular 134 stories. Furthermore, we achieved an average predictive accuracy of approximately 80% for all voted stories.
Section snippets
The model
We modeled the diffusion of a particular story through the DOSN by using a modified epidemiological SIR model [5]. As in modeling the spread of a virus in a population, we used similar SIR definitions from epidemiology to categorize the users of the DOSN at any given time in relation to any given story. The “susceptible” population is comprised of users who have not yet voted for a particular story, the “infected” population consists of users who have voted for a particular story and
Conclusion and discussion
By using a variant of the epidemiological SIR model, we obtained an average predictive accuracy of over 98% for the most popular story, approximately 86% for the 134 highest voted news articles and approximately 80% of all stories. These accuracies show that the application of a viral information propagation model more accurately predicts the voting trend of Digg network stories than the previous model in Ref. [4] over the first 50 h.
In addition to achieving higher accuracies, we showed that
Acknowledgments
This research was supported in part by the Fields Institute for Research in Mathematical Sciences, the Mitacs, the Canada Research Chair program and by Natural Sciences and Engineering Research Council of Canada. The authors would like to thank Professor Feng Wang at Arizona State University for her expertise advice on online social network information dynamics, and Professor Kristina Lerman at the Information Sciences Institute for granting us the access to the datasets of Ref. [1]. We also
References (7)
- et al.
Richards model revisited: validation by and application to infection dynamics
J. Theoret. Biol.
(2012) - K. Lerman, Digg 2009 Data Set,...
- F. Wang, H. Wang, K. Xu, Diffusive logistic model towards predicting information diffusion in online social networks,...
Cited by (17)
An extended SEIR model considering homepage effect for the information propagation of online social networks
2018, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :One of the classical mathematical models to elaborate information propagation is the infectious diseases model in biological mathematics [4,5]. As the way of network information propagation is similar to the way of the virus spread, many scholars use the typical infectious models, Susceptible–Infected–Susceptible (SIS) model [6,7] and Susceptible–Infected–Recovered (SIR) model [8–10], to simulate the online information propagation. Although these researches indicate that SIS and SIR model can show superior performance in certain aspects, there still reminds following challenges to be addressed (not exhaustive).
Effect of the dynamics of human behavior on the competitive spreading of information
2018, Computers in Human BehaviorCitation Excerpt :Theoretically speaking, the situation in which multiple messages mutually influence and competitively spread is completely different from the spread of an independent message. Models (Freeman, McVittie, Sivak, & Wu, 2014; Li, Zhang, Chen, & Cao, 2014; Zhou, Hu, Wu, & Xiong, 2015) discussing the spread of a single piece of information are no longer applicable to the study of competitive spreading among multiple pieces of information. As far as practical applications are concerned, the competition among messages that represent different opinions and attitudes usually leads to changes in the results of an event or word of mouth on a product.
Research on suppression strategy of social network information based on effective isolation
2018, Procedia Computer ScienceA two-stage broadcast message propagation model in social networks
2016, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :modified the SIR model and developed susceptible–infected–hibernator–removed (SIHR) model under the case of considering forgetting and memory mechanism, which revealed the spread of rumor is repeatable. Freeman et al. [15] focused on the information propagation rule of online social networking sites (Digg) by an improved SIR model, which could predict user voting behavior within 50 h. Wang et al. [16]
Mining the key predictors for event outbreaks in social networks
2016, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :In Section 6, we conclude the paper and discuss future research directions. In recent years, information diffusion, also known as information cascade, has drawn considerable attention from many fields of research, and a variety of methods and models have been proposed to capture information diffusion in social networks [5–8]. Some researchers focus on building effective models to explain the general process of information diffusion.
Topological evolution of virtual social networks by modeling social activities
2015, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :However, it is intractable to conduct rigorous studies of human centric networking and communications over a large-scale virtual social network because of the large scale, complex topology and security problems of network. In addition, it is illegal to carry out special scientific researches and experimental developments on real social networks, such as social-aware routing protocol design, faults [6] and worm propagation[7–11], and advertising promotion [12]. As such, the structural modeling [13] and conceptual properties [14–16] of virtual social networks are well studied as a special form of the networks.
- 1
Current address: Centre for Complexity Science, University of Warwick, Coventry CV4 7AL, United Kingdom.