Viral information propagation in the Digg online social network

https://doi.org/10.1016/j.physa.2014.06.011Get rights and content

Highlights

  • We propose and analyze an epidemiological model for information propagation in online social network.

  • We characterize peak timing, turning point, viral period, and final size of the number of votes.

  • There are significant similarity and difference between information propagation in OSNs differs from disease spread in populations.

  • Simple dynamic models can provide accurate prediction of information propagation in OSNs.

Abstract

We propose the use of a variant of the epidemiological SIR model to accurately describe the diffusion of online content over the online social network Digg.com. We examine the qualitative properties of our viral information propagation model, demonstrate the model’s applications to social media spread in online social networks with particular focus on accurately predicting user voting behavior over a period of 50 h. The model allows us to characterize the peak time, turning point, viral period and final size (total number of votes), and gives much improved prediction of user voting behaviors than other established models.

Introduction

Everyday in online social networks (OSNs), thousands of users post news articles, videos, photos etc. which become visible to their connected users as new online content. As most of these forms of media never spread to a wide audience from the sources, many users are influenced by their networking. The causes and dynamics by which information proliferates throughout OSNs are still poorly understood. A greater comprehension of the mathematics underlying the spread of information in OSNs would have important applications for advertisers seeking to wage more effective online marketing campaigns and may enable a more rapid spread of information over OSNs in the aftermath of political crises or natural disasters.

The focus of this article is the OSN Digg.com (DOSN). In this network, users are able to post content to a personal web page, vote for (“digg”) or against (“bury”) this content and share the content with users to whom they are connected. There are two forms of connections between users: directed (one user can share content with another user but not vice versa) and bidirected (both users can share content with each other). Once posted content receives some large number of votes over a particular period of time, the content is then posted to the homepage of Digg.com and is visible to all users in the DOSN.

One of the reasons we select DOSN to illustrate our epidemiological approach is our accessibility to the dataset of Ref.  [1] which contains the information of voting characteristics of the DOSN for June 2009. In particular, the data contains 3553 distinct stories (online content), the number of votes a particular story received, the particular users that voted for each story and the time at which each user cast the vote. On average, each story received approximately 850 votes where the minimum number of votes was 122 and the maximum 24 099. It should be noted, that this dataset only includes the stories that were promoted to the homepage of Digg.com in June 2009. In addition to voting data,  [1] also contains the connectivity information of 71 367 distinct users which includes: the users to which each user is connected, the time at which the connection was created and the type of connection that was created (directed or bidirected). We determined that on average, every user is connected to 24 other users of which approximately half were directed (48.901%) and half were bidirected (51.099%). As in Ref.  [2], we defined the distance metric between two users in the DOSN as the minimum number of connections (directed or bidirected) needed to connect them. We defined two users as being disconnected if there does not exist any path in the DOSN connecting them. We refer to Ref.  [3] for the excellent empirical characterization of this data.

A goal of modeling the propagation of information in an OSN is to understand the rate at which a piece of online content influences the users as a function of time and distance away from the source of the propagation. The linear diffusive model of Feng et al.  [4] used a temporal–spatial partial differential equation (PDE) model to explain these rates of spread in the DOSN. By fixing their model’s parameters and altering the initial conditions to replicate the information propagation, they were able to achieve an average model accuracy of 97.41% for the most popular story. Additionally, they examined all stories receiving more than 3000 votes (134 stories). In approximately 60% of these stories, they had model accuracies greater than 80%.

Our focus was directed towards an adaptation and application of an epidemiological model which describes the spread of a virus in a population. In modifying and utilizing this model, we were able to predict the cumulative number of users who voted for any shared story at time t (hours) after its initial posting, the time period (viral period) during which the story diffuses quickly through the DOSN, and the peaking time for the total time of “influence users” to reach the maximum, and the turning point when the information spread starts to slow down. By using this model, we achieved higher model accuracies than  [4] in both the most popular story and the most popular 134 stories. Furthermore, we achieved an average predictive accuracy of approximately 80% for all voted stories.

Section snippets

The model

We modeled the diffusion of a particular story through the DOSN by using a modified epidemiological SIR model  [5]. As in modeling the spread of a virus in a population, we used similar SIR definitions from epidemiology to categorize the users of the DOSN at any given time in relation to any given story. The “susceptible” population S(t) is comprised of users who have not yet voted for a particular story, the “infected” population I(t) consists of users who have voted for a particular story and

Conclusion and discussion

By using a variant of the epidemiological SIR model, we obtained an average predictive accuracy of over 98% for the most popular story, approximately 86% for the 134 highest voted news articles and approximately 80% of all stories. These accuracies show that the application of a viral information propagation model more accurately predicts the voting trend of Digg network stories than the previous model in Ref.  [4] over the first 50 h.

In addition to achieving higher accuracies, we showed that

Acknowledgments

This research was supported in part by the Fields Institute for Research in Mathematical Sciences, the Mitacs, the Canada Research Chair program and by Natural Sciences and Engineering Research Council of Canada. The authors would like to thank Professor Feng Wang at Arizona State University for her expertise advice on online social network information dynamics, and Professor Kristina Lerman at the Information Sciences Institute for granting us the access to the datasets of Ref.  [1]. We also

References (7)

There are more references available in the full text version of this article.

Cited by (17)

  • An extended SEIR model considering homepage effect for the information propagation of online social networks

    2018, Physica A: Statistical Mechanics and its Applications
    Citation Excerpt :

    One of the classical mathematical models to elaborate information propagation is the infectious diseases model in biological mathematics [4,5]. As the way of network information propagation is similar to the way of the virus spread, many scholars use the typical infectious models, Susceptible–Infected–Susceptible (SIS) model [6,7] and Susceptible–Infected–Recovered (SIR) model [8–10], to simulate the online information propagation. Although these researches indicate that SIS and SIR model can show superior performance in certain aspects, there still reminds following challenges to be addressed (not exhaustive).

  • Effect of the dynamics of human behavior on the competitive spreading of information

    2018, Computers in Human Behavior
    Citation Excerpt :

    Theoretically speaking, the situation in which multiple messages mutually influence and competitively spread is completely different from the spread of an independent message. Models (Freeman, McVittie, Sivak, & Wu, 2014; Li, Zhang, Chen, & Cao, 2014; Zhou, Hu, Wu, & Xiong, 2015) discussing the spread of a single piece of information are no longer applicable to the study of competitive spreading among multiple pieces of information. As far as practical applications are concerned, the competition among messages that represent different opinions and attitudes usually leads to changes in the results of an event or word of mouth on a product.

  • A two-stage broadcast message propagation model in social networks

    2016, Physica A: Statistical Mechanics and its Applications
    Citation Excerpt :

    modified the SIR model and developed susceptible–infected–hibernator–removed (SIHR) model under the case of considering forgetting and memory mechanism, which revealed the spread of rumor is repeatable. Freeman et al. [15] focused on the information propagation rule of online social networking sites (Digg) by an improved SIR model, which could predict user voting behavior within 50 h. Wang et al. [16]

  • Mining the key predictors for event outbreaks in social networks

    2016, Physica A: Statistical Mechanics and its Applications
    Citation Excerpt :

    In Section 6, we conclude the paper and discuss future research directions. In recent years, information diffusion, also known as information cascade, has drawn considerable attention from many fields of research, and a variety of methods and models have been proposed to capture information diffusion in social networks [5–8]. Some researchers focus on building effective models to explain the general process of information diffusion.

  • Topological evolution of virtual social networks by modeling social activities

    2015, Physica A: Statistical Mechanics and its Applications
    Citation Excerpt :

    However, it is intractable to conduct rigorous studies of human centric networking and communications over a large-scale virtual social network because of the large scale, complex topology and security problems of network. In addition, it is illegal to carry out special scientific researches and experimental developments on real social networks, such as social-aware routing protocol design, faults [6] and worm propagation[7–11], and advertising promotion [12]. As such, the structural modeling [13] and conceptual properties [14–16] of virtual social networks are well studied as a special form of the networks.

View all citing articles on Scopus
1

Current address: Centre for Complexity Science, University of Warwick, Coventry CV4 7AL, United Kingdom.

View full text