Viral information propagation in the Digg online social network

doi:10.1016/j.physa.2014.06.011

Physica A: Statistical Mechanics and its Applications

Volume 415, 1 December 2014, Pages 87-94

https://doi.org/10.1016/j.physa.2014.06.011 Get rights and content

Highlights

•
We propose and analyze an epidemiological model for information propagation in online social network.
•
We characterize peak timing, turning point, viral period, and final size of the number of votes.
•
There are significant similarity and difference between information propagation in OSNs differs from disease spread in populations.
•
Simple dynamic models can provide accurate prediction of information propagation in OSNs.

Abstract

We propose the use of a variant of the epidemiological SIR model to accurately describe the diffusion of online content over the online social network Digg.com. We examine the qualitative properties of our viral information propagation model, demonstrate the model’s applications to social media spread in online social networks with particular focus on accurately predicting user voting behavior over a period of 50 h. The model allows us to characterize the peak time, turning point, viral period and final size (total number of votes), and gives much improved prediction of user voting behaviors than other established models.

Introduction

Everyday in online social networks (OSNs), thousands of users post news articles, videos, photos etc. which become visible to their connected users as new online content. As most of these forms of media never spread to a wide audience from the sources, many users are influenced by their networking. The causes and dynamics by which information proliferates throughout OSNs are still poorly understood. A greater comprehension of the mathematics underlying the spread of information in OSNs would have important applications for advertisers seeking to wage more effective online marketing campaigns and may enable a more rapid spread of information over OSNs in the aftermath of political crises or natural disasters.

The focus of this article is the OSN Digg.com (DOSN). In this network, users are able to post content to a personal web page, vote for (“digg”) or against (“bury”) this content and share the content with users to whom they are connected. There are two forms of connections between users: directed (one user can share content with another user but not vice versa) and bidirected (both users can share content with each other). Once posted content receives some large number of votes over a particular period of time, the content is then posted to the homepage of Digg.com and is visible to all users in the DOSN.

One of the reasons we select DOSN to illustrate our epidemiological approach is our accessibility to the dataset of Ref. [1] which contains the information of voting characteristics of the DOSN for June 2009. In particular, the data contains 3553 distinct stories (online content), the number of votes a particular story received, the particular users that voted for each story and the time at which each user cast the vote. On average, each story received approximately 850 votes where the minimum number of votes was 122 and the maximum 24 099. It should be noted, that this dataset only includes the stories that were promoted to the homepage of Digg.com in June 2009. In addition to voting data, [1] also contains the connectivity information of 71 367 distinct users which includes: the users to which each user is connected, the time at which the connection was created and the type of connection that was created (directed or bidirected). We determined that on average, every user is connected to 24 other users of which approximately half were directed (48.901%) and half were bidirected (51.099%). As in Ref. [2], we defined the distance metric between two users in the DOSN as the minimum number of connections (directed or bidirected) needed to connect them. We defined two users as being disconnected if there does not exist any path in the DOSN connecting them. We refer to Ref. [3] for the excellent empirical characterization of this data.

A goal of modeling the propagation of information in an OSN is to understand the rate at which a piece of online content influences the users as a function of time and distance away from the source of the propagation. The linear diffusive model of Feng et al. [4] used a temporal–spatial partial differential equation (PDE) model to explain these rates of spread in the DOSN. By fixing their model’s parameters and altering the initial conditions to replicate the information propagation, they were able to achieve an average model accuracy of 97.41% for the most popular story. Additionally, they examined all stories receiving more than 3000 votes (134 stories). In approximately 60% of these stories, they had model accuracies greater than 80%.

Our focus was directed towards an adaptation and application of an epidemiological model which describes the spread of a virus in a population. In modifying and utilizing this model, we were able to predict the cumulative number of users who voted for any shared story at time $t$ (hours) after its initial posting, the time period (viral period) during which the story diffuses quickly through the DOSN, and the peaking time for the total time of “influence users” to reach the maximum, and the turning point when the information spread starts to slow down. By using this model, we achieved higher model accuracies than [4] in both the most popular story and the most popular 134 stories. Furthermore, we achieved an average predictive accuracy of approximately 80% for all voted stories.

Section snippets

The model

We modeled the diffusion of a particular story through the DOSN by using a modified epidemiological SIR model [5]. As in modeling the spread of a virus in a population, we used similar SIR definitions from epidemiology to categorize the users of the DOSN at any given time in relation to any given story. The “susceptible” population $S (t)$ is comprised of users who have not yet voted for a particular story, the “infected” population $I (t)$ consists of users who have voted for a particular story and

Conclusion and discussion

By using a variant of the epidemiological SIR model, we obtained an average predictive accuracy of over 98% for the most popular story, approximately 86% for the 134 highest voted news articles and approximately 80% of all stories. These accuracies show that the application of a viral information propagation model more accurately predicts the voting trend of Digg network stories than the previous model in Ref. [4] over the first 50 h.

In addition to achieving higher accuracies, we showed that

Acknowledgments

This research was supported in part by the Fields Institute for Research in Mathematical Sciences, the Mitacs, the Canada Research Chair program and by Natural Sciences and Engineering Research Council of Canada. The authors would like to thank Professor Feng Wang at Arizona State University for her expertise advice on online social network information dynamics, and Professor Kristina Lerman at the Information Sciences Institute for granting us the access to the datasets of Ref. [1]. We also

References (7)

X. Wang et al.
Richards model revisited: validation by and application to infection dynamics
J. Theoret. Biol.
(2012)
K. Lerman, Digg 2009 Data Set,...
F. Wang, H. Wang, K. Xu, Diffusive logistic model towards predicting information diffusion in online social networks,...

There are more references available in the full text version of this article.

Cited by (17)

An extended SEIR model considering homepage effect for the information propagation of online social networks
2018, Physica A: Statistical Mechanics and its Applications
Citation Excerpt :
One of the classical mathematical models to elaborate information propagation is the infectious diseases model in biological mathematics [4,5]. As the way of network information propagation is similar to the way of the virus spread, many scholars use the typical infectious models, Susceptible–Infected–Susceptible (SIS) model [6,7] and Susceptible–Infected–Recovered (SIR) model [8–10], to simulate the online information propagation. Although these researches indicate that SIS and SIR model can show superior performance in certain aspects, there still reminds following challenges to be addressed (not exhaustive).
In this work we extend the SEIR model as in epidemic disease modeling to investigate the propagation dynamics of the information online. Here, we go one step further and takes the homepage effect into consideration, acting as the infectious sources in infectious diseases. Aside of the mathematical analysis, the results of simulation also show some managerial insights that are helpful for controlling the information spread. Furthermore, a case study, usingthe information propagation data in Digg.com, is carried out to examine the effectiveness of two propagation models. Our study could be a starting point for developing a more realistic model about the online propagation dynamics of multiple messages.
Effect of the dynamics of human behavior on the competitive spreading of information
2018, Computers in Human Behavior
Citation Excerpt :
Theoretically speaking, the situation in which multiple messages mutually influence and competitively spread is completely different from the spread of an independent message. Models (Freeman, McVittie, Sivak, & Wu, 2014; Li, Zhang, Chen, & Cao, 2014; Zhou, Hu, Wu, & Xiong, 2015) discussing the spread of a single piece of information are no longer applicable to the study of competitive spreading among multiple pieces of information. As far as practical applications are concerned, the competition among messages that represent different opinions and attitudes usually leads to changes in the results of an event or word of mouth on a product.
In the current online environment, it is very common to have many different kinds of information spreading concurrently on social media. This study investigates the competitive spreading phenomenon by comparing positive information and negative information. Using a dynamical systems approach, we propose an information competition model to explore the competitive results. The model reveals what kind of information will win the aforementioned competition by describing the dynamic changes of the number of information spreaders. Dynamic analysis and simulation results of the model show that a group in a competitive spreading system cannot absolutely win, whether this group spreads positive information or negative information. The reason is that the group influence plays a more important role than the negativity bias in the process of competitive information diffusion. The existence of group influence aggravates people's herd psychology. People are more willing to believe and follow the behavior of the majority. This study not only enriches the information spreading model, but can also be used to guide emergency management and online marketing decisions.
Research on suppression strategy of social network information based on effective isolation
2018, Procedia Computer Science
In the social network, when the virus or rumor breaks out, isolation strategy is widely adopted. However, the problem of invalid isolation is often existed in this strategy, and in the process, the nodes always keep the mobility. In order to solve these problems, the SIQR model is improved, some new parameters such as rate of new accession, rate of independent withdrawal and rate of invalid isolation are introduced, and the existence and stability of the equilibrium point are studied by using the complex network mean field theory. The relationship between the transmission rate and the isolation rate of the virus (rumors) is revealed. Reliability is verified through experiments. The experimental results show that the virus propagation is determined by a threshold, and the isolation rate is negatively related to this threshold. When the virus broke out, according to the relationship between the influencing factors and the threshold, take a variety of effective measures to control its propagation to the minimum range.
A two-stage broadcast message propagation model in social networks
2016, Physica A: Statistical Mechanics and its Applications
Citation Excerpt :
modified the SIR model and developed susceptible–infected–hibernator–removed (SIHR) model under the case of considering forgetting and memory mechanism, which revealed the spread of rumor is repeatable. Freeman et al. [15] focused on the information propagation rule of online social networking sites (Digg) by an improved SIR model, which could predict user voting behavior within 50 h. Wang et al. [16]
Message propagation in social networks is becoming a popular topic in complex networks. One of the message types in social networks is called broadcast message. It refers to a type of message which has a unique and unknown destination for the publisher, such as ‘lost and found’. Its propagation always has two stages. Due to this feature, rumor propagation model and epidemic propagation model have difficulty in describing this message’s propagation accurately. In this paper, an improved two-stage susceptible–infected–removed model is proposed. We come up with the concept of the first forwarding probability and the second forwarding probability. Another part of our work is figuring out the influence to the successful message transmission chance in each level resulting from multiple reasons, including the topology of the network, the receiving probability, the first stage forwarding probability, the second stage forwarding probability as well as the length of the shortest path between the publisher and the relevant destination. The proposed model has been simulated on real networks and the results proved the model’s effectiveness.
Mining the key predictors for event outbreaks in social networks
2016, Physica A: Statistical Mechanics and its Applications
Citation Excerpt :
In Section 6, we conclude the paper and discuss future research directions. In recent years, information diffusion, also known as information cascade, has drawn considerable attention from many fields of research, and a variety of methods and models have been proposed to capture information diffusion in social networks [5–8]. Some researchers focus on building effective models to explain the general process of information diffusion.
It will be beneficial to devise a method to predict a so-called event outbreak. Existing works mainly focus on exploring effective methods for improving the accuracy of predictions, while ignoring the underlying causes: What makes event go viral? What factors that significantly influence the prediction of an event outbreak in social networks? In this paper, we proposed a novel definition for an event outbreak, taking into account the structural changes to a network during the propagation of content. In addition, we investigated features that were sensitive to predicting an event outbreak. In order to investigate the universality of these features at different stages of an event, we split the entire lifecycle of an event into 20 equal segments according to the proportion of the propagation time. We extracted 44 features, including features related to content, users, structure, and time, from each segment of the event. Based on these features, we proposed a prediction method using supervised classification algorithms to predict event outbreaks. Experimental results indicate that, as time goes by, our method is highly accurate, with a precision rate ranging from 79% to 97% and a recall rate ranging from 74% to 97%. In addition, after applying a feature-selection algorithm, the top five selected features can considerably improve the accuracy of the prediction. Data-driven experimental results show that the entropy of the eigenvector centrality, the entropy of the PageRank, the standard deviation of the betweenness centrality, the proportion of re-shares without content, and the average path length are the key predictors for an event outbreak. Our findings are especially useful for further exploring the intrinsic characteristics of outbreak prediction.
Topological evolution of virtual social networks by modeling social activities
2015, Physica A: Statistical Mechanics and its Applications
Citation Excerpt :
However, it is intractable to conduct rigorous studies of human centric networking and communications over a large-scale virtual social network because of the large scale, complex topology and security problems of network. In addition, it is illegal to carry out special scientific researches and experimental developments on real social networks, such as social-aware routing protocol design, faults [6] and worm propagation[7–11], and advertising promotion [12]. As such, the structural modeling [13] and conceptual properties [14–16] of virtual social networks are well studied as a special form of the networks.
With the development of Internet and wireless communication, virtual social networks are becoming increasingly important in the formation of nowadays’ social communities. Topological evolution model is foundational and critical for social network related researches. Up to present most of the related research experiments are carried out on artificial networks, however, a study of incorporating the actual social activities into the network topology model is ignored. This paper first formalizes two mathematical abstract concepts of hobbies search and friend recommendation to model the social actions people exhibit. Then a social activities based topology evolution simulation model is developed to satisfy some well-known properties that have been discovered in real-world social networks. Empirical results show that the proposed topology evolution model has embraced several key network topological properties of concern, which can be envisioned as signatures of real social networks.

View all citing articles on Scopus

¹: Current address: Centre for Complexity Science, University of Warwick, Coventry CV4 7AL, United Kingdom.

View full text