Skip to main content
Log in

An efficient and effective hop-based approach for influence maximization in social networks

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Influence maximization in social networks is a classic and extensively studied problem that targets at selecting a set of initial seed nodes to spread the influence as widely as possible. However, it remains an open challenge to design fast and accurate algorithms to find solutions in large-scale social networks. Prior Monte Carlo simulation-based methods are slow and not scalable, while other heuristic algorithms do not have any theoretical guarantee and they have been shown to produce poor solutions for quite some cases. In this paper, we propose hop-based algorithms that can be easily applied to billion-scale networks under the commonly used Independent Cascade and Linear Threshold influence diffusion models. Moreover, we provide provable data-dependent approximation guarantees for our proposed hop-based approaches. Experimental evaluations with real social network datasets demonstrate the efficiency and effectiveness of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Arora A, Galhotra S, Ranu S (2017) Debunking the myths of influence maximization: an in-depth benchmarking study. In: Proceedings of ACM SIGMOD, pp 651–666

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  • Borgs C, Brautbar M, Chayes J, Lucier B (2014) Maximizing social influence in nearly optimal time. In: Proceedings of SODA, pp 946–957

  • Cha M, Mislove A, Gummadi KP (2009) A measurement-driven analysis of information propagation in the Flickr social network. In: Proceedings WWW, pp 721–730

  • Chen W (2009) NetHEPT dataset. http://research.microsoft.com/en-us/people/weic/

  • Cheng S, Shen H, Huang J, Chen W, Cheng X (2014) IMRank: influence maximization via finding self-consistent ranking. In: Proceedings ACM SIGIR, pp 475–484

  • Cheng S, Shen H, Huang J, Zhang G, Cheng X (2013) Staticgreedy: solving the scalability-accuracy dilemma in influence maximization. In: Proceedings ACM CIKM, pp 509–518

  • Chen W, Lu W, Zhang N (2012) Time-critical influence maximization in social networks with time-delayed diffusion process. In: Proceedings of AAAI, pp 592–598

  • Chen W, Wang C, Wang Y (2010a) Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of ACM KDD, pp 1029–1038

  • Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of ACM KDD, pp 199–208

  • Chen W, Yuan Y, Zhang L (2010b) Scalable influence maximization in social networks under the linear threshold model. In: Proceedings of IEEE ICDM, pp. 88–97

  • Cohen E, Delling D, Pajor T, Werneck RF (2014) Sketch-based influence maximization and computation: scaling up with guarantees. In: Proceedings ACM CIKM, pp 629–638

  • Conforti M, Cornuéjols G (1984) Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the rado-edmonds theorem. Discrete Appl Math 7(3):251–274

    Article  MathSciNet  Google Scholar 

  • Dinh TN, Zhang H, Nguyen DT, Thai MT (2014) Cost-effective viral marketing for time-critical campaigns in large-scale social networks. IEEE ACM Trans Netw 22(6):2001–2011

    Article  Google Scholar 

  • Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings ACM KDD, pp 57–66

  • Galhotra S, Arora A, Roy S (2016) Holistic influence maximization: Combining scalability and efficiency with opinion-aware models. In: Proceedings ACM SIGMOD, pp 743–758

  • Goel S, Watts DJ, Goldstein DG (2012) The structure of online diffusion networks. In: Proceedings ACM EC, pp 623–638

  • Goyal A, Bonchi F, Lakshmanan LVS (2011a) A data-based approach to social influence maximization. Proc VLDB Endow 5(1):73–84

    Article  Google Scholar 

  • Goyal A, Bonchi F, Lakshmanan L, Venkatasubramanian S (2013) On minimizing budget and time in influence propagation over social networks. Social Netw Anal Min 3(2):179–192

    Article  Google Scholar 

  • Goyal A, Lu W, Lakshmanan LV (2011b) Celf++: Optimizing the greedy algorithm for influence maximization in social networks. In: Proceedings WWW Companion, pp 47–48

  • Goyal A, Lu W, Lakshmanan LVS (2011c) Simpath: An efficient algorithm for influence maximization under the linear threshold model. In: Proceedings IEEE ICDM, pp 211–220

  • Jiang F, Jin S, Wu Y, Xu J (2014) A uniform framework for community detection via influence maximization in social networks. In: Proceedings IEEE/ACM ASONAM, pp 27–32

  • Jung K, Heo W, Chen W (2012) IRIE: scalable and robust influence maximization in social networks. In: Proceedings IEEE ICDM, pp 918–923

  • Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings ACM KDD, pp 137–146

  • Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of WWW, pp 591–600

  • Lee JR, Chung CW (2014) A fast approximation for influence maximization in large social networks. In: WWW Companion, pp 1157–1162

  • Leskovec J, Adamic LA, Huberman BA (2007a) The dynamics of viral marketing. ACM Trans Web 1(1):5:1–5:39

    Article  Google Scholar 

  • Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007b) Cost-effective outbreak detection in networks. In: Proceedings of ACM KDD, pp 420–429

  • Leskovec J, Krevl A (2014) SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data

  • Li Y, Zhao BQ, Lui JCS (2012) On modeling product advertisement in large-scale online social networks. IEEE ACM Trans Netw 20(5):1412–1425

    Article  Google Scholar 

  • Lin Y, Chen W, Lui JC (2017) Boosting information spread: an algorithmic approach. In: Proceedings of IEEE ICDE, pp 883–894

  • Liu B, Cong G, Xu D, Zeng Y (2012) Time constrained influence maximization in social networks. In: Proceedings of IEEE ICDM, pp 439–448

  • Lu W, Chen W, Lakshmanan LV (2015) From competition to complementarity: comparative influence diffusion and maximization. Proc VLDB Endow 9(2):60–71

    Article  Google Scholar 

  • Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions-I. Math Program 14(1):265–294

    Article  MathSciNet  Google Scholar 

  • Nguyen HT, Dinh TN, Thai MT (2016a) Cost-aware targeted viral marketing in billion-scale networks. In: Proceedings of IEEE INFOCOM

  • Nguyen HT, Thai MT, Dinh TN (2016b) Stop-and-stare: optimal sampling algorithms for viral marketing in billion-scale networks. In: Proceedings of ACM SIGMOD, pp 695–710

  • Ohsaka N, Akiba T, Yoshida Y, Kawarabayashi K (2014) Fast and accurate influence maximization on large networks with pruned Monte-Carlo simulations. In: Proceedings of AAAI, pp 138–144

  • Ohsaka N, Sonobe T, Fujita S, Kawarabayashi Ki (2017) Coarsening massive influence networks for scalable diffusion analysis. In: Proceedings of ACM SIGMOD, pp 635–650

  • Song G, Zhou X, Wang Y, Xie K (2015) Influence maximization on large-scale mobile social network: a divide-and-conquer method. IEEE Trans Parallel Distrib Syst 26(5):1379–1392

    Article  Google Scholar 

  • Tang Y, Shi Y, Xiao X (2015) Influence maximization in near-linear time: A martingale approach. In: Proceedings of ACM SIGMOD, pp 1539–1554

  • Tang J, Tang X, Xiao X, Yuan J (2018a) Online processing algorithms for influence maximization. In: Proceedings of ACM SIGMOD

  • Tang J, Tang X, Yuan J (2016) Profit maximization for viral marketing in online social networks. In: Proceedings of IEEE ICNP, pp 1–10

  • Tang J, Tang X, Yuan J (2017a) Influence maximization meets efficiency and effectiveness: a hop-based approach. In: Proceedings of IEEE/ACM ASONAM, pp 64–71

  • Tang J, Tang X, Yuan J (2017b) Profit maximization for viral marketing in online social networks: algorithms and analysis. IEEE Trans Knowl Data Eng (Preprint)

  • Tang J, Tang X, Yuan J (2018b) Towards profit maximization for online social network providers. In: Proceedings of IEEE INFOCOM

  • Tang Y, Xiao X, Shi Y (2014) Influence maximization: Near-optimal time complexity meets practical efficiency. In: Proceedings of ACM SIGMOD, pp 75–86

  • Wang Z, Yang Y, Pei J, Chu L, Chen E (2017) Activity maximization by effective information diffusion in social networks. IEEE Trans Knowl Data Eng 29(11):2374–2387

    Article  Google Scholar 

  • Xu W, Lu Z, Wu W, Chen Z (2014) A novel approach to online social influence maximization. Social Netw Anal Min 4(1):153

    Article  Google Scholar 

  • Zhang C, Sun J, Wang K (2013) Information propagation in microblog networks. In: Proceedings of IEEE/ACM ASONAM, pp 190–196

  • Zhou C, Zhang P, Guo J, Guo L (2014) An upper bound based greedy algorithm for mining top-k influential nodes in social networks. In: Proceedings of WWW Companion, pp 421–422

  • Zhou C, Zhang P, Guo J, Zhu X, Guo L (2013) UBLF: an upper bound based approach to discover influential nodes in social networks. In: Proceedings of IEEE ICDM, pp 907–916

Download references

Acknowledgements

This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its IDM Futures Funding Initiative, and by Singapore Ministry of Education Academic Research Fund Tier 1 under Grant 2017-T1-002-024 and Tier 2 under Grant MOE2015-T2-2-114.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Tang.

Appendix

Appendix

Proof of Theorem 1

To consider the outgoing edges from u one at a time, we first disable all the edges from u to its neighbors except for one edge \(\langle u,w_1\rangle\). Then, for each neighbor v of \(w_1\), all of v’s inverse neighbors other than \(w_1\) have their one-hop activation probabilities unchanged by adding \(\langle u,w_1\rangle\). Let \(\pi _2^{{S}\cup \{u\}}(v|w_1)\) denote the new two-hop activation probability of v. Then, we have

$$\begin{aligned} \frac{1-\pi _2^{{S}\cup \{u\}}(v|w_1)}{1-\pi _2^{{S}}(v)}=\rho (S,u,v,w_1), \end{aligned}$$
(16)

where \(\rho (S,u,v,w)=\frac{1-p_{w,v}\cdot \pi _1^{{S}\cup \{u\}}(w)}{1-p_{w,v}\cdot \pi _1^{{S}}(w)}\). Next, we enable the second edge \(\langle u,w_2\rangle\). Let \(\pi _2^{{S}\cup \{u\}}(v|w_1,w_2)\) denote the new two-hop activation probability of v. Following similar arguments, for each neighbor v of \(w_2\), we have

$$\begin{aligned} \frac{1-\pi _2^{{S}\cup \{u\}}(v|w_1,w_2)}{1-\pi _2^{{S}\cup \{u\}}(v|w_1)} =\rho (S,u,v,w_2). \end{aligned}$$
(17)

We continue to enable the outgoing edges of u sequentially. In general, when an edge \(\langle u,w_i\rangle\) is enabled after edges \(\langle u,w_1\rangle, \langle u,w_2\rangle, \ldots , \langle u,w_{i-1}\rangle\), for each neighbor v of \(w_i\), we have

$$\begin{aligned} \frac{1-\pi _2^{{S}\cup \{u\}}(v|w_1,\dots ,w_i)}{1-\pi _2^{{S}\cup \{u\}}(v|w_1,\dots ,w_{i-1})}=\rho (S,u,v,w_i). \end{aligned}$$
(18)

Therefore, we can initialize \(\pi _2^{{S}\cup \{u\}}(v)\) with \(\pi _2^{{S}}(v)\) and iteratively update \(\pi _2^{{S}\cup \{u\}}(v)\) with

$$\begin{aligned} 1-\left(1-\pi _2^{{S}\cup \{u\}}(v)\right)\cdot \rho (S,u,v,w), \end{aligned}$$
(19)

for all the nodes \(w\in {N}_u\setminus {S}\) and \(v\in {N}_w\setminus {S}\). Moreover, for the direct neighbors of u, their two-hop activation probabilities also need to be adjusted because u’s one-hop activation probability has changed from \(\pi _1^{{S}}(u)\) to 1. For each neighbor v of u, the adjustment can be made in a similar way by updating \(\pi _2^{{S}\cup \{u\}}(v)\) with

$$\begin{aligned} 1-\left(1-\pi _2^{{S}\cup \{u\}}(v)\right)\cdot \rho (S,u,v,u). \end{aligned}$$
(20)

Then, the final two-hop activation probability \(\pi _2^{{S}\cup \{u\}}(v)\) by the iterative updates (19) and (20) is

$$\begin{aligned} \pi _2^{{S}\cup \{u\}}(v) = 1-\left(1-\pi _2^{{S}}(v)\right)\cdot \prod _{w\in ({M}_{u,v}\cup \{u\})}\rho (S,u,v,w). \end{aligned}$$
(21)

Hence, the theorem is proven. \(\square\)

Proof of Theorem 2

Consider a single seed \(\{u\}\). Let \({A}_u\subseteq {N}_u\) denote a subset of a node u’s neighbors. Let \(p({A}_u)\) denote the probability that all the nodes in \({A}_u\) are activated directly by u under the IC and LT models, while all the nodes in \({N}_u\setminus {A}_u\) are not directly activated by u (they may not even be activated eventually). Since each of u’s neighbors is activated by u independently, we have

$$\begin{aligned} p({A}_u)=\left(\prod _{v\in {A}_u} p_{u,v}\right)\cdot \left(\prod _{v\in {N}_u\setminus {A}_u}(1-p_{u,v})\right). \end{aligned}$$
(22)

Furthermore, with h hops of propagation, for each node \(w\in {V}\setminus \{u\}\), w can only be activated by a propagation path starting from a node \(v\in {A}_u\) whose path length is no longer than \(h-1\) hops. In other words, the probability for w to be activated by \({A}_u\) is \(\pi _{h-1}^{{A}_u}(w)\). Considering all the possible node sets \({A}_u\) activated directly by u, we have

$$\begin{aligned}&\sigma _h(\{u\})\nonumber \\&\quad = 1+\sum _{{A}_u\subseteq {N}_u}\left(p({A}_u)\cdot \sum _{w\in {V}\setminus \{u\}}\pi _{h-1}^{{A}_u}(w)\right)\nonumber \\&\quad \le 1+\sum _{{A}_u\subseteq {N}_u}\left(p({A}_u)\cdot \sum _{w\in {V}}\pi _{h-1}^{{A}_u}(w)\right)\nonumber \\&\quad = 1+\sum _{{A}_u\subseteq {N}_u}\left(p({A}_u)\cdot \sigma _{h-1}({A}_u)\right)\nonumber \\&\quad \le 1+\sum _{{A}_u\subseteq {N}_u}\left(p({A}_u)\cdot \sum _{v\in {A}_u}\sigma _{h-1}(\{v\})\right)\nonumber \\&\quad = 1+\sum _{{A}_u\subseteq {N}_u}\left(p({A}_u)\cdot \sum _{v\in {N}_u}\left(\sigma _{h-1}(\{v\})\cdot p(v\in {A}_u)\right)\right)\nonumber \\&\quad = 1+\sum _{{A}_u\subseteq {N}_u}\left(\sum _{v\in {N}_u}\left(p({A}_u)\cdot \sigma _{h-1}(\{v\})\cdot p(v\in {A}_u)\right)\right)\nonumber \\&\quad = 1+\sum _{v\in {N}_u}\left(\sum _{{A}_u\subseteq {N}_u}\left(p({A}_u)\cdot \sigma _{h-1}(\{v\})\cdot p(v\in {A}_u)\right)\right)\nonumber \\&\quad = 1+\sum _{v\in {N}_u}\left(\sigma _{h-1}(\{v\})\cdot \sum _{{A}_u\subseteq {N}_u}\left(p({A}_u)\cdot p(v\in {A}_u)\right)\right). \end{aligned}$$
(23)

The second “\(\le\)” is due to the submodularity of \(\sigma _{h}(\cdot )\) (see Theorem 3) such that \(\sigma _{h-1}({A}_u)\le \sum _{v\in {A}_u}\sigma _{h-1}(\{v\})\). In the third “=”, \(p(v\in {A}_u)\) is such a binary value that \(p(v\in {A}_u)=1\) if and only if \(v\in {A}_u\). Meanwhile, we have

$$\begin{aligned}&\sum _{{A}_u\subseteq {N}_u}\left(p({A}_u)\cdot p(v\in {A}_u)\right)\nonumber \\&\quad =\sum _{{A}_u\subseteq {N}_u\setminus \{v\}}\left(p({A}_u)\cdot p(v\in {A}_u)\right)\nonumber \\&\qquad +\sum _{{A}_u\subseteq {N}_u\setminus \{v\}}\left(p({A}_u\cup \{v\})\cdot p(v\in {A}_u\cup \{v\})\right)\nonumber \\&\quad =\sum _{{A}_u\subseteq {N}_u\setminus \{v\}}p({A}_u\cup \{v\}). \end{aligned}$$
(24)

The last “=” follows the fact that \(p(v\in {A}_u)=0\) since \(v\not \in {A}_u\subseteq {N}_u\setminus \{v\}\) and \(p(v\in {A}_u\cup \{v\})=1\) since \(v\in {A}_u\cup \{v\}\). Therefore, from (23) and (24), we have

$$\begin{aligned} \sigma _h(\{u\})\le 1+\sum _{v\in {N}_u}\left(\sigma _{h-1}(\{v\})\cdot \sum _{{A}_u\subseteq {N}_u\setminus \{v\}}p({A}_u\cup \{v\})\right). \end{aligned}$$
(25)

Furthermore, by definition,

$$\begin{aligned}&\sum _{{A}_u\subseteq {N}_u\setminus \{v\}}p({A}_u\cup \{v\})\nonumber \\&\quad = \sum _{{A}_u\subseteq {N}_u\setminus \{v\}}\left(\left(\prod _{w\in {A}_u\cup \{v\}} p_{u,w}\right)\cdot \left(\prod _{w\in {N}_u\setminus ({A}_u\cup \{v\})}(1-p_{u,w})\right)\right)\nonumber \\&\quad = \sum _{{A}_u\subseteq {N}_u\setminus \{v\}}\left(p_{u,v}\cdot \left(\prod _{w\in {A}_u} p_{u,w}\right)\cdot \left(\prod _{w\in {N}_u\setminus ({A}_u\cup \{v\})}(1-p_{u,w})\right)\right)\nonumber \\&\quad = p_{u,v}\cdot \sum _{{A}_u\subseteq {N}_u\setminus \{v\}}\left(\left(\prod _{w\in {A}_u} p_{u,w}\right)\cdot \left(\prod _{w\in {N}_u\setminus ({A}_u\cup \{v\})}(1-p_{u,w})\right)\right)\nonumber \\&\quad = p_{u,v}\cdot 1\nonumber \\&\quad = p_{u,v}. \end{aligned}$$
(26)

Thus, by (25) and (26), it holds that

$$\begin{aligned} \sigma _h(\{u\})\le 1+\sum _{v\in {N}_u}\left(\sigma _{h-1}(\{v\})\cdot p_{u,v}\right). \end{aligned}$$
(27)

Inequality (11) can be proved by induction. When \(h=1\), the inequality follows directly from Inequality (10). Suppose that it holds for \(h-1\) hops of propagation, i.e., \(\sigma _{h-1}(\{u\}) \le \hat{\sigma }_{h-1}(\{u\})\). Then, for h hops of propagation, we have

$$\begin{aligned} \sigma _h(\{u\})&\le 1+\sum _{v\in {N}_u}\left(p_{u,v}\cdot \sigma _{h-1}(\{v\})\right)\nonumber \\&\le 1+\sum _{v\in {N}_u}\left(p_{u,v}\cdot \hat{\sigma }_{h-1}(\{v\})\right)\nonumber \\&= \hat{\sigma }_{h}(\{u\}). \end{aligned}$$
(28)

Therefore, for any \(h\ge 0\), we have \(\sigma _{h}(\{u\}) \le \hat{\sigma }_{h}(\{u\})\). \(\square\)

Proof of Theorem 3

This can be proved using the live edge approach (Kempe et al. 2003).

  • Under the IC model, for each edge \(\langle u,v\rangle\in {E}\), we independently flip a coin of bias \(p_{u,v}\) to decide whether the edge ⟨uv⟩ is live or blocked to generate a sample influence propagation outcome X.

  • Under the LT model, for each node \(v\in V\), it picks at most one of its incoming edge at random—selecting the edge from an inverse neighbor u with probability \(p_{u,v}\) and not selecting any incoming edge with probability \(1-\sum _{u\in {I}_v}p_{u,v}\).

We use p(X) to denote the probability of a specific outcome X in the sample space. Let \({V}_h^X(v)\) denote the node set that can be reached from a node v within h hops in the sample outcome X. Then, the number of nodes that can be reached from a seed set S within h hops in the outcome X is given by \(\sigma _h^X({S})=\Big |\bigcup _{v\in {S}}{V}_h^X(v)\Big |\). Thus,

$$\begin{aligned} \sigma _h({S})=\sum _{X}\left(p(X)\cdot \sigma _h^X({S})\right), \end{aligned}$$
(29)

where the monotonicity of \(\sigma _h({S})\) holds since \(\sigma _h^X({S})\) increases as S expands.

The marginal influence gain

$$\begin{aligned} \sigma _h^X({S}\cup \{u\})-\sigma _h^X({S})=\Big |{V}_h^X(u)\setminus \bigcup _{v\in {S}}{V}_h^X(v)\Big | \end{aligned}$$
(30)

is the number of nodes that are reachable from a node u within h hops but are not reachable from any node in a seed set S within h hops in a sample outcome X. For any two node sets S and T where \({S}\subseteq {T}\), we have \(\bigcup _{v\in {S}}{V}_h^X(v)\subseteq \bigcup _{v\in {T}}{V}_h^X(v)\). Thus, \({V}_h^X(u)\setminus \bigcup _{v\in {S}}{V}_h^X(v)\supseteq {V}_h^X(u)\setminus \bigcup _{v\in {T}}{V}_h^X(v)\), which implies that

$$\begin{aligned} \sigma _h^X({S}\cup \{u\})-\sigma _h^X({S})\ge \sigma _h^X({T}\cup \{u\})-\sigma _h^X({T}). \end{aligned}$$
(31)

Since \(p(X)\ge 0\) for any X, taking the linear combination, we have

$$\begin{aligned} \sigma _h({S}\cup \{u\})-\sigma _h({S})\ge \sigma _h({T}\cup \{u\})-\sigma _h({T}). \end{aligned}$$
(32)

Thus, \(\sigma _h(\cdot )\) is submodular. \(\square\)

Proof of Theorem 4

Let \({S}_h^*\) denote the optimal seed set for maximizing the influence spread within h hops of propagation, i.e., \(\sigma _h({S}_h^*)=\max _{|{S}|=k}\sigma _h({S})\). We have

$$\begin{aligned} \sigma ({S}_h)&\ge \sigma _h({S}_h)\nonumber \\&\ge \left(\frac{1}{\kappa _{\sigma _h}}(1-e^{-\kappa _{\sigma _h}})\right)\sigma _h({S}_h^*)\nonumber \\&\ge \left(\frac{1}{\kappa _{\sigma _h}}(1-e^{-\kappa _{\sigma _h}})\right)\sigma _h({S}^*)\nonumber \\&=\left(\frac{1}{\kappa _{\sigma _h}}(1-e^{-\kappa _{\sigma _h}})\alpha \right)\sigma ({S}^*) \end{aligned}$$
(33)

The first inequality follows from the fact that the exact influence spread is equal to the influence spread without any hop limitation of propagation. The second inequality is because that the greedy algorithm can achieve \(\left(\frac{1}{\kappa _f}(1-e^{-\kappa _f})\right)\)-approximation for maximizing a monotone submodular function f with a cardinality constraint (Conforti and Cornuéjols 1984), where the submodularity and monotonicity of \(\sigma _h(\cdot )\) is given by Theorem 3. The third inequality is because \({S}_h^*\) is the optimal solution for maximizing \(\sigma _h(\cdot )\). \(\square\)

We first introduce some lemmas used to prove Theorem 5.

Lemma 1

For scale-free random graphs with propagation probability \(p_{u,v}=p\) for every edge \(\langle u,v\rangle\in {E}\), the expected influence spread produced within one hop of propagation from a random seed set S satisfies

$$\begin{aligned} \mathbb {E}[\sigma _1({S})]\ge (p+1)k-pk^2/|{V}|. \end{aligned}$$
(34)

Proof of Lemma 1

With one hop of propagation, for a randomly selected node v, it is not activated if and only if v is not a seed and v is not activated by any of its inverse neighbors. The probability for v to be a non-seed node is \(1-\frac{k}{|{V}|}\). The probability for an inverse neighbor of v to be a seed is \(\frac{k}{|V|}\), and thus, the probability for it to activate v is \(p\cdot \frac{k}{|{V}|}\). Therefore, the probability for all of v’s inverse neighbors to fail to activate v is

$$\begin{aligned} \prod _{u\in {I}_v}\left(1-p\cdot \frac{k}{|{V}|}\right)=\left(1-\frac{pk}{|{V}|}\right)^{|{I}_v|}. \end{aligned}$$
(35)

Note that if v is selected as a seed, it must be activated. Hence, the overall activation probability of v is

$$\begin{aligned} \pi _1^{S}(v)=1-\left(1-\frac{k}{|{V}|}\right)\cdot \left(1-\frac{pk}{|{V}|}\right)^{|{I}_v|}. \end{aligned}$$
(36)

As a result, the expectation of the activation probability of a random node v is given by

$$\begin{aligned} \mathbb {E}[\pi _1^{S}(v)]&=\mathbb {E}\left [1-\left(1-\frac{k}{|{V}|}\right )\cdot \left(1-\frac{pk}{|{V}|}\right )^{|{I}_v|}\right ]\nonumber \\&=1-\left (1-\frac{k}{|{V}|}\right )\cdot \sum _{|{I}_v|=1}^{\infty }\left (P_0(|{I}_v|)\cdot \left (1-\frac{pk}{|{V}|}\right )^{|{I}_v|}\right )\nonumber \\&\ge 1-\left (1-\frac{k}{|{V}|}\right )\cdot \left (1-\frac{pk}{|{V}|}\right )\cdot \sum _{|{I}_v|=1}^{\infty }P_0(|{I}_v|)\nonumber \\&=1-\left (1-\frac{k}{|{V}|}\right )\cdot \left (1-\frac{pk}{|{V}|}\right )\nonumber \\&=\frac{(1+p)k}{|{V}|}-\frac{pk^2}{|{V}|^2}. \end{aligned}$$
(37)

Therefore, it holds that \(\mathbb {E}[\sigma _1({S})]=|{V}|\cdot \mathbb {E}[\pi _1^{S}(v)]\ge (p+1)k-pk^2/|{V}|\). This completes the proof. \(\square\)

Lemma 2

(Li et al. 2012) For an infinite random power law graph, the expected fraction of nodes activated \(\phi ({S})=\mathbb {E}[\sigma ({S})]/|{V}|\) can be computed by

$$\begin{aligned} {\left\{ \begin{array}{ll} 1-\varphi ({S})=\left(1-\frac{k}{|{V}|}\right)\sum _{d=0}^{\infty }P_1(d+1)\left (1-p\varphi ({S})\right )^d,\\ 1-\phi ({S})=\left (1-\frac{k}{|{V}|}\right )\sum _{d=1}^{\infty }P_0(d)\left (1-p\varphi ({S})\right )^d, \end{array}\right. } \end{aligned}$$
(38)

where \(P_1(d)=\frac{d^{1-\gamma }}{\sum _{d=1}^{\infty }d^{1-\gamma }}\) is the probability of a node connecting to a neighbor whose degree is d, and \(\varphi ({S})\) is an instrumental variable.

Lemma 3

The expected fraction of nodes activated \(\phi ({S})\) is bounded by

$$\begin{aligned} \mathbb {E}[\sigma ({S})]\le |{V}|\cdot \left(1-\left(1-\frac{k}{|{V}|}\right)P_0(1)(1-pA)\right), \end{aligned}$$
(39)

where \(A=1-\left (1-\frac{k}{|{V}|}\right )P_1(1)\).

Proof of Lemma 3

From (38) in Lemma 2, we have

$$\begin{aligned} 1-\varphi ({S})\ge \left (1-\frac{k}{|{V}|}\right )P_1(1)\left (1-p\varphi ({S})\right )^0=1-A, \end{aligned}$$
(40)

and

$$\begin{aligned} 1-\phi ({S})\ge \left (1-\frac{k}{|{V}|}\right )P_0(1)\left (1-p\varphi ({S})\right ). \end{aligned}$$
(41)

Hence, by (40) and (41), the lemma follows. \(\square\)

Proof of Theorem 5

Lemma 1 indicates that

$$\begin{aligned} \mathbb {E}[\sigma _h({S})]\ge \mathbb {E}[\sigma _1({S})]\ge (p+1)k-pk^2/|{V}|. \end{aligned}$$
(42)

Lemma 3 indicates that

$$\begin{aligned} E[\sigma ({S})]\le |{V}|\cdot \left (1-\left (1-\frac{k}{|{V}|}\right )P_0(1)(1-pA)\right ). \end{aligned}$$
(43)

Putting (42) and (43) together, the theorem follows. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, J., Tang, X. & Yuan, J. An efficient and effective hop-based approach for influence maximization in social networks. Soc. Netw. Anal. Min. 8, 10 (2018). https://doi.org/10.1007/s13278-018-0489-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0489-y

Keywords

Navigation