Skip to main content
Log in

Know by a handful the whole sack: efficient sampling for top-k influential user identification in large graphs

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Influence Maximization aims to find the top-K influential individuals to maximize the influence spread within a social network, which remains an important yet challenging problem. Most existing greedy algorithms mainly focus on computing the exact influence spread, leading to low computational efficiency and limiting their application to real-world social networks. While in this paper we show that through supervised sampling, we can efficiently estimate the influence spread at only negligible cost of precision, thus significantly reducing the execution time. Motivated by this, we propose ESMCE, a power-law exponent supervised Monte Carlo estimation method. In particular, ESMCE exploits the power-law exponent of the social network to guide the sampling, and employs multiple iterative steps to guarantee the estimation accuracy. Moreover, ESMCE shows excellent scalability and well suits large-scale social networks. Extensive experiments on six real-world social networks demonstrate that, compared with state-of-the-art greedy algorithms, ESMCE is able to achieve almost two orders of magnitude speedup in execution time with only negligible error (2.21 % on average) in influence spread.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 199–208. ACM, Paris (2009)

    Chapter  Google Scholar 

  2. Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1029–1038. ACM, Washington, DC (2010)

    Chapter  Google Scholar 

  3. Chen, W., Collins, A., Cummings, R., Ke, T., Liu, Z., Rincon, D., Sun, X., Wang, Y., Wei, W., Yuan, Y.: Influence maximization in social networks when negative opinions may emerge and propagate. In: Proceedings of SIAM International Conference on Data Mining, pp. 379–390. SIAM, Mesa, AZ (2011)

  4. Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)

    Article  MATH  Google Scholar 

  5. Domingos, P., Richardson, M.: Mining the network value of customers. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 57–66. ACM, San Francisco, CA (2001)

    Google Scholar 

  6. Huang, Y., Chu H.: Practical consideration for grey modeling and its application to image processing. J. Grey Syst. 8(3), 217–234 (1996)

    Google Scholar 

  7. Jiang, Q., Song, G., Cong, G., Wang, Y., Si, W., Xie, K.: Simulated annealing based influence maximization in social networks. In: Proceedings of the 25th AAAI International Conference on Artificial Intelligence, pp. 127–132. AAAI, San Francisco, CA (2011)

    Google Scholar 

  8. Jung, K., Heo, W., Chen, W.: IRIE: A scalable influence maximization algorithm for independent cascade model and its extensions, pp. 1–20. CoRR arXiv preprint arXiv:1111.4795 (2011)

  9. Kawai, R.: Adaptive Monte Carlo variance reduction with two-time-scale stochastic approximation. Monte Carlo Methods Appl. 13(3), 197–217 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  10. Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 137–146. ACM, Washington, DC (2003)

    Google Scholar 

  11. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 177–187. ACM, Chicago, IL (2005)

    Google Scholar 

  12. Leskovec, J., Adamic L., Huberman, B.: The dynamics of viral marketing. ACM Trans. Web 1(1), Article 5 (2007)

    Google Scholar 

  13. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 420–429. ACM, San Jose, CA (2007)

    Chapter  Google Scholar 

  14. Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  15. Liu, S., Lin, Y.: Grey Systems: Theory and Applications, 1st edn. p. 380. Springer Berlin, Heidelberg (2010)

    Book  Google Scholar 

  16. Richardson, M., Agrawal, R., Domingos, P.: Trust management for the semantic web. In: Proceedings of 2nd International Semantic Web Conference, pp. 351–368. Springer, Sanibel Island, FL (2003)

    Google Scholar 

  17. Tseng, F., Yu, H., Tzeng, G.: Applied hybrid grey model to forecast seasonal time series. Technol. Forecast. Soc. 67(2), 291–302 (2001)

    Article  Google Scholar 

  18. Wang, Y., Cong, G., Song, G., Xie, K.: Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1039–1048. ACM, Washington, DC (2010)

    Chapter  Google Scholar 

  19. Wijayatunga, P., Cory B.: Sample size reduction in Monte Carlo based use-of-system costing of power systems. In: Proceedings of International Conference on Advances in Power System Control, Operation and Management, pp. 373–378. IEEE, Hong Kong (1991)

    Google Scholar 

  20. Yao, W., Chi, S., Chen, J.: An improved grey-based approach for electricity demand forecasting. Electr. Power Syst. Res. 67(3), 217–224 (2003)

    Article  Google Scholar 

  21. Zafarani, R., Liu, H.: Social Computing Data Repository at ASU. http://socialcomputing.asu.edu/ (2009). Accessed 15 April 2011

  22. Zhuge, H.: The Web Resource Space Model (Web Information Systems Engineering and Internet Technologies Book Series), 1st edn., p. 238. Springer (2008)

  23. Zhuge, H.: Communities and emerging semantics in semantic link network: discovery and learning. IEEE Trans. Knowl. Data Eng. 21(6), 785–799 (2009)

    Article  MathSciNet  Google Scholar 

  24. Zhuge, H.: Semantic linking through spaces for cyber-physical-socio intelligence: a methodology. Artif. Intell. 175(5–6), 988–1019 (2011)

    Article  Google Scholar 

  25. Zhuge, H.: The Knowledge Grid—Toward Cyber-Physical Society, 2nd edn. World Scientific Publishing Co., Singapore (2012)

    Book  Google Scholar 

  26. Zhuge, H., Xing, Y.: Probabilistic resource space model for managing resources in cyber-physical society. IEEE T. Serv. Comput. 5(3), 404–421 (2012)

    Article  Google Scholar 

  27. Zhuge, H., Zhang, J.: Topological centrality and its e-science applications. J. Am. Soc. Inf. Sci. Technol. 61(9), 1824–1841 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Li, S., Liao, X. et al. Know by a handful the whole sack: efficient sampling for top-k influential user identification in large graphs. World Wide Web 17, 627–647 (2014). https://doi.org/10.1007/s11280-012-0196-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-012-0196-y

Keywords

Navigation