Abstract
We develop a new sampling method to estimate eigenvector centrality on incomplete networks. Our goal is to estimate this global centrality measure having at disposal a limited amount of data. This is the case in many real-world scenarios where data collection is expensive, the network is too big for data storage capacity or only partial information is available. The sampling algorithm is theoretically grounded by results derived from spectral approximation theory. We studied the problem on both synthetic and real data and tested the performance comparing with state-of-the-art methods. We show that approximations obtained from such methods are not always reliable and that our algorithm, while preserving computational scalability, improves performance under some relevant error measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
De Choudhury, M., Lin, Y.R., Sundaram, H., Candan, K.S., Xie, L., Kelliher, A.: How does the data sampling strategy impact the discovery of information diffusion in social media?. In: Fourth International AAAI Conference on Weblogs and Social Media (2010)
Sadikov, E., Medina, M., Leskovec, J., Garcia-Molina, H.: Correcting for missing data in information cascades. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 55–64. ACM (2011)
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006)
Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: Data Compression Conference Proceedings DCC 2001, pp. 203–212. IEEE (2001)
Frank, O.: Network sampling and model fitting. In: Models and Methods in Social Network Analysis, pp. 31–56 (2005)
Han, J.-D.J., Dupuy, D., Bertin, N., Cusick, M.E., Vidal, M.: Effect of sampling on topology predictions of protein-protein interaction networks. Nat. Biotechnol. 23(7), 839 (2005)
Lee, S.H., Kim, P.-J., Jeong, H.: Statistical properties of sampled networks. Phys. Rev. E 73(1), 016102 (2006)
Kossinets, G.: Effects of missing data in social networks. Soc. Netw. 28(3), 247–268 (2006)
Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 2(1), 113–120 (1972)
Costenbader, E., Valente, T.W.: The stability of centrality measures when networks are sampled. Soc. Netw. 25(4), 283–307 (2003)
Saad, Y.: Numerical Methods for Large Eigenvalues Problems. Manchester University Press, Manchester (2011)
Blagus, N., Šubelj, L., Bajec, M.: Empirical comparison of network sampling: how to choose the most appropriate method? Physica A: Stat. Mech. Appl. 477, 136–148 (2017)
Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is the sample good enough? comparing data from Twitter’s streaming API with Twitter’s firehose. In: Seventh International AAAI Conference on Weblogs and Social Media (2013)
Stutzbach, D., Rejaie, R., Duffield, N., Sen, S., Willinger, W.: On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw. (TON) 17(2), 377–390 (2009)
Hübler, C., Kriegel, H.-P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 283–292. IEEE (2008)
Stumpf, M.P., Wiuf, C.: Sampling properties of random graphs: the degree distribution. Phys. Rev. E 72(3), 036118 (2005)
Ganguly, A., Kolaczyk, E.D.: Estimation of vertex degrees in a sampled network. In: 2017 51st Asilomar Conference on Signals, Systems, and Computers, pp. 967–974. IEEE (2018)
Antunes, N., Bhamidi, S., Guo, T., Pipiras, V., Wang, B.: Sampling-based estimation of in-degree distribution with applications to directed complex networks. arXiv preprint arXiv:1810.01300 (2018)
Segarra, S., Ribeiro, A.: Stability and continuity of centrality measures in weighted graphs. IEEE Trans. Signal Process. 64(3), 543–555 (2015)
Han, C.-G., Lee, S.-H.: Analysis of effect of an additional edge on eigenvector centrality of graph. J. Korea Soc. Comput. Inf. 21(1), 25–31 (2016)
Murai, S., Yoshida, Y.: Sensitivity analysis of centralities on unweighted networks. In: The World Wide Web Conference, pp. 1332–1342. ACM (2019)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Sakakura, Y., Yamaguchi, Y., Amagasa, T., Kitagawa, H.: An improved method for efficient PageRank estimation. In: International Conference on Database and Expert Systems Applications, pp. 208–222. Springer (2014)
Chen, Y.-Y., Gan, Q., Suel, T.: Local methods for estimating PageRank values. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 381–389. ACM (2004)
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)
Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in Facebook: a case study of unbiased sampling of OSNs. In: 2010 Proceedings IEEE Infocom, pp. 1–9. IEEE (2010)
Romance, M.: Local estimates for eigenvector-like centralities of complex networks. J. Comput. Appl. Math. 235(7), 1868–1874 (2011)
Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Karrer, B., Newman, M.E.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83(1), 016107 (2011)
Erdös, P., Rényi, A.: On random graphs, I. Publicationes Mathematicae (Debrecen) 6, 290–297 (1959)
Oliveira, R., Willinger, W., Zhang, B., et al.: Quantifying the completeness of the observed internet as-level structure. Work 11(15), 13–17 (2008)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 2 (2007)
Richardson, M., Agrawal, R., Domingos, P.: Trust management for the semantic web. In: International Semantic Web Conference, pp. 351–368. Springer (2003)
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)
Maiya, A.S., Berger-Wolf, T.Y.: Sampling community structure. In: Proceedings of the 19th International Conference on World Wide Web, pp. 701–710. ACM (2010)
Lovász, L., et al.: Random walks on graphs: a survey. Comb. Paul Erdos Eighty 2(1), 1–46 (1993)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)
Goodman, L.A.: Snowball sampling. Ann. Math. Stat. 32, 148–170 (1961)
Corder, G.W., Foreman, D.I.: Nonparametric Statistics: A Step-by-Step Approach. Wiley, Hoboken (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ruggeri, N., De Bacco, C. (2020). Sampling on Networks: Estimating Eigenvector Centrality on Incomplete Networks. In: Cherifi, H., Gaito, S., Mendes, J., Moro, E., Rocha, L. (eds) Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, vol 881. Springer, Cham. https://doi.org/10.1007/978-3-030-36687-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-36687-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36686-5
Online ISBN: 978-3-030-36687-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)