Link prediction via significant influence
Introduction
Recently, many studies investigate topological features and reveal network functions for comprehensive understanding the essential characters of complex networks [[1], [2], [3], [4], [5]]. Link prediction is put forward to solve the related problems in some researches and has attracted more attentions [[6], [7]]. Link prediction indicates how to utilize the information of endpoint and network structure to predict the connecting possibility of two unconnected endpoints. It can be applied in many fields such as exploring protein-to-protein interactions [[8], [9]], studying the potential mechanism which drives co-authorship evolution [10], reconstructing airline networks [11], recommending friends [[12], [13]] and promoting e-commerce scales [[14], [15]], etc.
The similarity-based method of link prediction is defined based on the network structure and is more applicable than other methods. The similarity index is usually modeled to describe the probability of finding the missing and future links [[7], [16]]. It is a problem that huge expenses are caused by hardly extracting the attributes from endpoints in link prediction [[17], [18]]. Considering the topological similarity based on the network structure, mainstream methods for link prediction can be divided into three classes. The first class is called global index, such as Katz Index [19], which uses the global structural information to calculate the topological similarity of the endpoints. Unfortunately, global index suffers high computational complexity. The second class is proposed on local structure of network. Traditional local indices model the similarity by counting the number of common neighbors (CN) [20], or setting penalizing parameter to punish the large-degree endpoints, such as Salton Index [21], Sorensen Index [22], Hub Promoted Index [23], Leicht–Holme–Newman Index [24] and so on. Adamic–Adar Index (AA) [25] and Resource Allocation Index (RA) [26] penalize the large-degree common neighbors on the basis of CN index. Compared with the global index, the local indices have the lower complexity but suffer the poor performance. The third class focuses on quasi-local structures of network in order to get the compromise between performance and complexity. The Local Path Index (LP) [[26], [27]] considers the two and three hops paths but ignores the longer paths. The Local Random Walk (LRW) and the Superposed Random Walk (SRW) are the similarity indices based on random walk [28]. LRW just considers the process of limited number of steps, while SRW gives the nodes nearby more opportunities to be connected to the target node. Some recent works propose new methods based on these traditional indices for improving performance. Zhu et al. [29] supposes that paths consisting of small-degree nodes contribute more in the similarity between endpoints and propose a significant path index by using the intermediate node-degree to calculate the similarity. Liu et al. [30] filters out the redundant links in the network to improve the accuracy of the k-shell method from the perspective of spreading dynamics. Zeng [31] presents an index of common neighbor plus preferential attachment to estimate the possibility of the link existence. Ahmed et al. [32] presents a fast algorithm via random walks in temporal networks.
Previous researches assume that endpoint influence helps unconnected endpoints to connect each other in the future. They, however, simply regard the endpoint degree as the effective influence, based on which one endpoint attracts another unconnected endpoint in the future. Based on the Three Degree of Influence Rule [33], we find the influence of endpoint is eventually determined by the paths from it to its target endpoint, but the endpoint degree. For example, although possessing many relationships in the social network, two strangers are more likely to know each other through a common friend but the indirect chain of friends, i.e., more co-friends mean more effective connections, promoting two people to know each other and to be more similar. Moreover, links constructing the endpoint degree possess different abilities in transferring influence between two endpoints, namely, some links deliver more by common neighbors, some deliver less by long paths with three or more hops and the others even cannot connect the target endpoint anyway. Accordingly, for an endpoint, the ability of a short path contributing more in future connection should be called strong relation, and the ability of a long path should be called weak relation oppositely. Obviously, significant influence holds more strong relations and less weak ones.
A simple example to illustrate the strong and weak relations in the network is shown in Fig. 1. There are three different paths marked with disparate colors between initial endpoint and target endpoint . In the red two-hop path , endpoint connects directly with through a common neighbor . This path can produce the strong relation between and because of the short length. The green three-hop paths, both built by two intermediate nodes between and , are regarded as the weak relations of the influence, which is smaller than the short path’s. Moreover, some blue paths where are disconnected to contribute the least influence. Fig. 1 at the same time exemplifies the fact that the influence is determined by the relations represented by paths instead of degree by links, i.e., the influence delivered by the two paths and cannot be simply delivered by the one single link . So the endpoint degree is inappropriate to be modeled as effective influence. Obviously, we believe more strong relations and less weak ones constitute the significant influence, which promotes the future connection and similarity of the endpoints.
In this paper, through emphasis of the strong relations and penalization of the weak, we propose a novel link prediction index via modeling the significant influence (SI). In comparison experiments with main stream indices on 12 benchmark datasets, the results exhibit the excellent improvement in the link prediction accuracy. The remainder of this paper is organized as follows. Section 2 defines the SI index for link prediction and some baselines for comparison in complex network. The datasets and metrics for experiment are given in Section 3. We discuss the results in Section 4 and conclude the whole paper in Section 5.
Section snippets
Definition
Above all, the definitions of strong and weak relations and significant influence are given as blow.
Definition 1 In an undirected and unweighted network , the relation of endpoints is built through the paths between them. Between endpoints and , the short path, especially two-hop path, is represented as strong relation, whereas weak relation happens when connects with through long path. When endpoint has more strong relations and less weak ones with , we can think there
Experiments
In an unweighted and undirected network , and represent the set of nodes and links respectively. The links set is randomly divided into two parts: the training set treated as known information and the testing set used for prediction. The division should guarantee the connectivity in . Clearly, and . We suppose the universal set as which contains all the links. Then the nonexistent links set can be represented as . The purpose of link
Results and discussions
The AUC curves of SI are illustrated in Fig. 2 under twelve independent datasets. The axis represents the degree of punishment and the vertical axis indicates the accuracy of the SI index. In order to show the impact of the penalty factor on the accuracy of the index and better observe the changing trend of AUC, we set to vary in a wider range of . AUC varies continuously in all datasets, as shown in Fig. 2. The penalization parameter plays important role in prediction when .
Conclusions
A novel index SI considering the significant influence of endpoints for link prediction in complex network is proposed in this paper. In the complex network, the connecting ability between endpoints makes up the influence of the endpoints. We assume that two-hop paths bring in the significant influence and paths with three or more hops produce the weak influence in transferring resource. Therefore, two-hop paths denoted by the common neighbors are separated from the long paths. For receiving
Acknowledgments
This research is supported in part by National Science and Technology Major Project of the Ministry of Science and Technology (2017ZX03001012-003), in part by National Natural Science Foundation of China (61461136002 and 61602048), in part by Fundamental Research Funds for the Central Universities, and in part by MOE-CMCC 1-5.
References (46)
- et al.
Complex networks: Structure and dynamics
Phys. Rep.
(2006) - et al.
Link prediction in complex networks: A survey
Physica A
(2011) Uncovering mechanisms of co-authorship evolution by multirelations-based link prediction
Info. Proc. Mgmt.
(2017)- et al.
Recommender systems
Phys. Rep.
(2012) - et al.
Friends and neighbors on the web
Soc. Netw.
(2003) Link prediction based on local information considering preferential attachment
Physica A
(2016)- et al.
Sampling-based algorithm for link prediction in temporal networks
Inform. Sci.
(2016) - et al.
What’s in a crowd? Analysis of face-to-face behavioral networks
J. Theoret. Biol.
(2011) - et al.
Self-similar scaling of density in complex real-world networks
Physica A
(2012) - et al.
Statistical mechanics of complex networks
Rev. Modern Phys.
(2002)
Pseudofractal scale-freeweb
Phys. Rev. E
The structure and function of complex networks
SIAM Rev.
Characterization of complex networks: A survey of measurements
Adv. Phys.
Link mining: a survey
ACM SIGKDD Explor. Newslett.
Mining from protein-protein interactions
Data Min. Knowl. Discov.
From link-prediction in brain connectomes and protein interactomes to the local-community paradigm in complex networks
Sci. Rep.
Missing and spurious interactions and the reconstruction of complex networks
Proc. Acad. Nat. Sci.
Exploiting place features in link prediction on location-based social networks
Human mobility, social ties, and link prediction
Link prediction approach to collaborative filtering
The link-prediction problem for social networks
J. Am. Soc. Inf. Sci. Technol.
Linkrec: a unified framework for link recommendation with user attributes and graph structure
Folks in folksonomies: social link prediction from shared metadata
Cited by (24)
Link prediction based on a spatial distribution model with fuzzy link importance
2019, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :Information age blesses us with an overwhelming amount of news, data, and even rumors. Therefore, algorithms that predict the possible connection between users and objects provide essential value and attract various scientists and engineers explore related topics [3,4,12–19]. For instance, collaborative filtering algorithms [20] are often used for link prediction in recommender systems.
Link prediction in complex networks based on the interactions among paths
2018, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :Liu et al. proposed the ERA index which extended RA index with consideration of the resource transfer process of local paths [62]. Yang et al. considered the significant influence of endpoints and proposed a novel SI index to measure the contributions of strong and weak relations [63]. All these indices [37,38,58–63] consider the contributions of paths from the viewpoint of intermediate nodes on paths.
A new similarity measure for link prediction based on local structures in social networks
2018, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :The link prediction is one of the hot research fields in social network analysis [30]. Several studies, for example [31–33] and [34], have been conducted on applying link prediction methods in various applications. Authors in [31] Applied different similarity measures to find out the efficient measure for a complex military network.
A novel link prediction method integrated link attributes for directed graph
2022, International Journal of Modern Physics B