Link prediction via significant influence

https://doi.org/10.1016/j.physa.2017.11.078Get rights and content

Highlights

  • A novel link prediction index significant influence (SI) is proposed.

  • The strong and weak influences in transferring resources are defined.

  • SI models significant influence by distinguishing the strong influence from the weak.

  • The proposed index can achieve the better performance than the traditional indices.

Abstract

In traditional link prediction, many researches assume that endpoint influence, represented by endpoint degree, prefers to facilitate the connection between big-degree endpoints. However, after investigating the network structure, it is observed that influence is determined by the relations built through the paths between endpoints instead of the endpoint degree. Strong relations connecting the other endpoint through short paths, especially through common neighbors, can bring in more powerful influence, and in contrast, those relations through long paths obviously generate weak influence. In this paper, a novel link prediction index SI is proposed, which deliberately models the significant influence by distinguishing the strong influence from the weak. After comparison with main stream baselines on 12 benchmark datasets, the results suggest SI effectively improve the link prediction accuracy.

Introduction

Recently, many studies investigate topological features and reveal network functions for comprehensive understanding the essential characters of complex networks [[1], [2], [3], [4], [5]]. Link prediction is put forward to solve the related problems in some researches and has attracted more attentions [[6], [7]]. Link prediction indicates how to utilize the information of endpoint and network structure to predict the connecting possibility of two unconnected endpoints. It can be applied in many fields such as exploring protein-to-protein interactions [[8], [9]], studying the potential mechanism which drives co-authorship evolution [10], reconstructing airline networks [11], recommending friends [[12], [13]] and promoting e-commerce scales [[14], [15]], etc.

The similarity-based method of link prediction is defined based on the network structure and is more applicable than other methods. The similarity index is usually modeled to describe the probability of finding the missing and future links [[7], [16]]. It is a problem that huge expenses are caused by hardly extracting the attributes from endpoints in link prediction [[17], [18]]. Considering the topological similarity based on the network structure, mainstream methods for link prediction can be divided into three classes. The first class is called global index, such as Katz Index [19], which uses the global structural information to calculate the topological similarity of the endpoints. Unfortunately, global index suffers high computational complexity. The second class is proposed on local structure of network. Traditional local indices model the similarity by counting the number of common neighbors (CN) [20], or setting penalizing parameter to punish the large-degree endpoints, such as Salton Index [21], Sorensen Index [22], Hub Promoted Index [23], Leicht–Holme–Newman Index [24] and so on. Adamic–Adar Index (AA) [25] and Resource Allocation Index (RA) [26] penalize the large-degree common neighbors on the basis of CN index. Compared with the global index, the local indices have the lower complexity but suffer the poor performance. The third class focuses on quasi-local structures of network in order to get the compromise between performance and complexity. The Local Path Index (LP) [[26], [27]] considers the two and three hops paths but ignores the longer paths. The Local Random Walk (LRW) and the Superposed Random Walk (SRW) are the similarity indices based on random walk [28]. LRW just considers the process of limited number of steps, while SRW gives the nodes nearby more opportunities to be connected to the target node. Some recent works propose new methods based on these traditional indices for improving performance. Zhu et al. [29] supposes that paths consisting of small-degree nodes contribute more in the similarity between endpoints and propose a significant path index by using the intermediate node-degree to calculate the similarity. Liu et al. [30] filters out the redundant links in the network to improve the accuracy of the k-shell method from the perspective of spreading dynamics. Zeng [31] presents an index of common neighbor plus preferential attachment to estimate the possibility of the link existence. Ahmed et al. [32] presents a fast algorithm via random walks in temporal networks.

Previous researches assume that endpoint influence helps unconnected endpoints to connect each other in the future. They, however, simply regard the endpoint degree as the effective influence, based on which one endpoint attracts another unconnected endpoint in the future. Based on the Three Degree of Influence Rule [33], we find the influence of endpoint is eventually determined by the paths from it to its target endpoint, but the endpoint degree. For example, although possessing many relationships in the social network, two strangers are more likely to know each other through a common friend but the indirect chain of friends, i.e., more co-friends mean more effective connections, promoting two people to know each other and to be more similar. Moreover, links constructing the endpoint degree possess different abilities in transferring influence between two endpoints, namely, some links deliver more by common neighbors, some deliver less by long paths with three or more hops and the others even cannot connect the target endpoint anyway. Accordingly, for an endpoint, the ability of a short path contributing more in future connection should be called strong relation, and the ability of a long path should be called weak relation oppositely. Obviously, significant influence holds more strong relations and less weak ones.

A simple example to illustrate the strong and weak relations in the network is shown in Fig. 1. There are three different paths marked with disparate colors between initial endpoint x and target endpoint y. In the red two-hop path xz1y, endpoint x connects directly with y through a common neighbor z1. This path can produce the strong relation between x and y because of the short length. The green three-hop paths, both built by two intermediate nodes between x and y, are regarded as the weak relations of the influence, which is smaller than the short path’s. Moreover, some blue paths where x are disconnected to y contribute the least influence. Fig. 1 at the same time exemplifies the fact that the influence is determined by the relations represented by paths instead of degree by links, i.e., the influence delivered by the two paths xz2z3y and xz2z4y cannot be simply delivered by the one single link xz2. So the endpoint degree is inappropriate to be modeled as effective influence. Obviously, we believe more strong relations and less weak ones constitute the significant influence, which promotes the future connection and similarity of the endpoints.

In this paper, through emphasis of the strong relations and penalization of the weak, we propose a novel link prediction index via modeling the significant influence (SI). In comparison experiments with main stream indices on 12 benchmark datasets, the results exhibit the excellent improvement in the link prediction accuracy. The remainder of this paper is organized as follows. Section 2 defines the SI index for link prediction and some baselines for comparison in complex network. The datasets and metrics for experiment are given in Section 3. We discuss the results in Section 4 and conclude the whole paper in Section 5.

Section snippets

Definition

Above all, the definitions of strong and weak relations and significant influence are given as blow.

Definition 1

In an undirected and unweighted network G(V,E), the relation of endpoints is built through the paths between them. Between endpoints x and y, the short path, especially two-hop path, is represented as strong relation, whereas weak relation happens when x connects with y through long path. When endpoint x has more strong relations and less weak ones with y, we can think there

Experiments

In an unweighted and undirected network G(V,E), V and E represent the set of nodes and links respectively. The links set E is randomly divided into two parts: the training set ET treated as known information and the testing set EP used for prediction. The division should guarantee the connectivity in ET. Clearly, ETEP=E and ETEP=ϕ. We suppose the universal set as U which contains all the |V|×(|V|1)2 links. Then the nonexistent links set can be represented as UE. The purpose of link

Results and discussions

The AUC curves of SI are illustrated in Fig. 2 under twelve independent datasets. The α axis represents the degree of punishment and the vertical axis indicates the accuracy of the SI index. In order to show the impact of the penalty factor on the accuracy of the index and better observe the changing trend of AUC, we set α to vary in a wider range of [2,2]. AUC varies continuously in all datasets, as shown in Fig. 2. The penalization parameter plays important role in prediction when α<1.

Conclusions

A novel index SI considering the significant influence of endpoints for link prediction in complex network is proposed in this paper. In the complex network, the connecting ability between endpoints makes up the influence of the endpoints. We assume that two-hop paths bring in the significant influence and paths with three or more hops produce the weak influence in transferring resource. Therefore, two-hop paths denoted by the common neighbors are separated from the long paths. For receiving

Acknowledgments

This research is supported in part by National Science and Technology Major Project of the Ministry of Science and Technology (2017ZX03001012-003), in part by National Natural Science Foundation of China (61461136002 and 61602048), in part by Fundamental Research Funds for the Central Universities, and in part by MOE-CMCC 1-5.

References (46)

  • DorogovtsevS.N. et al.

    Pseudofractal scale-freeweb

    Phys. Rev. E

    (2002)
  • NewmanM.E.J.

    The structure and function of complex networks

    SIAM Rev.

    (2003)
  • CostaL.D.F. et al.

    Characterization of complex networks: A survey of measurements

    Adv. Phys.

    (2007)
  • GetoorL. et al.

    Link mining: a survey

    ACM SIGKDD Explor. Newslett.

    (2005)
  • MamitsukaH.

    Mining from protein-protein interactions

    Data Min. Knowl. Discov.

    (2012)
  • CannistraciC.V. et al.

    From link-prediction in brain connectomes and protein interactomes to the local-community paradigm in complex networks

    Sci. Rep.

    (2013)
  • GuimeràR. et al.

    Missing and spurious interactions and the reconstruction of complex networks

    Proc. Acad. Nat. Sci.

    (2009)
  • ScellatoS. et al.

    Exploiting place features in link prediction on location-based social networks

  • WangD. et al.

    Human mobility, social ties, and link prediction

  • HuangZ. et al.

    Link prediction approach to collaborative filtering

  • Liben-NowellD. et al.

    The link-prediction problem for social networks

    J. Am. Soc. Inf. Sci. Technol.

    (2007)
  • YinZ. et al.

    Linkrec: a unified framework for link recommendation with user attributes and graph structure

  • SchifanellaR. et al.

    Folks in folksonomies: social link prediction from shared metadata

  • Cited by (24)

    • Link prediction based on a spatial distribution model with fuzzy link importance

      2019, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      Information age blesses us with an overwhelming amount of news, data, and even rumors. Therefore, algorithms that predict the possible connection between users and objects provide essential value and attract various scientists and engineers explore related topics [3,4,12–19]. For instance, collaborative filtering algorithms [20] are often used for link prediction in recommender systems.

    • Link prediction in complex networks based on the interactions among paths

      2018, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      Liu et al. proposed the ERA index which extended RA index with consideration of the resource transfer process of local paths [62]. Yang et al. considered the significant influence of endpoints and proposed a novel SI index to measure the contributions of strong and weak relations [63]. All these indices [37,38,58–63] consider the contributions of paths from the viewpoint of intermediate nodes on paths.

    • A new similarity measure for link prediction based on local structures in social networks

      2018, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      The link prediction is one of the hot research fields in social network analysis [30]. Several studies, for example [31–33] and [34], have been conducted on applying link prediction methods in various applications. Authors in [31] Applied different similarity measures to find out the efficient measure for a complex military network.

    View all citing articles on Scopus
    View full text