Abstract

An accelerated multilevel aggregation method is presented for calculating the stationary probability vector of an irreducible stochastic matrix in PageRank computation, where the vector extrapolation method is its accelerator. We show how to periodically combine the extrapolation method together with the multilevel aggregation method on the finest level for speeding up the PageRank computation. Detailed numerical results are given to illustrate the behavior of this method, and comparisons with the typical methods are also made.

1. Introduction

The PageRank algorithm for assigning a rank of importance to web pages has been the key technique in web search [1]. The core of the PageRank computation consists in the principal eigenvector of a stochastic matrix representing the hyperlink structure of the web. Due to the large size of the web graph (over eight billion nodes [2]), computing PageRank is faced with the big challenge of computational resources, both in terms of the space of CPU and RAM required and in terms of the speed of updating in time; that is, as a new crawl is completed, it can be soon available for searching. Among all the numerical methods to compute PageRank, the Power method is one of the standard ways for its stable and reliable performances [1], whereas the low rate of convergence is its fatal flaw. Many accelerating techniques have been proposed to speed up the convergence of the Power method, including aggregation/disaggregation [35], vector extrapolation [610], multilevel [1114], lumping [15, 16], Arnoldi-type [10, 17], and adaptive methods [4, 18]. To some extent, our work follows in this vein, that is, seeking a method to accelerate the convergence of PageRank computation. In this research, we introduce a multilevel aggregation method to accelerate the computation of PageRank, and the acceleration is performed by vector extrapolation, which aims at combining the old iteration sequences.

The remainder of this paper is organized as follows. Section 2 provides preliminary and related works on PageRank and some acceleration techniques. Section 3 describes our main acceleration algorithm. Section 4 covers our numerical results and comparisons among the main PageRank algorithms. Section 5 contains conclusion and points out directions for future research.

2.1. Background on PageRank

PageRank [19] is the heart of software, as Google claims on its webpage, and continues to provide the basis for all the web search tools. Link-based PageRank views the web as a directed graph, where each node represents a page, and each edge from node to node represents the existence of a link from page to page , and at this time, one can view page as “important.” Page and Brin [1] made another basic assumption: the amount of importance distributed to by is proportional to the importance of and inversely proportional to the number of pages is pointing to. This definition suggests an iterative fixed-point computation for determining the importance for every page on the web. Each page has a fixed rank score , forming the rank vector . Page and Brin convert the computation of PageRank to the computation of stationary distribution vector of the linear system , where is the adjacency matrix of the web graph normalized so that each row sums to 1. For to be a valid transition probability matrix, every node must have at least 1 outlink; that is, should have no rows consisting of all zeros (pages without outlinks are called dangling nodes). Meanwhile, to guarantee the uniqueness of a unitary norm eigenvector corresponding to the eigenvalue 1, by the ergodic theorem for Markov chains [20], should be aperiodic and irreducible. To solve these problems, Page and Brin have introduced a parameter and transferred matrix to a new one : , where is the vector with all entries equal to 1, is the column vector representing a uniform probability distribution over all nodes, , , while page is dangling and otherwise, assuming the total page number is . If viewing a surfer as a random walker on the web, the constant can be explained below: a web surfer visiting a page will jump to any other page on the web with probability , rather than following an outlink. Matrix is usually called the Google matrix. Then, PageRank vector is the stationary distribution vector of ; that is,

obviously, (1) can be rewritten as where , is an identity matrix, and is a singular -matrix with the diagonal elements are negative column sums of its off-diagonal elements. So what remains to do is to solve the linear system (1) or its homogeneous one (2), corresponding to an irreducible Markov chain.

2.2. The Vector Extrapolation-Acceleration of PageRank

The Power method is the standard algorithm for PageRank, that is, giving the initial uniform distribution of the system (1), to compute successive iterates , until convergence, that is, when exists, which is exactly the PageRank vector. As mentioned in Section 1, although the Power method is a stable and reliable [1] or even a fast iteration algorithm [21], accelerating the computation is still important, since every search engine crawls a huge number of pages, and each matrix multiplication in iteration is so expensive, requiring considerable resources both in terms of CPU and RAM, and hence, the rate of convergence deteriorates as the number of pages grows larger. Now there are many acceleration algorithms for PageRank computation, and among all the algorithms the vector extrapolation method is a very popular and efficient one. A detailed review of the vector extrapolation method can be found in the work of Kamvar et al., Smith et al., and Sidi [9, 22, 23]. To our knowledge, there are several kinds of extrapolation methods, such as quadratic extrapolation [9], two polynomial-type methods including the minimal polynomial extrapolation method (MPE) of Cabay and Jackson [24] and the reduced rank extrapolation (RRE) of Eddy [25], and the three epsilon vector extrapolation methods of Wynn [26]. In recent years, many papers have discussed the application of vector extrapolation method to compute the stationary probability vector of Markov chains and web ranking problems, see [6, 8, 9, 22, 23, 27] for details. Numerical experiments in these papers suggest that the polynomial-type methods are in general more economical than the epsilon vector extrapolation methods in the sense of computation time and storage requirements. For this reason, our concern in the context is to consider a polynomial-type vector extrapolation, that is, the generalization of quadratic extrapolation method (GQE) proposed in [23]. Now let us discuss simply the strategy of GQE. As it is known, the starting point of the extrapolation method is to accelerate the convergence of the sequences generated from a fixed-point iterative method in Markov chain of the form where is an initial guess. Supposve that we have produced a sequence of iterates , where . Then, at the th outer iteration, let be a matrix consisting of the last iterates with being the newest, where is called the window size as usual. It is evident that has the following properties:

The problem to be solved is transformed into obtaining a vector satisfying , and, thus we have an updated probability vector a linear combination of the last iterates. That is to say, the current iterate can be expressed as a linear combination of some of the first eigenvectors, combined with the Power method, up to the converge of the principal eigenvector. GQE is derived in this light and can be given as follows [23].

Algorithm 1 (the generalization of quadratic extrapolation (GQE)). (1) Input the vectors .
(2) Compute , , set . Compute the QR-factorization of , namely, . Obtain , .
(3) Solve the linear system , .
(4) Set and compute by , .
(5) Compute .

3. Accelerated Aggregation Strategy in PageRank Computation

The core of PageRank computation, as discussed earlier, is to solve the large sparse Markov chain problem (2). Thus, it is evocative of using a especially useful method—multilevel method based on aggregation of Markov states, which has attracted considerable attention [1114, 28, 29]. Isensee and Horton considered a kind of multilevel methods for the steady state solution of continuous-time and discrete-time Markov chains in [13, 14], respectively. De Sterck et al. proposed a multilevel adaptive aggregation method for calculating the stationary probability vector of Markov matrices in [11], as shown in their context, which is a special case of the adaptive smoothed aggregation [30] and adaptive algebraic multigrid methods [31] for sparse linear systems.

The central idea of multilevel aggregation method is to convert a large linear system to a smaller one by some aggregation strategies, and thus, the stationary state solution can be obtained in an efficient way.

However, aggregation/disaggregation cannot always dissolve the algorithmic scalability due to poor approximation of in problem (2) by unsmoothed intergrid operators, so a careful choice of aggregation strategy is crucial for the efficiency of the multilevel aggregation hierarchies. Now there are many aggregation methods, such as graph aggregation [32], neighborhood-based aggregation, and (double) pairwise aggregation [4, 13, 14]. In our study, we consider the neighborhood aggregation method as described in [1214], since it is able to result in well-balanced aggregates of approximately equal size and provide a more regular coarsening throughout the automatic coarsening process [12, 29, 33]. Our aggregation strategies are based on the problem matrix by the current iterate , rather than the original coefficient matrix . Let denote the current iterate; then, we say node is strongly connected to node in the graph of if where is a strength of connection parameter and is used there. Suppose that is the set of all points which are strongly connected to in the graph of including node itself. Then, we have the neighborhood-based algorithm as follows (see [12, 29] for more details).

Algorithm 2 (neighborhood-based aggregation, ). (1) Preparing: set and .
(2) Assign entire neighborhoods to aggregates: for , build strong neighborhoods based on ; if , then , , , .
(3) Put the remaining points in the most connected aggregates: while , pick and set , set and .
(4) Obtain the aggregation matrix : if , , then , otherwise .
Note that the aggregation matrix in Algorithm 2 has the following form

From (8), we find that there exists only one element in each row, but each column may have several elements , and the sum of the elements in the th column denotes the number of the nodes combined into the th aggregation.

The multilevel aggregation method, based on the neighborhood-based aggregation strategy, introduced by De Sterck et al. in [11, 12, 28, 29] can be expressed as follows.

Algorithm 3 (multilevel aggregation method, ). (1) Presmoothing: do times (Relax).
(2) Build according to the automatic coarsening process described below. Obtain and .
(3) Form the coarse-level matrix and compute .
(4) If on the coarsest level, solve by a direct method. Otherwise, apply iterations of this algorithm MA(diag, ).
(5) Coarse-level correction: (diag).
(6) Postsmoothing: do times (Relax).

In Algorithm 3, is given in system (2), is an initial guess vector, denotes the coarse-level matrix, and is the corresponding coarse-level vector, where is the number of levels, the finest level is 1, and the coarsest level is . is the aggregation matrix generated by the aggregation method-Algorithm 2, and and , are the prolongation and restriction operators, respectively, which are created by an automatic coarsening process in step 2, and then, we get the coarse-level matrices by the equation . Setting the weighted Jacobi method with weight , a variant of Jacobi method, as the pre- and postsmoothing approaches and letting and be their iteration times, respectively, the matrix in system (2) is splitted into the following form where is the diagonal part of the matrix with for all and and are the negated strictly lower- and upper-triangular parts of , respectively. Then, the weighted Jacobi relaxation method can be written as with weight . Here, we use to denote the normalization operator defined by for all . Note that we have carried out the normalization after each relaxation process in Algorithm 3 to ensure that the fine-level iterates can be interpreted as approximations to the stationary probability vector.

It should be noted that the direct method on the coarsest level, used at step 4 of Algorithm 3, is based on the following theorem.

Theorem 4 (see [34, Theorem 4.16]). If is an irreducible singular -matrix, then each of its principal submatrices other than itself is a nonsingular -matrix.

If is the coarsest-level operator, then we use the direct method presented in the coarsest-level algorithm below to solve the coarsest-level equation .

Coarsest-Level Algorithm

Step 1. Compute ; % the size of coarsest-level operator .

Step 2. Obtain ; % the order principal submatrix of .

Step 3. Obtain ; % the right-hand vector corresponding to .

Step 4. Compute ; let ; and obtain the coarsest-level solution .

Now, let us introduce the main idea of this paper and the implementation details of the main algorithm. In fact, our idea is derived from the excellent papers [12, 28, 29]. In the literature, De Sterck et al. studied several strategies to accelerate the convergence of the multilevel aggregation method, including the application of a smoothing technique to the interpolation and restriction operators in [28] and the analysis of a recursively accelerated multilevel aggregation method by computing quadratic programming problems with inequality constraints in [29]. Of particular note, in [12], they introduced a top-level acceleration of adaptive algebraic multilevel method by selecting a linear combination of old iterates to minimize a function over a subset of the set of probability vectors. Their acceleration strategy was involved mainly in cases when the window size , and for larger window size , such as or 4, they used the active set method from Matlab’s quadprog function [35]. With this in mind, enough interest is aroused to look for a general acceleration method for any case of window size. In view of the effectiveness of vector extrapolation in improving the convergence of the iterate sequence, now consider a new accelerated multilevel aggregation method, combining the multilevel aggregation method and the vector extrapolation method. Giving the finest level iterate sequence produced by Algorithm 3 in multilevel aggregation, the updated iterate is generated as their linear combination by Algorithm 1. And so we can present our accelerated multilevel aggregation as follows.

Algorithm 5 (extrapolation-accelerated multilevel aggregation methods, ). (1) Set , if no initial guess is given, choose .
(2) Run the multilevel aggregation method, .
(3) Set . % set the window size.
(4) Set . % the last iterates.
(5) Apply GQE algorithms to obtain from , respectively.
(6) If , then .
(7) Check convergence, if , then , otherwise set and go to step 2.

Note that in Algorithm 5 is the window size and is a prescribed tolerance. In particular, there exists a difference in constructing the matrix between our Algorithm 5 and the algorithm of [36]. From Algorithm 1, the main reason is that the GQE needs vectors as their input. That is to say, when the window size is , four approximate probability vectors given in the matrix are required as its input of GQE algorithms. In particular, the window size corresponds to the quadratic vector extrapolation method presented in [9]. Hence, the matrix is given as the form of step 4 in Algorithm 5. The efficiency of our accelerated multilevel aggregation algorithms will be shown in the next section, where the relative advantage of our method over Matlab’s quadprog function in the Markov chain and over other methods in PageRank computation will be demonstrated.

4. Numerical Results and Analysis

In this section, we report some numerical experiments to show the numerical behavior of extrapolation-accelerated multilevel aggregation methods. All the numerical results are obtained with a Matlab 7.0.1 implementation on a 2.93 GHz processor with 2 GB main memory.

For the sake of justice, the initial guess vector is selected as for all the algorithms. All the iterations are terminated when the residual 1-norm where is the current approximate solution and is a user described tolerance. For convenience, in all the previous tables we have abbreviated the Power method, the generation of quadratic extrapolation method, and the extrapolation-accelerated multilevel aggregation method as Power, GQE, and EAMA, respectively. We use “” to denote the 1-norm residual, “” to the number of levels in the multilevel aggregation, “” to the number of iterations, and “” to the CPU time used in seconds.

Example 1. As described in Section 3, the active set method from Matlab’s quadprog function [12, 35] is usually used for the acceleration strategy in Markov chains and other problems. And in this paper, we have suggested the extrapolation-acceleration strategy instead of the Matlab’s one. This example aims to compare the two methods and to test the efficiency of our new method. The test example is a birth-death chain with invariant birth and death rates as shown in Figure 1 [3739], which is usually used in queuing theory, demographics, performance engineering, or in biology [32, 40].
Setting , relaxation parameter , and the strength parameter of connection in our experiment, we use -matlab to denote the Matlab’s quadprog function acceleration method that improves the -cycles on the finest level and “” to denote the length of Markov chains. “” denotes the operator complexity of the last cycle, which is defined as the sum of the number of nonzero entries in all operators on all levels , , divided by the number of nonzero entries in the finest-level operator . That is, where is the number of nonzero entries in the matrix . Numerical results for EAMA and -matlab methods have been reported in Table 1.
From Table 1, it is evident that our accelerated multilevel aggregation method EAMA has better performance than Matlab’s function -matlab for the testing problem from both an operator complexity and the iteration count perspective, and moreover, our method is more efficient with the length of Markov chain increasing. For instance, when and the window size : , the EAMA method cuts the number of iterations by up to , , and , respectively, compared to the -matlab method.

Example 2. In this example, we consider the performance of EAMA in the PageRank computation. We take a typical website of “Hollions” as our numerical example, which has been listed in Chapter 14 of [41], which contains 6012 web pages and 23875 hyperlinks. We compare the Power method, the generalization of quadratic extrapolation method, and the extrapolation-accelerated multilevel aggregation method for the PageRank problem.
In Algorithm 5, we consider that accelerators, -cycles (), and -cycles can be treated in a similar way. Some specific sets of parameters are used. We use the weighted Jacobi as the pre- and postsmoothing approaches in Algorithm 2, with relaxation parameter . Direct coarsest-level solvers are performed by the coarsest-level algorithm, and the strength parameter of connection is chosen as . There are some numerical results reported in Tables 2 and 3 in the following parts.
The residual analysis in Table 2 indicates that EAMA does much better in PageRank computation than the other two methods, namely, the classic Power method and GQE, and EAMA has a more obvious advantage over the other two methds, with the iteration count increasing.
From Table 3, when an error tolerance is provided and no matter what value the window size is, the accelerated multilevel aggregation method outperforms the Power and GQE, in terms of both iteration count and time. For instance, when the window size is 3, the number of iterations by EAMA is about half of the one by GQE and less than one third of the one by Power.

Example 3. The test matrix is the widely used CS-Stanford Web matrix, which is available from http://www.cise.ufl.edu/research/sparse/matrices/index.html. It contains 9914 nodes and 35,555 links. In this example, we run the Power method, the generalized quadratic extrapolation method, and our extrapolation-accelerated multilevel aggregation method. The convergence tolerance is , and the window size is selected as . Table 4 lists the results.
It is seen from Table 4 that the numerical behavior of the three algorithms strongly relies on the choice of the damping factor in PageRank computation. When is high, say 0.95 and 0.99, the computing time and iteration count are sizable. However, when is moderate, say 0.85, the computing time and iteration count are greatly reduced, like, about one-seventh of the case . This just confirms Page and Brin’s point [1]: is the most common choice.
Besides, just as expected, it is obvious to see that the GQE method is superior to the Power method, while the new multilevel aggregation method () performs the best, in both computing time and iteration count terms. For instance, when is 0.85, comparing with the results by Power and GQE, the time for has been reduced by and , respectively, and the iteration count has been reduced by and , respectively.

Example 4. This example aims to examine the influence of the choice of the convergence tolerance on the algorithms. The test matrix is the Stanford-Berkeley Web matrix (available from http://www.stanford.edu/~sdkamvar/research.html). It contains 683,446 pages and 7.6 million links. We run the Power method, the generalized quadratic extrapolation method, and our extrapolation-accelerated multilevel aggregation method on this problem and choose to be , , , and , respectively. The window size for the extrapolation procedure is selected as , and in PageRank is set as 0.85. Table 5 reports the results obtained.
It is easy to see from Table 5 that our new algorithm, EAMA, performs the best in most cases, regardless of the convergence tolerance and both in terms of computing time and iteration numbers. For instance, when varies from to , the iteration count needed for the three algorithms (power, GQE, and EAMA) increases , , and , respectively, but nonetheless, the count for EAMA is still less than the corresponding one for GQE and far less than that for the Power, only about of the latter. It can be seen that about the same goes for time.

5. Conclusions

This paper illustrates the accelerated multilevel aggregation method for calculating the stationary probability vector of an irreducible stochastic matrix, Google matrix, in PageRank computation. It is conducted to combine the vector extrapolation method and the multilevel aggregation method, where the neighborhood-based aggregation strategy [11, 12, 28, 29] is used, and the vector extrapolation acceleration is carried out on the finest level. Our approach is inspired by De Sterck et al. in [12] where the active set method from Matlab’s quadprog function was used therein for window size , which is greater than two. Thus, it is natural to seek for a general acceleration method for any case of window size, and thus, our EAMA is produced.

Although our method is effective, however, there are still many places worth exploring. The Google matrix, for example, can be reordered according to dangling nodes and nondangling nodes of the matrix [16, 4244], which reduces the computation of PageRank to that of solving a much smaller problem. And for the special linear system (2) in Markov chains, there maybe other even better options than the neighborhood-based aggregation strategy in multilevel method. With some improvement, our algorithm will be worth watching and will be a part of future work.

Acknowledgments

This research is supported by the Chinese Universities Specialized Research Fund for the Doctoral Program (20110185110020), Scientific Research Fund of Sichuan Provincial Education Department (T11008), and the Science and Technology Planning Project of Hunan Province (2010JT4042).