Abstract
We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. The basic idea is to partition the graph into classes of quasi-equivalent vertices, to project the page-based random walk to be approximated onto those classes, and to compute the stationary probability distribution of the resulting class-based random walk. From this distribution we can quickly reconstruct a distribution on pages. In particular, our framework can approximate the well-known PageRank distribution by setting the classes according to the set of pages on each Web host.
We experimented on a Web-graph containing over 1.4 billion pages and over 6.6 billion links from a crawl of the Web conducted by AltaVista in September 2003. We were able to produce a ranking that has Spearman rank-order correlation of 0.95 with respect to PageRank. The clock time required by a simplistic implementation of our method was less than half the time required by a highly optimized implementation of PageRank, implying that larger speedup factors are probably possible.
Article PDF
Similar content being viewed by others
References
Abiteboul S, Preda M and Cobena G (2003) Adaptive on-line page importance computation. In: Proc. 12th International WWW Conference (WWW2003), Budapest, Hungary, pp. 280–290
Amitay E, Carmel D, Darlow A, Lempel R and Soffer A (2003) The connectivity sonar: detecting site functionality by structural patterns. In: Proc. of the ACM Hypertext 2003 Conference, Nottingham, UK. pp. 38–47
Bar-Yossef Z and Rajagopalan S (2002) Template detection via data mining and its applications. In: Proceedings of the 11th International WWW Conference, Honolulu, Hawaii, USA, pp. 580–591.
Barabasi A-L and Albert R (1999) Emergence of scaling in random networks. Science, 286:509–512
Bharat K, Broder A, Henzinger M, Kumar P and Venkatasubramanian S (1998) The connectivity server: fast access to linkage information on the web. In: 7th International World Wide Web Conference, pp. 104–111
Brin S and Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proc. 7th International WWW Conference, pp. 107–117
Broder A (2002) A taxonomy of web search. SIGIR Forum, 36(2):3–10.
Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A and Wiener J (2000) Graph structure in the web. In: Proc. 9th International WWW Conference. pp. 309–320
Chen Y-Y, Gan Q and Suel T (2002) I/O efficient techniques for computing pagerank. In: Proc. ACM Conference on Information and Knowledge Management (CIKM2002)
Chien S, Dwork C, Kumar R, Simon D and Sivakumar D (2002) Link evolution: Analysis and algorithms. In: Workshop on Algorithms and Models for the Web Graph (WAW), Vancouver, Canada.
Cho J, GarcíAa-Molina H and Page L (1998) Efficient crawling through URL ordering. Computer Networks and ISDN Systems 30(1–7):161–172
Gallager RG (1996) Discrete stochastic processes. Kluwer Academic Publishers
Haveliwala TH (1999) Efficient Computation of PageRank. Technical Report Technical Report, Stanford University
Haveliwala TH (2002) Topic-Sensitive PageRank. In: Proc. 11th International WWW Conference (WWW2002).
Haveliwala TH, Kamvar SD and Jeh G (2003) An Analytical Comparison of Approaches to Personalizing PageRank. Technical Report Technical Report, Stanford University
Henzinger MR, Heydon A, Mitzenmacher M and Najork M (1999) Measuring index quality using random walks on the Web. Computer Networks (Amsterdam, Netherlands 1999) 31(11–16):1291–1303.
Jeh G and Widom J (2003) Scaling personalized web search. In: Proc. 12th International WWW Conference (WWW2003), Budapest, Hungary., pp. 271–279
Jennings A (1977) Matrix computation for engineers and scientists. John Wiley & Sons, Ltd.
Kamvar SD, Haveliwala TH and Golub GH (2003a) Adaptive methods for the computation of PageRank. Technical report, Stanford University.
Kamvar SD, Haveliwala TH, Manning CD and Golub GH (2003b) Exploiting the block structure of the web for computating PageRank. Technical Report Technical Report, Stanford University
Kamvar SD, Haveliwala TH, Manning CD and Golub GH (2003c) Extrapolation methods for accelerating pagerank computations. In: Proc. 12th International WWW Conference (WWW2003), Budapest, Hungary, pp. 261–270
Kleinberg JM, Kumar R, Raghavan P, Rajagopalan S and Tomkins AS (1999) The web as a graph: measurements, models and methods. In: Proc. of the Fifth International Computing and Combinatorics Conference, pp. 1–17
Lee HC (2002) When the hyperlinked environment is perturbed. In: Workshop on Algorithms and Models for the Web Graph (WAW), Vancouver, Canada
Lempel R and Moran S (2001) Rank-stability and rank-similarity of web link-based ranking algorithms. Technical Report CS-2001-22 (revised version), Dept. of Computer Science, Technion-Israel Institute of Technology
Lifantsev M (2000) Voting models for ranking web pages. In: Proc. International Conference on Internet Computing (IC 2000), Las Vegas, Nevada, pp. 143–148
Ng AY, Zheng AX and Jordan MI (2001) Stable algorithms for link analysis. In: Proc. 24’th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–266
Pandurangan G, Raghavan P and Upfal E (2002) Using PageRank to characterize web structure. In: Proc. 8th Annual International Computing and Combinatorics Conference, pp. 330–339
Richardson M and Domingos P (2001) The intelligent surfer: probabilistic combination of link and content information in pagerank. In: Advances in Neural Information Processing Systems 14 [NIPS 2001], Vancouver, British Columbia, Canada, pp. 1441–1448, MIT Press
Ruhl M, Bharat K, Chang B-W, Henzinger M (2001) Who links to whom: mining linkage between web sites. In: IEEE International Conference on Data Mining (ICDM), pp. 51–58
Snedecor GW and Cochran WG (1989) Statistical methods. Iowa State University Press, 8th edition
Tomlin J (2003) A new paradigm for ranking pages on the world wide web. In: Proc. 12th International WWW Conference (WWW2003), Budapest, Hungary, pp. 350–355
Tsoi AC, Morini G, Scarselli F, Hagenbuchner M and Maggini M (2003) Adaptive ranking of web pages. In: Proc. 12th International WWW Conference (WWW2003), Budapest, Hungary, pp. 356–365
Upstill T, Craswell N and Hawking D (2003) Predicting fame and fortune: pagerank or indegree?. In: Proc. 8th Australasian Document Computing Symposium, Canberra, Australia
Author information
Authors and Affiliations
Corresponding author
Additional information
Significant portions of the work presented here were done while A. Broder and R. Lempel were employed by the AltaVista corporation.
Rights and permissions
About this article
Cite this article
Broder, A.Z., Lempel, R., Maghoul, F. et al. Efficient PageRank approximation via graph aggregation. Inf Retrieval 9, 123–138 (2006). https://doi.org/10.1007/s10791-006-7146-1
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10791-006-7146-1