Amy Langville and Carl Meyer, Google’s Page Rank and Beyond: The Science of Search Engine Rankings

White, Bebo

doi:10.1007/s10791-008-9063-y

Amy Langville and Carl Meyer, Google’s Page Rank and Beyond: The Science of Search Engine Rankings

Princeton University Press, Princeton, 2006, 234 pp, $35.00, ISBN 978069112201

Book Review
Published: 06 June 2008

Volume 11, pages 471–472, (2008)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Amy Langville and Carl Meyer, Google’s Page Rank and Beyond: The Science of Search Engine Rankings

Download PDF

Bebo White¹

638 Accesses
1 Citation
3 Altmetric
Explore all metrics

With such a compelling title, this book suggests that it holds the secret to how Web authors can acquire one of their most coveted goals—a high ranking on the Google search engine. The key to that goal, they suggest, is an understanding of Google’s PageRank model. They even float the idea (and quickly reject) that the PageRank equation may rank alongside the great equations of history such as Maxwell’s equations of electromagnetism and Einstein’s pervasive E = mc². With an understanding of that equation might come a high Google ranking—perhaps more valuable in today’s IT world than an understanding of the processes of the universe. However, after reading this book, it may be far more realistic to categorize PageRank amongst the great industrial recipes of our time such as the closely guarded formula for Coca-Cola. Like the Coke recipe, PageRank is a well-kept industrial secret, an intellectual property that built an empire, that contributes to the satisfaction of millions of consumers each day, has inspired numerous competitors, and has even generated some memorable lawsuits. The environment in which PageRank evolved certainly makes for a compelling story.

From the start of this book, it is clear that the authors really wanted to write a comprehensive work about the mathematics underlying the PageRank. (Approximately two-thirds of the book is devoted to mathematics). Perhaps realizing the limited audience that such a book might attract, their mathematical descriptions are wrapped up with descriptions of the history of content search (Web and otherwise), an overview of Web search technologies, an outstanding collection of anecdotes (called “asides”) and search-related facts, statistics, and just plain trivia. (Unfortunately, many of these facts are quite out-of-date with respect to the publication date of the book, but interesting nonetheless.) This format was no doubt designed to make the book accessible to both a general and highly technical audience. For the mathematicians they have certainly succeeded. For those expecting to learn a great deal about “the science of search engine rankings,” this reviewer finds the book coming up short.

The brief history of information retrieval is good but too abbreviated. (I, for one, did not know the story of how scrolls evolved into books). Hypertext and Web information retrieval are introduced via the familiar stories of Tim Berners-Lee and Vannevar Bush’s Memex. It was perplexing that no mention was made of Ted Nelson and Xanadu. Nelson had definite ideas about how users might search his “docuverse.” A number of these ideas have made it into current search technology.

Chapters 2 (Crawling, Indexing, and Query Processing) and 3 (Ranking Webpages by Popularity) should provide the science of search engine rankings promised in the book title. Instead, Chapter 2 provides a superficial description of crawling supported by an example of crawling code written in MATLAB. The discussion of the content index and query processing are done by example and presented in a format that can be more appreciated by mathematicians. The authors take the opportunity in Chapter 3, which should be the heart of the book given its title, to briefly discuss the history and differences between PageRank and the HITS (hypertext induced topic search) algorithm. HITS, used by the Teoma search engine, is the only other algorithm besides PageRank highlighted in the book. The “true science of ranking” is found in the mathematical chapters. The best information in Chapter 3 is found in the “asides.”

Search engine optimization (SEO) is a topic of great interest to Website designers and Web content providers. This topic is only touched upon anecdotally. The book offers no clear-cut strategy how Web-based content can be optimized and made “more attractive” to modern day search engines.

The heart of this book is the chapters (4–12) on the mathematics of PageRank. It is comprehensive in its detail and will surely be a delight for applied mathematicians. These chapters are not for the light-hearted and can be skipped without impacting the flow of the book. Fortunately the remaining chapters rarely make reference to any of the mathematical content. The mathematical concepts are organized and structured like a well-designed textbook. All that is missing are examples and exercises. Interesting “asides” are dispersed within the mathematical chapters perhaps to give the reader some respite after a particularly complex concept is explained. MATLAB code is occasionally used to present a programming implementation of concepts.

Chapter 13 promises to give the reader a glimpse into “The Future of Web Information Retrieval.” Instead it offers a collection of anecdotes, news stories, and issue statements. This chapter could have drawn a great deal from the vast research coming from the WEBIR and WWW communities, but did not. There is no mention of the search technology in the so-called Web 2.0 applications or of the Semantic Web and its related technologies such as RDF and OWL. Examples of these applications could greatly bolster future work in previously mentioned topics such as context-sensitive searching and Boolean searching. The Information Retrieval community has often charged the WEBIR community with “picking the low hanging fruit.” The authors could have addressed this charge.

Besides the mathematical treatises, this book is very well written in a comfortable and, at times, very informal style. It is entertaining as well as informative. It has a good resource list (though likely somewhat dated), and an excellent bibliography and glossary. There is a tutorial on the relevant mathematics for readers needing to refresh their knowledge on the mathematical principles discussed.

This book is highly recommended for readers and students who want to understand in depth the mathematical principles behind Web searching, especially Google’s PageRank. For readers interested in a comprehensive discussion of topics such as search engine principles and search engine optimization, there are better and more complete texts.

Author information

Authors and Affiliations

Stanford Linear Acce lerator Center, 2575 Sand Hill Road, MailStop 97, Menlo Park, CA, 94305, USA
Bebo White

Authors

Bebo White
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bebo White.

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, B. Amy Langville and Carl Meyer, Google’s Page Rank and Beyond: The Science of Search Engine Rankings. Inf Retrieval 11, 471–472 (2008). https://doi.org/10.1007/s10791-008-9063-y

Download citation

Received: 05 April 2008
Accepted: 27 May 2008
Published: 06 June 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10791-008-9063-y

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Amy Langville and Carl Meyer, Google’s Page Rank and Beyond: The Science of Search Engine Rankings

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation